Cause and Corrective Action – Engineering an Escape from Man-Made Disaster

Unexplained field failures are the bane of new product launches. This is especially true when the product is sold based on its theoretical reliability, but its performance in practice falls short.

A Case Study

In the case of one market incident, a large fraction of gas turbine engine production failed catastrophically within one year of commissioning. That was far short of the anticipated five- to ten-year life expectancy. Every engine in the group showed a specific “burn pattern” on the shaft in the location of its seized gas foil journal bearings, but the similarities ended there. Mixed in the lot were builds with parts from different manufacturers, bearings with different design details, and some components which had been remanufactured, while others were new. A naive physical examination of the components of all the engines failed to find a common denominator. The circumstances required a more rational approach of the kind that good statistics can bring to the table.

Fortunately, electronic monitoring of all the engines in the field had permitted the collection of clean time-to-failure data. The analysis of that data was of a type called Weibull that permits the use of small data sets, and can give information about the number and type of failure modes. When the failures were divided into groups according to the analysis, patterns emerged.

Statistical Analysis Results

One group, consisting of half the failures, conformed to the pattern of infant mortality, which means that most of the failures occurred almost immediately after commissioning. The bearings that supported the high-speed rotating assembly in this group were all of the same recently-introduced type. When the manufacturing and materials teams examined bearings sampled from the factory process stream, they found cracks in some of the elements, caused by the forming process. Corrective action, at least for the moment, consisted of reversion to the former bearing type.

The analysis identified a second failure mode, numerically about a third of all, in which most of the failures occurred at close to the same time in service. Usually termed a “wearout” failure mode, some people compared it to “hitting a wall”.

After additional engines had accumulated in our “morgue”, a third failure mode emerged from the data. It showed a more normal distribution, as if the design life had been reduced by something in the environment that affected all of the engines in the group equally.

At this stage, we had enough information to associate a type of failure mode with each failed engine, and we had identified the physical cause of one of them. Identifying the underlying physics of the remaining two failure modes turned out to be something of a challenge.

Cause and Corrective Action

Building on existing knowledge, we created a Failure Modes and Effects Analysis (FMEA) that incorporated both design and manufacturing potentials in a single matrix, but otherwise followed Automotive Industry Action Group (AIAG) rules. The result of an FMEA is a array which identifies the potential failures associated with each system, subsystem, and component, along with their likelihood, severity, and ease of early detection. When the process was complete, we were able to identify disabled placard four leading suspects for the remaining two failure modes, not counting interactions. Pareto charting was ultimately helpful, although its use, and that of Quality Function Deployment (QFD) tools, met with much resistance from some members of the team. That, however, is another story.

The fattest targets in our war plan turned out to be rotor imbalance and cooling flow issues. Rotor imbalance was a particularly thorny problem because it involved both in-house and vendor processes. Third and fourth items, the bearing dynamic characteristics of stiffness and damping, had never been measured, even approximately, in any satisfactory way.

Process mapping and repeatability and reproducibility (R&R) measurements in the balance room convinced us that the uncertainty of our balance measurements had been at least an order of magnitude greater than the design limits. Further, there was a strong dependence on assembly technique, and a correlation between a transfer of balance room personnel and an increase in production with the onset of field failures.

The physical clues that pointed us in the direction of a secondary cooling flow problem were discoloration of a heat shield adjacent to the turbine, and (by process mapping again) conceptual errors in the way the compression of an important static seal was measured. A combination of finite element analysis (FEA) and computational fluid dynamics (CFD) led us to the conclusion that leakage past the seal likely elevated the temperature of the turbine bearing coating beyond its design limits.

Follow-up testing, planned with the “Design-Expert” software tool by Stat-Ease, gave us 75% confidence that unintentional balance errors of the magnitude we measured would result in vibration sufficient to cause progressive bearing damage, culminating in failure (but still passing our acceptance test!). We could not check the high temperature levels predicted by our analysis due to a lack of nonintrusive instrumentation.

Because the rotor speed, at nearly 100,000 rpm, pushed the inside of the balancing envelope, the original specification had required both component and group balancing. The complicating factor was that the design of the turbine required disassembly of the rotor after balancing, with the potential for an out-of-balance condition arising from reassembly during build. Moreover, the rotor design called for a cost-effective, but potentially damaging, radial interference fit between the shaft and both rotating aerodynamic components (a compressor impeller and a turbine rotor).

Leave a comment

Your email address will not be published.