5 March 2026

Reliability Engineering for Wind: Why Reactive Doesn't Work

Wind turbines are designed to operate for 20 years or more in harsh environments. The fact that most of them do is a testament to good design and, in many cases, good maintenance. But "still running" is not the same as "running well," and the difference between the two often comes down to whether anyone is paying sustained, structured attention to reliability.

The P-F Curve: A Framework for Thinking About Failure

At the heart of reliability engineering is a simple concept known as the P-F curve. It describes the timeline of a component's journey from healthy operation through to functional failure, and the different maintenance strategies that apply at each stage.

The first line of defence is preventive maintenance. This is about keeping components within their design parameters so that defects are less likely to form in the first place. Ensuring oil is clean and at the right temperature, bolts are at the correct torque, filters are changed on schedule, and control systems are operating as intended. None of this is glamorous, but it is the foundation. A component that consistently operates within its design envelope has the best chance of reaching its intended service life without developing problems.

Despite best efforts, defects will still form. Bearings will begin to pit, gear teeth will develop surface damage, blade surfaces will erode, and electrical insulation will progressively age. This is where predictive maintenance takes over. The objective is to detect the defect as early as possible after it forms, the "P" point on the curve, and to monitor its progression so that intervention can be planned before it reaches functional failure. The tools here are condition monitoring, oil analysis, borescope inspections, SCADA trend analysis, and visual and audio inspections. The earlier the detection, the longer the window available to plan a response, and the more options are on the table. A gearbox bearing defect detected six months before it would cause a shutdown gives you time to source a replacement and choose your weather window. The same defect detected six days before does not.

When both preventive and predictive strategies fail, or when a failure mode was not anticipated, the result is reactive maintenance. The component has reached functional failure, the turbine is shut down, and the priority shifts to restoring operation as quickly as possible. This is invariably the most expensive outcome: unplanned downtime, emergency procurement, mobilisation at short notice, and secondary damage to other components caused by the typically catastrophic failure itself.

The goal of a reliability programme is to push as much activity as possible towards the left-hand side of the P-F curve. Maximise the preventive work that stops defects forming. Invest in the predictive capability that catches them early when they do. Minimise the proportion of failures that reach the reactive stage. No programme will eliminate reactive maintenance entirely, but a well-structured one will make it the exception rather than the routine.

Getting the Basics Right

If the P-F curve provides the framework, the preventive layer is where reliability is won or lost in practice. And it starts with something deceptively simple: keeping components within their design parameters. This is fundamentally the work of your maintenance technicians. The industry does not always value this role as highly as it should, but the reality is that a skilled technician completing routine preventive maintenance to a high standard does more for long-term reliability than any amount of engineering analysis after the fact. Everything that follows in the P-F curve, the detection, the interpretation, the intervention planning, depends on this foundation being solid.

This sounds obvious. In practice, it is surprisingly easy for things to slip. A gearbox oil temperature that has been creeping up over six months may not trigger an alarm, but it may indicate a cooling system issue that, left unaddressed, will shorten gearbox life. An automatic greasing reservoir that runs dry of grease due to delayed maintenance will not cause a shutdown, but it may initiate a defect that will be picked up weeks later via condition monitoring, or worse when the bearing catastrophically fails.

None of these are dramatic events at the time, and none are likely to appear on a monthly operations report. But they are exactly the kind of issues that a structured reliability programme is designed to catch before they become expensive. They are the preventable precursors to the defects that then need to be detected and managed further down the P-F curve.

The Shift in Risk

For much of the onshore wind industry's history, the question of major component risk was largely somebody else's problem. Full-scope OEM service contracts covered the main drivetrain components, and the operator's exposure was limited to the terms of the agreement. If a gearbox failed, the OEM managed the replacement. The cost was significant, but it was known and contracted.

As we detailed before in this blog, the industry is changing. OEMs are increasingly selective about the contracts they offer, particularly on older platforms and smaller fleets. Many operators are moving to reduced-scope or independent maintenance arrangements, whether by choice or because the alternative is no longer available at an acceptable price. This shift is well understood across the industry, and for many operators it makes sound commercial sense.

But it changes the reliability conversation fundamentally. When you carry major component risk on your own balance sheet, a major component replacement is not a contract claim. It is a capital event, often costing hundreds of thousands of pounds, requiring crane mobilisation, procurement lead times, and potentially months of lost production.

Under a full-scope contract, the operator needed to know that the turbine was running. Under a self-managed or reduced-scope arrangement, the operator needs to know how it is running, in enough detail to make informed decisions about when to intervene, what to prioritise, and how to plan capital expenditure against a realistic view of component condition. In the language of the P-F curve, the operator now owns the entire curve, not just the reactive end of it.

This is where reliability engineering earns its value. Not as a reporting function, but as the engineering-led capability that interprets findings, identifies risks, and supports decisions where the cost of getting it wrong can be measured in six figures.

The Interpretation Challenge

Having data and having the ability to interpret it are not the same thing. Most operators have access to SCADA data, maintenance records, inspection reports, and oil analysis results. The challenge is knowing what to do with them.

A borescope report showing micropitting on intermediate-stage gearbox teeth raises a question, but it does not answer it. Is this within acceptable limits for the age and loading history of the component? Is it progressing, and if so, at what rate? Does it warrant continued monitoring, a change in oil, a planned uptower replacement at the next suitable weather window, a full gearbox exchange, turbine curtailment or an immediate shutdown? The cost difference between these options can be enormous, and the decision requires engineering judgement grounded in an understanding of the specific turbine, its operational history, and the broader fleet context.

This is the predictive layer of the P-F curve in action. The defect has been detected. The question is: where are we on the curve, how fast are we moving along it, and what is the right response?

The same applies to blade inspection findings, tower bolt torque results, vibration trends, and any number of other condition indicators. Each finding needs to be assessed not in isolation, but in the context of the turbine's design basis, its loading history, and its remaining operational life. A finding that is acceptable on a turbine with five years left to run may not be acceptable on one with fifteen.

For operators who have spent years under full-scope contracts, this interpretive capability may not exist in-house - it was never needed. The OEM made the engineering decisions and managed the risk. Building that capability from scratch takes time, and in the interim, the decisions still need to be made.

The Staffing Reality

Many operators recognise the need for dedicated reliability engineering but struggle to resource it practically. A reliability engineer with genuine wind turbine experience, someone who understands drivetrain failure modes, blade structural behaviour, control system dynamics, and the commercial context in which these assets operate, is a specialist hire. For a portfolio of 10 wind farms, particularly where the fleet is a mix of different manufacturers and models, the economics of a permanent full-time role are hard to justify. Furthermore, a mixed portfolio of different manufacturers and vintages presents a breadth of technical knowledge requirements that would stretch even an experienced individual.

This is especially true for smaller operators, community-owned schemes, and portfolios that have grown through acquisition. The engineering challenges are no less complex than those facing a large utility-scale operator, but the budget available to address them is substantially smaller.

Outsourcing to a large consultancy on a project-by-project basis provides access to expertise when needed, but introduces its own limitations. Each engagement begins with a mobilisation period, a data familiarisation phase, and a scoping discussion. The consultant delivers a report, the recommendations are actioned, and the accumulated knowledge of the fleet disperses until the next engagement. There is limited continuity, and the operator pays the learning curve cost each time.

A Retained Model

The alternative is a retained technical partner who maintains an ongoing, evolving understanding of the fleet. Under a retainer arrangement, the scope is not defined by a single problem or a single deliverable. It is defined by an agreed allocation of engineering time, a bank of hours, applied flexibly across whatever the portfolio needs at any given point.

In one month, that might mean reviewing gearbox oil analysis results and assessing whether a blade suction side crack warrants further investigation or continued monitoring. In another, it might be a fleet-wide screening of pitch system alarm data after a pattern has been noticed. In another, it might be supporting the operations and commercial teams in the refurbishment and storage of a generator.

The value of this approach is cumulative. The retained engineers develop familiarity with the specific turbines, the site conditions, the maintenance history, and the operational context. Patterns that would not be visible in a one-off study become apparent when the same fleet is being observed month after month. The cost of each engagement is lower because there is no mobilisation overhead or familiarisation period, and the operator has access to engineering judgement when they need it rather than when a procurement process allows it.

For smaller and more varied fleets, this model is particularly effective. It provides access to the same depth of engineering capability that a large operator might employ in-house, at a fraction of the cost and without the recruitment challenge of finding a single individual who covers every turbine type in the portfolio. The retained partner brings experience across multiple platforms, and the bank-of-hours structure means the operator only pays for what they use.

Prevention, Not Just Response

The operators who manage reliability well are not necessarily the ones with the largest teams or the most expensive monitoring systems. They are the ones who maintain a continuous line of sight on how their turbines are actually operating, who ensure the basics are being done well, and who invest in the engineering capability to interpret what they see and act before problems escalate.

Reliability engineering is not a luxury reserved for large portfolios. It is the discipline that keeps components within their design parameters, catches developing issues while intervention is still straightforward, and ensures that when a significant engineering decision does arise, it is made on the basis of evidence and judgement rather than urgency and guesswork.

Getting Started

At Atharra, we offer retained reliability engineering support, providing operators with flexible access to specialist wind turbine engineering. We will stand alongside you and help embed reliability into your fleet, supporting with activities such as SCADA-based fleet screening, failure investigation, alarm trend analysis, component condition interpretation, maintenance review, and technical support for the engineering decisions that matter most.

If you are interested in exploring how a retained model could work for your portfolio, we would welcome a conversation.

Please get in touch info@atharra.com.