- Advanced Maintenance Approach: Reliability Centered Maintenance
- Applying Key Performance Indicators
- Comprehensive O&M Program
- Cybersecurity for O&M Systems
- Healthy Building O&M
- Integrating and Analyzing Building Information to Support O&M
- Maintenance Approaches
- OMETA: An Integrated Approach to Operations, Maintenance, Engineering, Training, and Administration
- Prioritizing O&M Actions
- Re-tuning Buildings
O&M Best Practice Issue Discussion: Maintenance Approaches
Table of Contents
- Types of Maintenance Programs
- Maintenance Approaches (FEMP 2010)
- Reactive Maintenance
- Preventive Maintenance
- Predictive Maintenance
- Solutions and Actions
- Conclusions and Next Steps
- Additional Resources
- FEMP O&M Best Practices Website
Over the past century, maintenance has evolved from a nuisance cost of doing business to an economic, strategic, and resilience engagement. Where once maintenance was a sub-function within an organization designed to handle equipment failure, it has now been transformed into an essential departmental principle.
As buildings, systems, and equipment become more technologically complex, so too have maintenance approaches, processes, and procedures. What started over 100 years ago as a method of failure response and correction has now transformed to failure analysis; simple preventive activities have given way to predictive functions; and the nuisance cost of doing business has changed into a reliability-based strategic organization within many agencies and industries.
Well-practiced operations and maintenance (O&M) is one of the most cost-effective methods for assuring reliability, safety, resilience, and energy efficiency. Good maintenance practices can generate substantial energy savings and should be considered a resource. Moreover, improvements to facility maintenance programs can often be accomplished immediately and at a relatively low cost.
While there are many definitions of O&M, the following is particularly comprehensive (FEMP 2010):
Operations and Maintenance are the decisions and actions regarding the control and upkeep of property and equipment. These are inclusive, but not limited to, the following: (1) actions focused on scheduling, procedures, and work/systems control and optimization; and (2) performance of routine, preventive, predictive, scheduled and unscheduled actions aimed at preventing equipment failure or decline with the goal of increasing efficiency, reliability, resilience, and safety.
Modern, effective O&M programs rely on four basic approaches: (1) reactive/corrective (includes run-to-failure) O&M—fix or replace when broken; (2) preventive O&M—time-based actions; (3) predictive O&M—fix it before it breaks; and (4) reliability-centered O&M—a strategic combination of the previous three approaches. This Best Practice will discuss the first three approaches and their relevancy in O&M program development.
It has been estimated that O&M programs targeting energy efficiency can save 5% to 20% on energy bills without a significant capital investment (PECI 1999). From small to large sites, these savings can represent thousands to hundreds-of-thousands of dollars each year, and many can be achieved with minimal cash outlays.
The need for effective building O&M is illustrated in Figure 1, which shows how, over time, the performance of a building (and its components) will eventually degrade in one of three scenarios—first, “without normal” maintenance; second, “with normal” maintenance; and third, with “optimal maintenance.” Of interest in the figure is the prolonged service life achieved through effective O&M. Not shown in this figure is the additional benefit of reduced building (energy) operating costs resulting from effectively maintaining mechanical and electrical equipment (e.g., lighting; heating, ventilation, and air conditioning [HVAC]; controls; and on-site generation).
Beyond the potential for significant cost and energy/resource savings, an O&M program operating at its peak operational efficiency has other important implications (FEMP 2010):
- A well-functioning O&M program is a safe O&M program. Equipment is maintained properly, mitigating any potential hazard arising from deferred maintenance.
- In most federal buildings, O&M staff are responsible for not only the comfort but also the health and safety of the occupants. Of increasing productivity (and legal) concern are indoor air quality (IAQ) issues within these buildings. Proper O&M reduces the risks associated with the development of dangerous and costly IAQ situations.
- Properly performed O&M increases the probability that the design-life expectancy of equipment will be achieved and, in some cases, exceeded. Conversely, the costs associated with early equipment failure are usually not budgeted for and often come at the expense of other planned O&M activities.
- An effective O&M program supports site compliance with federal legislation such as the Clean Air Act and Clean Water Act’s energy efficiency requirements, as well as current and future carbon management requirements.
- A well-functioning O&M program is not always answering complaints but is instead proactive in its response and corrects situations before they become problems. This model minimizes callbacks and keeps occupants satisfied while allowing more time for scheduled maintenance.
Types of Maintenance Programs
An accepted need for maintenance is predicated on actual or impending failure. Ideally, maintenance is performed to keep equipment and systems running efficiently for at least the design life of the component(s). Therefore, the practical operation of a component is a time-based function. If one were to graph the failure rate of a component population versus time, it is likely the graph would take the “bathtub” shape shown in Figure 2. In the figure, the Y-axis represents the failure rate and the X-axis is time. From its shape, the curve can be divided into three distinct parts: infant mortality, useful life, and wear-out periods.
The initial infant mortality period of a bathtub curve is characterized by a high failure rate followed by a period of decreasing failure. Many of the failures associated with this region are linked to poor design, poor installation, or misapplication. The infant mortality period is followed by a nearly constant failure rate period known as useful life. There are many theories on why components fail in this region, but most acknowledge that poor O&M often plays a significant role. The general consensus is also that exceptional maintenance practices encompassing preventive and predictive elements can extend this period. The wear-out period is characterized by a rapidly increasing failure rate with time. This period usually encompasses the normal distribution of design-life failures.
Maintenance Approaches (FEMP 2010)
The design life of most equipment requires periodic maintenance. For example, belts need adjustment, alignment needs to be maintained, and proper lubrication on rotating equipment is required. Certain components usually need replacement (e.g., a bearing on a distribution fan or pump) to improve the probability that the main piece of equipment (in this case, the HVAC system) lasts for its design life.
Any time we fail to perform maintenance activities intended by the equipment’s designer, we risk shortening the operating life of the equipment. Over the last 40 years and starting with the U.S. airline industry, different approaches to how maintenance can be performed to assure equipment reaches or exceeds its design life have been developed. In addition to waiting for a piece of equipment to fail (reactive maintenance), well-defined programs now utilize preventive maintenance (time-based actions), predictive maintenance (fix before failure), or the optimization of the three in a reliability-centered maintenance approach.
Reactive maintenance (also called corrective maintenance) is the “run it until it breaks” maintenance approach. No actions or efforts are taken to maintain the equipment as the designer originally intended to assure design life is reached. Reactive maintenance is almost always unscheduled because of the unpredicted failure. The sole function of reactive maintenance is to restore the device or system to a functioning condition after the failure has occurred; this may include device repair or replacement.
The benefits to reactive maintenance can be viewed as a double-edged sword. If we are working with new equipment, we can expect minimal incidents of failure at the beginning of its operating life. If our maintenance program is purely reactive, we will not expend labor costs or incur capital cost until something breaks. Because we do not see any associated maintenance cost, we could view this period as saving money.
The risk increases during that time frame when we believe we are saving maintenance and capital cost but are actually spending more money than we would have under a different maintenance approach. We have more expenses associated with capital cost because while waiting for the equipment to break, we are shortening the life of the equipment and increasing replacement frequency. While we may incur cost upon failure of the primary device associated with the equipment’s failure, we may also cause the failure of a secondary device, and these additional costs can be significant. We would not have experienced these significantly increased costs if our maintenance program was more active. Our labor costs associated with repair will probably be higher than normal because the failure will likely require more extensive repairs than would have been needed if the piece of equipment had not been run to failure. There is a possibility that the piece of equipment will fail during off hours or close to the end of the normal workday. If it is a critical piece of equipment that needs to be back online quickly, we must pay maintenance overtime cost and run the risk-critical service, mission, or tenant activity disruption. Because we expect to run equipment to failure, we will require a large material inventory of repair parts. This is a cost we could minimize using a different maintenance strategy.
- Low cost.
- Less staff.
- Increased cost due to unplanned equipment downtime.
- Increased labor cost, especially if overtime is needed.
- Cost involved with repair or replacement of equipment.
- Possible secondary equipment or process damage from equipment failure.
- Inefficient use of staff resources.
Preventive maintenance can be defined as actions performed on a time- or machine-run, time-based schedule that detect, preclude, or mitigate degradation of a component or system with the aim of sustaining or extending its useful life by controlling degradation to an acceptable level.
The U.S. Navy pioneered preventive maintenance as a means to increase the reliability of their vessels. By simply expending the necessary resources to conduct maintenance activities intended by the equipment designer, equipment life is extended and its reliability is increased. In addition to an increase in reliability, monetary savings are greater than those of a program that only uses reactive maintenance. Studies indicate that this savings can amount to as much as 12% to 18% on average. Depending on the facilities’ current maintenance practices, present equipment reliability, and facility downtime, there is little doubt that many facilities purely reliant on reactive maintenance could reduce costs by much more than 18% by instituting a proper preventive maintenance program (FEMP 2010).
While preventive maintenance is not the optimum maintenance program, it does have several advantages over that of a purely reactive program. By performing the preventive maintenance as the equipment designer envisioned, we will extend the life of the equipment closer to its design. This translates into monetary savings as well. Preventive maintenance (lubrication, filter change, etc.) will generally run the equipment more efficiently, resulting in cost reduction over a reactive approach. While this approach will not prevent catastrophic equipment failures, we will decrease the number of failures. Minimizing failures results in maintenance and capital cost savings through reduced downtime and impacts to the facility, its mission, and tenants.
- Cost effective in many capital-intensive processes.
- Flexibility allows for the adjustment of maintenance periodicity.
- Increased component life cycle.
- Energy savings.
- Reduced equipment or process failure.
- Estimated 12% to 18% cost savings over reactive maintenance program.
- Catastrophic failures still likely to occur.
- Labor intensive.
- Includes performance of unneeded maintenance.
- Potential for incidental damage to components while conducting unneeded maintenance.
Predictive maintenance can be defined as follows: Measurements that detect the onset of system degradation (lower functional state), thereby allowing causal stressors to be eliminated or controlled before any significant deterioration in the component’s physical state. These measurements and their analysis and results are an indication of current and future system functionality.
Predictive maintenance differs from preventive maintenance by basing maintenance need on the actual condition of the machine rather than on a preset schedule. Recall that preventive maintenance is time based. Activities such as changing lubricant are based on time intervals, like calendar time or equipment run time. For example, most people change the oil in their vehicles every 3,000 to 5,000 miles traveled. This effectively bases oil change needs on equipment run time. No concern is given to the actual condition and performance capability of the oil. It is changed because it is time to do so. This methodology would be analogous to a preventive maintenance task. If, on the other hand, the operator of the car discounted the vehicle run time and had the oil analyzed at some periodicity to determine its actual condition and lubrication properties, they may be able to extend the oil change until the vehicle had traveled 10,000 miles. Of course, the cost and logistics to analyze a passenger vehicle’s oil may be prohibitive. However, performing this analysis on a large rotary screw compressor where oil volumes are measured in gallons may make more sense.
The fundamental difference between predictive and preventive maintenance is that predictive maintenance schedules maintenance tasks based on the quantified condition of the equipment, whereas preventive maintenance schedules maintenances tasks solely based on time.
Traditional predictive maintenance technologies include:
- Oil analysis.
- Ultrasonic analysis.
- Vibration analysis.
- Motor analysis.
- Performance trending and root-cause analysis.
The benefits of predictive maintenance are many. A well-orchestrated predictive maintenance program will all but eliminate catastrophic equipment failures. This type of program allows the scheduling of maintenance activities to minimize or delete overtime cost. Inventory stock can be minimized and parts ordered well ahead of time to support downstream maintenance needs. Equipment operation can be optimized, saving energy cost and increasing plant reliability. Past studies have estimated that a properly functioning predictive maintenance program can provide a savings of 8% to 12% over a program that utilizes preventive maintenance alone. Depending on a facility’s reliance on reactive maintenance and material condition, it could easily recognize savings opportunities exceeding 30% to 40% (FEMP 2010).
- Increased component operational life and availability.
- Allows for preemptive corrective actions.
- Decrease in equipment or process downtime.
- Decreased disruptions to mission and/or tenant activities.
- Decrease in costs for parts and labor.
- Better product quality.
- Improved worker and environmental safety.
- Improved worker morale.
- Energy savings.
- Estimated 8% to 12% cost savings over preventive maintenance program.
- Increased investment in diagnostic equipment.
- Increased investment in staff training.
- Savings potential not readily seen by management.
Figure 3 summarizes the benefits and risks of the major maintenance approaches.
Solutions and Actions
The key to optimizing an O&M program is having an understanding of the major O&M approaches, including how each is defined and what their strengths and weaknesses are. With this understanding, an O&M approach can be developed, assessed, tracked, and improved.
The first step of this process is to “baseline” the O&M program. The goal of this activity is to assess your program by assigning the relative percentage of program activity (time or cost) into each of the three O&M approach categories described here; the fourth approach (RCM) will be presented in an upcoming publication. While few organizations track these metrics at highly refined intervals, an organized O&M program should be able to make decent estimates (within 5% or 10%) of how much a program’s activity can be allocated to each of the three approach categories.
Once category percentages are estimated, these should be reviewed, shared with other departments for input and comment, and finalized. Once finalized, these baseline values become the center from which O&M program assessment, tracking, and improvements can be defined and implemented.
Every O&M program will have its own optimal mix of representation in the three approach categories. For federal sites, this mix will typically include representation in each of the three categories. Table 1 presents benchmarks for maintenance approaches.
Typical Industry Range
Less than 10%
1. Adapted from FEMP 2010.
Once your O&M program has been defined by approach category, activities and goals should be developed to move toward a target goal range and ultimately to a best-in-class range. Other O&M Best Practices in this series address O&M program optimization to develop and maintain a best-in-class O&M program (see Additional Resources below).
Conclusions and Next Steps
Understanding the predominant O&M approaches is a critical first step to developing a high-functioning O&M program. This starts with characterizing your program and looking for opportunities and risks within your current function. Improving the balance between reactive, preventive, and predictive approaches will improve the overall health of your O&M program.
As the program develops, these three approaches become optimized in a defined way specific to your site, buildings, systems, and equipment. This optimization is generally referred to as reliability-centered maintenance and is discussed in the O&M Best Practices Issue - An Advanced Maintenance Approach: Reliability Centered Maintenance.
FEMP O&M Best Practices Website
FEMP O&M Best Practices have been developed for a variety of topics. The Best Practices listed below have been cited in this document, with additional documents available at the FEMP O&M Best Practices website.
- Approaches for Healthy Building Operations and Maintenance (O&M) in Existing Buildings.
- Existing Building Commissioning Approaches Summary.
- Key Performance Indicators.
- OMETA: An Integrated Approach to Operations, Maintenance, Engineering, Training, and Administration.
- Asset. Maintenance term commonly taken to be any item of a physical plant or equipment.
- Backlog. Work that has not been completed by the nominated required-by date. The overdue period for a work order is the difference between the current and required-by dates.
- Benchmarking. The process of comparing performance with other metrics or organizations.
- Downtime. The time that an item of equipment is out of service because of failure.
- Failure. An item of equipment has suffered a failure when it is no longer capable of fulfilling one or more of its intended functions.
- Key Performance Indicators. A select number of key measures that enable performance against targets to be monitored.
- Mean Time Between Failures (MTBF). A measure of equipment reliability equal to the number of failures in a given time period, divided by the total equipment uptime in that period.
- Reliability. The ability of an asset to continue performing its intended functions. Normally measured by mean time between failures.
- Repair. Any activity that returns the capability of a failed asset to a performance level equal to or greater than that specified by its functions, but not greater than its original maximum capability. An activity that increases the maximum capability of an asset is a modification.
- Risk. The potential for the realization of the unwanted, negative consequences of an event.
- Shutdown. That period of time when equipment is out of service.
- Unscheduled Maintenance. Any maintenance work that has not been included on an approved maintenance schedule before its commencement.
- Uptime. The time that an item of equipment is in service and operating.
- Useful Life. The maximum length of time that a component can be left in service before it will start to experience a rapidly increasing probability of failure.
FEMP – Federal Energy Management Program. 2010. Operations & Maintenance Best Practices: A Guide to Achieving Operational Efficiency. Release 3.0. Prepared by Pacific Northwest National Laboratory for FEMP, Richland, WA. https://www.wbdg.org/FFC/DOE/DOECRIT/femp_omguide.pdf.
NRC – National Research Council. 1998. Stewardship of Federal Facilities: A Proactive Strategy for Managing the Nation’s Public Assets. The National Academies Press, Washington, D.C. https://www.nap.edu/catalog/6266/stewardship-of-federal-facilities-a-proactive-strategy-for-managing-the
PECI – Portland Energy Conservation, Inc. 1999. Operations and Maintenance Assessments. Portland Energy Conservation, Inc. Published by the U.S. Environmental Protection Agency and the Department of Energy, Washington D.C. https://www.energystar.gov/sites/default/files/buildings/tools/Operations%20and%20Maintenance%20Assessments.pdf
Actions and activities recommended in this Best Practice should only be attempted by trained and certified personnel. If such personnel are not available, the actions recommended here should not be initiated.
Published July 2021