“…there’s a cat in a box somewhere who’s alive and dead at the same time (although if they don’t ever open the box to feed it, it’ll eventually just be two different kinds of dead)…” — Neil Gaiman, in American Gods
Erwin Schrödinger never intended his thought experiment about a cat in box to launch a thousand quantum physics jokes. He intended to convince Albert Einstein, as he eventually did and as Einstein put it, that “one cannot get around the assumption of reality, if only one is honest.”
Like Schrödinger’s cat, independent layers of protection (IPL) simultaneously exist in two possible states until called upon to act—ready to succeed on demand and ready to fail on demand. Its not until there is a demand that this resolves into single state. Because we don’t want to wait until there is a demand that poses a real hazard to us to learn how it will resolve, we check on our IPLs with proof tests.
But checking on our IPLs usually takes them out of service while we do the proof test. How do we account for that?
One of the key characteristics of an IPL is its probability of failure on demand (PFD). This tells us how likely it is that the IPL won’t work when we need it to at any specific time. The PFD usually increases as time goes on, logarithmically approaching 100% given enough time. In other words, everything fails.
However, with repairable systems, we are interested in the average probability of failure on demand (PFDavg). We also call this the Unavailability of the system. (Availability is 1 – Unavailability). For calculations of unavailability, which is independent of time, it’s more valuable to consider the average probability over the relevant span of time than the specific probability at a moment in time.
The smaller the PFDavg, the more we like the IPL. The risk reduction provided by the IPL is measured by the inverse of the PFDavg, which is the Risk Reduction Factor (RRF). An IPL with a PFDavg of 0.1 has an average probability of failing of 10% when there is a demand. That means that it has an RRF of 10. With a PFDavg of 0.01, an IPL has an average probability of failing of 1% when there is a demand, and an RRF of 100.
Unavailability During a Proof Test
An IPL, especially a safety instrumented function (SIF), is not available to provide protection during a proof test. For 100% of the time that an IPL is being tested or repaired, it is unavailable. The risk reduction provided by the IPL, however, is based on the PFDavg of the IPL during the entire time it exists, not during any specific time. So, while the Unavailability of a SIF is 100% during the proof test, the PFDavg for the IPL takes into account that it will need to be tested periodically and that it is going to fail and require repair occasionally.
The contribution of the periodic proof tests and occasional repairs is usually negligible. Consider an IPL with a PFDavg of 0.01 and assume for a moment that the PFDavg does not account for the time it is unavailable for proof tests and repairs. That unavailability needs to be included.
In the case of a proof test that takes 4 hours and occurs every year, the contribution to unavailability is
(4 hours/1 year) / (24 hours/day x 365 days/year) = 0.00046
In the case of a repair that takes 48 hours and occurs every two years, the contribution to unavailability is
(48 hours/2 years) / (24 hours/day x 365 days/year) = 0.0027
The total PFDavg, accounting for proof tests and repairs as extra sources of unavailability, then, is
PFDavg = 0.01 + 0.00046 + 0.0027 = 0.013
On an order-of-magnitude basis, there is no difference between 0.01 and 0.013. More importantly, however, is that the reported values of PFDavg for non-instrumented IPLs and the calculated values of PFDavg for SIFs do take repair rates and proof test times into account. They are not extras but are built into the values. The software that we use, for instance, uses 72 hours as the mean time to repair as the default when calculating the PFDavg for SIFs.
The Mechanical Integrity (MI) element of the Process Safety Management (PSM) standard, 29 CFR 1910.119(j) has something to say about proof tests and repairs. In paragraph (4), it says that “Inspections and tests shall be performed on process equipment,” and that “The frequency of inspections and tests of process equipment shall be consistent with applicable manufacturers’ recommendations and good engineering practices….”
So, proof tests must be done.
The MI element also has something to say about repairs. In paragraph (5) it says, “The employer shall correct deficiencies in equipment that are outside acceptable limits (defined by the process safety information in paragraph (d) of this section) before further use or [emphasis added] in a safe and timely manner when necessary means are taken to assure safe operation.”
First, this means that if it’s broken, it must be fixed. It also means that safe operation must be assured while it is being repaired or the equipment must be shut down. When it comes to IPLs, safe operation means that they are providing the necessary risk reduction.
Not a Free Pass
There is no need to worry excessively about the brief time that an IPL is out of service while you perform proof tests, preventative maintenance, and repairs. When this time is kept brief—a few hours, a couple of days—the impact on PFDavg is already accounted for, either in the PFDavg calculations for SIFs or in the literature values reported for other IPLs. However, when the time stretches beyond a couple of days into several days, or weeks, or longer, the impact increases to a significant amount.
For protective equipment covered under the PSM Standard, letting the time stretch beyond a few hours or couple of days without shutting down the process that relies on the risk reduction provided by that protective equipment is also a violation of the standard.
While you don’t need to worry excessively, there is not a free pass to just let things slide.
Call to Action
The best way to make sure that unavailability that results from proof tests, preventative maintenance or repairs is kept to a minimum is to plan for those activities and assure that appropriate spare parts are available. Those plans should include planning for a demand to occur while doing those activities. Good planning will further reduce the time that the protective measure is out of service, which minimizes impact of unavailability.