In industries ranging from oil platform operations to e-commerce, people often need to calculate the mean time between failures to stay ahead of potential problems. MTBF prediction can address many problems, but it's important to not treat it as a magic wand. You should understand what goes into an MTBF calculation so you can better apply predictions.
Logging
The bedrock of MTBF prediction and calculation work is logging failure data. You need to collect information so you can make grounded decisions.
An injection molding company might need to figure out how long it can run its main systems before shutting them down to clean the nozzles, for example. They would run the systems normally to collect data through logging. After collecting a statistically significant amount of data, the company would then analyze the data to perform an MTBF calculation. Depending on the results, the company can then decide if it needs to increase or decrease the time between scheduled maintenance cycles.
Failures
Notably, there should always be a clear distinction in how you code failures in the logs. If someone elects to shut down the machine and perform maintenance, that doesn't count as a failure. A quality MTBF calculation database should clearly distinguish between voluntary shutdowns and incidents that compelled shutdowns due to failure risks. Likewise, it's a good idea to categorize incidents in the log so you can distinguish between types of failures.
Averaging
The term "mean" is the preferred usage in the world of statisticians. Many people who've been through middle or high school math are likely familiar with the concept, but their teachers probably taught it to them as average.
Suppose you have two machines. One of them experiences a failure every 20 hours, and the second one experiences a failure every 30 hours. The mean would be 25 hours.
Computational Power
Most real-world applications are not going to be as simple as that example. Systems often operate for thousands of hours at a time. A well-configured web server with reasonable traffic, for example, could go months or even years between failures that disrupt its operations. Similarly, a multinational corporation with an expansive customer base could deploy hundreds or thousands of such servers.
As you expand the pool of systems in the dataset, that makes the math more complex. How do you cope with the rapidly increasing complexity of large systems? You use computing power. The typical MTBF prediction results from computers running millions of analyses to examine many possible worst-case scenarios.
For more information, go to websites of companies that can help with this.