Error proofing is a structured approach to ensuring quality all the way through your work processes. This approach enables you to improve your production or business processes to prevent specific errors—and, thus, defects—from occurring. Error-proofing methods enable you to discover sources of errors through fact-based problem solving. The focus of error proofing is not on identifying and counting defects. Rather, it is on the elimination of their cause: one or more errors that occur somewhere in the production process. The distinction between an error and a defect is as follows:
- An error is any deviation from a specified manufacturing or business process. Errors cause defects in products or services.
- A defect is a part, product, or service that does not conform to specifications or a customer’s expectations. Defects are caused by errors. The goal of error proofing is to create an error-free production environment. It prevents defects by eliminating their root cause, which is the best way to produce high-quality products and services.
Shigeo Shingo is widely associated with a Japanese concept called poka-yoke (pronounced poker-yolk-eh) which means to mistake proof the process. Mr. Shingo recognized that human error does not necessarily create resulting defects. The success of poka-yoke is to provide some intervention device or procedure to catch the mistake before it is translated into nonconforming product. Shingo lists the following characteristics of poka-yoke devices:
– They permit 100% inspection
– They avoid sampling for monitoring and control
– They are inexpensive
Poka-yoke devices can be combined with other inspection systems to obtain near zero defect conditions.
Error proofing in Lean organization
For your organization to be competitive in the marketplace, you must deliver high-quality products and services that exceed your customers’ expectations. You cannot afford to produce defective products or services. A lean enterprise strives for quality at the source. This means that any defects that occur during one operation in a manufacturing or business process should never be passed on to the next operation. This ensures that your customers will receive only defect-free products or services. In a “fat” system, any defects that are found can simply be discarded while operations continue. These defects are later counted, and if their numbers are high enough, root-cause analysis is done to prevent their recurrence. But in a lean enterprise, which concentrates on producing smaller batch sizes and producing to order versus adding to inventory, a single defect can significantly impact performance levels. When a defect occurs in a lean enterprise, operations must stop while immediate action is taken to resolve the situation. Obviously, such pauses in operations can be costly if defects occur often. Therefore, it is important to prevent defects before they can occur. Your organization can achieve zero errors by understanding and implementing the four elements of error proofing. These are as follows:
- General inspection.
- 100% inspection.
- Error-proofing devices.
- Immediate feedback.
The first, and most important, element of error proofing is inspection. There are three types of inspections that organizations commonly use.
- Source inspections. Source inspections detect errors in a manufacturing process before a defect in the final part or product occurs. The goal of source inspections is to prevent the occurrence of defects by preventing the occurrence of errors. In addition to catching errors, source inspections provide feedback to employees before further processing takes place. Source inspections are often the most challenging element of error proofing to design and implement.
- Judgment inspections. Often referred to as end-of the-line inspections, final inspections, or dock audits, these are inspections during which a quality inspector or operator compares a final product or part with a standard. If the product or part does not conform, it is rejected. This inspection method has two drawbacks. First, it might not prevent all defects from being shipped to customers. Second, it increases the delay between the time an error occurs and the time a resulting defect is discovered. This allows the production process to continue to make defective products and makes root-cause analysis difficult. If you rely on judgment inspections, it’s important to relay inspection results to all the earlier steps in your production process. This way, information about a defect is communicated to the point in the process at which the problem originated.
- Informative inspections. Informative inspections provide timely information about a defect so that root-cause analysis can be done and the production process can be adjusted before significant numbers of defects are created. Typically, these inspections are done close enough to the time of the occurrence of the defect so that action can be taken to prevent further defects from occurring.
There are two types of informative inspections. They are as follows:
- Successive inspections. These inspections are performed after one operation in the production process is completed, by employees who perform the next operation in the process. Feedback can be provided as soon as any defects are detected (which is preferable) or simply tracked and reported later. It is always better to report defects immediately.
- Self-inspections. Operators perform self inspections at their own workstations. If an operator finds a defect in a product or part, he/ she sets it aside and takes action to ensure that other defective products or parts are not passed on to the next operation. The root cause of the defect is then determined and corrected. Often this involves putting error-proofing measures and devices in place to prevent the problem from recurring. Industrial engineering studies have shown that human visual inspection is only about 85% effective. Similar inaccuracies happen when humans directly measure physical properties, such as pressure, temperature, time, and distance. Use electronic or mechanical inspection devices to achieve better accuracy. Operator self-inspection is the second most effective type of inspection. It is much more effective and timely than successive inspection. The number of errors detected depends on the diligence of the operator and the difficulty of detecting the defect. Wherever practical, empower operators tostop the production line whenever a defect is detected. This creates a sense of urgency that focuses employees’ energy on prevention of the defect’s recurrence. It also creates the need for effective source inspections and self-inspections.
The second element of error proofing is 100% inspection, the most effective type of inspection. During these inspections, a comparison of actual parts or products to standards is done 100% of the time at the potential source of an error. The goal is to achieve 100% real-time inspection of the potential process errors that lead to defects. It is often physically impossible and too time-consuming to conduct 100% inspection of all products or parts for defects. To help you achieve zero defects, use low-cost error-proofing devices to perform 100% inspection of known sources of error. When an error is found, you should halt the process or alert an operator before a defect can be produced. Zero defects is an achievable goal! Many organizations have attained this level of error proofing. One of the largest barriers to achieving it is the belief that it can’t be done. By changing this belief among your employees, you can make zero defects a reality in your organization. Statistical process control (SPC) is the use of mathematics and statistical measurements to solve your organization’s problems and build quality into your products and services. When used to monitor product characteristics, SPC is an effective technique for diagnosing process-performance problems and gathering information for improving your production process. But because SPC relies on product sampling to provide both product and process characteristics, it can detect only those errors that occur in the sample that you analyze. It gives a reliable estimate of the number of total defects that are occurring, but it cannot prevent defects from happening, nor does it identify all the defective products that exist before they reach your customers.
The third element of error proofing is the use of error proofing devices: physical devices that enhance or substitute for the human senses and improve both the cost and reliability of your organization’s inspection activities. You can use mechanical, electrical, pneumatic, or hydraulic devices to sense, signal, or prevent existing or potential error conditions and thus achieve 100% inspection of errors in a cost-effective manner. Common error-proofing devices include the following:
- Guide pins of different sizes that physically capture or limit the movement of parts, tooling, or equipment during the production process.
- Limit switches, physical-contact sensors that show the presence and/or absence of products and machine components and their proper position.
- Counters, devices used to count the number of components, production of parts, and availability of components.
- Alarms that an operator activates when he/she detects an error.
- Checklists, which are written or graphical reminders of tasks, materials, events, and so on.
Such industrial sensing devices are the most versatile error-proofing tools available for work processes. Once such a device detects an unacceptable condition, it either warns the operator of the condition or automatically takes control of the function of the equipment, causing it to stop or correct itself. These warning and control steps, known as regulatory functions. These sensing devices can detect object characteristics by using both contact and non-contact methods. Contact sensors include micro-switches and limit switches; non-contact methods include transmitting and reflecting photoelectric switches. Setting functions describe specific attributes that sensing devices need to inspect. All of the four setting functions listed below are effective error-detection methods:
- Contact methods involve inspecting for physical characteristics of an object, such as size, shape, or color, to determine if any abnormalities exist.
Example: A sensor receives a reflective signal (sparks) only when the flint wheel is installed correctly.
- Fixed-value setting functions inspect for a specific number of items, events, and so on, to determine if any abnormalities exist. This technique is often used to ensure that the right quantity of parts has been used or the correct number of activities has been performed.
Example: All materials must be used to assemble a case, including eight screws. A counter on the drill keeps track of the number of screws used. Another method is to package screws in groups of eight.
- Motion-step setting functions inspect the sequence of actions to determine if they are done out of order.
Example: Materials are loaded into a hopper in a predetermined sequence. If the scale does not indicate the correct weight for each incremental addition, a warning light comes on.
- Information-setting functions check the accuracy of information and its movement over time and distance to determine if any gaps or errors exist. Here are some tips for using information-setting functions:
- To capture information that will be needed later, use work logs, schedules, and action lists.
- To distribute information accurately across distances, you can use e-mail, bar-coding systems, radio frequency devices, voice messaging systems, and integrated information systems, such as enterprise resource planning (ERP).
Example: Inventory placed in a temporary storage location must be accurately entered into the storeroom system for later retrieval during the picking operation. Bar-coding is used to identify part numbers and bin locations. This data is transferred directly from the bar-code reader to the storeroom system. Customers access the storeroom system through the internet.
The fourth element of error proofing is immediate feedback. Because time is of the essence in lean operations, giving immediate feedback to employees who can resolve errors before defects occur is vital to success. The ideal response to an error is to stop production and eliminate the source of the error. But this is not always possible, especially in continuous batch or flow operations. You should determine the most cost-effective scenario for stopping production in your work process when an error is detected. It is often better to use a sensor or other error-proofing device to improve feedback time rather than relying on human intervention. Methods for providing immediate feedback that use sensing devices are called regulatory functions. When a sensing device detects an error, it either warns an operator of the condition or makes adjustments to correct the error. There are two types of regulatory functions.
- The warning method: It does not stop operations but provides various forms of feedback for the operator to act upon. Common feedback methods include flashing lights or unusual sounds designed to capture an operator’s attention.
Example: A clogged meter sets off a warning light on a control panel. However, the operator can still run the mixer and produce bad powder.
- The control method: This method is preferred for responding to error conditions, especially where safety is a concern. However, it can also be a more frustrating method for the operator if a machine continually shuts itself down.
Example: A mixer will not operate until the water meter is repaired. The preventive maintenance program should have “meter visual inspections” on its schedule, and spare nozzles should be made available.
Warning methods are less effective than control methods because they rely on the operator’s ability to recognize and correct the situation. If the operator does not notice or react to the error quickly enough, defective parts or products will still be produced. However, warning methods are preferred over control methods when the automatic shutdown of a line or piece of equipment is very expensive.
- The warning method: It does not stop operations but provides various forms of feedback for the operator to act upon. Common feedback methods include flashing lights or unusual sounds designed to capture an operator’s attention.
Some common sources of errors
Common sources of error include humans, methods, measurements, materials, machines, and environmental conditions. These are examined in detail below. Any one of these factors alone, or any combination of them, might be enough to cause errors, which can then lead to defects.
Unfortunately, human error is an unavoidable reality. The reasons are many.
- Lack of knowledge, skills, or ability. This happens when employees have not received proper training to perform a task and their skill or knowledge level is not verified.
- Mental errors. These include slips and mistakes. Slips are subconscious actions. They usually occur when an experienced employee forgets to perform a task. Mistakes are conscious actions. They occur when an employee decides to perform a task in a way that results in an error.
- Sensory overload. A person’s ability to perceive, recognize, and respond to stimuli is dramatically affected by the sharpness of the five senses. When an employee’s senses are bombarded by too many stimuli at once, sensory overload results, and his/her senses are dulled. This increases the chance for error.
- Mechanical process errors. Some tasks are physically difficult to do and are thus prone to error. They can result in repetitive-strain injuries and physical exhaustion, which are both known to cause errors.
- Distractions. There are two types of distractions: internal and external. External distractions include high-traffic areas, loud conversations, and ringing phones. Emotional stress and daydreaming are examples of internal distractions. Both types can lead to errors.
- Loss of memory. Many work tasks require employees to recall information that can be forgotten. In addition, aging, drug or alcohol use, and fatigue can all cause memory loss and lead to errors.
- Loss of emotional control. Anger, sorrow, jealousy, and fear often work as emotional blinders, hampering employees’ ability to work effectively
Measurements must be accurate, repeatable, and reproducible if they are to successfully locate a problem. Unfortunately, measurement devices and methods are as equally prone to error as the processes and products that they measure. Inspection measurement practices, measurement graphs and reports, and measurement definitions are all potential sources of misinterpretation and disagreement. For instance, a measurement scale’s being out of calibration can cause errors. Don’t be surprised if a root-cause analysis points to measurement as the source of an error. An accurate measurement is the product of many factors, including humans, machines, and methods.
Industry experts believe that nearly 85% of the errors that occur in a work process are caused by the tasks and technology involved in the process. The sources of error in a work process are as follows:
- Process steps. These are the physical or mental steps that convert raw materials into products, parts, or services.
- Transportation. This refers to the movement of materials, information, people, and technology during a work process.
- Decision making. This is the process of making a choice among alternatives. Make sure that all your employees’ decisions address six basic questions: Who? What? When? Where? How? Why?
- Inspections. These are activities that compare the actual to the expected. As noted above, they are prone to error.
The area of work processes is the one where lean enterprises make the largest gains in error reduction and quality improvement. Concentrate your organizational efforts on this area.
This factor can contribute to error in the following ways:
- Use of the wrong type or amount of raw materials or use of incompatible raw materials, components, or finished products.
- Inherent product, tool, or equipment designs. A root-cause analysis typically leads back to faulty manufacturing, materials handling, or packaging practices.
- Missing or ill-designed administrative tools (e.g., forms, documents, and office supplies) that do not support performance requirements.
Machine errors are classified as either predictable or unpredictable. Predictable errors are usually addressed in a preventative or scheduled maintenance plan. Unpredictable errors, which are caused by varying machine reliability, should be considered when your organization purchases equipment. If satisfactory machine reliability cannot be achieved, then you must plan other ways to prevent and catch machine-related errors.
Poor lighting, excessive heat or cold, and high noise levels all have a dramatic affect on human attention levels, energy levels, and reasoning ability.
In addition, unseen organizational influences—such as pressure to get a product shipped, internal competition among employees, and pressure to achieve higher wage levels—all affect quality and productivity. Error-proofing devices and techniques can be used for some, but not all, sources of environmentally caused errors. Often an organization’s operating and personnel policies must be revised to achieve a goal of zero defects.
The probability that errors will happen is high in certain types of situations. These so-called red-flag conditions include the following:
- Lack of an effective standard. Standard operating procedures (SOPs) are reliable instructions that describe the correct and most effective way to get a work process done. Without SOPs, employees cannot know the quality of the product or service they produce or know with certainty when an error has occurred. In addition, when there are no SOPs, or if the SOPs are complicated or hard to understand, variations can occur in the way a task is completed, resulting in errors.
- Symmetry. This is when opposite sides of a part, tool, material, or fixture are, or seem to be, identical. The identical sides of a symmetrical object can be confused during an operation, resulting in errors.
- Asymmetry. This is when opposite sides of a part, tool, material, or fixture are different in size, shape, or relative position. Slight differences are difficult to notice in asymmetrical parts, leading to confusion, delays, or errors.
- Rapid repetition. This is when the same action or operation is performed quickly, over and over again. Rapidly repeating a task, whether manually or by machine, increases the opportunity for error.
- High or extremely high volume. This refers to rapidly repeated tasks that have a very large output. Pressure to produce high volumes makes it difficult for an employee to follow the SOPs, increasing the opportunity for errors.
- Poor environmental conditions. Dim lighting, poor ventilation, inadequate housekeeping, and too much traffic density or poorly directed traffic can cause errors. The presence of foreign materials (e.g., dirt or oils), overhandling, and excessive transportation can also result in errors or damaged products and parts.
- Adjustments. These include bringing parts, tooling, or fixtures into a correct relative position.
- Tooling and tooling changes. These occur when any working part of a power-driven machine needs to be changed, either because of wear or breakage or to allow production of different parts or to different specifications.
- Dimensions, specifications, and critical conditions. Dimensions are measurements used to determine the precise position or location for a part or product, including height, width, length, and depth. Specifications and critical conditions include temperature, pressure, speed, tension coordinates, number, and volume. Deviation from exact dimensions or variation from standards leads to errors.
- Many or mixed parts. Some work processes involve a wide range of parts in varying quantities and mixes. Selecting the right part and the right quantity becomes more difficult when there are many of them or when they look similar.
- Multiple steps. Most work processes involve many small operations or sub-steps that must be done, often in a preset, strict order. If an employee forgets a step, does the steps in an incorrect sequence, or mistakenly repeats a step, errors occur and defects result.
- Infrequent production. This refers to an operation or task that is not done on a regular basis. Irregular or infrequent performance of a task leads to the increased likelihood that employees will forget the proper procedures or specifications for the task. The risk of error increases even more when these operations are complicated.
Always use data as a basis for making adjustments in your work processes. Using subjective opinion or intuition to make adjustments can result in errors—and eventually defects. Any change in conditions can lead to errors that in turn lead to defects. For instance, wear or degradation of production equipment produces slow changes that occur without the operator’s awareness
and can lead to the production of defective parts.
A Review of Human Error
A brief review of the concepts and language of human error will be useful. Human error has been studied extensively by cognitive psychologists. Their findings provide concepts and language that are vital to this discussion.
Errors of Intent vs. Errors in Execution
The process humans use to take action has been described in several ways. One description divides the process into two distinct steps:
- Determining the intent of the action.
- Executing the action based on that intention. Failure in either step can cause an error.
Norman divided errors into two categories, mistakes and slips. Mistakes are errors resulting from deliberations that lead to the wrong intention. Slips occur when the intent is correct, but the execution of the action does not occur as intended. Generally, error-proofing requires that the correct intention be known well before the action actually occurs. Otherwise, process design features that prevent errors in the action could not be put in place. Rasmussen and Reason divide errors into three types, based on how the brain controls actions. They identify skill-based, rule-based, and knowledge-based actions. Their theory is that the brain minimizes effort by switching among different levels of control, depending on the situation. Common activities in routine situations are handled using skill-based actions, which operate with little conscious intervention. These are actions that are done on “autopilot.” Skill-based actions allow you to focus on the creativity of cooking rather than the mechanics of how to turn on the stove. Rule-based actions utilize stored rules about how to respond to situations that have been previously encountered. When a pot boils over, the response does not require protracted deliberations to determine what to do. You remove the pot from the heat and lower the temperature setting before returning the pot to the burner. When novel situations arise, conscious problem solving and deliberation are required. The result is knowledge-based actions. Knowledge-based actions are those actions that use the process of logical deduction to determine what to do on the basis of theoretical knowledge. Every skill- and rule-based action was a knowledge-based action at one time. Suppose you turn a burner on high but it does not heat up. That is unusual. You immediately start to troubleshoot by checking rule-based contingencies. When these efforts fail, you engage in knowledge-based problem solving and contingency planning. Substantial cognitive effort is involved.
Knowledge in the Head vs. knowledge in the World
Norman introduces two additional concepts that will be employed throughout this book. He divides knowledge into two categories:
- Knowledge in the head is information contained in human memory
- Knowledge in the world is information provided as part of the environment in which a task is performed
Historically, organization has focused on improving knowledge in the head. A comprehensive and elaborate Quality manual is an example of knowledge in the head. A significant infrastructure has been developed to support this dependence on memory, including lengthy standard operating procedures that indicate how tasks are to be performed. These procedures are not intended to be consulted during the actual performance of the task, but rather to be committed to memory for later recall. Retaining large volumes of instructions in memory so that they are ready for use requires significant ongoing training efforts. When adverse events occur, organizational responses also tend to involve attempts to change what is in the memory of the worker. These include retraining the worker who errs, certifying (i.e., testing) workers regularly, attempting to enhance and manage worker attentiveness, and altering standard operating procedures. The passage of time will erase any gains made once the efforts to change memory are discontinued.
Putting “knowledge in the world” is an attractive alternative to trying to force more knowledge into the head. Knowledge can be put in the world by providing cues about what to do. This is accomplished by embedding the details of correct actions into the physical attributes of the process. In manufacturing, for example, mental energies that were used to generate precise action and monitor compliance with procedures stored in memory are now freed to focus on those critical, non-routine deliberations required for the best possible customer satisfaction. How do you recognize knowledge in the world when you see it? Here is a crude rule of thumb: if you can’t take a picture of it in use, it probably is not knowledge in the world. Error-proofing involves changing the physical attributes of a process, and error-proofing devices can usually be photographed. Error-proofing is one way of putting knowledge in the world. The rule is crude because there are gray areas, such as work instructions. If the instructions are visible and comprehensible at the point in the process where they are used, then they would probably be classified as knowledge in the world. Otherwise, work instructions are a means of creating knowledge in the head.
There is no comprehensive typology of error-proofing. The approaches to error reduction are diverse and evolving. More innovative approaches will evolve, and more categories will follow as more organizations and individuals think carefully about error-proofing their processes. Tsuda lists four approaches to error-proofing:
- Mistake prevention in the work environment.
- Mistake detection (Shingo’s informative inspection).
- Mistake prevention (Shingo’s source inspection).
- Preventing the influence of mistakes.
Mistake Prevention in the Work Environment
This approach involves reducing complexity, ambiguity, vagueness, and uncertainty in the workplace. An example from Tsuda is having only one set of instructions visible in a notebook rather than having two sets appear on facing pages. When only one set of instructions is provided, workers are unable to accidentally read inappropriate or incorrect instructions from the facing page. In another example, similar items with right-hand and left-hand orientations can sometimes lead to wrong-side errors. If the design can be altered and made symmetrical, no wrong-side errors can occur; whether the part is mounted on the left or right side, it is always correct. The orientation of the part becomes inconsequential. Likewise, any simplification of the process that leads to the elimination of process steps ensures that none of the errors associated with that step can ever occur again. Norman suggests several process design principles that make errors less likely. He recommends avoiding wide and deep task structures. The term “wide structures” means that there are lots of alternatives for a given choice, while “deep structures” means that the process requires a long series of choices. Humans can perform either moderately broad or moderately deep task structures relatively well. Humans have more difficulty if tasks are both moderately broad and moderately deep, meaning there are lots of alternatives for each choice, and many choices to be made. Task structures that are very broad or very deep can also cause difficulties.
Mistake detection identifies process errors found by inspecting the process after actions have been taken. Often, immediate notification that a mistake has occurred is sufficient to allow remedial actions to be taken in order to avoid harm. The outcome or effect of the problem is inspected after an incorrect action or an omission has occurred. Informative inspection can also be used to reduce the occurrence of incorrect actions. This can be accomplished by using data acquired from the inspection to control the process and inform mistake prevention efforts. Another informative inspection technique is Statistical Process Control (SPC). SPC is a set of methods that uses statistical tools to detect if the observed process is being adequately controlled. SPC is used widely in industry to create and maintain the consistency of variables that characterize a process. Shingo identifies two other informative inspection techniques: successive checks and self-checks. Successive checks consist of inspections of previous steps as part of the process. Self-checks employ mistake-proofing devices to allow workers to assess the quality of their own work. Self-checks and successive checks differ only in who performs the inspection. Self-checks are preferred to successive checks because feedback is more rapid.
Whether mistake prevention or mistake detection is selected as the driving mechanism in a specific application, a setting function must be selected. A setting function is the mechanism for determining that an error is about to occur (prevention) or has occurred (detection). It differentiates between safe, accurate conditions and unsafe, inaccurate ones. The more precise the differentiation, the more effective the mistake-proofing can be. Chase and Stewart identify four setting functions that are described in Table below.
Table Setting functions
Setting Function Description Physical (Shingo’s contact) Checks to ensure the physical attributes of the product or process are correct and error-free. Sequencing (Shingo’s motion step) Checks the precedence relationship of the process to ensure that steps are conducted in the correct order. Grouping or Counting
(Shingo’s fixed value methods)
Facilitates checking that matched sets of resources are available when needed or that the correct number of repetitions has occurred. Information Enhancement Determines and ensures that information required in the process is available at the correct time and place and that it stands out against a noisy background.
Once the setting function determines that an error has occurred or is going to occur, a control function (or regulatory function) must be utilized to indicate to the user that something has gone awry. Not all mistake-proofing is equally useful. Usually, mistake prevention is preferred to mistake detection. Similarly, forced control, shutdown, warning, and sensory alert are preferred, in that order. The preferred devices tend to be those that are the strongest and require the least attention and the least discretionary behavior by users.
Control (or regulatory) functions
Regulator function Mistake prevention Mistake detection Forced control Physical shape and size of object or electronic controls detect mistakes that being made and stop them from resulting in incorrect actions or omissions. Physical shape and size of object or electronic controls detect incorrect actions or omissions before they can cause harm. Shut down The process is stopped before mistakes can result in incorrect actions or omissions. The process is stopped immediately after an incorrect action or omission is detected. Warning A visual or audible warning signal is given that a mistake or omission is about to occur. Although the error is signaled, the process is allowed to continue. A visual or audible warning signal is given that a mistaken action or omission has just occurred. Sensory alert A sensory cue signals that a mistake is about to be acted upon or an omission made. The cue may be audible, visible, or tactile. Taste and smell have not proved to be as useful. Sensory alerts signal mistakes but allow the process to continue. A sensory cue signals that a mistake has just been acted upon or an omission has just occurred .
Mistake prevention identifies process errors found by inspecting the process before taking actions that would result in harm. The word “inspection” as it is used here is broadly defined. The inspection could be accomplished by physical or electronic means without human involvement. The 3.5-inch disk drive is an example of a simple inspection technique that does not involve a person making a significant judgment about the process. Rather, the person executes a process and the process performs an inspection by design and prevents an error from being made. Shingo called this type of inspection “source inspection.” The source or cause of the problem is inspected before the effect—an incorrect action or an omission—can actually occur.
Preventing the Influence of Mistakes
Preventing the influence of mistakes means designing processes so that the impact of errors is reduced or eliminated. This can be accomplished by facilitating correction or by decoupling processes
This could include finding easy and immediate ways of allowing workers to reverse the errors they commit. While doing things right the first time is still the goal, effortless error corrections can often be nearly as good as not committing errors at all. This can be accomplished through planned responses to error or the immediate reworking of processes. Typewriters have joined mimeograph machines and buggy whips as obsolete technology because typing errors are so much more easily corrected on a computer. Errors that once required retyping an entire page can now be corrected with two keystrokes. Software that offers “undo” and “redo” capabilities also facilitates the correction of errors. Informal polls suggest that people use these features extensively. Some users even become upset when they cannot “undo” more than a few of their previous operations. Also, computers now auto-correct errors like “thsi” one. These features significantly increase the effectiveness of users. They did not come into being accidentally but are the result of intentional, purposeful design efforts based on an understanding of the errors that users are likely to make. Automotive safety has been enhanced by preventing the influence of mistakes. Air bags do not stop accidents. Rather, they are designed to minimize injuries experienced in an accident. Antilock brakes also prevent the influence of mistakes by turning a common driving error into the correct action. Prior to the invention of antilock brakes, drivers were instructed not to follow their instincts and slam on the brakes in emergencies. To do so would increase the stopping distance and cause accidents due to driver error. Pumping the brakes was the recommended procedure. With anti-lock brakes, drivers who follow their instincts and slam on the brakes are following the recommended emergency braking procedure. What once was an error has become the correct action.
“Decoupling” means separating an error-prone activity from the point at which the error becomes irreversible. Software developers try to help users avoid deleting files they may want later by decoupling. Pressing the delete button on an unwanted E-mail or computer file does not actually delete it. The software merely moves it to another folder named “deleted items,” “trash can,” or “recycling bin.” If you have ever retrieved an item that was previously “deleted,” you are the beneficiary of decoupling. Regrettably, this type of protection is not yet available when saving work. The files can be overwritten, and the only warning may be a dialogue box asking, “Are you sure?” Sometimes the separation of the error from the outcome need not be large. Stewart and Grout suggest a decoupling feature for telephoning across time zones. The first outward manifestation of forgetting or miscalculating the time difference is the bleary eyed voice of a former friend at 4:00 a.m. local time instead of the expected cheery voice at a local time of 10:00 a.m. One way to decouple the chain would be to provide an electronic voice that tells the caller the current time in the location being called. This allows the caller to hang up the phone prior to being connected and thus avoid the mistake.
Attributes of Error-Proofing
Error-Proofing is Inexpensive
The cost of Error-proofing devices is often the fixed cost of the initial installation plus minor ongoing calibration and maintenance costs. A device’s incurred cost per use can be zero, as it is with the 3.5-inch diskette drive. The cost per use can also be negative in cases in which the device actually enables the process to proceed more rapidly than before. In manufacturing, where data are available, mistake-proofing has been shown to be very effective. There are many management tools and techniques available to manufacturers. However, many manufacturers are unaware of error-proofing. The TRW Company reduced its defect rate from 288 parts per million (ppm) defective to 2 parts per million. Federal Mogul had 99.6 percent fewer customer defects than its nearest competitor and a 60 percent productivity increase by systematically thinking about the details of their operation and implementing mistake-proofing. DE-STA-CO manufacturing reduced omitted parts from 800 omitted ppm to 10; in all modes, they reduced omitted parts from 40,000 ppm to 200 ppm and, once again, productivity increased as a result. These are very good results for manufacturing. They would be phenomenal results in health care. Patients should be the recipients of processes that are more reliable than those in manufacturing. Regrettably, this is not yet the case.
Error -Proofing Can Result in Substantial Returns on Investment
Even in manufacturing industries, however, there is a low level of awareness of error-proofing as a concept. In an article published in 1997, Bhote stated that 10 to 1,100 to 1, and even 1,000 to 1 returns are possible, but he also stated that awareness of error-proofing was as low as 10 percent and that implementation was “dismal” at 1 percent or less. Exceedingly high rates of return may seem impossible to realize, yet Whited cites numerous examples. The Dana Corporation reported employing one device that eliminated a mode of defect that cost $.5 million dollars a year. The device, which was conceived, designed, and fabricated by a production worker in his garage at home, cost $6.00. That is an 83,333 to 1 rate of return for the first year. The savings occur each year that the process and the device remain in place. A worker at Johnson & Johnson’s Ortho-Clinical Diagnostics Division found a way to use “Post-It® Notes” to reduce defects and save time that was valued at $75,000 per year. If the “Post-It® Notes” cost $100 per year, then the return on investment would be 750 to 1. These are examples of savings for a single device. Lucent Technologies’ Power System Division implemented 3,300 devices over 3 years. Each of these devices contributed a net savings of approximately $2,545 to their company’s bottom line The median cost of each device was approximately $100. The economics in medicine are likely to be at least as compelling. A substantial amount of mistake-proofing can be done for the cost of settling a few malpractice suits out of court.
Error-proofing Is Not a Stand-Alone Technique
It will not obviate the need for other responses to error.
Error-Proofing Is Not Rocket Science
It is detail-oriented and requires cleverness and careful thought, but once implementation has been completed, hindsight bias will render the solution obvious.
Error-Proofing Is Not a Panacea
It cannot eliminate all errors and failures from a process. Perrow points out that no scheme can succeed in preventing every event in complex, tightly-linked systems. He argues that multiple failures in complex, tightly-linked systems will lead to unexpected and often incomprehensible events. Observers of these events might comment in hindsight, “Who would have ever thought that those failures could combine to lead to this?” Perrow’s findings apply to error-proofing as they do to any other technique. Error-proofing will not work to block events that cannot be anticipated. Usually, a good understanding of the cause-and-effect relationship is required in order to design effective Error-proofing devices. Therefore, the unanticipated events that arise from complex, tightly-linked systems cannot be mitigated using Error-proofing.
Error-Proofing Is Not New
It has been practiced throughout history and is based on simplicity and ingenuity. error-proofing solutions are often viewed post hoc as “common sense.” Bottles of poison are variously identified by their rectangular shape, blue-colored glass, or the addition of small spikes to make an impression on inattentive pharmacists. Most organizations will find that examples of Error-proofing already exist in their processes. The implementation of Error-proofing, then, is not entirely new but represents a refocusing of attention on certain design issues in the process.
Creating Simplicity Is Not Simple
In hindsight, Error-proofing devices seem simple and obvious. A good device will lead you to wonder why no one thought of it before. However, creating simple, effective, error-proofing devices is a very challenging task. Significant effort should be devoted to the design process. Organizations should seek out and find multiple approaches to the problem before proceeding with the implementation of a solution. Each organization’s error-proofing needs may be different, depending on the differences in their processes. Consequently, some error-proofing solutions will require new, custom-made devices designed specifically for a given application. Other devices could be off-the-shelf solutions. Even off-the-shelf devices will need careful analysis—an analysis that will require substantial process understanding-in the light of the often subtly idiosyncratic nature of their own processes.
Some of the Error-Proofing tools
Just culture refers to a working environment that is conducive to “blame-free” reporting but also one in which accountability is not lost. Blame-free reporting ensures that those who make mistakes are encouraged to reveal them without fear of retribution or punishment. A policy of not blaming individuals is very important to enable and facilitate event reporting which in turn, enables mistake-proofing. The concern with completely blame-free reporting is that egregious acts, in which punishment would be appropriate, would go unpunished. Just culture divides behavior into three types: normal, risk-taking, and reckless. Of these, only reckless behavior is punished.
Event reporting refers to actions undertaken to obtain information about events and near-misses. The reporting reveals the type and severity of events and the frequency with which they occur. Event reports provide insight into the relative priority of events and errors, thereby enabling the mistake-proofing of processes. Consequently, events are prioritized and acted upon more quickly according to the seriousness of their consequences.
Root Cause Analysis
Root cause analysis (RCA) is a set of methodologies for determining at least one cause of an event that can be controlled or altered so that the event will not recur in the same situation. These methodologies reveal the cause-and-effect relationships that exist in a system. RCA is an important enabler of mistake-proofing, since mistake-proofing cannot be accomplished without a clear knowledge of the cause-and-effect relationships in the process. Care should be taken when RCA is used to formulate corrective actions, since it may only consider one instance or circumstance of failure. Other circumstances could also have led to the failure. Other failure analysis tools, such as fault tree analysis, consider all known causes and not just a single instance. Anticipatory failure determination facilitates inventing new circumstances that would lead to failure given existing resources.
Corrective Action Systems
Corrective action systems are formal systems of policies and procedures to ensure that adverse events are analyzed and that preventive measures are implemented to prevent their recurrence. Normally, the occurrence of an event triggers a requirement to respond with counter-measures within a certain period of time. Error-proofing is an effective form of counter-measure. It is often inexpensive and can be implemented rapidly. It is also important to look at all possible outcomes and counter-measures, not just those observed. Sometimes, mistake-proofing by taking corrective action is only part of the solution. For example, removing metal butter knives from the dinner trays of those flying in first class effectively eliminates knives from aircraft, but does not remove any of the other resources available for fashioning weapons out of materials available on commercial airplanes. This is mistake-proofing but not a fully effective counter-measure. Corrective action systems can also serve as a resource to identify likely mistake-proofing projects. Extensive discussion and consultation in a variety of industries, including health care, reveal that corrective actions are often variations on the following themes:
- An admonition to workers to “be more careful” or “pay attention.”
- A refresher course to “retrain” experienced workers.
- A change in the instructions, standard operating procedures, or other documentation.
All of these are essentially attempts to change “knowledge in the head”. Chappell states that “You’re not going to become world class through just training, you have to improve the system so that the easy way to do a job is also the safe, right way. The potential for human error can be dramatically reduced.” Error-proofing is an attempt to put “knowledge in the world.” Consequently, corrective actions that involve changing “knowledge in the head” can also be seen as opportunities to implement mistake-proofing devices. These devices address the cause of the event by putting “knowledge in the world.” Not all corrective actions deserve the same amount of attention. Therefore, not all corrective actions should be allotted the same amount of time in which to formulate a response. Determining which corrective actions should be allowed more time is difficult because events occur sequentially, one at a time. Responding to outcomes that are not serious, common, or difficult to detect should not consume too much time. For events that are serious, common, or difficult to detect, additional time should be spent in a careful analysis of critical corrective actions.
Substantial efforts to improve have been focused on specific events such as Customer complaint, internal rejection, external rejection, accidents, near miss incidents. These specific foci provide areas of opportunity for the implementation of error-proofing.
In aviation, simulation is used to train pilots and flight crews. Logically enough, simulators have also begun to be employed in Other industries such as automotive industries, IT and medicine. In addition to training, simulation can provide insights into likely errors and serve as a catalyst for the exploration of the psychological or causal mechanisms of errors. After likely errors are identified and understood, simulators can provide a venue for the experimentation and validation of new mistake-proofing devices.
The study of facility design complements error-proofing and sometimes is error-proofing . Adjacency, proper handrails and affordances, standardization, and the use of Failure Modes and Effects Analysis (FMEA) as a precursor are similar to error-proofing. Ensuring non-compatible connectors and pin-indexed medical gases is mistake-proofing.
Revising Standard Operating Procedures
When adverse events occur, it is not uncommon for standard operating procedures (SOPs) to be revised in an effort to change the instructions that employees refer to when providing care. This approach can either improve or impair patient safety, depending on the nature of the change and the length of the SOP. If SOPs become simpler and help reduce the cognitive load on workers, it is a very positive step. If the corrective responses to adverse events are to lengthen the SOPs with additional process steps, then efforts to improve patient safety may actually result in an increase in the number of errors. Evidence from the nuclear industry suggests that changing SOPs improves human performance up to a point but then becomes counterproductive. Chiu and Frick studied the human error rate at the San Onofre Nuclear Power Generation Facility since it began operation. They found that after a certain point, increasing procedure length or adding procedures resulted in an increase in the number of errors instead of reducing them as intended. Their facility is operating on the right side of the minimum, in the region labeled B. Consequently, they state that they “view with a jaundiced eye an incident investigation that calls only for more rules (i.e., procedure changes or additions), and we seek to simplify procedures and eliminate rules whenever possible.” Simplifying processes and providing clever work aids complement mistake-proofing and in some cases may be mistake-proofing. When organizations eliminate process steps, they also eliminate the errors that could have resulted from those steps.
Substantial resources are invested in ensuring that workers, in general, are alert and attentive as they perform their work. Attention management programs range from motivational posters in the halls and “time-outs” for safety, to team-building “huddles” . Eye-scanning technology determines if workers have had enough sleep during their off hours to be effective during working hours. When work becomes routine and is accomplished on “autopilot” (skill-based), error-proofing can often reduce the amount of attentiveness required to accurately execute detailed procedures. The employee performing these procedures is then free to focus on higher level thinking. Error-proofing will not eliminate the need for attentiveness, but it does allow attentiveness to be used more effectively to complete tasks that require deliberate thought.
Crew Resource Management
Crew resource management (CRM) is a method of training team members to “consistently use sound judgment, make quality decisions, and access all required resources, under stressful conditions in a time-constrained environment.” It grew out of aviation disasters where each member of the crew was problem-solving, and no one was actually flying the plane. This outcome has been common enough that it has its own acronym: CFIT—Controlled Flight Into Terrain. Error-proofing often takes the form of reducing ambiguity in the work environment, making critical information stand out against a noisy background, reducing the need for attention to detail, and reducing cognitive content. Each of these benefits complements CRM and frees the crew’s cognitive resources to attend to more pressing matters.
FMEA : Please click here for FMEA
FMEA is a bottom-up approach in the sense that it starts at the component or task level to identify failures in the system. Fault trees are a top-down approach. A fault tree starts with an event and determines all the component (or task) failures that could contribute to that event. A fault tree is a graphical representation of the relationships that directly cause or contribute to an event or failure.
The top of the tree indicates the failure mode, the “top event.” At the bottom of the tree are causes, or “basic failures.” These causes can be combined as individual, independent causes using an “OR” symbol. They can be combined using an “AND” symbol if causes must co-exist for the event to occur. The tree can have as many levels as needed to describe all the known causes of the event. These failures can be analyzed to determine sets of basic failures that can cause the top event to occur, cut sets. A minimal cut set is the smallest combination of basic failures that produces the top event. A minimal cut set leads to the top event if, and only if, all events in the set occur. to assess the performance of mistake-proofing device designs. These minimal cut sets are shown with dashed lines. Fault trees also allow one to assess the probability that the top event will occur by first estimating the probability that each basic failure will occur. In the probabilities of the basic failures are combined to calculate the probability of the top event. The probability of basic failures 1 and 2 occurring within a fixed period of time is 20 percent each. The probability of basic failure 3 occurring within that same period is only 4 percent. However, since both basic failures 1 and 2 must occur before the top event results, the joint probability is also 4 percent. Basic failure 3 is far less likely to occur than either basic failure 1 or 2. However, since it can cause the top event by itself, the top event is equally likely to be caused by minimal cut set 1 or 2. Two changes can be made to the tree to reduce the probability of the top event:
- Reduce the probability of basic failures.
- Increase redundancy in the system.
That is, design the system so that more basic failures are required before a top event occurs. If one nurse makes an error and another nurse double checks it, then two basic failures must occur. One is not enough to cause the top event. The ability to express the interrelationship among contributory causes of events using AND and OR symbols provides a more precise description than is usually found in the “potential cause” column of an FMEA. Potential causes of an FMEA are usually described using only the conjunction OR. It is the fault tree’s ability to link causes with AND, in particular, that makes it more effective in describing causes. Gano suggests that events usually occur due to a combination of actions and conditions; therefore, fault trees may prove very worthwhile. FMEA and fault trees are not mutually exclusive. A fault tree can provide significant insights into truly understanding potential failure causes in FMEA.
FMEA and fault trees are useful in understanding the range of possible failures and their causes. The other tools—safety culture, just culture, event reporting, and root cause analysis—lead to a situation in which the information needed to conduct these analyses is available. These tools, on their own, may be enough to facilitate the design changes needed to reduce medical errors. Only fault tree analysis, however, comes with explicit prescriptions about what actions to take to improve the system.These prescriptions are: increase component reliability or increase redundancy. Fault trees are also less widely known or used than other existing tools.
Designing Mistake-Proofing Devices
Select an undesirable failure mode for further analysis.
In order to make an informed decision about which failure mode to analyze, the RPN or the criticality number of the failure mode must have been determined in the course of performing FMEA or FMECA.
Review FMEA findings and brainstorm solutions .
Most existing mistake-proofing has been done without the aid of a formal process. This is also where designers should search for existing solutions.. Common sense, creativity, and adapting existing examples are often enough to solve the problem. If not, continue to Step 3.
Create a detailed fault tree of the undesirable failure mode
This step involves the traditional use of fault tree analysis. Detailed knowledge regarding the process and its cause-and-effect relationships discovered during root cause analysis and FMEA provide a thorough understanding of how and why the failure mode occurs. The result of this step is a list and contents of minimal cut sets. Since severity and detectability of the failure mode could be the same for all of the minimal cut sets, the probability of occurrence will most likely be the deciding factor in a determination of which causes to focus on initially.
Select a benign failure mode(s) that would be preferred to the undesirable failure.
FMEA precede multiple fault trees to provide information about other failure modes and their severity. Ideally, the benign failure alone should be sufficient to stop the process; the failure, which would normally lead to the undesirable event, causes the benign failure instead.
Using a detailed fault tree, identify “resources” available to create the benign failure
These resources, basic events at the bottom of the benign fault tree, can be employed deliberately to cause the benign failure to occur.
Generate alternative mistake-proofing device designs that will create the benign failure
This step requires individual creativity and problem-solving skills. Creativity is not always valued by organizations and may be scarce. If necessary, employ creativity training, methodologies, and facilitation tools like TRIZ if brainstorming alone does not result in solutions.
7. Consider alternative approaches to designed failures
Some processes have very few resources. If creativity tools do not provide adequate options for causing benign process failures, consider using cues to increase the likelihood of correct process execution. Changing focus is another option to consider when benign failures are not available. If you cannot solve the problem, change it into one that is solvable. Changing focus means, essentially, exploring the changes to the larger system or smaller subsystem that change the nature of the problem so that it is more easily solved. For example, change to a computerized physician order entry (CPOE) system instead of trying to error proof handwritten prescriptions. There are very few resources available to stop the processes associated with handwritten paper documents. Software, on the other hand, can thoroughly check inputs and easily stop the process.
8. Implement a solution.
Some basic tasks usually required as part of the implementation are listed below:
- Select a design from among the solution alternatives:
- Forecast or model the device’s effectiveness.
- Estimate implementation costs.
- Assess the training needs and possible cultural resistance.
- Assess any negative impact on the process.
- Explore and identify secondary problems (side effects or new concerns raised by the device).
- Assess device reliability.
- Create and test the prototype design:
- Find sources who can fabricate, assemble, and install custom devices, or find manufacturers willing to make design changes .
- Resolve technical issues of implementation.
- Undertake trials if required.
- Trial implementation:
- Resolve nontechnical and organizational issues of implementation.
- Draft a maintenance plan.
- Draft process documentation.
- Broad implementation leads to:
- Consensus building.
- Organizational change.
- Select a design from among the solution alternatives:
The eight steps to creating error-proofing devices can be initiated by a root cause analysis or FMEA team, an organization executive, a quality manager, or a risk manager. An interdisciplinary team of 6 to 10 individuals should execute the process steps. An existing FMEA or root cause analysis team is ideal because its members would already be familiar with the failure mode. Help and support from others with creative, inventive, or technical abilities may be required during the later stages of the process. A mistake-proofing device is designed using the eight steps just discussed in the application example that follows.
Some hints on POKA-YOKE
Some Examples of POKA-YOKE
Preventing wrong jig fixing at the time of jig change
Preventing to miss cooling water for high induction heating
Preventing missing and wrong calking
Missing Process on Work
Mistake of Process on Work
Work Set Mistake
Mixing with foreign parts
Error proofing Caveats
Error Proof the Error-Proofing
Error-proofing devices should be error-proofed themselves. They should be designed with the same rigor as the processes the devices protect. The reliability of error-proofing devices should be analyzed, and if possible, the device should be designed to fail in benign ways. Systems with extensive automatic error detection and correction mechanisms are more prone to a devious form of failure called a latent error. Latent errors remain hidden until events reveal them and are very hard to predict, prevent, or correct. They often “hide” inside automatic error detection and correction devices. An error that compromises an inactive detection and recovery system is generally not noticed, but when the system is activated to prevent an error, it is unable to respond, leaving a hole in the system’s security. This is an important design issue, although it is quite likely that the errors prevented by the automatic error detection and correction systems would have caused more damage than the latent errors induced by the systems.
Avoid Moving Errors to Another Location
When designing error-proofing devices, it is important to avoid the common problem of moving errors instead of eliminating or reducing them. For example, in jet engine maintenance, placing the fan blades in the correct position is very important. The hub where the blade is mounted has a set screw that is slightly different in size for each blade so that only the correct blade will fit. This solves numerous problems in assembly and maintenance throughout the life of the engine. It also produces real problems for the machine shop that produces the hubs; it must ensure that each set screw hole is machined properly.
Prevent Devices from Becoming Too Cumbersome
How error-proofing devices affect processes is another design issue that must be considered. The device could be cumbersome because it slows down a process while in use or because the process, once stopped, is difficult to restart.
Avoid Type I Error Problems
If error-proofing is used for error detection application and replaces an inspection or audit process in which sampling was used, changing to the 100 percent inspection provided by a error-proofing device may have unintended consequences. Specifically, there will be significantly more information collected about the process than there would be when only sampling is used. Suppose the error of inferring that something about the process is not correct when, in fact, the process is normal (Type I error) occurs only a small percentage of the time. The number of opportunities for a Type I error increases dramatically. The relative frequency of Type I errors is unchanged. The frequency of Type I errors per hour or day increases. It is possible that too many instances requiring investigation and corrective action will occur. Properly investigating and responding to each may not be feasible.
Prevent Workers from Losing Skills
Bainbridge and Parasuraman et al assert that reducing workers’ tasks to monitoring and intervention functions makes their tasks more difficult. Bainbridge asserts that workers whose primary tasks involve monitoring will see their skills degrade from lack of practice, so they will be less effective when intervention is called for. Workers will tend not to notice when usually stable process variables change and an intervention is necessary. Automatic features, like mistake-proofing devices, will isolate the workers from the system, concealing knowledge about its workings, which are necessary during an intervention. And, finally, automatic systems will usually make decisions at a faster rate than they can be checked by the monitoring personnel. Parasuraman, Molloy, and Singh looked specifically at the ability of the operator to detect failures in automated systems. They found that the detection rate improved when the reliability of the system varied over time, but only when the operator was responsible for monitoring multiple tasks.
If you need assistance or have any doubt and need to ask any question contact us at: firstname.lastname@example.org. You can also contribute to this discussion and we shall be happy to publish them. Your comment and suggestion is also welcome.