Design for Six Sigma

 Common DFSS Methodologies


Design for six sigma(DFSS) is the suggested method to bring order to product design. Hockman, Suh, and Paul, have noted that 70% – 80% of all quality problems are design related. Emphasis on the manufacturing side alone will concentrate at the tail end of the problem solving process. The emphasis should be at the front end. Problem solving at the downstream end is more costly and time consuming than fixing it at the source.  In 1999, NIST reported that the automotive supply chain lost at least a billion dollars a year due to poor interoperability of digitally designed product data. There has been considerable emphasis in recent years by American industry in downsizing, restructuring, process redesign, and instituting cost containment, etc. These methods are directed at holding the line on costs. This can be described as denominator management. In the business world, the equation for return on investment, or return on net operating assets, has both a numerator – net income, and a denominator – investment. Managers have found cutting the denominator, investments in people, resources, materials, or other assets is an easy way to make the desired return on investment rise (at least short-term). To grow the numerator of the equation requires a different way of thinking. That thinking must include ways to increase sales or revenues. One of the ways to increase revenues must include introducing more new products for sale to customers. The new products account for a large percentage of company ‘s sales (40%), and profits (46%). Of course, not every new product will survive. Two studies listed in Table below provide some statistics.

Progression of New Products Through Development

Table indicates that a large amount of ideas are needed. These ideas are sorted, screened, and evaluated in order to obtain feasible ideas, which enter the development stage, pass into launch stage, and become successful products. Cooper provides more details of how winning products are obtained:

  1. A unique, superior product: This is a product with benefits and value for the customer.
  2. A strong market orientation: An understanding of customer needs and wants exists.
  3. Predevelopment work: Up front activities such as screening, market analysis, technical assessment, market research, and business analysis are vital before development starts.
  4. Good product definition: A company must undertake good product and project definition before development begins.
  5. Quality of execution: The development process has many steps. A company must execute these steps with the proper amount of detail and correctness.
  6. Team effort: Product development is a team effort that includes research & development, marketing, sales, and operations.
  7. Proper project selection: Poor projects must be killed at the proper time. This provides adequate resources for the good projects.
  8. Prepare for the launch: A good product launch is important and resources must be available for future launches.
  9. Top management leadership: Management has a role to play in the product development process. They must provide guidance, strategy, resources, and leadership.
  10. Speed to market: Product development speed is the weapon of choice, but sound management practices should be maintained.
  11. A new product process: This is a screening (stage gate) process for new products.
  12. An attractive market: An attractive market makes it easier to have a successful product.
  13. Strength of company abilities: The new product provides a synergy between the company and internal abilities.

There are many product development processes to choose from. Rosenau suggests that the former “relay race” process (one function passing the product  from marketing to engineering to manufacturing and back through the loop) is obsolete. Multi-functional team activities involving all departments are necessary for effectiveness and speed to market. The process is comprised of 2 parts: a “fuzzy front end” (idea generation and sorting) and new product development (NPD). The complete NPD process includes 5 activities:-

  1. Concept study: A study is needed to uncover the unknowns about the market, technology, and/ or the manufacturing process.
  2. Feasibility investigations: There is a need to determine the limitations of the concept. Find out if the unknowns are resolvable, or if new research improves the project.
  3. Development of the new product: This is the start of the NPD process. This includes the specifications, needs of the customer, target markets, establishment of multi-functional teams, and determination of key stage gates.
  4. Maintenance: These are the post delivery activities associated with product development.
  5. Continuous learning: Project status reports and evaluations are needed to permit learning.

Stage Gate Process

A stage gate process is used by many companies to screen and pass projects as they progress through development stages. Each stage of a project has requirements that must be fulfilled. The gate is a management review of the particular stage in question. It is at the various gates that management should make the “kill” decision. Too many projects are allowed to live beyond their useful lives and clog the system. This dilutes the efforts of project teams and overloads the company resources. Table below illustrates some sample stages.


Product Development Stages for Various Companies

The above Table presents several examples of new product development processes. The individual organization should customize their process and allow a suitable time period for it to stabilize.

Product Development

In the area of new product management, there are  describe some commonly accepted new product terms:

  1. New-to-the-world products: These are inventions and discoveries that include products like Polaroid cameras, laser printers, in-line skates, etc.
  2. New category entries: These are company products that are not new to the world, but new to the company. A “me-too” type product.
  3.  Additions to product lines: These products are extensions of the organization’s existing product line. Examples are Diet Coke, Caffeine-free Coke.
  4. Product improvements: Current products made better.
  5. Repositioning: Products that are retargeted for a new use. The original purpose was not broad enough. Arm & Hammer baking soda has been repositioned as a drain deodorant, refrigerator deodorant, etc.
  6. Cost reductions: New products which are designed to replace existing  products, but at a lower cost.

GE Plastics has formalized their product design development process. It is described as designing for six sigma using the product development process. The methodology is used to produce engineered plastics through a series of tollgates that describe the elements needed for completion of a stage. The best practices are used in each stage. Best practices include:

  • Understanding critical to quality characteristics for external customers and internal customers
  • Conducting failure mode and effects analysis (FMEA)
  • Performing design of experiments to identify key variables
  • Benchmarking other facilities using competitive analysis, surveys, etc.

Treffs, Simon and Shree provide additional insight on the  development  of other six sigma design methods. A standardized approach has not yet been established, but most authors recommend a framework that tries to remove “gut feel” and substitutes more control.


Treffs  presents a four step IDOV model:

  • Identify: Use a team charter, VOC, QFD, FMEA, and benchmarking.
  • Design: Emphasize CTQs, identify functional requirements, develop alternatives, evaluate, and select.
  • Optimize: Use process capability information, statistical tolerancing, robust design, and various six sigma tools.
  • Validate: Test and validate the design.


Simon  provides a five step define, measure, analyze, design and validate (DMADV) process for six sigma design. The DMADV method for the creation of a new product consists of the following steps:

  •  Define: Define the project goals and customer needs
  • Measure: Measure and determine customer needs and specifications
  •  Analyze: Analyze the process options to meet customer needs
  • Design: Develop the process details to meet customer needs
  • Verify: Verify and validate the design performance


The six sigma DMADOV process is used to develop new processes or products at high quality levels, or if a current process requires more than just incremental improvement. DMADOV is an acronym for define, measure, analyze, design, optimize, and verify. The process steps for a DMADOV project include:

  1. Define the project:
    • What are the projects goals?
    • Who is the customer and what are their requirements?
  2. Measure the opportunity:
    • Determine customer needs and specifications
    • Benchmark competitors and industry
  3. Analyze the process options:
    • What option will meet the customer needs?
    •  Determine creative solutions
  4. Design the process:
    • Develop a detailed process
    • Design experiments that verify the design meets customer needs
  5. Optimize the process:
    • Test the new process to evaluate performance levels and impacts
    • Re-design the process, as necessary, to meet customer specifications
  6. Verify the performance:
    • Verify the design performance and ability to meet customer needs
    • Deploy the new process

The French Design Model
The design is named after a British author named Michael Joseph French.


The French Design Model

The designer (and design team) will capture the needs, provide analysis, and produce a statement of the problem. The conceptual design will generate a variety of solutions to the problem. This brings together the elements of engineering, science, practical knowledge, production methods, and practices. Embodiment of schemes step produces a concrete working drawing (or item) from the abstract concept. The detailing step consolidates and coordinates the fine points of producing a product. The designer of a new product is responsible for taking the initial concept to final launch. in this effort, the designer will be part of a team. The project manager, product manager, or general manager for a new product or new design team (which includes marketing, sales, operations, design, and finance) will need to manage the process.

Design for X (DFX)

Design for X (DFX) is defined as a knowledge-based approach for designing products to have as many desirable characteristics as possible. The desirable characteristics include: quality, reliability, serviceability, safety, user friendliness, etc. This approach goes beyond the traditional quality aspects of function, features, and appearance of the item. AT&T Bell Laboratories coined the term DFX to describe the process of designing a product to meet the above characteristics. In doing so, the life cycle cost of a product and the lowering of downstream manufacturing costs would be addressed. The DFX toolbox has continued to grow in number from its inception  to include hundreds of tools today. The user can be overwhelmed by the choices available. Some researchers in DFX technology have developed sophisticated models and algorithms. The usual practice is to apply one DFX tool at a time. Multiple applications of DFX tools can be costly. The authors note that a systematic framework is not yet available for use for DFX methodology. A set methodology would aid in the following ways:

  • Understanding how DFX works
  • Aiding in the selection of a tool
  • Faster learning of DFX tools
  • Providing a platform for multiple DFX tools

Usage of DFX Techniques and Tools

  1. Design guidelines:
    DFX methods are usually presented as rules of thumb (design guidelines). These rules of thumb provide broad design rules and strategies. The design rule to increase assembly efficiency requires a reduction in the part count and part types. The strategy would be to verify that each part is needed.
  2. DFX analysis tools:
    Each DFX tool involves some analytical procedure that measures the effectiveness of the selected tool. For example a DFA (design for assembly) procedure addresses the handling time, insertion time, total assembly time, number of parts, and the assembly efficiency. Each tool should have some method of verifying its effectiveness.
  3. Determine DFX tool structure:
    A technique may require other calculations before the technique can be considered complete. An independent tool will not depend on the output of another tool. The handling analysis, insertion analysis, and number of parts are all capable of being calculated, but the total assembly time requires sub- system times for each component.
  4. Tool effectiveness and context:
    Each tool can be evaluated for usefulness by the user. The tool may be evaluated based on accuracy of analysis, reliability characteristics and/or integrity of the information generated.
  5. The focus of activity and the product development process:
    If the product development process is understood by the design team, the use of the DFX tools will be of benefit. Understanding the process activities will help determine when a particular tool can be used.
  6. Mapping tool focus by level:
    The mapping of a tool by level implies that DFX analysis can be complex. Several levels of analysis may be involved with one individual tool. The structure may dictate the feasibility of tool use. For routine product redesigns, the amount of information needed may already be available. For original designs, the amount of interdependence of tools can make it difficult to coordinate all of the changes downstream.

DFX Characteristics

The following characteristics and attributes should be considered by DFX projects.

  1. Function and performance:  These factors are vital for the product.
  2. Safety: Design for safety requires the elimination of potential failure prone elements that could occur in the operation and use of the product. The design should make the product safe for: manufacture, sale, use by the consumer, and disposal.
  3. Quality: The three characteristics of quality, reliability, and durability are required and are often grouped together in this category.
  4. Reliability: A reliable design has already anticipated all that can go wrong with the product, using the laws of probability to predict product failure. Techniques are employed to reduce failure rates in design testing. FMEA techniques consider how alternative designs can fail. Derating of parts is considered. Redundancy through parallel critical component systems may be used.
  5. Testability: The performance attributes must be easily measured.
  6. Manufacturability: The concept of design for manufacturability (DFM) includes the ability to test and ship a product. Producibility and manufacturability are terms used since the 1960s. Design for manufacturability (DFM) has been the dominant term used since 1985. A design must simplify the manufacture of a product through a reduced number of parts and a reduced number of manufacturing operations.
  7. Assembly (Design for Assembly, DFA): DFA means simplifying the product so that fewer parts are involved, making the product easier to assemble. This portion of DFX can often provide the most significant benefit. A product designed for ease of assembly can: reduce service, improve recycling, reduce repair times, and ensure faster time to market. This is accomplished by using fewer parts, reducing engineering documents, lowering inventory levels, reducing inspections, minimizing setups, minimizing material handling, etc
  8. .Environment: The objective is minimal pollution during manufacture, use, and disposal. This could be defined as Design for the Environment (DFE). The concept is to increase growth without increasing the amount of consumable resources. Some categories of environmental design practices include: recovery and reuse, disassembly, waste minimization, energy conservation, material conservation, chronic risk reduction, and accident prevention.
  9. Serviceability (Maintainability and Reparability): A product should be returned to operation and use easily after a failure. This is sometimes directly linked to maintainability.
  10. Maintainability: The product must perform satisfactorily throughout its intended life with minimal expenses. The best approach is to assure the reliability of components. There should be: reduced down time for maintenance activities; reduced user and technician time for maintenance tasks; reduced  requirements for parts; and lower costs of maintenance. Endres provides some specific methods for increasing maintainability (decreasing diagnosis and repair times): use modular construction in systems, use throw away parts (instead of parts requiring repair), use built-in testing, have parts operate in a constant failure rate mode, etc.
  11. User Friendliness or Ergonomics:  Human factors engineering must fit the product to the human user. Some guidelines to consider are: fitting the product to the user’s attributes, simplifying the user’s tasks, making controls and functions obvious, anticipating human error, providing constraints to prevent incorrect use, properly positioning locating surfaces, improving component accessibility, and identify components.
  12. Appearance (Aesthetics): Attractiveness is especially necessary for consumer products. These characteristics include: special requirements of the user, relevancy of the style, compatibility of materials and form, proportional shapes, or protection from damage in service.
  13. Packaging: The best package for the product must be considered. The size and physical  characteristics of the product are important, as are the economics of the package use. The method of packaging must be determined. Automated packaging methods are desirable.
  14. Features: Features are the accessories, options, and attachments available for a product.
  15. Time to Market: The ability to have shorter cycle times in the launch design of a product is desirable. The ability to produce the product either on time or faster than the competition is a tremendous advantage.

Robust Design and Process

Dr. Genichi Taguchi wrote that the United States has coined the term “Taguchi Methods” to describe his system of robustness for the evaluation and improvement of the product development processes. He has stated that he preferred the term “quality engineering” to describe the process. Other authors have used robust design or robust engineering  to describe the process. Any of the above mentioned terms can be used.

Robust Design Approach


Robust design processes are one of the more important developments in design processes in recent years. The use of robust approaches for design is a process that, when used, can produce extremely reliable designs both during manufacture and in use. Robust design uses the concept of parameter control to place the design in a position where random “noise” does not cause failure. The concept is that a product or process is controlled by a number of factors to produce the desired response. The signal factor is the signal used for the intended response. That is, the actions taken (signal) to start the lawn mower (response) or the dial setting (signal) to obtain a furnace temperature (response). The success of obtaining the response is dependent on control factors and noise factors.

A Robust Design Schematic

Control factors are those parameters that are controllable by the designer. These  factors are the items in the product or process that operate to produce a response when triggered by a signal. For instance, in the case of the furnace, the control factors might be the design of the thermocouple and heat controller. Control factors are sometimes separated into those which add no cost to the product or process and those that do add cost. Since factors that add cost are frequently associated with selection of the tolerance of the components, these are called tolerance factors. Factors that don’t add cost are simply control factors. Noise factors are parameters or events that are not controllable by the designer. These are generally random, in that only the mean and variance can be predicted.
Examples of noise factors in furnace design include:

  • Line voltage variations
  • Outside temperature
  • Parallax errors in dial setting

These noise factors have the ability to produce an error in the desired response. The function of the designer is to select control factors so that the impact of noise factors on the response is minimized while maximizing the response to signal factors. This adjustment of factors is best done using statistical design of experiments or SDE.

 Some of the key principles are concept design, parameter design, and tolerance design.

  1. Concept Design

    Concept design is the selection of the process or product architecture based on technology, cost, customer, or other important considerations. This step depends  heavily on the abilities and creativity of the designer.

  2. Parameter Design

    During the parameter design stage the design is established using the lowest cost components and manufacturing techniques. The response is then optimized for control and minimized for noise. If the design meets the requirements, the designer has achieved an acceptable design at the lowest cost.

  3. Tolerance Design

    If the design doesn’t meet requirements, the designer begins considerations of more expensive components or processes that reduce the tolerances. The tolerances are reduced until the design requirements are met. With robust design approaches, the designer has the ability to produce a design with either the lowest cost, the highest reliability or an optimized combination of cost and reliability.

Example of Robust Design:

A mid-size tile manufacturing company in Japan in 1953 was having a serious problem with their $2 million kiln purchased from West Germany. The problem was extreme variation in the dimensions of the tile produced. The stacked tiles were baked inside a tunnel kiln as shown below. Tiles toward the outside of the stack tended to have a different average dimension and exhibited more variation than those toward the inside of the stack.

1A Schematic of a Tile Tunnel Kiln

The cause of variation was readily understandable. . There was an uneven temperature profile inside the kiln. To correct the cause, the company would have to redesign the kiln, which was a very  expensive proposition. This company’s budget didn’t allow such costly action, but the kiln was creating a tremendous financial loss for the company, so something had to be done. Although temperature was an important factor, it was treated as a noise factor. This meant that temperature was a necessary evil and all other factors would be varied to see if the dimensional variation could be made insensitive to temperature. In Dr. Taguchi’s words, “whether the robustness of the tile design could be improved.” People (the engineers, chemists, etc.) having knowledge about the process were brought together. They brainstormed and identified seven major controllable factors which they thought could affect the tile dimension. These were: (1) limestone content in the raw mix, (2) fineness of the additives, (3) amalgamate content, (4) type of amalgamate, (5) raw material quantity, (6) waste return content, and (7) type of feldspar.

After testing these factors over specified levels using an orthogonal design, the  experimenters discovered that factor #1 (limestone content) was the most significant factor, although other factors had smaller effects. It was found that by increasing the limestone content from 1% to 2% (and by choosing a slightly better level for other factors), the percent warpage could be reduced from 30% to less than 1%. Fortunately, limestone was the cheapest material in the tile mix. Moreover, they found through the experimentation that they could use a smaller amount of amalgamate without adversely affecting the tile dimension. Amalgamate was the most expensive material in the tile. This is a classic example of improving quality (reducing the impact ofa noise factor), reducing cost (using less amalgamate) and drastically reducing the number of defectives at the same time.

Functional Requirements

In the development of a new product, the product planning department must* determine the functions required. The designer (or design engineer) will have a set of requirements that a new product must possess. The designer will develop various concepts, embodiments, or systems that will satisfy the customer’s  requirements. All possible alternative systems should be considered. The alternative systems include existing ones and new not-yet-developed systems. The criteria for selection of a design will be based on the quality level and development costs that will enable
the product to survive in the highly competitive marketplace. The product design must be “functionally robust,” which implies that it must withstand variation in input conditions and still achieve desired performance capabilities. The designer has two objectives:

  1. Develop a product that can perform the desired functions and be robust under various operating or exposure conditions
  2. Have the product manufactured at the lowest possible cost

After selection of the new system, the nominal values and tolerance parameters of the new system must be determined. The optimal solution to the new system is called the “optimal condition” or “optimal design.”

Parameter Design

Parameter designs improve the functional robustness of the process so that the desired dimensions or quality characteristics are obtained. The process is considered functionally robust if it produces the desired part with a wide variety of part dimensions.
The steps to obtain this robustness are:

  1. Determine the signal factors (input signals) and the uncontrollable noise factors (error factors) and ranges.
  2. Choose as many controllable factors as possible, select levels for these factors, and assign these levels to appropriate orthogonal arrays. Controllable factors can be adjusted to different levels to improve the functional robustness of the process.
  3. Calculate S/N ratios from the experimental data. 1
    r is a measurement of the magnitude of the input signals
    Sβ is the sum of squares of the ideal function (useful part)
    Ve is the mean square of nonlinearity
    VN is an error term of nonlinearity and linearity
  4. Determine the optimal conditions for the process. The optimal conditions are derived from the experimental data. The maximum average S/N of each level of controllable factors will be used for the optimal settings. Additional experiments will be conducted for verification of the settings.
  5. Conduct actual production runs.

 Signal-to-Noise Ratio

A signal-to-noise ratio (SIN) is used to evaluate system performance. In assessing the result of experiments, the S/N ratio is calculated at each design point. The combinations of the design variables that maximize the SIN ratio are selected for consideration as product or process parameter settings.


There are 3 cases of S/N ratios:
Case 1: S/N ratio for “smaller is better” used for minimizing the wear, shrinkage, deterioration, etc. of a product or process.
SIN = -10 log (mean-squared response)
Some references use “r” instead of “n” in the equations for Case 1 and Case 2.
Case 2: S/N ratio for “larger is better”:
SIN = -10 log (mean-squared of the reciprocal response)
In this case, S/N ratios will seek the highest values for items like strength, life, fuel efficiency, etc.
Case 3: S/N ratio for “nominal is best”:
This SIN ratio is applicable for dimensions, clearances, weights, viscosities, etc.

Parameter Design Case Study


A case study is taken  to illustrate the parameter design approach. An experiment was conducted to find an assembly method to join an elastomer connector to a nylon tube for use in automotive engine components. The objective was to minimize the assembly effort. There are 4 controllable factors and 3 noise factors. The controllable factors are at 3 levels; the noise factors at 2 levels. This is illustrated in Table below

Parameter Design Case Study Factors

Given 4 factors at 3 levels, this would amount to 81 experiments. Taguchi provided orthogonal arrays to reduce the amount of testing required. They are fractional factorial experiments without regard for interactions, in most cases. An L9 array can be used for the controllable factors with 9 experimental runs. The 3 noise factors are placed in an L8 array. There are 8 runs of noise conditions. This array induces noise into the experiment to help identify the controllable factors that are least sensitive to a change in noise level.


The two arrays are combined to form the complete parameter design layout. The L9 array is called the inner array, while the L8 array is the outer array.Example Orthogonal Design Layout

The completed matrix contains the mean response results. In addition, the variation of the signal-to-noise (S/N) ratio has been determined. The larger the S/N ratio the better. SIN ratios are computed for each of the 9 experimental conditions. An ANOVA can also be used in the calculations to supplement the S/N ratios. Taguchi prefers to use graphing techniques to visually identify the significant factors, without using ANOVA. The optimum combination of factors and levels can be determined from the analysis. A confirmation run should be conducted to verify the results.

The Loss Function


The loss function is used to, determine the financial loss that will occur when a quality characteristic, y, deviates from the target value, m. The quality loss is zero when the quality characteristic, y, is at the target value, m. The quality loss function is defined as the mean square deviation of the objective characteristics from their target values. The function is depicted as:
The function L(y) shows that the further the quality characteristic is away from the target, the greater the quality loss. Of course, at a value outside the tolerance specifications, the product is a defective.
The “A” value is the cost due to a defective product. The amount of deviation from the target, or “tolerance” as Taguchi calls it, is the delta (A) value. The constant k is derived as shown.  The mean square deviation from the target (σ2), as used by Taguchi, does not indicate a variance.

Example of  the Loss Function

 Given that Mr X wished to buy a pair of size 7 shoes. The store was out of size 7 and he had to settle for a pair of 7 and a half (7.5) shoes. After 2 days, he found them to be ill-fitting and had to discard them. The original cost of the shoes was $50. Size 6.5 shoes were also not suitable. The quality loss function can be applied to this situation.


The target value m is 7.0
The existing quality characteristic y is 7.5
The cost of a defective product A is $50.
The hypothetical tolerance (7.5 – 7.0) is 0.5
Solving for the quality loss function:
The above calculations shows the quality loss to be $50. If the shoe size were 7.25, and keeping the other variables the same, the resulting loss to society would be:This quality loss calculation indicates a loss to society of $12.50. The use of the loss function illustrates that there is value in reducing variation in the product.

Tolerance Design


The tolerances for all system components must be determined. This includes the  types of materials used. In tolerance design, there is a balance between a given quality level and cost of the design. The measurement criteria is quality losses. Quality losses are estimated by the functional deviation of the products from their target values plus the cost due to the malfunction of these products. Taguchi  described the approach as using economical safety factors. For a manufacturer, without design responsibility, tolerances will be supplied by its customers. Design responsible indicates that the organization has the authority to change and produce design drawings. Tolerances are usually established by using engineering experience, considering the uncertainty of design and production factors. A safety factor of 4 is typically used in the United States. This safety factor is bound to vary across industry. The defense and communications sectors may require much larger values. The shipping specifications for a product characteristic is said to be on a higher-level in relation to the subsystem and parts. The subsystem characteristic values are also on a higher level in relation to its parts and materials. The functional limit Δ0 must be determined by methods like experimentation and testing. Taguchi uses a LD50 point as a guide to establish the upper and lower functional limits. The LD50 point is where the product will fail 50% of the time. The 50% point is called the median.  An example from Taguchi illustrates the determination of the functional limit:
A spark plug has a nominal ignition voltage of 20 kV. The lower functional limit Δ01, is -12 kV. The upper functional limit Δ02 is +18 kV. These values are determined by testing. The resulting specifications will have a lower tolerance (Δ01) of 8kV and upper tolerance (Δ01) of 38 kV. The relationships between the tolerance specification, the functional limit, and the safety factor are as follows:


The economical safety factor φ is determined as follows:
Given the value of the quality characteristic at y, and the target value at m, the quality loss function will appear as follows:
For example A power supply for a TV set has the functional limits at +/- 25% of output voltage. The average quality loss A0 after shipment of a bad TV is known to be $300. The adjustment of a power supply in-house before shipping is $1.00.  The economical safety factor φ  is calculated as:


The tolerance specification for the output voltage, as a percentage, will be:
Therefore, the tolerance specification for the output voltage of 120 volts will be:

120±(120)(0.0145) = 120 ±  1.74 volts

Although the functional limits were initially established at 120 ±30 volts(25%), the TV sets should have output voltages within 1.74 volts of the nominal.

Taguchi’s Quality Imperatives

  • Robustness is a function of product design. The manufacturing process and on-line quality control cannot do much to change that. Quality losses are a loss to society.
  • Robust products have a strong signal with low internal noise. The design change of increasing the signal-to-noise ratio will improve the robustness of the product.
  • For new products, use planned experiments varying in values, stresses, and conditions to seek out the parameter targets.  Orthogonal arrays are recommended.
  • To build robust products, simulate customer-use conditions.
  • Tolerances are set before going to manufacturing. The quality loss function _ can be measured.
  • Products that barely meet the standard are only slightly better than products that fail the specifications. The aim is for the target value.
  • The factory must manufacture products that are consistent. Reduced variation is needed for consistency.
  • Reducing product failure in the field will reduce the number of defectives in the factory. Part variation reduction decreases system variation.
  • Proposals for capital equipment for on-line quality efforts should have the average quality loss (quality loss function) added to the proposal.

The use of engineering techniques using robust design will improve customer satisfaction, reduce costs, and shorten the development time. The reduction of rework in the development process will get the product to market quicker, and smoother.

Statistical Tolerancing

Statistical tolerancing uses the square root of the sum of variances to determine the tolerances required, when two or more components are assembled. This results in tighter tolerances for the assembly than would be indicated by summing the individual tolerances.
Example: The assignment of tolerances involves many factors including the sigma safety level required. Let’s assume that plus and minus four sigma is necessary and that three components are assembled.One might incorrectly assume that the dimensions of the final assembly would be 30″ ± 0.014″. The nominal thickness is correct, but the variation is incorrect. There are two important forces at work here: random assembly and a normal distribution of variation in each of the parts. The proper tolerance is determined by the additive law of variances. (Variance equals σ2 ).
The final assembly, without special effort, will be: 30″ ±  0.0082″
Compare ± 0.014″ there is a 41% improvement(±  0.0082″). Consider the implications of this difference on the final product and the potential for unnecessary internal  scrap.

Porter’s Five Competitive Forces

Professor Michael Porter of the Harvard Business School developed the five competitive forces as a strategy to analyze the marketplace and to gain a market advantage. He states that a company’s current position is the heart of strategy. The five forces affect most industries. An analyst may have to perform considerable research in order to determine the positioning of any individual company. The five competitive forces are:

  1. The threat of new entrants
  2. The power of suppliers
  3. The power of customers
  4. Substitute products or services
  5. Industry rivalry
  1. The Threat of New Entrants

    The ability of a new competitor to enter into an industrial sector is a major market force that existing companies have to consider. If the barriers are not too difficult, new competitors will bring additional capacity, new or greater resources, and the desire to gain market share. There are six possible barriers to consider:

    1. Economies of scale: The new entrant must be prepared to compete on a large scale. The economies of scale requires very good operational techniques
    2. Product differentiation: If tremendous brand loyalty is a barrier, this may cause new entrants to invest very heavily in methods to counter brand loyalty.
    3. Capital requirements: Large initial investments may be required in facilities inventory, marketing, or R&D in order to compete.
    4. Learning curve advantage: A cost advantage may occur from being further down the learning curve. This advantage is due to elements like accumulated production experience or patents.
    5. Access to distribution channels: Market distribution of the product must be secured in some fashion. The existing distribution channels may be closed or open to new entrants.
    6. Government policy: Regulated industries enjoy some protection from new competitors. Examples include some airlines, coal mining companies, and liquor retailers.
  2. The Power of Suppliers

    Suppliers and customers (buyers) can be considered to be on opposing economic sides. Industrial profits can be affected by the two vying forces if there is an imbalance between them. Some of the factors that make a supplier a powerful force, and potentially difficult to bargain with, include:

    • The industry is dominated by a few companies
    • The supplier has a product or raw material that is unique
    • The product does not have substitutes
    • The supplier has the potential to perform or integrate the service
    • The industry is not important to the supplier
  3. The Power of Customers

    Customers (buyers) are powerful if:

    • Economies of scale matter, and purchases are large
    • The buyer can integrate backwards if needed, keeping costs down
    • The purchased product is a small part of the buyer’s total cost
    • The buyer is in a low profit industry, and must pursue low cost items
    • The product is deemed a commodity
  4. Substitute Products

    A product or industry that has a substitute product will find itself with a cap on potential profits. This can be seen in steel versus aluminum products, corn syrup versus sugar, or fiberglass versus Styrofoam products. Substitute products may be new technologies that have the potential to cause price reductions in the industry.

  5. Industry Rivalry 

    The jockeying among current contestants can be an important factor especially when the rivalry among industry foes is intense. There can be significant price competition, frequent product introductions, and industrial advertising wars. Industry rivalry will have the following characteristics:

    • There are numerous competitors with equal shares
    • There is slow industry growth
    • The product is not easily differentiated (a commodity)
    • There is excessive industry capacity
    • The exit barriers are high (the costs of leaving the industry are very high)
    •  There is intense rivalry
Use of the Five Competitive Forces

An analysis of the five competitive forces may require considerable effort. Professor Porter presents an organized framework to perform the analysis. Once the forces  are identified, an analyst can determine the strengths and weaknesses of a company as it pursues a particular strategy. The company can try to match up its strengths and weaknesses to the current industry model. That is, if the company is not the low cost producer, it will not try to have a price war with the industry’s low cost producer, unless it has long staying power. The company might also try to position itself in a quadrant where the forces are weakest, and where higher profit opportunities might exist. Porter maintains that an effective competitive strategy will allow a company to be proactive in its actions toward creating a defendable position against competition. The company can position itself in a certain segment buffeted by its capabilities and resources. It can also try to reduce or influence certain competitive forces in the industry. Finally, the analysis can help the company anticipate shifts in the underlying forces and to take advantage of business opportunities.NA

Portfolio Architecting

Technical processes include technology portfolio architecting, research and technology development (R&TD), product commercialization, and post-launch engineering work. The older approach used DMAIC six sigma and lean methods to correct problems and increase flow in existing technical processes, which provided quick, “emergency” actions. The new approach involves enabling and enhancing technical processes to prevent problems before they become an issue. This uses  six sigma on a sustained basis to become consistent and predictable at conducting value-adding tasks. Inbound R&TD is focused on strategic technology portfolio definition, development, optimization, and transfer. Inbound product design engineering is focused on tactical product commercialization to rapidly prepare a new design, which often possesses transferred, new technology to fulfill launch requirements. Outbound post-launch engineering is focused on operations in post-launch production and service engineering support. Service engineering professionals often function as a “reactionary force” to fix problems. Instead, the focus should be on planning engineering changes and upgrades to increase profit margins. Newly transferred technology is frequently immature resulting in a delay in the delivery of new products. Executives want an orderly design and launch of new product lines. If the product portfolio and technology needed to enable it are not linked and timed for integration, the work of executing the new portfolio cannot happen on time. There is a need to design a strong, strategic alignment between product and technology portfolio architecting tasks for the sake of downstream % cycle-time efficiency and control. The product and technology portfolio renewal process is the first of two strategic processes in which research and development (R&D) professionals can use six sigma methods. The second process is the formal development of new technologies that the product and technology portfolio process requires. 

Strategic to Tactical Workflow


The strategic component consists of the inbound technical processes, research and technology development; and the tactical component is product design engineering done during commercialization.
Figure below shows the integrated marketing and technical functions that reside within the inbound and outbound technical areas.

Process Linkage Diagram

To enable growth, marketing and technical processes and functions must be linked for six sigma in marketing, R&TD, and design. Integrated, multi-functional teams from inbound marketing, R&TD, design, and production/service support engineering  must be used across all three process areas to develop and share data, to manage risk and to make decisions. The lDEA process for product portfolio definition and development consists of the following phases: 

  • Identify markets, their segments, and opportunities using technology benchmarking and road mapping
  • Define portfolio requirements and product architectural alternatives
  • Evaluate product alternatives against competitive portfolios, then select
  • Activate ranked and resourced individual product commercialization projects

With statistically significant data, differentiated needs between the market segments within the general markets may be defined. Diverse new, unique, and difficult (NUD) needs may be translated into a set of product portfolio requirements that possess common and differentiated targets and fulfillment ranges. These requirements drive the product portfolio architecting process. Innovation at this level is the most strategic form of creativity and idea generation that a company can conduct. The define phase is the key transfer point for delivering product portfolio requirements to the R&TD organization. R&TD receives these diverse requirements and translates them into technology requirements. With several alternative product portfolio architectures defined, the team enters the , evaluate phase. This phase involves the data-driven evaluation of the candidate portfolio architectures against competitive benchmarks in light of the portfolio requirements. A superior hybrid portfolio architecture emerges from this process phase. The final phase of P&TPR is to activate product commercialization projects out of the superior portfolio architecture. The focus here is on activating projects that will, in the first phase of commercialization, convert opportunities into specific product requirements and ideas into specific product concepts.

Set-Based Design

Set-based concurrent engineering (SBCE) design begins with broad sets of possible solutions, converging to a narrow set of alternatives and then to a final solution. Design teams from various functions can work sets of solutions in parallel, gradually narrowing sets of solutions. Information from development, testing, customers, and others will help narrow the decision sets. Sets of ideas are viewed and reworked leading to more robust, optimized, and efficient projects. This approach is deemed to be more efficient than working with one idea at a time. An analogy to set-based concurrent design is the 20 questions game. A player will be asked to identify an unknown object or problem. The player trying to seek the answer will have only 20 questions to ask. The experienced player will use a series of broad questions to narrow the scope of the field of possibilities. Questions that define animal, vegetable, or mineral will eliminate quite a few possibilities quickly. SBCE seeks to narrow the scope of design in a more efficient and robust manner. Toyota is the only company using practices consistent with SBCE. SBCE assumes that reasoning and communicating about sets of ideas is preferable to working with one idea at a time.

Principles of SBCE

  1. Define the feasible regions
  2.  Communicate sets of possibilities
  3.  Look for intersections
  4. Explore trade-offs by designing multiple alternatives
  5. Impose minimum constraint
  6. Narrow sets smoothly, balancing the need to learn and the need to decide
  7. Pursue high-risk and conservative options in parallel
  8. Establish feasibility before commitment
  9. Stay within sets once committed
  10. Control by managing uncertainty at process gates
  11. Seek solutions robust to physical, market, and design variation

Theory of Inventive Problem-Solving (TRIZ)

TRIZ is a Russian abbreviation for “the theory of inventive problem solving.”. Altshuller states that inventiveness can be taught. Creativity can be learned, it is not innate, one does not have to be born with it. Altshuller asserts that traditional inventing is “trial and error” resulting in much wasted time, effort, and resources. Through his years of education and imprisonment, he solidified a theory that one solves problems through a collection of assembled techniques. Technical evolution and invention have certain patterns. One should be knowledgeable with them to solve technical problems. There is some common sense, logic, and use of physics in problem solving.
There are three groups of methods to solve technical problems:

  1. Various tricks (a reference to a technique)
  2. Methods based on utilizing physical effects and phenomena (changing the  state of the physical properties of substances)
  3. Complex methods (combination of tricks and physics)

Altshuller provides an introduction to ARIZ (algorithm to solve an inventive problem). This is a sequence of 9 action steps in the use of TRIZ. The steps are:

  • Analysis of the problem
  • Analysis of the problem’s model: Use of a block diagram defining the “operating zone”
  • Formulation of the ideal final result (IFR): Providing a description of the final result, which will provide more details
  • Utilization of outside substances and field resources
  • Utilization of an informational data bank: Determining the physical or chemical constraints (standards) of the problem
  • Change or reformulate the problem .
  • Analysis of the method that removed the physical contradiction: Is a quality solution provided?
  • Utilization of the found solution: Seeking side effects of the solution on the system or other processes
  • Analysis of the steps that lead to the solution: An analysis may prove useful later

Initially, there were 27 TRIZ tools  which were later expanded to 40 innovative, technical tools. The list of the 40 principles is:

  • Segmentation
  • Partial or excessive action
  • Extraction
  • Transition into a new dimension
  • Local quality
  • Mechanical vibration
  •  Asymmetry
  • Periodic action
  • Consolidation
  • Continuity of useful action
  • Universality
  • Rushing through
  • Nesting
  • Convert harm into benefit
  •  Counterweight
  • Replacement of mechanical systems
  • Prior counteraction
  • Pneumatic or hydraulic construction
  •  Prior action
  • Flexible membranes or thin films
  • Cushion in advance
  • Porous material
  • Equipotentiality
  • Changing the color
  • Do it in reverse
  • Homogeneity
  •  Feedback
  • Rejecting or regenerating parts
  • Mediator
  • Transformation of properties
  • Self-service
  • Phase transition
  • Copying
  • Thermal expansion
  • Dispose
  • Accelerated oxidation
  • Spheroidality
  • Inert environment
  • Dynamicity
  • Composite materials

Systematic Design

Systematic design is a step-by-step approach to design. It provides a structure to the design process using a German methodology. It is stated that systematic design is a very rational approach and will produce valid solutions. The authors who describe this approach detail a method that is close to the guidelines as written by the German design standard: Guideline VDI 2221 (“Systematic Approach to the Design of Technical Systems and Products” through the Design Committee of the VDI: Verein Deutscher Ingenieure).
Pahl  presents four main phases in the design process:

  • Task clarification: collect information, formulate concepts, identify needs
  • Conceptual design: identify essential problems and sub-functions
  • Embodiment design: develop concepts, layouts, refinements
  • Detail design: finalize drawings, concepts and generate documentation

An abstract concept is developed into a concrete item, represented by a drawing. Synthesis involves search and discovery, and the act of combining parts or elements to produce a new form. Modern German design thinking uses the following structure:

  • The requirements of the design are determined
  • The appropriate process elements are selected
  • A step-by-step method transforms qualitative items to quantitative items
  • A deliberate combination of elements of differing complexities is used

The main steps in the conceptual phase:

  • Clarify the task
  • Identify essential problems
  • Establish function structures
  • Search for solutions using intuition and brainstorming
  • Combine solution principles and select qualitatively
  •  Firm up concept variants: preliminary calculations, and layouts
  • Evaluate concept variants

There are suggested tools and methods for various steps along the design process. The creativity of the designer is encouraged in this method, but on a more structured basis. Any and all design methods must employ the designer’s creativity to find new innovative solutions.

Critical Parameter Management

Critical parameter management (CPM) is a:

  • Disciplined methodology for managing, analyzing, and reporting technical  product performance.
  • Process for linking system parameters for i ‘sensitivity analysis and optimization of critical performance factors.
  • Strategic tool for improving product development by integrating systems, software, design, and manufacturing activities.

CPM program benefits include:

  1. Facilitated analysis
    • Statistical modeling & optimization of the performance-cost trade-off
    • Real-time system-level sensitivity analysis
    • Connects analyses between system, subsystem and component levels
  2. Improved collaboration
    • Shares technical analysis and knowledge
    • Links ownership to parameters
    • Connects teams and parameters to understand flow-down of requirements
    • Captures and leverages invested intellectual capital for future business
  3. Streamlined reporting
    • Total Property Management (TPM) design margins are statistically tracked over product lifecycle
    • Automated, real-time TPM data gathering I report generation
    • Reconciliation of requirement allocation and engineering design capability

The proper place to initiate critical parameter management in a business is during advanced product portfolio planning, and research and technology development (R&TD). At these earliest stages of product development, a certified portfolio of critical functional parameters and responses can be rapidly transferred as a family of modular designs in the product commercialization program. Critical parameter management is a systems-engineering and integration process that is used within an overarching technology development and product commercialization roadmap. The I2DOV road map defines a generic technology development process approach to research and technology development which consists of the following phases:

  •  I2 = Invention and Innovation
  • D = Develop technology
  • O = Optimization of the robustness of the baseline technologies
  • V = Verification of the platform or sublevel technologies

Critical parameter management derives from a carefully defined architectural flow down of requirements that can be directly linked to functions that are engineered to flow up to fulfill the requirements,  Customer needs drive system-level technical requirements, which drive the system-level engineering functions, which in turn, drive the definition of the system-level architectural concepts. When a system architecture is estimated from this perspective, the inevitable trade-offs due to subsystem, subassembly, and component architectures begin.


Critical Parameter Management Model

Pugh Analysis

Stuart Pugh, former Professor of Engineering Design, University of Strathclyde, Glasgow, Scotland, (now deceased) was a leader in product development (total design) methodology. He was a practicing engineer in industry before turning to the academic world. His work provides a methodology for product conception and generation.  Quality function deployment can be used to determine customer technical requirements. This provides the starting point necessary to develop new products. Pugh suggests a cross functional team activity to assist in the development of improved concepts. The process starts with a set of alternative designs. These
early designs come from various individuals in response to the initial project charter. A matrix-based process is used to refine the concepts. During the selection process, additional new concepts are generated. The final concept will generally not be the original concept.
The Pugh concept selection process has 10 steps: .

  1. Choose criteria: The criteria comes from the technical requirements.
  2. Form the matrix: An example matrix is shown below.
  3. Clarify the concepts: The team members must be sure that all of the concepts are understood. New concepts may require a sketch for visualization.
  4. Choose the datum concept: Select a design that is among the best concepts available for the baseline (datum).
  5. Run the matrix: Comparisons are made on every concept compared to the datum. Use a simple scale to rate the concepts. “A+” can be used for a better concept. “A-” for a worse design, and a “s” for a same design.
  6. Evaluate the ratings: Add up the scores for each category. See what the positives will contribute to one’s insight of the design.
  7. Attack the negatives and enhance the positives: Actively discuss the most promising concepts. Kill or modify the negative ones.
  8. Select a new datum and rerun the matrix: A new, hybrid can be entered into the matrix for consideration.
  9. Plan further work: At the end of the first working session, the team may gather more information, perform experiments, seek technical knowledge, etc.
  10. Iterate to arrive at a new winning concept: Return the team to work on the concepts. Rerun the matrix for further analysis as needed.

Example of a Pugh Evaluation Matrix

The Pugh concept selection method has proven to be successful in the product  development process. The team will acquire:

  • Better insight on the requirements
  • Better understanding of the design problems
  • Greater understanding of the potential solutions
  • Greater understanding of the iteration of concepts
  • More insight on why certain designs are stronger than others
  • The desire to create additional concepts

Back to Home Page

Design of Experiments

Design of experiments (DOE) is used to understand the effects of the factors and interactions that impact the output of a process. As a battery of tests, a DOE is designed to methodically build understanding and enhance the predictability of a process. A DOE investigates a list of potential factors whose variation might impact the process output. These factors can be derived from a variety of sources including process maps, FMEAs, Multi-Vari studies, Fishbone Diagrams, brainstorming techniques, and Cause and Effect Matrices. With most data-analysis methods, you observe what happens in a process without intervening. With a designed experiment, you change the process settings to see the effect this has on the process output. The term design of experiments refers to the structured way you change these settings so that you can study the effects of changing multiple settings simultaneously. This active approach allows you to effectively and efficiently explore the relationship between multiple process variables (x’s) and the output, or process performance variables (y’s). This tool is most commonly used in the Analyze step of the DMAIC method as an aid in identifying and quantifying the key drivers of variation, and in the Improve step as an aid in selecting the most effective solutions from a long list of possibilities.


  • DOE identifies the “vital few” sources of variation (x’s)—the factors that have the biggest impact on the results
  •  DOE identifies the x’s that have little effect on the results
  • It quantifies the effects of the important x’s, including their interactions
  • It produces an equation that quantifies the relationship between the x’s and the y’s
  • It predicts how much gain or loss will result from changes in process conditions

The types of DOEs include:

  • Screening DOEs, which ignore most of the higher order interaction effects so that the team can reduce the candidate factors down to the most important ones.
  • Characterization DOEs, which evaluate main factors and interactions to provide a prediction equation. These equations can range from 2k designs up to general linear models with multiple factors at multiple levels. Some software packages readily evaluate nonlinear effects using center points and also allow for the use of blocking in 2k analyses.
  •  Optimizing DOEs, which use more complex designs such as Response Surface Methodology or iterative simple designs such as evolutionary operation or plant experimentation to determine
    the optimum set of factors.
  • Confirming DOEs, where experiments are done to ensure that the prediction equation matches reality.

Classical experiments focus on 1FAT (one factor at a time) at two or three levels and attempt to hold everything else constant (which is impossible to do in a complicated process). When DOE is properly constructed, it can focus on a wide range of key input factors or variables and will determine the optimum levels of each of the factors. It should be recognized that the Pareto principle applies to the world of experimentation. That is, 20% of the potential input factors generally make 80% of the impact on the result.
The classical approach to experimentation, changing just one factor at a time, has shortcomings:

  • Too many experiments are necessary to study the effects of all the input factors.
  • The optimum combination of all the variables may never be revealed.
  •  The interaction (the behavior of one factor may be dependent on the level of another factor) between factors cannot be determined.
  • Unless carefully planned and the results studied statistically, conclusions may be wrong or misleading.
  • Even if the answers are not actually wrong, non-statistical experiments are often inconclusive. Many of the observed effects tend to be mysterious or unexplainable.
  • Time and effort may be wasted by studying the wrong variables or obtaining too much or too little data.

The design of experiments overcomes these problems by careful planning. In short, DOE is a methodology of varying a number of input factors simultaneously, in a carefully planned manner, such that their individual and combined effects on the output can be identified. Getting good results from a DOE involves a number of steps:

  • Set objectives
  • Select process variables
  • Select an experimental design
  • Execute the design
  • Check that the data are consistent with the experimental assumptions
  • Analyze and interpret the results
  • Use/present the results (may lead to further runs or DOES)

Applications of DOE

Situations, where experimental design can be effectively used include:

  • Choosing between alternatives
  • Selecting the key factors affecting a response
  • Response surface modeling to:
    • Hit a target
    • Reduce variability
    • Maximize or minimize a response
    • Make a process robust (despite uncontrollable “noise” factors)
    • Seek multiple goals

Advantages of DOE:

  • Many factors can be evaluated simultaneously, making the DOE process economical and less interruptive to normal operations.
  •  Sometimes factors having an important influence on the output cannot be controlled (noise factors), but other input factors can be controlled to make the output insensitive to noise factors.
  • ln-depth, statistical knowledge is not always necessary to get a big benefit from standard planned experimentation.
  •  One can look at a process with relatively few experiments. The important factors can be distinguished from the less important ones. Concentrated effort can then be directed at the important ones.
  • Since the designs are balanced, there is confidence in the conclusions drawn. The factors can usually be set at the optimum levels for verification.
  • If important factors are overlooked in an experiment, the results will indicate that they were overlooked.
  • Precise statistical analysis can be run using standard computer programs.
  • Frequently, results can be improved without additional costs (other than the ( costs associated with the trials). In many cases, tremendous cost savings can be achieved.

DOE Terms

Understanding DOEs requires an explanation of certain concepts and terms.

  1. Alias:  An alias occurs when two-factor effects are confused or confounded with each other. Alias occurs when the analysis of a factor or interaction cannot be unambiguously determined because the factor or interaction settings are identical to another factor or interaction, or is a linear combination of other factors or interactions. As a result, one might not know which factor or interaction is responsible for the change in the output value. Note that aliasing/confounding can be additive, where two or more insignificant effects add and give a false impression of statistical validity. Aliasing can also offset two important effects and essentially cancel them out.
  2. Balanced:  A fractional factorial design, in which an equal number of trials design (at every level state) is conducted for each factor. A balanced design will have an equal number of runs at each combination of the high and low settings for each factor.
  3. Block: A subdivision of the experiment into relatively homogeneous experimental units. The term is from agriculture, where a single field would be divided into blocks for different treatments.
  4. Blocking: When structuring fractional factorial experimental test trials, blocking is used to account for variables that the experimenter wishes to avoid. A block may be a dummy factor that doesn’t interact with the real factors. Blocking allows the team to study the effects of noise factors and remove any potential effects resulting from a known noise factor. For example, an experimental design may require a set of eight runs to be complete, but there is only enough raw material in a lot to perform four runs. There is a concern that different results may be obtained with the different lots of material. To prevent these differences, should they exist, from influencing the results of the experiment, the runs are divided into two halves with each being balanced and orthogonal. Thus, the DOE is done in two halves or “blocks” with “a material lot” as the blocking factor. (Because there is not enough material to run all eight experiments with one lot, some runs will have to be done with each material anyway.) The analysis will determine if there is a statistically significant difference between these two blocks. If there is no difference, the blocks can be removed from the model and the data treated as a whole. Blocking is a way of determining which trials to run with each lot so that any effect from the different material will not influence the decisions made about the effects of the factors being explored. If the blocks are significant, then the experimenter was correct in the choice of blocking factor and the noise due to the blocking factor was minimized. This may also lead to more experimentation on the blocking factor.
  5. Box-Behnken: When full, second-order, polynomial models are to be used in response surface studies of three or more factors, Box- Behnken designs are often very efficient. They are highly fractional, three-level factorial designs.
  6. Collinear: A collinear condition occurs when two variables are totally correlated. One variable must be eliminated from the analysis for valid results.
  7. Confounded: When the effects of two factors are not separable. In the following example, A, B, and C are input factors, and columns AB, AC, & BC represent interactions (multiplication of 2 factors). Confounded columns are identified by arrows, indicating the setting of one cannot be separated from the setting of the other.1
  8. Continuous and discrete factors: A DOE may use continuous and/or discrete factors. A continuous factor is one (such as feed rate) whose levels can vary continuously, while a discrete factor will have a predetermined finite number of levels (such as supplier A or supplier B). Continuous factors are needed when true curvature/center point analysis is desired.
  9. Correlation: A number between -1 and 1 that indicates the degree of linear coefficient (r) relationship between two sets of numbers. Zero (0) indicates no linear relationship.
  10. Covariates: Things that change during an experiment that had not been planned to change, such as temperature or humidity. Randomize the test order to alleviate this problem. Record the value of the covariate for possible use in regression analysis.
  11. Curvature: Refers to non-straight line behavior between one or more factors and the response. Curvature is usually expressed in mathematical terms involving the square or cube of the factor. For example, in the model:1
  12. Degrees of Freedom: The terms used are DOF, DF, df, or V. The number of measurements that are independently available for estimating a population parameter.
  13. Design of experiments: The arrangement in which an experimental program is to be conducted and the selection of the levels of one or more (DOE) factors or factor combinations to be included in the experiment. Factor levels are accessed in a balanced full or fractional factorial design. The term SDE (statistical design of experiment) is also widely used.
  14. Design Projection: The principle of design projection states that if the outcome of a fractional factorial design has insignificant terms, the insignificant terms can be removed from the model, thereby reducing the design. For example, determining the effect of four factors for a full factorial design would normally require sixteen runs (a 24 design). Because of resource limitations, only a half fraction (a 2(4-1) design) consisting of eight trials can be run. If the analysis showed that one of the main effects (and associated interactions) was insignificant, then that factor could be removed from the model and the design analyzed as a full factorial design. A half fraction has therefore become a full factorial.
  15. Efficiency: It can be that considered one estimator is more efficient than another if it had a smaller variance.
    Percentage efficiency is calculated as:1
  16. EVOP: Stands for evolutionary operation, a term that describes the way sequential experimental designs can be made to adapt to
    system behavior by learning from present results and predicting future treatments for better response. Often, small response improvements may be made via large sample sizes. The experimental risk, however, is quite low because the trials are conducted in the near vicinity of an already satisfactory process.
  17. Experiment:  A test is undertaken to make an improvement in a process, or to learn previously unknown information.
  18. Experimental error: Variation in response or outcome of virtually identical test conditions. This is also called residual error.
  19. First-order: Refers to the power to which a factor appears in a model. If “X1” represents a factor and “B” is its factor effect, then the model: Y=B0+B1X1+B2X2
    is first-order in both X1 and X2. First-order models cannot account for curvature or interaction.
  20. Fractional: An adjective that means fewer experiments than the full design calls for. The three-factor designs shown below are two-level, half-fractional designs.1
  21. Full factorial: Describes experimental designs which contain all combinations of all levels of all factors. No possible treatment combinations are omitted. A two-level, three-factor full factorial design is shown below:1
  22. Inference space: The inference space is the operating range of the factors. It is where the factor’s range is used to infer an output to a setting not used in the design. Normally, it is assumed that the settings of the factors within the minimum and maximum
    experimental settings are acceptable levels to use in a prediction equation. For example, if factor A has low and high settings of five and ten units, it is reasonable to make predictions when the factor is at a setting of six. However, predictions at a value of thirteen cannot and should not be attempted because this setting is outside the region that was explored. (For a 2k design, a check for curvature should be done prior to assuming linearity between the high and low outputs.)
  23. Narrow inference: A narrow inference utilizes a small number of test factors and/or factor levels or levels that are close together to minimize the noise in a DOE. One example of a narrow inference is having five machines, but doing a DOE on just one machine to minimize the noise variables of machines and operators.
  24. Broad inference: A broad inference utilizes a large number of the test factors and/or factor levels or levels that are far apart, recognizing that noise will be present. An example of a broad inference is performing a DOE on all five machines. There will be more noise, but the results more fully address the entire process.
  25. Input factor: An independent variable that may affect a (dependent) response variable and is included at different levels in the experiment.
  26. Inner array: In Taguchi-style fractional factorial experiments, these are the factors that can be controlled in a process.
  27. Interaction: An interaction occurs when the effect of one input factor on the output depends upon the level of another input factor. 1 Interactions can be readily examined with full factorial experiments. Often, interactions are lost with fractional factorial experiments.
  28. Level:  A given factor or a specific setting of an input factor. Four levels of heat treatment may be 100°F, 120°F, 140°F, and 160°F.
  29. Main effect: An estimate of the effect of a factor independent of any other factors.
  30. Mixture experiments: Experiments in which the variables are expressed as proportions of the whole and sum to 1.0.
  31. Multicollinearity:  This occurs when two or more input factors are expected to independently affect the value of an output factor but are found to be highly correlated. For example, an experiment is being conducted to determine the market value of a house. The input factors are square feet of living space and the number of bedrooms. In this case, the two input factors are highly correlated. Larger residences have more bedrooms.
  32. Nested: An experimental design in which all trials are not fully experimenting randomized. There is generally a logical reason for taking this action. For example, in an experiment, technicians might be nested within labs. As long as each technician stays with the same lab, the techs are nested. It is not often that techs travel to different labs just to make the design balanced.
  33. Optimization: This involves finding the treatment combinations that give the most desired response. Optimization can be “maximization” (as, for example, in the case of product yield) or “minimization” (in the case of impurities).
  34. Orthogonal: A design is orthogonal if the main and interaction effects in a given design can be estimated without confounding the other main effects or interactions. Two columns in a design matrix are orthogonal if the sum of the products of their elements within each row is equal to zero. A full factorial is said to be balanced, or orthogonal because there is an equal number of data points under each level of each factor. When a factorial experiment is balanced, the design is said to be completely orthogonal. The Pearson correlation coefficient of all of the factor and interaction columns will be zero.
  35. Outer array: In a Taguchi-style fractional factorial experiment, these are the factors that cannot be controlled in a process.
  36. Paired comparison: The basis of a technique for treating data so as to ignore sample-to-sample variability and focus more clearly on variability caused by a specific factor effect. Only the differences in response for each sample are tested because sample-to-sample differences are irrelevant.
  37. Parallel experiments: These experiments are done at the same time, not one after another, e.g., agricultural experiments in a big cornfield. Parallel experimentation is the opposite of sequential experimentation.
  38. Precision: The closeness of agreement between test results.
  39. Qualitative: This refers to descriptors of category and/or order, but not of interval or origin. Different machines, operators, materials, etc. represent qualitative levels or treatments.
  40. Quantitative: This refers to descriptors of order and interval (interval scale) and possibly also of origin (ratio scale). As a quantitative factor, “temperature” might describe the interval value 27.32°C. As a quantitative response, “yield” might describe the ratio value 87.42%.
  41. Random factor: A random factor is any factor whose settings (such as any speed within an operating range) could be randomly selected, as opposed to a fixed factor whose settings (such as the current and proposed levels) are those of specific interest to the experimenter. Fixed factors are used when an organization wishes to investigate the effects of particular settings or, at most, the inference space enclosed by them. Random factors are used when the organization wishes to draw conclusions about the entire population of levels.
  42. Randomization: Randomization is a technique to distribute the effect of unknown noise variables over all the factors. Because some noise factors may change over time, any factors whose settings are not randomized could be confounded with these time-dependent elements. Examples of factors that change over time are tool wear, operator fatigue, process bath concentrations, and changing temperatures throughout the day.
  43. Randomized trials: Frees an experiment from the environment and eliminates biases. This technique avoids the undue influences of systematic changes that are known or unknown.
  44. Repeated trials: Trials that are conducted to estimate the pure trial-to-trial experimental error so that lack of fit may be judged. Also called replications.
  45. Residual error: The difference between the observed and the predicted value (ε) or (E) for that result, based on an empirically determined model. It can be variation in outcomes of virtually identical test conditions.
  46. Residual: The difference between experimental responses and predicted model values. A residual is a measure of the error in a model. A prediction equation estimates the output of a process at various levels within the inference space. These predicted values are called fits. The residual is the difference between a fit and an actual experimentally observed data point.
  47. Residual Analysis:  Residual analysis is the graphical analysis of residuals to determine if a pattern can be detected. If the prediction equation is a good model, the residuals will be independently and normally distributed with a mean of zero and a constant variance. Nonrandom patterns indicate that the underlying assumptions for the use of ANOVA have not been met. It is important to look for nonrandom and/or non-normal patterns in the residuals. These types of patterns can often point to potential solutions. For example, if the residuals have more than one mode, there is most likely a missing factor. If the residuals show trends or patterns vs. the run order, there is a time-linked factor.
  48. Resolution: Resolution is the amount and structure of aliasing of factors and interactions in an experimental design. Roman numerals are used to indicate the degree of aliasing, with Resolution III being the most confounded. A full factorial design has no terms that are aliased. The numeral indicates the aliasing pattern. A Resolution III has main effects and two-way interactions confounded (1+2 = III). A Resolution V has one-way and four-way interactions as well as two-way and three-way interactions aliased (1+4 = V = 2+3).
  49. Resolution I: An experiment in which tests are conducted, adjusting one factor at a time, hoping for the best. This experiment is not statistically sound
  50. Resolution ll: An experiment in which some of the main effects are confounded. This is very undesirable.
  51. Resolution III: A fractional factorial design in which no main effects are confounded with each other, but the main effects and two, factor interaction effects are confounded.
  52. Resolution IV: A fractional factorial design in which the main effects and two-factor interaction effects are not confounded, but the two-factor effects may be confounded with each other.
  53. Resolution V: A fractional factorial design in which no confounding of main effects and two-factor interactions occur. However, two-factor interactions may be confounded with three-factor and higher interactions.
  54. Resolution Vl: Also called Resolution V+. This is at least a full factorial experiment with no confounding. It can also mean two blocks of 16 runs.
  55. Resolution Vll: Can refer to eight blocks of 8 runs.
  56. Response surface methodology (RSM): The graph of a system response plotted against one or more system factors. Response surface methodology employs experimental design to discover the “shape” of the response surface and then uses geometric concepts to take advantage of the relationships discovered.
  57. Response variable: The variable that shows the observed results of an experimental treatment. Also known as the output or dependent variable.
  58. Robust design: A term associated with’ the application of Taguchi experimentation in which a response variable is considered robust or immune to input variables that may be difficult or impossible to control.
  59. Screening: A technique to discover the most (probable) important factors experiment in an experimental system. Most screening experiments employ two-level designs. A word of caution about the results of screening experiments, if a factor is not highly significant, it does not necessarily mean that it is insignificant.
  60. Second-order: Refers to the power to which one or more factors appear in a model. If “X1” represents a factor and “B,” is its factor effect, then the model: Y= Bo + B1X1 + B11(X1 * X1)+B2X2+ɛ is second-order in X1 but not in X2. Second-order models can account for curvature and interaction. B12(X1 * X1)  is another second-order example, representing an interaction between X1 and X2.
  61. Sequential experiments: Experiments are done one after another, not at the same time.  This is often required by the type of experimental design being used. Sequential experimentation is the opposite of parallel experimentation.
  62. Simplex: A geometric figure that has a number of vertexes (corners) equal to one more than the number of dimensions in the factor space.
  63. Simplex design: A spatial design used to determine the most desirable variable combination (proportions) in a mixture.
  64.  The sparsity of effects principle states that processes are usually driven by main effects and low-order interactions.
  65. Test coverage: The percentage of all possible combinations of input factors in an experimental test.
  66. Treatments: In an experiment, the various factor levels describe how an experiment is to be carried out. A pH level of 3 and a temperature level of 37° Celsius describe an experimental treatment.

Experimental Objectives

Choosing an experimental design depends on the objectives of the experiment and the number of factors to be investigated. Some experimental design objectives are:

  1. Comparative objective: If several factors are under investigation, but the primary goal of the experiment is to make a conclusion about whether a factor, in spite of the existence of the other factors, is “significant,” then the experimenter has a comparative problem and needs a comparative design solution.
  2. Screening objective: The primary purpose of this experiment is to select or screen out the few important main effects from the many lesser important ones. These screening designs are also termed main effects or fractional factorial designs.
  3. Response surface (method) objective: This experiment is designed to let an experimenter estimate interaction (and quadratic) effects and, therefore, give an idea of the (local) shape of the response surface under investigation. For this reason, they have termed response surface method (RSM) designs. RSM designs are used to:
    • Find improved or optimal process settings
    • Troubleshoot process problems and weak points
    • Make a product or process more robust against external influences
  4. Optimizing responses when factors are proportions of a mixture objective: If an experimenter has factors that are proportions of a mixture and wants to know the “best” proportions of the factors to maximize (or minimize) a response, then a mixture design is required.
  5. Optimal fitting of a regression model objective: If an experimenter wants to model response as a mathematical function (either known or empirical) of a few continuous factors, to obtain “good” model parameter estimates, then a regression design is necessary.

Important practical considerations in planning and running experiments are:

  • Check the performance of gauges/measurement devices first
  •  Keep the experiment as simple as possible
  • Check that all planned runs are feasible
  •  Watch out for process drifts and shifts during the run
  • Avoid unplanned changes (e.g. switching operators at half time)
  •  Allow some time (and back-up material) for unexpected events
  • Obtain buy-in from all parties involved
  • Maintain effective ownership of each step in the experimental plan
  • Preserve all the raw data – do not keep only summary averages!
  • Record everything that happens
  • Reset equipment to its original state after the experiment

Select and Scale the Process Variables

Process variables include both inputs and outputs, I. e. factors and responses. The selection of these variables is best done as a team effort. The team should:

  • Include all important factors (based on engineering and operator judgments)
  •  Be bold, but not foolish, in choosing the low and high factor levels
  • Avoid factor settings for impractical or impossible combinations
  • Include all relevant responses
  • Avoid using responses that combine two or more process measurements

When choosing the range of settings for input factors, it is wise to avoid extreme values. In some cases, extreme values will give runs that are not feasible; in other cases, extreme ranges might move the response surface into some erratic region. The most popular experimental designs are called two-level designs. Two-level designs are simple and economical and give most of the information required to go to a multi-level response surface experiment if one is needed. However, two-level designs are something of a misnomer. It is often desirable to include some center points (for quantitative factors) during the experiment (center points are located in the middle of the design “box.”). The choice of a design depends on the number of resources available and the degree of control over making wrong decisions (Type I and Type II hypotheses errors). It is a good idea to choose a design that requires somewhat fewer runs than the budget permits so that additional runs can be added to check for curvature and to
correct any experimental mishaps.

DOE Checklist

Every experimental investigation will differ in detail, but the following checklist will be helpful for many investigations.

  •  Define the objective of the experiment.
  • The principle experimenter should learn as many facts about the process as possible prior to brainstorming.
  • Brainstorm a list of the key independent and dependent variables with people knowledgeable of the process and determine if these factors can be controlled or measured.
  • Run “dabbling experiments” where necessary to debug equipment or determine measurement capability. Develop experimental skills and get some preliminary results.
  • Assign levels to each independent variable in the light of all available knowledge.
  • Select a standard DOE plan or develop one by consultation. It pays to have one person outline the DOE and another review it critically.
  • Run the experiments in random order and analyze results periodically.
  • Draw conclusions. Verify by replicating experiments, if necessary, and proceed to follow-up with further experimentation if an improvement trend is indicated in one or more of the factors.

It is often a mistake to believe that “one big experiment will give the answer.” A more useful approach is to recognize that while one experiment might give a useful result, it is more common to perform two, three, or more experiments before a complete answer is attained. An iterative approach is usually the most economical. Putting all one’s eggs in one basket is not advisable. It is logical to move through stages of experimentation, each stage supplying a different kind of answer.

Experimental Assumptions

In all experimentation, one makes assumptions. Some of the engineering and mathematical assumptions an experimenter can make include:

  • Are the measurement systems capable of all responses?
    It is not a good idea to find, after finishing an experiment, that the measurement devices are incapable. This should be confirmed before embarking on the experiment itself. In addition, it is advisable, especially if the experiment lasts over a protracted period, that a check is made on all measurement devices from the start to the conclusion of the experiment. Strange experimental outcomes can often be traced to ‘hiccups’ in the metrology system.
  • Is the process stable?

    Experimental runs should have control runs that are done at the “standard” process setpoints, or at least at some identifiable operating conditions. The experiment should start and end with such runs. A plot of the outcomes of these control runs will indicate if the underlying process itself drifted or shifted during the experiment. It is desirable to experiment on a stable process. However, if this cannot be achieved, then the process instability must be accounted for in the analysis of the experiment.

  • Are the residuals (the difference between the model predictions and the actual observations) well behaved?
    Residuals are estimates of experimental error obtained by subtracting the observed response from the predicted response. The predicted response is calculated from the chosen model after all the unknown model parameters have been estimated from the experimental data. Residuals can be thought of as elements of variation unexplained by the fitted model. Since this is a form of error, the same general assumptions apply to the group of residuals that one typically uses for errors in general: one expects them to be normally and independently distributed with a mean of 0 and some constant variance. These are the assumptions behind ANOVA and classical regression analysis. This means that an analyst should expect a regression model to err in predicting response in a random fashion; the model should predict values higher and lower than actual, with equal probability. In addition, the level of the error should be independent of when the observation occurred in the study, or the size of the observation being predicted, or even the factor settings involved in making the prediction. The overall pattern of the residuals should be similar to the bell-shaped pattern observed when plotting a histogram of normally distributed data. Graphical methods are used to examine residuals. Departures from assumptions usually mean that the residuals contain a structure that is not accounted for in the model. Identifying that structure, and adding a term representing it to the original model, leads to a better model. Any graph suitable for displaying the distribution of a set of data is suitable for judging the normality of the distribution of a group of residuals. The three most common types are histograms, normal probability plots, and dot plots. Shown below are examples of dot plot results.1

Steps to perform a DOE:

In General

  1. Document the initial information.
  2.  Verify the measurement systems.
  3. Determine if baseline conditions are to be included in the experiment. (This is usually desirable.)
  4. Make sure clear responsibilities are assigned for proper data collection.
  5. Always perform a pilot run to verify and improve data collection procedures.
  6. Watch for and record any extraneous sources of variation.
  7.  Analyze data promptly and thoroughly.
  8. Always run one or more verification runs to confirm results (i.e., go from a narrow to broad inference).

Setting up a DOE

  1. State the practical problem.
    For example, a practical problem may be “Improve yield by investigating factor A and factor B. Use an α of 0.05.”
  2. State the factors and levels of interest.
    For example, factors and levels of interest could be defined as, “Set coded values for factors A and B at -1 and +1.”
  3. Select the appropriate design and sample size based on the effect to be detected.
  4. Create an experimental data sheet with the factors in their respective columns. Randomize the experimental runs in the datasheet. Conduct the experiment and record the results.
  5. Construct an Analysis of Variance (ANOVA) table for the full model.
  6. Review the ANOVA table and eliminate effects with p-values above α. Remove these one at a time, starting with the highest order interactions.
  7.  Analyze the residual plots to ensure that the model fits.
  8.  Investigate the significant interactions (p-value < α). Assess the significance of the highest order interactions first. (For two-way interactions, an interactions plot may be used to efficiently determine optimum settings. For graphical analysis to determine settings for three-way interactions, it is necessary to evaluate two or more interaction plots simultaneously.). Once the highest order interactions are interpreted, analyze the next set of lower-order interactions.
  9. Investigate the significant main effects (p-value < α).
    (Note: If the level of the main effect has already been set as a result of a significant interaction, this step is not needed.). The use of the main effects plots is an efficient way to identify these values. Main effects that are part of statistically valid interactions must be kept in the model, regardless of whether or not they are statistically valid themselves. Care must be taken because, due to
    interactions, the settings chosen from the main effects plot may sometimes lead to a sub-optimized solution. If there is a significant interaction, use an interaction plot, as shown in the following chart.1
  10. State the mathematical model obtained.
    For a 2k design, the coefficients for each factor and interaction are one-half of their respective effects. Therefore, the difference in the mean of the response from the low setting to the high setting is twice the size of the coefficients.
    Commonly available software programs will provide these coefficients as well as the grand mean. The prediction equation is stated, for two factors,:
    y = grand mean + β1X1 + β2X2 + β3(X1 x X2)
  11.  Calculate the percent contribution of each factor and each interaction relative to the total “sum of the squares.” This is also called epsilon squared. It is calculated by dividing the sum of the squares for each factor by the total sum of the squares and is a rough evaluation of “practical” significance.
  12. Translate the mathematical model into process terms and formulate conclusions and recommendations.
  13. Replicate optimum conditions and verify that results are in the predicted range. Plan the next experiment or institutionalize the change.



An organization decided that it wanted to improve yield by investigating the pressure and temperature in one of its processes. Coded values for pressure and temperature were set at -1 and +1. The design and sample size chosen involved two replications of a 22 design for a total of eight runs. The experiment was conducted and the results were recorded as shown.


The Analysis of Variance (ANOVA) table for the full model was then constructed:

The ANOVA table was reviewed to eliminate the effects with a p-value above α. Because both main effects and the interaction were below the chosen α of 0.05, all three were included in the final model. The residual plots were analyzed in three ways, to ensure that the model fit:

  1. The residuals were plotted against the order of the data using an Individuals Chart and Run Chart to check that they were randomly distributed about zero.
  2. A normal probability plot was run on the residuals.
  3. A plot of the residuals vs. the fitted or predicted values was run to check that the variances were equal (i.e., the residuals were independent of the fitted values).

Creating an interactions plot for pressure and temperature showed that the optimum setting to maximize yield was to set both temperature and pressure at -1.

The chosen mathematical model involved the prediction equation:

 y = grand mean + β1X1 + β2X2 + β3(X1 x X2).

Substituting a grand mean of 14.00 and coefficients of -2.75 for pressure, -5.75 for temperature, and 1.50 for (P x T) into the equation, we get:
y = 14.00 – 2.75(Pressure) – 5.75(Temperature) + 1.5(P x T)
Using the optimum settings of pressure = -1  and
temperature = -1 that were identified earlier forces the setting for the interaction (P x T) to be (-1) x (-1) = +1.

Substituting these values into the prediction equation, we get:
y = 14.00 – 2.75(-1) – 5.75(-1) + 1.5(+1) = 24.00
This equation tells us that, to increase yield, the pressure and temperature must be lowered. The results should be verified via confirmation runs and experiments at even lower settings of temperature and pressure should also be considered.

Selecting Factor setting:

  • Process knowledge: Understand that standard operating conditions in the process could limit the range for the factors of interest. Optimum settings may be outside this range. For this reason, choose bold settings, while never forgetting safety.
  •  Risk: Always consider those bold settings that could possibly endanger equipment or individuals and must be evaluated for such risk. Avoid settings that have the potential for harm.
  • Cost: Cost is always a consideration. Time, materials, and/or resource constraints may also impact the design.
  • Linearity: If there is a suspected nonlinear effect, budget for runs to explore for curvature and also make sure the inference space is large enough to detect the nonlinear effect.



The general notation to designate a fractional factorial design is:

• k is the number of factors to be investigated.
• p designates the fraction of the design.
• 2k-p is the number of runs. For example, a 25 design requires thirty-two runs, a 25-1 (or 24) design requires sixteen runs (a half-fractional design), and a 25-2 (or 23) design requires eight runs (a quarter-fractional design).
• R is the resolution.



Coding is the representation of the settings picked in a standardized format. Coding allows for a clear comparison of the effects of the chosen factors. The design matrix for 2k factorials is usually shown in standard order. The Yates standard order has the first-factor alternate low settings, then high settings, throughout the runs. The second factor in the design alternates two runs at the low setting, followed by two runs at the high setting.
The low level of a factor is designated with a “-” or -1  and the high level is designated with a “+” or 1.

Coded values can be analyzed using the ANOVA method and yield a y = f (x) prediction equation. The prediction equation will be different for coded vs. uncoded units. However, the output range will be the same. Even though the actual factor settings in an example might be temperature 160° and 180° C, 20% and 40% concentration, and catalysts A and B, all the settings could be analyzed using -1 and +1 settings without losing any validity.

Fractional vs. Full DOEs

There are advantages and disadvantages for all DOEs. The DOE chosen for a particular situation will depend on the conditions involved.
Advantages of full factorial DOEs:

  • All possible combinations can be covered.
  • Analysis is straightforward, as there is no aliasing.

Disadvantages of full factorial DOEs:
The cost of the experiment increases as the number of factors increases. For instance, in a two-factor two-level experiment (22), four runs are needed to cover the effect of A, B, AB, and the grand mean. In a five-factor two-level experiment (25), thirty-two runs are required to do a full factorial. Many of these runs are used to evaluate higher-order interactions that the experimenter may not be interested in. In a 23 experiment, there are five one-way effects (A, B, C, D, E), ten two-ways, ten three-ways, five four-ways, and one five-way effect. The 2experiment has 75% of its runs spent learning about the likely one-way and two-way effects, while the 2design only spends less than 50% of its runs examining these one-way and two-way effects.
Advantages of fractional factorial DOEs:

  • Less money and effort is spent for the same amount of data.
  • It takes less time to do fewer experiments.
  • If data analysis indicates, runs can be added to eliminate confounding.

Disadvantages of fractional factorial DOEs:

  • Analysis of higher order interactions could be complex.
  • Confounding could mask factor and interaction effects.

Setting up a fractional factorial DOE


The effect of confounding should be minimized when setting up a fractional factorial. The Yates standard order will show the level settings of each factor and a coded value for all the interactions. For example, when A is high (+1) and B is low (-1), the interaction factor AB is (+1 x -1 = -1). A column for each interaction can thus be constructed as shown here:

Running a full factorial experiment with one more factor (D) would require a doubling of the number of runs. If factor D settings are substituted for a likely insignificant effect, that expense can be saved. The highest interaction is the least likely candidate to have a significant effect. In this case, replacing the A x B x C interaction with factor D allows the experimenter to say ABC was aliased or confounded with D. The three-level interaction still exists but will be confounded with the factor D. All credit for any output change will be attributed to factor D. This is a direct application of the sparsity of effects principle. In fact, there is more aliasing than just D and ABC. Aliasing two-way and three-way effects can also be accomplished and can be computed in two ways:

  1.  By multiplying any two columns together (such as column A and column D), each of the values in the new column (AD) will be either -1 or +1. If the resulting column matches any other (in this case, it will match column BC), those two effects can be said to be confounded.
  2.  The Identity value (I) can be discovered and multiplied to get the aliased values. For example, in this case, because D=ABC (also called the design generator), the Identity value is ABCD. Multiplying this Identity value by a factor will calculate its aliases. Multiplying ABCD and D will equal ABCDD. Because any column multiplied by itself will create a column of 1’s (multiplication identity), the D2 term drops out, leaving ABC and reaffirming that D=ABC.1

Adding an additional factor to a full factorial without adding any additional runs will create a half fractional design. (The design has half the runs needed for a full factorial. If a design has one-quarter the runs needed for full factorial analysis, it is a quarter fractional design, etc.) The key to selecting the type of run and number of factors is to understand what the resolution of the design is, for any given number of factors and available runs. The experimenter  must decide how much confounding he or she is willing to accept. A partial list of fractional designs is included below.

Interaction Case Study


A simple 2 x 2 factorial experiment (with replication) was conducted in the textile industry. The response variable was ED/MSH (ends down/thousand spindle hours.). The independent factors were RH (relative humidity) and ion level (the environmental level of negative ions). Both of these factors were controllable. A low ED/MSH is desirable since fewer thread breaks means higher productivity. An ANOVA showed the main effects were not significant but the interaction effects were highly significant. Consider the data table and plots in Figure below:
The above interaction plot demonstrates that if the goal is to reduce breaks, an economic choice could be made between low ion/low RH and high ion/high RH.

Randomized Block Plans


In comparing, a number of factor treatments, it is desirable that all other conditions be kept as nearly constant as possible. The required number of tests may be too large to be carried out under similar conditions. In such cases, one may be able to divide the experiment into blocks, or planned homogeneous groups. When each group in the experiment contains exactly one measurement of every treatment, the experimental plan is called a randomized block plan. A randomized block design for air permeability response is shown below: An experimental scheme may take several days to complete. If one expects some biasing differences among days, one might plan to measure each item on each day or to conduct one test per day on each item. A day would then represent a block. A randomized incomplete block (tension response) design is shown below: Only treatments A, C, and D are run on the first day. B, C, and D on the second, etc. In the whole experiment, note that each pair of treatments, such as BC, occur twice together. The order in which the three treatments are run on a given day follows a randomized sequence. Blocking factors are commonly environmental phenomena outside of the control of the experimenter.

Latin Square Designs

A Latin square design is called a ope-factor design because it attempts to measure the effects of a single key input factor on an output factor. The experiment further attempts to block (or average) the effects of two or more nuisance factors. Such designs were originally applied in agriculture when the two sources of non- homogeneity (nuisance factors) were the two directions on the field. The square was literally a plot of ground. In Latin square designs, a third variable, the experimental treatment, is then applied to the source variables in a balanced fashion. The Latin square plan is restricted by two conditions:

  1.  The number of rows, columns, and treatments must be the same.
  2. There should be no expected interactions between row and column factors, since these cannot be measured. If there are, the sensitivity of the experiment is reduced.

A Latin square design is essentially a fractional factorial experiment which requires less experimentation to determine the main treatment results.


Consider the following 5 x 5 Latin square design:In the above design, five drivers and five carburetors were used to evaluate gas mileage from five cars (A, B, C, D, and E). Note that only twenty-five of the potential 125 combinations are tested. Thus, the resultant experiment is a one-fifth fractional factorial. Similar 3 x 3, 4 x 4, and 6 x 6 designs may be utilized. In some situations, what is thought to be a nuisance factor can end up being very important.

Graeco-Latin Designs


Graeco-Latin square designs are sometimes useful to eliminate more than two sources of variability in an experiment. A Graeco-Latin design is an extension of the Latin square design, but one extra blocking variable is added for a total of three blocking variables. Consider the following 4 X 4 Graeco-Latin design: The output (response) variable could be gas mileage for the 4 cars (A, B, C, D).

Hyper-Graeco-Latin Designs


A hyper-Graeco-Latin square design permits the study of treatments with more than three blocking variables. Consider the following 4 x 4 hyper-Graeco-Latin design:

The output (response) variable could be gas mileage for the 4 cars (A, B, C, D).

Two-Level Fractional Factorial Example

The basic steps for a two-level fractional factorial design will be examined via the following hypothetical example. The following seven-step procedure will be followed:

  1. Select a process
  2. Identify the output factors of concern
  3. Identify the input factors and levels to be investigated
  4. Select a design (from a catalogue, Taguchi, self created, etc.)
  5. Conduct the experiment under the predetermined conditions
  6. Collect the data (relative to the identified outputs)
  7. Analyze the data and draw conclusions

Step 1: Select a process
We want to investigate UPSC(Union Public Service Commission) Prelims exam success using students of comparable educational levels.
Step 2: Identify the output factors
Student performance will be based on two results (output factors):
(1) Did they pass the test?
(2) What grade score did they receive?
Step 3: Establish the input factors and levels to be investigated
We want to study the effect of seven variables at two-level that may affect student performance. (7 factors at 2-levels)

Input factorLevel 1(-)Level 2(+)
UPSC coachingNOYes
Study timeMorningAfternoon
Problem worked200800
Primary ReferenceBook ABook B
Method of studySequentialRandom
Work experience0 years4 years +
Duration of study50 hours120 hours

Note: The above inputs are both variable (quantitative) and attribute (qualitative).
Step 4: Select a design
A screening plan is selected from a design catalogue. Only eight (8) tests are needed to evaluate the main effects of all 7 factors at 2-levels. The design is:

Input factors

One test example:

Test #3 means:

  • A (-) = No UPSC coaching
  • B (+) = Study in afternoon
  • C (+) = Work 800 problems
  • D (-) = Use reference book A
  • E (-) = Use sequential study method
  • F (+) = Have 4 years + of work experience
  • G (+) = Study 120 hours for the test

Step 5: Conduct the experiment
Step 6: Collect the data


Step 7: Analyze the data and draw conclusions
The pass/fail pattern of (+)s and (-)s does not track with any single input factor. It visually appears that there is some correlation with factors C & G

(+) means level 2 has a positive effect. (-) means level 2 has a negative effect. 0 means level 2 has no effect.

  • Factor A, taking coaching , will improve the exam results by 13 points
  • Factor B, study time of day, has no effect on exam results
  • Factor C, problems worked, will improve the exam results by 20 points
  • Factor D, primary reference, will improve the exam results by 5 points
  • Factor E, method of study, has no effect on exam results
  • Factor F, work experience, has no effect on exam results
  • Factor G, duration of study, will improve the exam results by 23 points

To calculate the optimum student performance:
1. Sum the arithmetic value of the significant differences (Δ) and divide the total by two. Call this value the improvement. Note that the absolute value is divided by 2 because the experiment is conducted in the middle of the high and low levels and only one—half the difference (Δ) can be achieved.


Improvement = 61 + 2 = 30.5. There were no significant negative effects (-) in this experiment. If there were, they would have been included (added) in determining the total effect. In this particular DOE format, the sign indicates direction only.
2. Average the test scores obtained in tests 1 through 8.
Average = 61.5
3. Add the improvement to the average to predict the optimum performance. .
Optimum = Average + Improvement
= 61.5 + 30.5
= 92
The optimum performance would be obtained by running the following trial: The above trial was one of the 120 tests not performed out of 128 possible choices. Obviously, the predicted student scores can be confirmed by additional experimentation.


One can further examine the significance of the design results using the sum of squares and a scree plot. A scree plot is so named because it looks like the rubble or rocky debris lying on a slope or at the base of a cliff. The scree plot indicates that factors D, B, E, and F are noise. The SS (sum of squares) for the error term is 3.1 (3.1 + 0 + 0 + 0)
MSE (mean square error) = T =3.1/4= 0.775
The maximum F ratio for factor G Is: 61.5/0.775= 85.29
The critical maximum F value from the following F Table for k – 1 = 7, p = 4 and α = 0.05 is 73. Thus, factor G is important at the 95% confidence level.
The maximum F table accommodates screening designs for runs of 8, 12, 16, 20, and 24. p is the number of noise factors averaged to derive the MSE, and k is the number of runs.
The maximum F ratio for factor C is 50/0.775 = 65.42
The critical maximum F value for k – 1 = 7, p = 4 and d = 0.10 is 49. Thus, factor C is important at the 90% confidence level.
The maximum F ratio for factor A  is 21.1/0.775= 27.22
The critical maximum F values for both alpha values are larger than 27.22. Therefore, factor A is not considered important (at these alpha levels).

A Full Factorial Example


Suppose that pressure, temperature, and concentration are three suspected key variables affecting the yield of a chemical process that is currently running at 64%. An experimenter may wish to fix these variables at two-levels (high and low) to see how they influence yield. In order to find out the effect of all three factors and their interactions, a total of 2 x 2 x 2 = 23 = 8 experiments must be conducted. This is called a full factorial experiment. The low and high levels of input factors are noted below by (-) and (+).
Temperature:    (-) = 120°C        (+) = 150°C
Pressure:              (-) = 10 psi         (+) = 14 psi
Concentration: (-) = 10N             (+) = 12N
To find the effect of temperature, sum the yield values when the temperature is high and subtract the sum of yields when the temperature is low, dividing the results by four.


When the temperature is set at a high level rather than at a low level, one gains 23.5% yield. All of this yield improvement can be attributable to temperature alone since, during the four high temperature experiments, the other two variables were twice low and twice high.


The effect of changing the pressure from a low level to a high level is a loss of 6% yield. Higher concentration levels result in a relatively minor 2% improvement in yield. The interaction effects between the factors can be checked by using the T, P, and C columns to generate the interaction columns by the multiplication of signs: Note, a formal analysis of the above data (developing a scree plot and MSE term) would indicate that only the temperature effect is significant.


 Following the same principles used for the main effects, :* interaction means the change in yield when the pressure and temperature values are both low or both high, as opposed to when one is high and the other is low. The T x P interaction shows a marginal gain in yield when the temperature and pressure are both at the same level.

In this example, the interactions have either zero or minimal negative yield effects.  If the interactions are significant compared to the main effects, they must be considered before choosing the final level combinations. The best combination of factors here is a high temperature, low pressure, and high concentration (even though the true concentration contribution is probably minimal).

Comparison to a Fractional Factorial Design

In some situations, an experimenter can derive the same conclusions by conducting fewer experiments. Suppose the experiments cost Rs 1,00,000 each, one might then decide to conduct a one-half fractional factorial experiment.
Assume the following balanced design is chosen. Since a fractional factorial experiment is being conducted, only the main effects of factors can be determined. Please note that experiments 1, 4, 6, and 7 would have been equally valid. The results are not exactly identical to what was obtained by conducting eight experiments previously. But, the same relative conclusions as to the effects of temperature, pressure, and concentration on the final yield can be drawn. The average yield is 63.25%. If the temperature is high, an 11.75% increase is expected, plus 3.25% for low pressure, plus 1.25% for high concentration equals an anticipated maximum yield of 79.5% even though this experiment was not conducted. This yield is in line with the actual results from experiment number 6 from the full factorial.



Most people don’t analyze experimental results using manual techniques. The following is a synopsis of the effects of temperature, pressure, and concentration on yield results using MINITAB. This analysis represents the very same data for the previously presented examples.
The F values and corresponding p-values indicate that temperature and pressure are 0 significant to greater than 99% certainty. Concentration might also be important but, more replications would be necessary to see if the 93% certainty could be improved to something greater than 95%.
The regression equation will yield results similar to those for the previous manual calculations. Again, the p-values for temperature and pressure reflect high degrees of certainty.
Using either the manual or MINITAB recaps, would the experimenter stop at this point? Might a follow-up experiment, perhaps at three levels looking at higher temperatures and lower pressures, pay off? After all, the yield has improved by 16% since experimentation started.

DOE Variations

Response Surface Method

The Response Surface Method (RSM) is a technique that enables the experimenter to find the optimum condition for a response (y) given two or more significant factors (x’s). For the case of two factors, the basic strategy is to consider the graphical representation of the yield as a function of the two significant factors. The RSM graphic is similar to the contours of a topographical map. The higher up the “hill,” the better the yield. Data is gathered to enable the contours of the map to be plotted. Once done, the resulting map is used to find the path of steepest ascent to the maximum or steepest descent to the minimum. The ultimate RSM objective is to determine the optimum operating conditions for the system or to determine a region of the factor space in which the operating specifications are satisfied (usually using a second-order model).

RSM terms:

  • Response surface: It is the surface represented by the expected value of an output modeled as a function of significant inputs (variable inputs only):
    Expected (y) = f (x1, x2, x3,…xn)
  • The method of steepest ascent or descent is a procedure for moving sequentially along the direction of the maximum increase (steepest ascent) or maximum decrease (steepest descent) of the response variable using the first-order model:
    y (predicted) = β0 + Σ βi x
  • The region of curvature is the region where one or more of the significant inputs will no longer conform to the first-order model. Once in this region of operation, most responses can be modeled using the following fitted second-order model:
    y (predicted) = β0 + Σ βi xi + Σ βii xixi +  Σ βij xixj
  • The central composite design is a common DOE matrix used to establish a valid second-order model.

Steps for Response Surface Method


1. Select the y. Select the associated confirmed x’s and boldly select their experimental ranges. These x’s should have been confirmed to have a significant effect on the y through prior experimentation.
2. Add center points to the basic 2k-p design. A center point is a point halfway between the high and low settings of each factor.
3. Conduct the DOE and plot the resulting data on a response surface.
4. Determine the direction of the steepest ascent to an optimum y.
5. Reset the x values to move the DOE in the direction of the optimum y. In general, the next DOE should have x values that overlap those used in the previous experiment.
6. Continue to conduct DOEs, evaluate the results, and step in the direction of the optimal y until a constraint has been encountered or the data shows that the optimum has been reached.


7. Add additional points to the last design to create a central composite design to allow for a second-order evaluation. This will verify if the analysis is at a maximum or minimum condition. If the condition is at an optimum solution, then the process is ended. If the second-order evaluation shows that the condition is not yet at optimum, it will provide direction for the next sequential experiment.

RSM is intended to be a sequence of experiments with an attempt to “dial in to an optimum setting.” Whenever an apparent optimum is reached, additional points are added to perform a more rigorous second-order evaluation.

Plackett-Burman Designs:   


Plackett-Burman designs are used for screening experiments. Plackett- Burman designs are very economical. The run number is a multiple of four rather than a power of 2. Plackett-Burman geometric designs are two-level designs with 4, 8, 16, 32, 64, and  128 runs and work best as screening designs. Each interaction effect is confounded with exactly one main effect. All other two-level Plackett-Burman designs (12, 20, 24, 28, etc.) are non-geometric designs. In these designs, a two-factor interaction will be partially confounded with each of the other main effects in the study. Thus, the non-geometric designs are essentially “main effect designs,” when there is reason to believe that any interactions are of little significance. For example, a Plackett-Burman design in 12 runs may be used to conduct an experiment containing up to 11 factors. With a 20-run design, an experimenter can do a screening experiment for up to 19 factors. As many as 27 factors can be evaluated in a 28-run design.

Plackett-Burman designs are orthogonal designs of Resolution III that are primarily used for screening designs. Each two-way interaction is positively or negatively aliased with the main effect.
• A limited number of runs are needed to evaluate a lot of factors.
• Clever assignment of factors might allow the Black Belt to determine which factor caused the output, despite aliasing.
• It assumes the interactions are not strong enough to mask the main effects.
• Aliasing can be complex.

A Design from a Design Catalogue


The preferred DOE approach examines (screens) a large number of factors with highly fractional experiments. Interactions are then explored or additional levels examined once the suspected factors have been reduced. A one-eighth fractional factorial design is shown below. A total of seven factors are examined at two levels. In this design, the main effects are independent of interactions, and six independent, two-factor interactions can be measured. This design is an effective screening experiment. This particular design comes from a design catalog. Often experimenters will obtain a design generated by a statistical software program. Since this is a one-eighth fractional factorial, there are seven other designs that would work equally as well.
Often, a full factorial or three-level fractional factorial trial (giving some interactions) is used in the follow-up experiment.
Note: 0 = low level and 1 = high level.

A Three-Factor, Three-Level Experiment

Often, a three-factor experiment is required after screening a large number of variables. These experiments may be full or fractional factorial. A one-third fractional factorial design is shown below. Generally, the (-) and (+) levels in two-level designs are expressed as O and 1 in most design catalogues. Three-level designs are often represented as O, 1, and 2.


From a design catalogue test plan, the selected fractional factorial experiment looks
EVOP and PLEX designs


Evolutionary operation (EVOP) is a continuous improvement design. Plant experimentation (PLEX) is a sequence of corrective designs meant to obtain rapid improvement. Both designs are typically small full factorial designs with possible entry points. They are designed to be run while maintaining production; therefore, the inference space is typically very small. EVOP (evolutionary operations) emphasizes a conservative experimental strategy for continuous process improvement. Tests are carried out in phase A until a response pattern is established. Then phase B is centered on the best conditions from phase A. This procedure is repeated until the best result is determined. When nearing a peak, the experimenter will then switch to smaller step sizes or will examine different variables. EVOP can entail small incremental changes so that little or no process scrap is generated. Large sample sizes may be required to determine the appropriate direction of improvement. The method can be extended to more than two variables, using simple main effects experiment designs. The experiment naturally tends to change variables in the direction of the expected improvement, and thus, follows an ascent path. In EVOP experimentation there are few considerations to be taken into account since only two or three variables are involved. The formal calculation of the direction of the steepest ascent is not particularly helpful.


  • They do not disrupt production and can be used in an administrative situation.
  • They force the organization to investigate factor relationships and prove factory physics.


  • They can be time-consuming. For example, because, in PLEX, levels are generally set conservatively to ensure that production is not degraded, it is sometimes difficult to prove statistical validity with a single design. A first design may be used to simply decide factor levels for a subsequent design.
  • They require continuous and significant management support.

Box-Wilson (central composite) design:

A Box-Wilson design is a rotatable design (subject to the number of blocks) that allows for the identification of nonlinear effects. Rotatability is the characteristic that ensures constant prediction variance at all points equidistant from the design center and thus improves the quality of prediction. The design consists of a cube portion made up from the characteristics of 2k  factorial designs or 2k-n fractional factorial designs, axial points, and center points.


  • It is a highly efficient second-order modeling design for quantitative factors.
  • It can be created by adding additional points to a 2k-p design, provided the original design was at least Resolution V or higher.


  • It does not work with qualitative factors.
  •  Axial points may exceed the settings of the simple model and may be outside the ability of the process to produce.

Box-Behnken design:

A Box-Behnken design looks like a basic factorial design with a center point, except that the corner points are missing and replaced with points on the edges. This type of design is used when the corner
point settings are impossible or impractical because of their combined severity. Running three factors at their high settings could produce a volatile situation.


  • It is more efficient than three-level full factorials.
  • It is excellent for trials where corner points are not recommended.
  • It allows all two-factor interactions to be modeled.
  • It can identify interactions and quadratic effects.


  • Enough trials must be run to estimate all one way and two-way effects (even if only one-way effects are of interest).
  • It is hard to modify into other studies.

Back to Home Page

If you need assistance or have any doubt and need to ask any question  contact us at: You can also contribute to this discussion and we shall be very happy to publish them in this blog. Your comment and suggestion is also welcome.

Nonparametric Tests

The hypothesis testing presented in my previous two posts presents a number of tests of hypothesis for continuous, dichotomous, and discrete outcomes. Tests for continuous outcomes focused on comparing means, while tests for dichotomous and discrete outcomes focused on comparing proportions. All of the tests presented in the modules on hypothesis testing are called parametric tests and are based on certain assumptions. For example, when running tests of hypothesis for means of continuous outcomes, all parametric tests assume that the outcome is approximately normally distributed in the population. This does not mean that the data in the observed sample follows a normal distribution, but rather that the outcome follows a normal distribution in the full population which is not observed. For many outcomes, investigators are comfortable with the normality assumption (i.e., most of the observations are in the center of the distribution while fewer are at either extreme). It also turns out that many statistical tests are robust, which means that they maintain their statistical properties even when assumptions are not entirely met. Tests are robust in the presence of violations of the normality assumption when the sample size is largely based on the Central Limit Theorem. When the sample size is small and the distribution of the outcome is not known and cannot be assumed to be approximately normally distributed, then alternative tests called nonparametric tests are appropriate.

Parametric vs. Nonparametric Tests

Parametric implies that distribution is assumed for the population. Often, an assumption is made when performing a hypothesis test that the data is a sample from a certain distribution, commonly the normal distribution. Nonparametric implies that there is no assumption of a specific distribution for the population. An advantage of a parametric test is that if the assumptions hold, the power, or the probability of rejecting H0, when it is false, is higher than the power of a corresponding nonparametric test with equal sample sizes. An advantage of nonparametric tests is that the test results are more robust against violation of the assumptions. Therefore, if assumptions are violated for a test based upon a parametric model, the conclusions based on parametric test p-values may be more misleading than conclusions, based upon nonparametric test p-values.

Nonparametric tests are sometimes called distribution-free tests because they are based on fewer assumptions (e.g., they do not assume that the outcome is approximately normally distributed). Parametric tests involve specific probability distributions (e.g., the normal distribution) and the tests involve estimation of the key parameters of that distribution (e.g., the mean or difference in means) from the sample data. The cost of fewer assumptions is that nonparametric tests are generally less powerful than their parametric counterparts (i.e., when the alternative is true, they may be less likely to reject H0).

It can sometimes be difficult to assess whether a continuous outcome follows a normal distribution and, thus, whether a parametric or nonparametric test is appropriate. There are several statistical tests that can be used to assess whether data are likely from a normal distribution. The most popular is the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Shapiro-Wilk test. Each test is essentially goodness of fit test and compares observed data to quantiles of the normal (or other specified) distribution. The null hypothesis for each test is H0: Data follow a normal distribution versus H1: Data do not follow a normal distribution. If the test is statistically significant (e.g., p<0.05), then data do not follow a normal distribution, and a nonparametric test is warranted. It should be noted that these tests for normality can be subject to low power. Specifically, the tests may fail to reject H0: Data follow a normal distribution when in fact the data do not follow a normal distribution. Low power is a major issue when the sample size is small – which unfortunately is often when we wish to employ these tests. The most practical approach to assessing normality involves investigating the distributional form of the outcome in the sample using a histogram and augmenting that with data from other studies, if available,  that may indicate the likely distribution of the outcome in the population. There are some situations when it is clear that the outcome does not follow a normal distribution. These include situations:

  • when the outcome is an ordinal variable or a rank,
  • when there are definite outliers or
  • when the outcome has clear limits of detection.

Nonparametric Techniques

Nonparametric techniques of hypothesis testing are applicable for many quality engineering problems and projects. The nonparametric tests are often called “distribution-free” since they make no assumption regarding the population distribution. Nonparametric tests may be applied ranking tests in which data is not specific in any continuous sense, but are simply ranks. Parametric tests are generally more powerful and can test a wider range of alternative hypotheses. It is worth repeating that if data are approximately normally distributed then parametric tests (as in the modules on hypothesis testing) are more appropriate. However, there are situations in which assumptions for a parametric test are violated and a nonparametric test is more appropriate.

In nonparametric tests, the hypotheses are not about population parameters (e.g., μ=50 or μ12).   Instead, the null hypothesis is more general.   For example, when comparing two independent groups in terms of a continuous outcome, the null hypothesis in a parametric test is H0: μ12. In a nonparametric test, the null hypothesis is that the two populations are equal, often this is interpreted as the two populations are equal in terms of their central tendency.

Nonparametric tests have some distinct advantages. With outcomes such as those described above, nonparametric tests may be the only way to analyze these data. Outcomes that are ordinal, ranked, subject to outliers or measured imprecisely are difficult to analyze with parametric methods without making major assumptions about their distributions as well as decisions about coding some values (e.g., “not detected”). As described here, nonparametric tests can also be relatively simple to conduct.

Continuous data are quantitative measures based on a specific measurement scale (e.g., weight in pounds, height in inches). Some investigators make the distinction between continuous, interval and ordinal scaled data. Interval data are like continuous data in that they are measured on a constant scale (i.e., there exists the same difference between adjacent scale scores across the entire spectrum of scores). Differences between interval scores are interpretable, but ratios are not. The temperature in Celsius or Fahrenheit is an example of an interval scale outcome. The difference between 30º and 40º is the same as the difference between 70º and 80º, yet 80º is not twice as warm as 40º. Ordinal outcomes can be less specific as the ordered categories need not be equally spaced. Symptom severity is an example of an ordinal outcome and it is not clear whether the difference between much worse and slightly worse is the same as the difference between no change and slightly improved. Some studies use visual scales to assess participants’ self-reported signs and symptoms. Pain is often measured in this way, from 0 to 10 with 0 representing no pain and 10 representing agonizing pain. Participants are sometimes shown a visual scale such as that shown in the upper portion of the figure below and asked to choose the number that best represents their pain state. Sometimes pain scales use visual anchors as shown in the lower portion of the figure below.1

In the upper portion of the figure, certainly, 10 is worse than 9, which is worse than 8; however, the difference between adjacent scores may not necessarily be the same. It is important to understand how outcomes are measured to make appropriate inferences based on statistical analysis and, in particular, not to overstate precision.

Assigning Ranks

The nonparametric procedures that we describe here follow the same general procedure. The outcome variable (ordinal, interval, or continuous) is ranked from lowest to highest and the analysis focuses on the ranks as opposed to the measured or raw values. For example, suppose we measure self-reported pain using a visual analog scale with anchors at 0 (no pain) and 10 (agonizing pain) and record the following in a sample of n=6 participants:

                            7               5               9              3             0               2

 The ranks, which are used to perform a nonparametric test, are assigned as follows: First, the data are ordered from smallest to largest. The lowest value is then assigned a rank of 1, the next lowest a rank of 2, and so on. The largest value is assigned a rank of n (in this example, n=6). The observed data and corresponding ranks are shown below:

Ordered Observed Data:














A complicating issue that arises when assigning ranks occurs when there are ties in the sample (i.e., the same values are measured in two or more participants). For example, suppose that the following data are observed in our sample of n=6:

Observed Data:       7         7           9            3           0          2

The 4th and 5th ordered values are both equal to 7. When assigning ranks, the recommended procedure is to assign the mean rank of 4.5 to each (i.e. the mean of 4 and 5), as follows:

Ordered Observed Data:














Suppose that there are three values of 7.   In this case, we assign a rank of 5 (the mean of 4, 5 and 6) to the 4th, 5th and 6th values, as follows:

Ordered Observed Data:














Using this approach of assigning the mean rank when there are ties ensures that the sum of the ranks is the same in each sample (for example, 1+2+3+4+5+6=21, 1+2+3+4.5+4.5+6=21, and 1+2+3+5+5+5=21). Using this approach, the sum of the ranks will always equal n(n+1)/2. When conducting nonparametric tests, it is useful to check the sum of the ranks before proceeding with the analysis.

To conduct nonparametric tests, we again follow the five-step approach outlined in the modules on hypothesis testing.

  1. Set up hypotheses and select the level of significance α. Analogous to parametric testing, the research hypothesis can be one- or two-sided (one- or two-tailed), depending on the research question of interest.
  2. Select the appropriate test statistic. A test statistic is a single number that summarizes the sample information. In nonparametric tests, the observed data is converted into ranks and then the ranks are summarized into a test statistic.
  3. Set up decision rule. The decision rule is a statement that tells under what circumstances to reject the null hypothesis. Note that in some nonparametric tests we reject H0 if the test statistic is large, while in others we reject H0 if the test statistic is small. We make the distinction as we describe the different tests.
  4. Compute the test statistic. Here we compute the test statistic by summarizing the ranks into the test statistic identified in Step 2.
  5. Conclusion. The final conclusion is made by comparing the test statistic (which is a summary of the information observed in the sample) to the decision rule.   The final conclusion is either to reject the null hypothesis (because it is very unlikely to observe the sample data if the null hypothesis is true) or not to reject the null hypothesis (because the sample data are not very unlikely if the null hypothesis is true).

Three powerful nonparametric techniques will be described with examples: Kendall Coefficient of Concordance, Spearman Rank Correlation Coefficient (rs). and Kruskal- Wallis one-way analysis of variance.

Kendall Coefficient of Concordance

Example: At a textile plant, some years ago, the primary product was denim. An important customer characteristic was “hand.” That is, how the fabric drapes and feels to the touch. Traditionally, the hand was evaluated by individuals (judges or inspectors) who had become experts over time by literally handling the fabric. The lab manager had obtained, on trial from the vendor, a “handleometer,” an instrument to objectively measure hand. She believed that the current subjective procedure for determining hand was too insensitive to change and ineffective in establishing a common customer specification. The plant manager, two department heads, and a product engineer (the plant judgment panel) were opposed to the handleometer. They said that the handleometer only measures the bending moment of fabric while they recognized multidimensional aspects of hand: stiffness, friction, drape, etc. The two measuring systems were compared using an analytic technique to determine whether the four-panel members represented a statistically homogeneous decision-making group. Secondly, to correlate the panel average ranking with the handleometer ranked values. Ten random samples from production were obtained.  The panel members were to independently rank them from most to least with no ties  (although the expanded procedure permits ties), 10 samples are to be independently ranked by 4 judges or inspectors for the sensory response variable, hand. The null hypothesis is that the judge’s rankings are independent of each other. The judges independently ranked the samples for the characteristic specified. The Kendall statistics are calculated:

  • Each judge ranks the samples from 1 to 10 (rank 1 is most hand)
  •  Sum the ranks of each judge (ΣR)
  •  Determine the average rank
  • Subtract the rank sum for each judge from the average rank (ΣR)
  • Square the rank sum differences (ΣR ) 2
  •  Sum the squares of the rank sum differences Σ[ΣR ] 2)

R̅= 220/10 = 22                    s = 1066                                        K = Judges = 4 N = Samples =10
Degrees of freedom = ν = N – 1 = 9                                                            Critical chi square = χ20.01,9= 21.67
The null hypothesis is rejected. The calculated chi-square is larger than the critical chi-square. The four judges’ rankings are not independent of each other. They constitute a homogeneous panel. This does not say that they are incorrect; only that they respond in a uniform way to this form of sensory input.

The Spearman Rank Correlation Coefficient (rs)


The Spearman correlation coefficient is a measure of association that requires that both variables be measured in at least an ordinal scale so that the samples or individuals to be analyzed may be ranked in two ordered series. If one of the series is continuous and one is ranked, then both series must be ranked. If both series contain continuous data from an unknown distribution, both series must be ranked.
Example: The ten rank sums from the Kendall coefficient example are ranked from largest to smallest. The rank numbers from 1 through 10 are then assigned to the ranked panel sums. For the same samples, the handleometer values are ranked and then assigned the integer values from 1 through 10. The differences between the paired ranks are squared and summed.


N = 10. If N is equal to or greater than 10, the following correlation equation can be used.

The strong correlation (0.97) between the ranked handleometer variable measurements and the ranked panel subjective sensory responses of judges, shows that the handleometer could replace people. The handleometer values can be obtained more quickly, with greater objectivity, and with a longer life span than individuals. The lab manager was disappointed to learn that the instrument would not be purchased, due to the objections presented before the analysis.

Kruskal-Wallis One-Way Analysis of Variance by Ranks

This is a test of independent samples. The measurements may be continuous data, but the underlying distribution is either unknown or known to be non-normal. In either case, the data can be ranked and analyzed without the constraint of having to assume a known population distribution.
Example: Three different plants manufactured the same garment style. Variation in garment length was a customer concern. The length was measured to the nearest 1/4″. Within each plant, only four measurement increment values were obtained. This lack of measurement sensitivity indicated that ranking the data was preferred to assuming normality. The null hypothesis is that the population medians are the same. Ho: M1 = M2 =… = Mn. The following table shows data coded as deviations from a common reference value.

Original Data Measurements (Coded)

Plant APlant BPlant C
0.25 0.50 0.25
0.50 0.25 1.00
1.00 0.75 0.75
0.50 0.25 1.00
0.50 0.25 0.50
0.25 1.00

For simplicity and convenience, the coded data can be further coded as integers.

Plant APlant BPlant C
1 4

The next step is to construct a combined sample, rank the combined data while retaining plant identity, and reconstitute the three plant sample sets with ranks replacing the original data. Tied ranks are replaced by the average value of the ties.
There were seven coded values tied at 4. They would have been ranks 1 through 7. The average of ranks 1 – 7 is 4. All coded measurement values of 1 received the average rank of 4. In a similar fashion, the five coded values tied at 2 received the average rank of 10. The three coded values tied at 3 received the average rank of 14,
and the six coded values tied at 4 received the average rank of 18.5. Reconstitute the original sample sets of coded data of plants A, B, and C with the final tied ranks. In some applications, there may be both individual ranks and tied ranks. Wherever there are tied ranks, they are to be used. Now do the following analysis for plant columns A, B, and C.

Plant APlant BPlant C
14 18.5
74.554.5102.0Rank Sum
693.781495.0421486.286(Rank Sum)2/n

G = ∑(Rank Sum)2/n = 693.781 + 495.042 + 1486.286 = 2675.109   N = 8 + 6 + 7 = 21
The significance statistic is H. H is distributed as chi-square. Tie values are included in the calculation of chi-square.
Let t = number of tied values in each tied set. Then T = t3 – t for that set.

Tied SettT

Let J = ∑T = 690 Let k = number of sample sets.  DF = k – 1 = 3 – 1 = 2.      Let α = 0.05.
Critical chi square =χ20.05,2 = 5.99
H is less than critical chi-square. Therefore, the null hypothesis of equality of population medians cannot be rejected.

Mann-Whitney U Test

With ordinal measurements, the Mann-Whitney U test is used to test whether two independent groups have been drawn from the same population. This is a powerful nonparametric test and is an alternative to the t-test when the normality of the population is either unknown or believed to be non-normal. Consider two populations, A and B. The null hypothesis, Ho, is that and B have the same frequency distribution with the same shape and spread (the same median). An alternative hypothesis, H1, is that A is larger than B, a directional hypothesis. We accept H1, if the probability is greater than 0.5 that a score from A is larger than a score from B. That is, if a is one observation from population A, and b is one observation from population B, then H1 is that P (a > b) > 0.5. If the evidence from the data supports H1, this implies that the bulk of the population is higher than the bulk of population B. If we wished to test if B is statistically larger than A, then H1 is P (a > b) < 0.5. For a 2-tailed test, that is, for a prediction of differences that does not state direction, H1 would be P (a > b) ≠ 0.5 (the medians are not the same).

If there are n1, observations from population A, and n2 observations from population B, rank all (n1 + n2) observations in ascending order. Ties receive the average of their rank number. The data sets should be selected so that n1<n2. Calculate the sum of observation ranks for population A, and designate the total as Ra, and the sum of observation ranks for population B, and designate the total as Rb.

Ua=n1 n2+O.5n1(n1+1)-Ra
Ub=n1 n2+0.5 n2( n2+1)-Rb
Where Ua + Ub = n1 n 2

Calculate the U statistic as the smaller of Ua and Ub. For n2≤ 20, Mann-Whitney tables are used to determine the probability, based on the U, n1, and n2 values. This probability is then used to reject or fail to reject the null hypothesis. If n2 > 20, the distribution of U rapidly approaches the normal distribution and the following apply:


Umean = µu = 0.5 n1 n2Example: Consider an experimental group (E) and a control group (C) with scores as shown in Table below. Note that n1 = 3 and n2 = 4. Does the experimental group have higher scores than the control group? Ho: A and B have the same median. H1: median A is larger than median B. Accept H1: if P (a > b) > 0.5.To find U, we first rank the combined scores in ascending order, being careful to retain each score’s identity as either an E or C
U = minimum(Ue, Uc) = minimum(3, 9) = 3. The Ho probability for n1 = 3, n1 = 4, and U = 3 is shown in Table below as P = 0.200. Since this is less than 0.5, we fail to reject Ho and conclude that scores for both groups have come from the same population. The probabilities in the Tables given below are one-tailed. For a two-tailed test, the values for P shown in the Table should be doubled.

Wilcoxon-Mann-Whitney Rank Sum Test


The Wilcoxon-Mann-Whitney rank-sum test is similar in application to the Mann- Whitney Test. The null hypothesis is that the two independent random samples are from the same distribution. The alternate hypothesis is the two distributions are different in some way. Note that this test does not require normal distributions.
The observations or scores of the two samples (A and B) are combined in order of increasing rank and given a rank number. Tied values are assigned tied rank values. In cases where equal results occur, the mean of the available rank numbers is assigned. Next find the rank-sum, R, of the smaller sample. Let N equal the size of the combined samples (N = n1 + n2) and n equal the size of the smaller sample. Then calculate:     R’ = n (N + 1) – R
The rank-sum values, R and R’, are compared with critical values from the Table below. It represents critical values of the smaller rank-sum. If either R or R’ is less than the critical value, the null hypothesis of equal means is rejected. If n2 > 20, the equations from the U test given above are used for the Z calculation.
Example: Determine if the data from samples A and B  have the same distribution. The null hypothesis, Ho, is the data from samples A and B have the same median. The alternate hypothesis, H1, is A median is larger than B median.nA=9, nB=10,N=19, R=77 and R’=n(N+1)-R=(9)(20)-77=103 ,
Let α = 0.05 for a one-tailed test. From the Table below the critical value is 69. Since R = 77 is larger than 69, we fail to reject the null hypothesis of equal means. If H1 had been A median is different than B median, then a two-tailed test would have been used.


Wilcoxon-Mann-Whitney Critical values

Levene’s Test

Levene’s test is used to test the null hypothesis that multiple population variances (corresponding to multiple samples) are equal. Levene’s test determines whether a set of k samples have equal variances. Equal variances across samples are called homogeneity of variances. Some statistical tests, i.e. the analysis of variance, assume that variances are equal across groups or samples.  The Levene test can be used to verify that assumption. Levene’s test is an alternative to the Bartlett test. The Levene test is less sensitive
than the Bartlett test to depart from normality. If there is strong evidence that the data does in fact come from a normal, or approximately normal, distribution, then Barlett’s test has better performance. The well-known F test for the ratio between two sample variances assumes the data is normally distributed. Levene’s variance test is more robust against departures from normality. When there are just two sets of data, the Levene procedure is to:

  1. Determine the mean
  2. Calculate the deviation of each observation from the mean
  3. Let Z equal the square of the deviation from the mean
  4. Apply the t test of two means to the Z data

The methodology for this calculation is remarkably similar to that presented earlier for the 2 mean equal variance t-test. The sample sizes do not need to be equal for Levene’s test to apply.

Mood’s Median Test

Mood’s Median Test performs a hypothesis test of the equality of population medians in a one-way design. The test is robust against outliers and errors in data and is particularly appropriate in the preliminary stages of analysis. The median test determines whether k independent groups (equal size is not required) have either been drawn from the same population or from populations with equal medians. The first step is to find the combined median for all scores in the k groups. Next, replace each score by a plus if the score is larger than the combined median and by a minus, if it is smaller than the combined median. If any score falls at the combined median, the score may be assigned to the plus and minus groups by designating a plus to those scores which exceed the combined median and a minus to those which fall at the combined median or below. Next set up a chi-square “k x 2” table with the frequencies of pluses and minuses in each of the k groups.


Table I shows the counts of critical defects that occurred in 52 lots from six different styles. Table ll identifies and counts those scores above the combined median. The combined median i determined by pooling all of the Table I values and determining the middle value. In this case, the ordered 26th value is 3 and the 27th value is 4, so the median of all of the values is 3.5.


The (+0) is the number of observed cells with values greater than the median. The (-0) is the number of observed cells with values less than the median. The expected frequency (E) for each style for the number of lots above or below the median for that style, is one-half of the number of lots in that style or N/2.
There are 26 scores(+) above the combined median and 26 scores (- , not shown) below the combined median. To apply the chi-square test and to set up a chi-square table, the Table below shows the chi-square, k x 2 tables where (0) represents the observed frequencies and (E) represents the expected frequencies. Because cell expected frequencies (E) should not be less than 4 (preferably 5), the results of styles K and L are combined. The null hypothesis, Ho, states that all style medians are equal. The alternative hypothesis, H1, states that at least one style median is different. The chi-square calculation over all ten cells is represented by: The degrees of freedom for contingency tables is:
df= (rows – 1) x (columns – 1) = (2 – 1) x (5 – 1) = 4
Assume we want a level of significance (alpha) of 0.05. The critical chi-square:
χ20.05,4 = 9.49
Since the calculated χ2 is less than the critical χ2, the null hypothesis cannot be rejected, at a 0.05 level of significance (or a 95% confidence level).

Nonparametric Test Summary

For tests of population location, the following nonparametric tests are analogous to the parametric t-tests and analysis of variance procedures in that they are used to perform tests about population location or center value. The center value is the mean for parametric tests and the median for nonparametric tests

  1. One-sample sign performs a test of the median and calculates the
    corresponding point estimate and confidence interval. Use this test as a nonparametric alternative to the one-sample Z and one-sample t-tests.
  2. One-sample Wilcoxon performs a signed-rank test of the median and calculates the corresponding point estimate and confidence interval. Use this test as a nonparametric alternative to the one-sample Z and one-sample t-tests.
  3. Mann-Whitney performs a hypothesis test of the equality of two population medians and calculates the corresponding point estimate and confidence interval. Use this test as a nonparametric alternative to the two-sample t-test.
  4. Kruskal-Wallis performs a hypothesis test of the equality of population medians for a one-way design (two or more populations). This test is a generalization of the procedure used by the Mann-Whitney test and, like Mood’s median test, offers a nonparametric alternative to the one-way analysis of variance. The Kruskal-Wallis test looks for differences among the population medians.
  5. Mood’s median test performs a hypothesis test of the equality of population medians in a one-way design. Mood’s median test, like the Kruskal-Wallis test, provides a nonparametric alternative to the usual one-way analysis of variance. Mood’s median test is sometimes called a median test or sign scores test.

The Kruskal-Wallis test is more powerful (the confidence interval is narrower, on average) than Mood’s median test for analyzing data from many distributions, including data from the normal distribution, but is less robust against outliers.


Comparison Summary of Non Parametric test 

It should be noted that nonparametric tests are less powerful (they require more data to find the same size difference) than the equivalent t-tests or ANOVA tests. In general, nonparametric procedures are used either when parametric assumptions cannot be met, or when the nature of the data requires a nonparametric test.

Back to Home Page

If you need assistance or have any doubt and need to ask any questions contact us at You can also contribute to this discussion and we shall be happy to publish them. Your comment and suggestion are also welcome.

Multivariate analysis

In univariate statistics, there are one or more independent variables (X1, X2), and only one dependent variable (Y). Multivariate analysis is concerned with two or more dependent variables, Y1, Y2, being simultaneously considered for multiple independent variables, X1, X2, etc. The manual effort used to solve multivariate problems was an obstacle to its earlier use. Recent advances in computer software and hardware have made it possible to solve more problems using multivariate analysis. Some of the software programs available to solve multivariate problems include SPSS, S-Plus, SAS, and Minitab. This coverage of multivariate analysis can only be considered an introduction to the subject. For more in-depth information the reader is advised to consult other references.

Multivariate analysis has found wide usage in the social sciences, psychology, and educational fields. Applications for multivariate analysis can also be found in the engineering, technology, and scientific disciplines. This element will highlight the following multivariate concepts or techniques:

  • Multi-Vari Studies
  • Principal components analysis
  • Factor analysis
  • Discriminant function analysis.
  • Cluster analysis
  • Canonical correlation analysis
  • Multivariate analysis of variance

Multi-Vari Studies

Multi-Vari charts are practical graphical tools that illustrate how variation in the input variables (x’s) impacts the output variable (y) or response. These charts can help screen for possible sources of variation (x’s). There are two types of Multi-Vari studies: 1) Passive nested studies, which are conducted without disrupting the routine of the process, and 2) Manipulated crossed studies, which are conducted by intentionally manipulating levels of the x’s. Sources of variation can be either controllable and/or noise variables. Categorical x’s are very typical for Multi-Vari studies (i.e., short vs. long, low vs. high, batch A vs. batch B vs. batch C). Multi-Vari studies help the organization determine where its efforts should be focused on. Given either historic data or data collected from a constructed sampling plan, a Multi-Vari study is a visual comparison of the effects of each of the factors by displaying, for all factors, the means at each factor level. It is an efficient graphical tool that is useful in reducing the number of candidate factors that may be impacting a response (y) down to a practical number.

In statistical process control, one tracks variables like pressure, temperature, or pH by taking measurements at certain intervals. The underlying assumption is that the variables will have approximately one representative value when measured. Frequently, this is not the case. The temperature in the cross-section of a furnace will vary and the thickness of a part may also vary depending on where each measurement is taken. Often the variation is within the piece and the source of this variation is different from piece-to-piece and time-to-time variation. The multi-vari chart is a very useful tool for analyzing all three types of variation. Multi-Vari charts are used to investigate the stability or consistency of a process. The chart consists of a series of vertical lines, or other appropriate schematics, along a time scale. The length of each line or schematic shape represents the range of values found in each sample set. Variation within samples (five locations across the width) is shown by the line length. Variation from sample to sample is shown by the vertical positions of the lines.

To establish a multi-vari chart, a sample set is taken and plotted from the highest to lowest value. This variation may be represented by a vertical line or other rational schematics. The figure below shows an injection-molded plastic part. The thickness is measured at four points across the width as indicated by arrows.

Three hypothetical cases are presented to help understand the interpretation of multi-vari charts

Interpretation of the chart is apparent once the values are plotted.

The advantages of multi-vari charts are:

  1. It can dramatize the variation within the piece (positional).
  2. It can dramatize the variation from piece-to-piece (cyclical).
  3. It helps to track any time-related changes (temporal).
  4. It helps minimize variation by identifying areas to look for excessive variation. It also identifies areas not to look for excessive variation.

The table below identifies the typical areas of time and locational variation.

Note, positional variation can often be broken into multiple components:

Nested Designs

Sources of variation for a passive nested design might be:

  • Positional (i.e., within-piece variation).
  • Cyclical (i.e., consecutive piece-to-piece variation).
  • Temporal (time-to-time variation, i.e., shift-to shift or day-to-day).

The y-axis in this figure records the measure of performance of units taken at different periods of time, in a time-order sequence. Each cluster (shaded box) represents three consecutive parts, each measured in three locations. Each of the three charts represents a different process, with each process having the greatest source of variation coming from a different component. In the Positional Chart, each vertical line represents a part with the three dots recording three measurements taken on that part. The greatest variation is within the parts. In the Cyclical Chart, each cluster represents three consecutive parts. Here, the greatest variation is shown to be between consecutive parts. The third chart, the Temporal Chart, shows three clusters representing three different shifts or days, with the largest variation between the clusters.

Nested Multi-Vari Example:

In a nested Multi-Vari study, the positional readings taken were nested within a part. The positions within part were taken at random and were unique to that part; position 1 on part 1 was not the same as position 1 on part 2. The subgroups of three “consecutive parts” were nested within a shift or day. The parts inspected were unique to that shift or day. A sampling plan or hierarchy was created to define the parameters in obtaining samples for the study.

A passive nested study was conducted in which two consecutive parts (cyclical) were measured over three days (temporal). Each part was measured in three locations, which were randomly chosen on each part (positional). A nested Multi-Vari chart was then created to show the results.

The day-to-day variation appears to be the greatest source of variation, compared to the variation within part or part-to-part within a day (consecutive parts). The next step in this study would be to evaluate the process parameters that impact day-to-day variation i.e., what changes (different material lots/batches, environmental factors, etc.) are occurring day to day to affect the process.

Crossed Designs:

Sources of variation for a manipulated crossed design might be:

  • Machine (A or B).
  • Tool (standard or carbide).
  • Coolant (off or on).

Interactions can only be observed with crossed studies. When an interaction occurs, the factors associated with the interaction must be analyzed together to see the effect of one factor’s settings on the other factor’s settings. With fully crossed designs, the data may be reordered and a chart may be generated with the variables in different positions to clarify the analysis. In contrast, passive nested
designs are time-based analyses and therefore must maintain the data sequence in the Multi-Vari chart.

Crossed Design Example:

A sampling plan or hierarchy for a crossed design is shown below:

The coolant was turned “on” or “off” for each of two tools while the tools were being used on one of two machines. Every possible combination was run using the same two machines, the same two types of tools, and the same two coolant settings. The following chart uses these sources to investigate graphically the main effects and interactions of these factors in improving surface finish (lower is better).

It appears that the best (lowest) value occurs with carbide tools using no coolant. The different machines have a relatively small impact. It may also be noted that when the coolant is off, there is a large difference between the two tool types. Because of the crossed nature of this study, we would conclude that there is an interaction between coolant and tool type. The interaction is also apparent in this second chart, which shows the same data but different sorting. Coolant “off” and “carbide tool” is again the lowest combinations. Notice how coolant “on” is now the lowest combination with the standard tool. Hence, the interaction could also be expected here.

Steps to create a Multivariate chart:

Multi-Vari charts are easiest done with a computer, but not difficult to do by hand.

  1. Plan the Multi-Vari Study.
    • Identify the Y  to be studied.
    • Determine how they will be measured and validate the measuring system
    • Identify the potential sources of variation. For nested designs, the levels depend on passive data; for crossed designs, the levels are specifically selected for manipulation.
    • Create a balanced sampling plan or hierarchy of sources. Balance refers to equal numbers of samples within the upper levels in the hierarchy (i.e., two tools for each machine). A strict balance of exactly the same number of samples for each possible combination of factors, while desirable, is not an absolute requirement. However, there must be at least one data point for each possible combination.
    • Decide how to collect data in order to distinguish between the major sources of variation.
    • When doing a nested study, the order of the sampling plan should be maintained to preserve the hierarchy.
  2. Take data in the order of production (not randomly).
    • Continue to collect data until 80% of the typical range of the response variable is observed (low to high). (This range may be estimated from historical data.)
    • For fully crossed designs, a Multi-Vari study can be used to graphically look at interactions with factors that are not time-dependent (in which case, runs can be randomized as in a design of experiments).
  3. Take a representative sample.
    It is suggested that a minimum of three samples per lowest level subgroup be taken.
  4. Plot the data.
    • The y-axis will represent the scaled response variable.
    • Plot the positional component on a vertical line from low to high and plot the mean for each line (each piece). (Offsetting the bar at a slight angle from vertical can improve clarity.)
    • Repeat for each positional component on neighboring bars.
    • Connect the positional means of each bar to evaluate the cyclical component.
    • Plot the mean of all values for each cyclic group.
    • Connect cyclical means to evaluate the temporal component.
    • Compare components of variation for each component (largest change in y (Δy) for each component).
    • Many computer programs will not produce charts unless the designs are balanced or have at least one data point for each combination.
    • Each plotted point represents an average of the factor combination selected. When a different order of factors is selected, the data, while still the same, will be re-sorted. Remember, if the study is nested, the order of the hierarchy must be maintained from the top-down or bottom-up of the sampling plan.
  5.  Analyze the results.
    Ask Is there an area that shows the greatest source of variation? Are there cyclic or unexpected nonrandom patterns of variation? Are the nonrandom patterns restricted to a single sample or more? Are there areas of variation that can be eliminated (e.g., shift-to-shift variation)?

Several ribbons, one-half short and one-half long and in four colours (red, white, blue, and yellow), are studied. Three samples of each combination are taken, for a total of twenty-four data points (2 x 4 x 3). Ribbons are nested within the “length”: ribbon one is unique to “short” and ribbon four is unique to “long.” Length, however, is crossed with colour: “short” is not unique to “blue.” Length is repeated for all colours. (This example is a combination study, nested and crossed, as are many Gauge R&Rs.)

The following data set was collected. Note that there are three ribbons for each combination of length and colour as identified in the “Ribbon #” column.

The ribbons are sorted by length, then colour to get one chart.

  • Each observation is shown in coded circles.
  • The squares are averages within a given length and colour.
  • Each large diamond is the average of six ribbons of both lengths within a colour.
  • Note the obvious pattern of the first, second, and third measured ribbons within the subgroups. The short ribbons (length = 1) consistently measure low, while the long ribbons consistently measure high, and the difference between short and long ribbons (Δy) is consistent.
  • There is more variation between colours than lengths (Δy is greater between colours than between lengths).
  • Also note the graph indicates that while the value of a ribbon is based upon both its colour and length, longer (length = 2) ribbons are in general more valuable than short ribbons. However, a short red ribbon has a higher value than a long yellow one. Caution should be taken here because not much about how the individual values vary relative to this chart is known. Other tools (e.g., hypothesis tests and DOEs) are needed for that type of analysis.

Multi-Vari Case Study

A manufacturer produced flat sheets of aluminium on a hot rolling mill. Although a finish trimming operation followed, the basic aluminium plate thickness was established during the rolling operation. The thickness specification was 0.245″ ± 0.005″. The operation had been producing scrap. A process capability study indicated that the process spread was 0.0125″ (a Cp of 0.8) versus the requirement of 0.010″. The operation generated a profit of approximately  Rs 20,00,000 per month even after a scrap loss of Rs,2,00,000 per month. Refitting the mill with a more modern design, featuring automatic gauge control and hydraulic roll bending, would cost Rs 80,00,000 and result in 6 weeks of downtime for installation. The department manager requested that a multi-vari study be conducted by a quality engineer before further consideration of the new mill design or other alternatives. Four positional measurements were made at the corners of each flat sheet in order to adequately determine within-piece variation. Three flat sheets were measured in consecutive order to determine piece-to-piece variation. Additionally, samples were collected each hour to determine temporal variation. The pictorial results are as follows The maximum detected variation was 0.010″. Without sophisticated analysis, it appeared that the time-to-time variation was the largest culprit. A gross change was noted after the 10:00 AM break. During this time, the roll coolant tank was refilled.
Actions taken over the next two weeks included re-leveling the bottom back-up roll (approximately 30% of total variation) and initiating more frequent coolant tank additions, followed by an automatic coolant make-up modification (50% of total variation). Additional spray nozzles were added to the roll stripper housings to reduce heat build-up in the work rolls during the rolling process (10%-15% of total variation). The piece-to-piece variation was ignored. This dimensional variation may have resulted from roll bearing play or variation in incoming aluminum sheet temperature (or a number of other sources). The results from this single study indicated, if all of the modifications were perfect, the resulting measurement spread would be 0.002″ total. in reality, the end result was  0.002″ or 0.004″ total, under conditions similar to that of the initial study. The total cash expenditure was Rs 80,000 for the described modifications. All work was completed in two weeks. The specification of 0.245″ ± 0.005″ was easily met.

Principal Components Analysis

Principal components analysis (PCA) and factor analysis (FA) are two related techniques used to find patterns of correlation among many possible variables or subsets of data and to reduce them to a smaller manageable number of components or factors. The researcher attempts to find the primary components, or factors, that account for most of the sources of variance. PCA refers to subsets as components and FA uses the term factors. Grimm states that a minimum of 100  observations should be used for PCA. The ratio is usually set at approximately 5 observations per variable. If there are 25 variables, then the ratio of 5:1 requires 5 observations/variable x 25 variables = 125 observations.

For illustration purposes, five independent variables will be considered in the growth of communities. The investigator wants to know how many of these components really contribute to growth: one, two, three, or all?  Perhaps two principal components will explain 95% of the variance. The other three may only contribute 5%. At one time, multivariate analysis required familiarity with linear algebra and matrices. To reduce this manual effort, a statistical software package such as Minitab, SPSS, or S-Plus can be used. Minitab is used in this discussion to display variances and the correlation matrix. Higher correlation values indicate a key linkage of the factors. An example of PCA is presented below in the table below. In this example, an investigator wishes to uncover the principal factors that are important for a community desiring high-tech growth. If there are only a few principal factors accounting for the vast majority of the variance in growth, then communities can focus on these vital few. The independent factors are:

  • High tech workers (thousands of workers)
  • Entrepreneurial culture (number of startups per year)
  •  University-industry interactions (measured by projects per year)
  • Creative classes (percentage of professionals and knowledge workers)
  •  Amount of venture capital (in millions of dollars)

The table below shows hypothetical data generated from interviews with community leaders.

For illustration purposes, the Minitab statistical recap of the information is shown in the table below. It provides the correlation matrix. A step-by-step analysis of the Minitab results are as follows:

  • A correlation matrix is used to determine the relationship between components.
  • Matrices define quantities as eigenvalues and eigenvectors. This is an eigenvalue analysis.
  • The eigenvalues are summed and a proportion is calculated. The sum of eigenvalues is 4.9999 (5.0 due to rounding errors). Thus, 3.5856 divided by 4.9999 is 0.717. PC1 contains 71.7% of the variance.
  • PC1 and PC2 explain 89.2% of the variance. This may be sufficient for the researcher.
  • There are five total components. Pareto analysis indicates two principal components.
  •  The first PC indicates that there is no clear separation for four components (high tech workers, entrepreneurial culture, university-industry projects, and venture capital). It is up to the researcher to further distinguish this grouping. It could be more related to the need for a critical mass of necessary resources The second PC indicates that “creative class” is the prime component.  A closer look at the first principal component may be required since the values are negative. (This was a small illustrative sample.)
  •  A “scree” plot (similar to a Pareto line chart) is provided by Minitab software to display the “vital few” eigenvalues.

Finally, an equation can be generated for the two principal components via the use of the coefficients.

PC1 = -O.449 (hightec) -O.507 (entre) -0.512 (university)  -0.226 (creative) – 0.478 (venture)
PC2 = 0.154 (hightec) +0.189 (entre) +O.80 (university) -O.966 (creative) -0.025 (venture)

Factor Analysis

Factor analysis is a data reduction technique to identify factors that explain variation. It is very similar to the principal components analysis technique. That is factor analysis attempts to simplify complex sets of data, reducing many factors to a smaller set. However, there is some subjective judgment involved in describing the factors in this method of analysis. The output variables are linearly related to the input factors. The variables under investigation should be measurable, have a range, of measurements, and be symmetrically distributed. There should be four or more input factors for each dependent variable. Factor analysis undergoes two stages: factor extraction and factor rotation. The first analysis will distinguish the major factors for further study (extraction). The second stage will rotate the factors, to make them more meaningful. A principal components analysis can be performed on the data to provide a reduction in the number of factors. ( Minitab can also examine the data through a “maximum likelihood” method.) The economic development data from the previous example was channeled through a principal components analysis which indicated that two factors were significant. From this information, a researcher can go back into Minitab, perform a factor analysis for two factors and obtain a correlation matrix. To make sense of the information, note that Factor 1 has the four factors in a grouping (enterprise, university, high tech, and venture) and Factor 2 has the creative class as the prime factor. This is a similar result to the earlier principal components analysis. Again, the first factor has negative readings, so the researcher should examine that grouping more closely for meaning. The communality column indicates whether the chosen variables explain the variability fit very well. The communality numbers are very high. This means that the researcher can state that the two major factors in high technology community development would involve the five studied variables. The data and factors can be rotated (by the software) to view the data from a different perspective. The four rotational methods in Minitab are equimax, varimax, quartimax, and orthomax. Other software has other varieties.

Discriminant Analysis

If one has a sample with known groups, discriminant analysis can be used to classify the observations or attributes into two or more groups. Discriminant analysis can be used as either a predictive or a descriptive tool. The decisions could involve medical care, college success attributes, car loan creditworthiness, or previous economic development issues. Discriminant analysis can be used as a follow-up to the use of MANOVA. The possible number of linear combinations (discriminant functions) for a study would be the smaller of the number of groups -1, or the number of variables. Some assumptions in the discriminant analysis are: the variables are multivariate, normally distributed, the population variances and covariances among the dependent variables are the same, and the samples within the variables are randomly obtained and exhibit independence of scores from the other samples. Minitab provides two forms of analysis: a linear and quadratic discriminant analysis. The linear discriminant analysis assumes that all groups have the same covariance matrix. This is not the case for the quadratic case. In the linear discriminant analysis, the Mahalanobis distance is the measure used to form or classify groups. The Mahalanobis distance is the squared distance (linear measure) from the observation to the group center. The classification into groups is formed by the distance measure. In the quadratic discriminant analysis, the squared distance does not translate to a linear function, but into a quadratic function. The quadratic distance is called the generalized squared distance.
The previous example, which provided information on high technology growth, will be used for a discriminant analysis example. An additional column has been inserted. It is a column used to state that the area is a “new economy” community. For example, a “yes” or “no” will be used to indicate if a community is considered a “new economy” area. The discriminant analysis will correlate the data and verify if the decision was correct.

The Minitab analysis states that the decisions on the grouping were 10 out of 10 (100% correct). That is, the values in the various factors match up enough to place various regions in certain categories.

Discriminant Analysis: New economy versus creative class, entrepreneurial culture, University-industry projects and venture capital.
Linear Method for Response: New Economy Predictors such as  creative, entrepre, universi, high tech venture

Summary of Classification:  N = 10     N Correct = 10       Proportion Correct = 1.000 (100%)

Squared distance between groups
(also called the Mahalanobis distance):


Linear Discriminant Function for Group:

The above results are Minitab outputs with few adjustments.

Cluster Analysis

Cluster analysis is used to determine groupings or classifications for a set of data. A variety of rules or algorithms have been developed to assist in group formations. The natural groupings should have observations classified so that similar types are placed together. A file on attributes of high achieving students could be grouped or classified by IQ, parental support, school system, study habits, and available resources. Cluster analysis is used as a data reduction method in an attempt to make sense of large amounts of data from surveys, questionnaires, polls, test questions, scores, etc.
The economic development example in the previous discussion will again be used to validate groupings. The two types of groups will be the new economy and not the new economy. The graphic output from the analysis is the classification tree or dendogram. It is a graphic line graph linking variables and groups at various stages.
The Table data will be analyzed by the cluster analysis method. Using Minitab, the first analysis request calls for two groups. (More groups can be used.) It is displayed below. The analysis shows our requested two groupings. However, instead of grouping into our presumed two groups of new economy or not the new economy, the program used an algorithm based on measures of “closeness” between groups. Since the author requested two groups, the final iteration provides two groups. The dendogram in the Figure below provides a visual that San Jose is distinctive and of a higher ranking than the other communities. The dendogram shows that Austin and Seattle are also distinct from the other lower communities. Communities 7, 8, and 9 forms the lowest cluster. This result can be verified by rerunning the analysis and requesting four groupings. Another interesting analysis would be to group the data by the original five factors as shown above Figure indicates that creative class is separated from the other factors in the grouping. In the principal components and factor analysis discussion, creative class was always the major factor listed in the second group, separated from the other four factors. Similar results can be obtained using different multivariate tools.

Canonical Correlation Analysis

Canonical analysis tests the hypothesis that effects can have multiple causes and causes can have multiple effects. This technique was developed by Hotelling in 1935 but was not widely used for over 50 years. The emergence of personal computers and statistical software has led to its fairly recent adoption. Canonical correlation analysis is a form of multiple regression to find the correlation between two sets of linear combinations. Each set may contain several related variables. The relating of one set of independent variables to one set of dependent variables will form linear combinations. The largest correlation values for sets are used in the analysis. The pairings of linear combinations are called canonical variates, and the correlations are called canonical correlations (also called characteristic roots). There may be more than one pair of linear combinations that could be applicable for an investigation. The maximum number of linear combinations would be limited by the number of variables in the smaller set. Most involve only two sets.  The canonical correlation coefficient, rc, is similar to the Pearson product-moment correlation coefficient. The rule of thumb is to have values above 0.30. The squared value would represent less than 10% in overlapping variance between pairs of canonical variates. The linear combinations can be determined from linear matrix algebra or statistical software. For instance, SPSS software can test for significance of canonical correlation and will provide several additional tests.
The table below illustrates the correlation of sets of independent variables to sets of dependent variables. An industrial survey can be conducted to see if there is a correlation between the characteristics of a quality engineer to the listed job skills of a quality engineer. There may be a set of variables that are strongly correlated and canonical correlation can be used.

Hotelling’s T2 test is a t-test that is used on more than 2 variables at a time. The student t-test can also be used to compare 2 samples at a time, but if it is used to compare 5 samples, 2 at a time, the probability of obtaining a type one error is increased. That is, finding a significant difference when the two samples are the same. If a 5% error is used, the probability of obtaining such an error is 1 – 0.952p.
Where p is the number of samples. Hotelling’s T2 is the preferred and recommended test method.

MANOVA (Multiple Analysis of Variance)

An analysis of variance is used for many independent X variables to solve one dependent Y variable. This method tests whether the mean differences among groups on a single dependent Y variable is significant. For multiple independent X variables and multiple dependent Y factors, (that is, two or more Ys and one or more Xs), the multiple analysis of variance is used. MANOVA tests whether mean differences among groups of a combination of Ys are significant or not. The concept of various treatment levels and associated factors are still valid. The data should be normality distributed, have homogeneity of the covariance matrices, and have the independence of observations. ln ANOVA, a sum of squares is used for the treatments and for the error term. In MANOVA the terms become matrices of the “sum of squares and cross-products”(SSPCP). ANOVAs used multiple times across the dependent variables could result in inflated alpha errors. The MANOVA method is used to reduce the alpha risk by having only one test.

MANOVA Example

In an engineered plastics company, a multivariate experiment test was conducted having two independent variables (time and pressure of the extrusion process) at two levels, and three dependent responses (tensile strength, coefficient of friction, and bubble breaks). A MANOVA was conducted to test for relationships. The levels for the independent variables:
Time: high (+) equals 30 seconds, low (-) equals 10 seconds
Pressure: high (+) equals 80 psi, low (-) equals 20 psi

The shortened Minitab output for the MANOVA is presented in the table below. It only has three statistics tables for the responses and interactions. Minitab automatically inserts the four statistical tests (Wilks’, Lawley-Hotelling, Roy’s, and Pillai-Bartlett) and makes the analysis. The results indicate that both the factors, time and pressure are significant with p values much below 5%. The interaction of time x pressure is not significant. For simplicity, the extensive SSCP tables were not displayed. For the individual familiar with linear algebra and matrices, the manual y calculations can also be made.

Back to Home Page

If you need assistance or have any doubt and need to ask any questions contact us at You can also contribute to this discussion and we shall be happy to publish them. Your comment and suggestion are also welcome.

Analysis of Variance – ANOVA

The Analysis of Variance – ANOVA procedure is one of the most powerful statistical techniques. ANOVA is a general technique that can be used to test the hypothesis that the means among two or more groups are equal, under the assumption that the sampled populations are normally distributed. Suppose we wish to study the effect of temperature on a passive component such as a resistor. We select three different temperatures and observe their effect on the resistors. This experiment can be conducted by measuring all the participating resistors before placing resistors each in three different ovens. Each oven is heated to a selected temperature. Then we measure the resistors again after, say, 24 hours and analyze the responses, which are the differences between before and after being subjected to the temperatures. The temperature is called a factor. The different temperature settings are called levels. In this example, there are three levels or settings of the factor Temperature.

A factor is an independent treatment variable whose settings (values) are controlled and varied by the experimenter. The intensity setting of a factor is the level. Levels may be quantitative numbers or, in many cases, simply “present” or “not present” (“0” or “1”). For example, the temperature setting in the resistor experiment may be:100oF, 200oF and 300oF. We can simply call them: Level 1; Level 2 and Level 3

  • The 1-way ANOVA
    In the experiment above, there is only one factor, temperature, and the analysis of variance that we will be using to analyze the effect of temperature is called a one-way or one-factor ANOVA.
  • The 2-way or 3-way ANOVA
    We could have opted to also study the effect of positions in the oven. In this case, there would be two factors, temperature and oven position. Here we speak of a two-way or two-factor ANOVA. Furthermore, we may be interested in a third factor, the effect of time. Now we deal with a three-way or three-factor ANOVA. In each of these ANOVA’s, we test a variety of hypotheses of equality of means (or average responses when the factors are varied).

 ANOVA is defined as a technique where the total variation present in the data is portioned into two or more components having the specific source of variation. In the analysis, it is possible to attain the contribution of each of these sources of variation to the total variation. It is designed to test whether the means of more than two quantitative populations are equal. It consists of classifying and cross-classifying statistical results and helps in determining whether the given classifications are important in affecting the results.
The assumptions in the analysis of variance are:

  • Normality
  • Homogeneity
  • Independence of error

Whenever any of these assumptions are not met, the analysis of variance technique cannot be employed to yield valid inferences.

With the analysis of variance, the variations in response measurement are partitioned into components that reflect the effects of one or more independent variables. The variability of a set of measurements is proportional to the sum of squares of deviations used to calculate the variance:


Analysis of variance partitions the sum of squares of deviations of individual measurements from the grand mean (called the total sum of squares) into parts: the sum of squares of treatment means plus a remainder which is termed the experimental or random error. When an experimental variable is highly related to the response, it’s part of the total sum of the squares that will be highly inflated. This condition is confirmed by comparing the variable sum of squares with that of the random error sum of squares using an F test.

Why use Anova and Not Use t-test Repeatedly?

  • The t-test, which is based on the standard error of the difference between two means, can only be used to test differences between two means
  • With more than two means, could compare each means with each other mean using t-tests
  • Conducting multiple t-tests can lead to severe inflation of the Type I error rate (false positives) and is NOT RECOMMENDED.
  • ANOVA is used to test for differences among several means without increasing the Type I error rate
  • The ANOVA uses data from all groups to estimate standard errors, which can increase the power of the analysis

Why Look at Variance When Interested in Means?

  • Three groups tightly spread about their respective means, the variability within each group is relatively small
  • Easy to see that there is a difference between the means of the three groups
  • Three groups have the same means as in previous figure but the variability within each group is much larger
  • Not so easy to see that there is a difference between the means of the three groups
  • To distinguish between the groups, the variability between (or among) the groups must be greater than the variability of, or within, the groups
  • If the within-groups variability is large compared with the between-groups variability, any difference between the groups is difficult to detect
  • To determine whether or not the group means are significantly different, the variability between groups and the variability within groups are compared


Suppose there are k populations that are from a normal distribution with unknown parameters. A random sample X1, X2, X3……………… Xk is taken from these populations
which hold the assumptions. If μ1, μ2, μ3………… μk are k population means, the null hypothesis is:
H0 : μ1 = μ2 = μ3………… = μk (i.e. all means are equal)
HA : μ1 ≠ μ2 ≠ μ3………… ≠ μk  (i.e. all means are not equal)

The steps in carrying out the analysis are:

  1.  Calculate variance between the samples
    The variance between samples measures the difference between the sample mean of each group and the overall mean. It also measures the difference from one group to another. The sum of squares between the samples is denoted by SSB. For calculating variance between the samples, take the total of the square of the deviations of the means of various samples from the grand average and divide this total by the degree of freedom, k-1, where k = no. of samples.
  2. Calculate variance within samples
    The variance within samples measures the inter-sample or within-sample differences due to chance only. It also measures the variability around the mean of each group. The sum of squares within the samples is denoted by SSW. For calculating variance within the samples, take the total sum of squares of the deviation of various items from the mean values of the respective samples and divide this total by the degree of freedom, n-k, where n = total number of all the observations and k = number of samples.
  3. Calculate the total variance
    The total variance measures the overall variation in the sample mean. The total sum of squares of variation is denoted by SST. The total variation is calculated by taking the squared deviation of each item from the grand average and dividing this total by the degree of freedom, n-1 where n = total number of observations.
  4.  Calculate the F ratio
    It measures the ratio of between–column variance and within-column variance. If there is a real difference between the groups, the variance between groups will be significantly larger than the variance within the groups.
    F = ( Variance between the Groups ) / Variance within the Groups
    F = SSB / SSW
  5. Decision Rule
    At a given level of significance E =0.05 and at n-k and k-1 degrees of freedom, the value of F is tabulated from the table. On comparing the values, if the calculated value is greater than the tabulated value, reject the null hypothesis. That means the test is significant or there is a significant difference between the sample means.
  6. Applicability of ANOVA
    Analysis of variance has wide applicability from experiments. It is used for two different purposes:
    • It is used to estimate and test hypotheses about population means.
    • It is used to estimate and test hypotheses about population variances.

An analysis of variance to detect a difference in three or more population means first requires obtaining some summary statistics for calculating variance of a set of data as shown below:              


  1. ΣX2  is called the crude sum of squares
  2. (ΣX)2 / N is the CM (correction for the mean), or CF (correction factor)
  3. ΣX2 – (ΣX)2 / N is termed SS (total sum of squares, or corrected SS).

  4. In the one-way ANOVA, the total variation in the data has two parts: the variation among treatment means and the variation within treatments.
  5. The  grand average GM = ΣX/N
  6. The total SS (Total SS) is then:
    Total SS = Σ(Xi – GM)2 Where Xi is any individual measurement.
  7. Total SS = SST + SSE Where SST = treatment sum of squares and SSE is the experimental error sum of squares.
  8. Sum of the squared deviations of each treatment average from the grand average or grand mean.
  9. Sum of the squared deviations of each individual observation within a treatment from the treatment average.For the ANOVA calculations:
  10. Total Treatment CM  Σ(TCM)=
  11. SST = Σ(TCM) – CM
  12. SSE = Total SS – SST (Always obtained by difference)
  13. Total DF = N – 1 (Total Degrees of Freedom)
  14. TDF = K – 1 (Treatment DF = Number of treatments minus 1)
  15. EDF = (N – 1) – (K – 1) = N – K (Error DF, always obtained by difference)
  16. MST =SST/TFD=SST/(K-1) (Mean Square Treatments)
  17. MSE = SSE/EDF=SSE/(N-K)  (Mean Square Error)To test the null hypothesis:
  18. H0 : μ1 = μ2 = μ3………… = μk            H1 : At least one mean different
  19. F = MST/MSE         When F > Fα , reject H0

Example: As an example of a comparison of three means, consider a single factor experiment: The following coded results were obtained from a single factor randomized experiment, in which the outputs of three machines were compared. Determine if there is a significant difference in the results (α = 0.05).
ΣX=30    N=15           Total DF=N-1=15-1=14
GM = ΣX/N = 30/15 = 2.0
ΣX2  = 222                    CM=(ΣX)2/N=(30)2/15 =60
Total SS = ΣX2 – CM = 222 – 60 = 162
Σ(TCM) = 197.2
SST = Σ(TCM) – CM =197.2 – 60 = 137.2 and
SSE = Total SS – SST = 162 – 137.2 = 24.8

The completed ANOVA table is:

Since the computed value of F (33.2) exceeds the critical value of F, the null hypothesis is rejected. Thus, there is evidence that a real difference exists among the machine means.
σe is the pooled standard deviation of within treatments variation. It can also be considered the process capability sigma of individual measurements. It is the variation within measurements which would still remain if the difference among treatment means were eliminated.

EXAMPLE: The bursting strengths of diaphragms were determined in an experiment . Use analysis of variance techniques to determine if there is a difference at a level of 0.05.

The origination of these data could similarly be measurements from

  • Parts manufactured by 7 different operators
  •  Parts manufactured on 7 different machines
  • Time for purchase order requests from 7 different sites
  • Delivery time of 7 different suppliers

An analysis of variance tests the hypothesis for equality of treatment means, or it tests that the treatment effects are zero, which is expressed as

H0 : μ1 = μ2 = μ3………… = μk            H1 : At least one mean different

This analysis indicates that rejection of the null hypothesis is appropriate because the p-value is lower than 0.05.  The probability values for the test of homogeneity of variances indicates that there is not enough information to reject the null hypothesis of equality of variances. No pattern or outlier data are apparent in either the “residuals versus order of the data” or “residuals versus fitted values .” The normal probability plot and histogram indicate that the residuals may not be normally distributed. Perhaps a transformation of the data could improve this fit; however, it is doubtful that any difference would be large enough to be of practical importance .


The  analysis of variance  indicated that there was a significant difference in the bursting strengths of seven different types of rubber diaphragms (k = 7). We will now determine which diaphragms differ from the grand mean. A data summary of the mean and variance for each rubber type, each having four observations (n = 4), is

The overall mean is

The pooled estimate for the standard deviation is

The number of degrees of freedom is (n – 1)k = (4 – 1)(7) = 21 . For a significance level of 0.05 with 7 means and 21 degrees of freedom, it is determined by interpolation from Table below that h0.05 = 2.94. The upper and lower decision lines are then


It will be seen that the two-way analysis procedure is an extension of the patterns described in the one-way analysis. Recall that a one-way ANOVA has two components of variance: Treatments and experimental error (may be referred to as columns and error or rows and error). In the two-way ANOVA, there are three components of variance: Factor A treatments, Factor B treatments, and experimental error (may be referred to as columns, rows, and errors).

In a two-way analysis of variance, the treatments constitute different levels affected by more than one factor. For example, sales of car parts, in addition to being affected by the point of sale display, might also be affected by the price charged, the location of the store, and the number of competitive products. When two independent factors have an effect on the dependent factor, analysis of variance can be used to test for the effects of two factors simultaneously. Two sets of hypotheses are tested with the same data at the same time.
Suppose there are k populations that are from a normal distribution with unknown parameters. A random sample X1, X2, X3……………… Xk is taken from these populations which hold the assumptions. The null hypothesis for this is that all population means are equal against the alternative that the members of at least one pair are not equal. The hypothesis follows:
H0 : μ1 = μ2 = μ3………… = μk
HA : Not all means μj are Equal.

If the population means are equal, each population effect is equal to zero against the alternatives. The test hypothesis is

H0 : β1 = β2 = β3………… = βk
HA : Not all means βj are Equal.

  1. Calculate variance between the rows
    The variance between rows measures the difference between the sample mean of each row and the overall mean. It also measures the difference from one row to another. The sum of squares between the rows is denoted by SSR. For calculating variance between the rows, take the total of the square of the deviations of the means of various sample rows from the grand average and divide this total by the degree of freedom, r-1 , where r= no. of rows.
  2. Calculate variance between the columns
    The variance between columns measures the difference between the sample mean of each column and the overall mean. It also measures the difference from one column to another. The sum of squares between the columns is denoted by SSC. For calculating variance between the columns, take the total of the square of the
    deviations of the means of various sample columns from the grand average and divide this total by the degree of freedom, c-1, where c= no. of columns.
  3. Calculate the total variance
    The total variance measures the overall variation in the sample mean. The total sum of squares of variation is denoted by SST. The Total variation is calculated by taking the squared deviation of each item from the grand average and divide this total by the degree of freedom, n-1 where n= total number of observations.
  4.  Calculate the variance due to error
    The variance due to error or Residual Variance in the experiment is by chance variation. It occurs when there is some error in taking observations, or making calculations, or sometimes due to lack of information about the data. The sum of squares due to error is denoted by SSE. It is calculated as:
    Error Sum of Squares = Total Sum of Squares – Sum of Squares between Columns – Sum of Squares Between Rows.
    The degree of freedom, in this case, will be (c-1)(r-1).
  5. Calculate the F Ratio
    It measures the ratio of between–column variance and within-row variance with variance due to error.
    F = Variance between the Columns / Variance due to Error
    F = SSC / SSE
    F = Variance between the Rows / Variance due to Error
    F = SSR / SSE
  6.  Decision Rule At a given level of significance α=0.05 and at n-k and k-1 degrees of freedom, the value of F is tabulated from the table. On comparing the values, if the calculated value is greater than the tabulated value, reject the null hypothesis. This means that the test is significant or, there is a significant difference between the sample means.

Example: Three different subjects were taught by two different instructors to three different students with the following results. The responses are examination results as a percentage. The null hypothesis: instructor and subject  means do not differ.

ΣX=1190    N=18           Total DF=N-1=18-1=17
GM = ΣX/N = 1190/18 = 266.11
ΣX2  = 81844                    CM=(ΣX)2/N=(1190)2/18 =78672.22
Total SS = ΣX2 – CM = 81844 – 78672.22 = 3171.78
ColSq = column total squared and divided by the no. of observations in the column
RowSq = row total squared and divided by the no. of observations in the row

SSCol = ΣColSq – CM = 79544.67 – 78672.22 = 872.44
SSRow = ΣRowSq – CM = 80677.78 – 78672.22 = 2005.56
SSE = Total SS – SSCol – SSRow = 3171.78 – 872.44 – 2005.56 = 293.78
The next step is to construct the ANOVA table.

anova 4

If no interaction: Col DF=Col-1 =3-1=2 Row DF=Row-1=2-1 =1
ErrorDF=Total DF-Col DF-Row DF=17-2-1=14
Col F = MSCol/MSE = 436.22/20.98 = 20.79. This is larger than critical F = 3.74. Therefore, the null hypothesis of equal material means is rejected.

Row F = MSRow/MSE = 2005.56/20.98 = 95.59. This is larger than critical F = 4.60. Therefore, the null hypothesis of equal instructor means is rejected.

The difference between total sigma (13.66) and error sigma (4.58) is due to the significant difference in instructor means and material means. If the instructor and study material differences were only due to chance causes, the sigma variation in the data would be equal to SlGe, the square root of the mean square error.

It should be noted, in the example above, the data was listed in six cells. That is six experimental combinations. There were also 3 replications (students) in each cell (k = 3). When k is greater than 1 in a 2 factor ANOVA, there is the opportunity to analyze for a possible interaction between the two factors.

Interaction effect:

A similar analysis pattern is noted here. The data in each cell is summed, and that total is divided by the number of observations in that cell.

CellSq = (Sumcell)2/k            InterSq = Σ(CellSq)
SSInter = lnterSq – CM – SSCol – SSRow

For the sum of squares interaction (SS Inter), it is not enough to just subtract the correction for the mean (CM) as was done to determine the main effects of SSCol and SSRow. This is because the data is replicated cells is affected by the treatment levels of the two factors of which it is a part as well as a possible interaction effect. To net out the interaction effect, it is necessary to also subtract the sum of squares column and row factors previously calculated. The cell-by-cell calculations are shown below

Σ(CellSq) = (264)2/3 + (202)2/3 + (224)2/3 + (186)2/3 + (146)2/3 + (168)2/3
Σ(CellSq) = 23232 + 13601.33 + 16725.33 + 11532 + 7105.33 + 9408
Σ(CellSq) = 81604

SSInter = 81604 – 78672.22 – 872.44 – 2005.56 = 53.78
SSError = TotSS – SSCol – SSRow – SSInter
SSError = 3171.78 – 872.44 – 2005.56 – 53.78 = 240
The null hypothesis for the interaction effect is that there is no interaction. See the revised ANOVA table below:
With interaction: Replications per cell = k = 3
CoIDF=Col-1=3-1=2 RowDF=Row-1=2-1=1
Inter DF = (Col – 1)(Row – 1) = (3 -1)(2 – 1) = 2
Error DF = Total DF -Col DF – Row DF – Inter DF =17 -2 -1- 2 =12
The interaction calculated F (1.34) is less than critical F (3.89). The null hypothesis of no interaction is not rejected. There is an advantage in analyzing for possible interaction if the opportunity exists. The more effects which are significant, the greater the amount of total variation which is explained and the smaller the MS error(unexplained variation). As the MS error is the divisor in the F ratio, a smaller MS error increases the sensitivity of testing effects.

Components of Variance

The analysis of variance can be extended with a determination of the COV (components of variance). The COV table uses the MS (mean square), F, and F(alpha) columns from the previous ANOVA TABLE and adds columns for EMS (expected mean square), variance, adjusted variance, and percent contribution to design data variation. The model for the ANOVA is: -The model states that any measurement (X) represents the combined effect of the population mean (μ), the different Subject (M), the different instructors (I), the Subject/instructor interaction (MI), and the experimental error (2). Where: I represent materials at 3 Levels, j represents instructors at 2 levels, k represents 3 replications per cell.
The variance coefficients are equal to the number of values use in  calculating the  respective MS. Subject coef = k x Row = 3 x 2 = 6, instructors coef = k x Col = 3 x 3 = 9 Interaction coef = k = 3. The general variance equation is given by:
Effect Variance = (MS Effect – MS Error)/(Variance Coefficient)
M Var = (436.22 – 20)/6 = 69.37

 I Var = (2005.56 – 20)/9 = 220.62
MI Var = (26.89 – 20)/3 = 2.30

  Error Var = 20
Material differences are significant and account for 22.21% of the variation in the data. Instructor differences are significant and account for 70.65% of the variation in the data. The Subject/ instructor interaction is not significant and shows as a negligible contribution. Experimental error accounts for only 6.40% of the total variation. The reason for the adjusted variance column is that variance calculations are negative when the mean square effect is less than the mean square error. Negative mean squares are considered to have a value of 0. Knowing the percent contribution aids in establishing priorities when taking improvement actions.

EXAMPLE: A battery is to be used within a device that is subjected to extreme temperature variations. At some point in time during development, an engineer can only select one of three plate material types. After product shipment the engineer has no control over temperature; however, he/she believes that temperature could degrade the effective life of the battery. The engineer would like to determine if one of the material types is robust to temperature variations. The table below describes the observed effective life (hours) of this battery at controlled temperatures within a laboratory

Using an α = 0.05 criterion, we conclude that there is a significant interaction between material types and temperature because its probability value is less than 0.05 [and F0 > (F0.05,4,27 = 2.73)]. We also conclude that the main effects of material type and temperature are also significant because each of their probabilities is less than 0.05 [and F0 > (F0.05,4,27 = 3.35)].

A plot of the average response at each factor level is shown in Figure above, which aids the interpretation of experimental results. The significance of the interaction term in our model is shown as the lack of parallelism of these lines. From this plot, we note a degradation in life with an increase in temperature regardless of material type. If it is desirable for this battery to experience less loss of life at elevated temperature, type 3 material seems to be the best choice of the three materials. Whenever there is a difference in the rows’ or columns’ means, it can be beneficial to make additional comparisons. This analysis shows these differences; however, the significance of the interaction can obscure comparison tests. One approach to address this situation is to apply the test at only one level of a factor at a time.
Using this strategy, let us examine the data for significant differences at 70°F (i .e ., level 2 of temperature). We can use ANOM techniques to gain insights into factor levels relative to the grand mean. The ANOM output shown in the Figure below indicates that material types 1 and 3 are different from the grand mean.

Tukey’s multiple comparison test shown below indicates that for a temperature level of 70°F the mean battery life between material types 2 and 3 cannot be shown differently. In addition, the mean battery life for material type 1 is significantly lower than that of both battery types 2 and 3.

The coefficient of determination (R2) can help describe the amount of variability in battery life explained by battery material, temperature, and the interaction of the material with temperature. From the analysis of variance output, we note

SSmodel = SSmaterial + SStemperature + SSinteraction
= 10,683 + 39,118 + 9613
= 59,414

which results in

From this, we conclude that about 77% of the variability is described by our model factors. The adequacy of the underlying model should be checked before the adoption of conclusions. The figure below gives a normal plot of the residuals and a plot of residuals versus the fitted values for the analysis of variance analysis. The normal probability plot of the residuals does not reveal anything of particular concern. The residual plot of residuals versus fitted values seems to indicate a mild tendency for the variance of the residuals to increase as battery life increases. The residual plots of battery type and temperature seem to indicate that material type 1 and low temperature might have more variability. However, these problems, in general, do not appear to be large enough to have a dramatic impact on the analysis and conclusions.

ANOVA Table for an A x B Factorial Experiment

In a factorial experiment involving factor A at a level and factor B at b levels, the total sum of squares can be partitioned into:
Total SS = SS(A) + SS(B) + SS(AB) + SSE

ANOVA Table for a Randomized Block Design

The randomized block design implies the presence of two independent variables, blocks, and treatments. The total sum of squares of the response measurements can be partitioned into three parts, the sum of the squares for the blocks, treatments, and error. The analysis of a randomized block design is of less complexity than an A x B factorial experiment.
Goodness-of-Fit Tests

GOF (goodness-of-fit) tests are part of a class of procedures that are structured in cells. In each cell, there is an observed frequency, (Fo). From the nature of the problem, one either knows the expected or theoretical frequency, (Fe) or can calculate it. Chi-square (χ2) is then summed across all cells according to the  formula: The calculated chi-square is then compared to the chi-square critical value for the following appropriate degrees of freedom:

Uniform Distribution (GOF):

Example: Is a game die balanced? The null hypothesis, H0, states the die is honest and balanced. When a die is rolled, the expectation is that each side should come up an equal number of times. It is obvious there will be random departures from this theoretical expectation if the die is honest. A die was tossed 48 times with the following results:

The calculated chi-square is 8.75. The critical chi-square χ20.05,5 = 11.07. The calculated chi-square does not exceed the critical chi-square. Therefore, the hypothesis of an honest die cannot be rejected. The random departures from theoretical expectation could well be explained by chance cause.

Normal Distribution (GOF):

Example:     The following data (105 observations) is taken from an – R chart. There is sufficient data for ten cells. The alternative would be six cells which are too few. Twelve integer cells fit the range of the data. The null hypothesis: the data was obtained from a normal distribution.

 = 15.4, sigma = 1.54, number of effective cells = 6, DF = 3 and χ20.05,3 = 7.81

One degree of freedom is lost because estimates μ. The second degree of freedom is lost because SD estimates sigma. The third degree of freedom is lost because sample N represents the population.

  • Col A: The cell boundaries are one half unit from the cell midpoint.
  • Col B: The cell middle values are integers.
  • Col C: The observed frequencies in each cell are Fo.
  • Col D: Distances from are measured from cell boundaries.
  • Col E: Distances from are divided by SD to transform distances into 2 units.
  • Col F: 2 units are converted into cumulative normal distribution probabilities.
  • Col G: The theoretical probability in each cell is obtained by taking the difference between cumulative probabilities in Column F. The top cell theoretical  probability boundary is 1.0000.
  • Col H: The theoretical frequency in each cell is the product of N and Column G.
  • Col l: Each cell is required to have a theoretical frequency equal to or greater than four. Therefore, the top four cells must be added to the cell whose midpoint is 18. The bottom three cells must be added to the cell whose midpoint is 13. Thus, there are six effective cells, all of which have a theoretical frequency equal to or greater than four.
  • Col J: The observed frequency cells must be pooled to match the theoretical frequency cells. It does not matter if the observed frequencies are less than four.
  • Col K: The contributions to chi square are obtained by squaring the difference between Column I and Column J and dividing by Column l.

Conclusion: Since the calculated chi-square, 6.057, is less than the critical chi-square, 7.81, we fail to reject the null hypothesis of normality, and therefore, conclude that the data is from a normal distribution.

Poisson Distribution (GOF)

Example: The bead drum is an attribute variable, random sample generating device, which was used to obtain the following data. In this exercise red beads represent defects. Seventy-five constant size samples were obtained. The goodness-of-fit test is analyzed based on sample statistics. The null hypothesis is that the bead drum samples represent a Poisson distribution.
N = 75
Sample Avg = 269/75 = 3.59
DF = 7 – 2 = 5
χ20.05,5 = 11.07
One degree of freedom is lost because  (sample average = 3.59) estimates μ. The second degree of freedom is lost because N (number of samples) estimates the population.

  • Col A: Values of c which matched the actual distribution of sample defects found.
  • Col B: The probability that c defects would occur given the average value of the samples.
  • Col C: The theoretical number of defects that would occur (N x Col B).
  • Col D: The observed frequency of each number of defects.
  • Col E: The required minimum frequency of four for each effective cell resulted in pooling at both tails of the theoretical Poisson distribution.
  • Col F: The observed frequency distribution of defects must also be pooled to match the effective theoretical distribution.
  • Col G: The contributions to chi square are obtained from squaring the difference between Fe and Fo and dividing the result by Fe.
  • Col H: Total defects found result from the product of number of defects and observed frequency.

Conclusion: Since the calculated chi-square of 4.47 is less than the critical chi-square value of 11.07 at the 95% confidence level, we fail to reject the null hypothesis that the bead drum samples represent a Poisson distribution.

Binomial Distribution (GOF)

Example: The null hypothesis states that the following industrial sample data comes from a binomial population of defectives (N = 80). In this case, we will estimate the probability of a defective from the sample data, p = 0.025625.
One degree of freedom is. lost because the total sample frequency represents the population. The second degree of freedom is lost because  is used to estimate μ :

  • Col A: The range of defectives matching the observed sample data.
  • Col B: The probability of observed cell defective count given sample size N and d.
  • Col C: The expected theoretical frequency (cell probability)(N).
  • Col D: The observed cell frequency count from the 80 samples.
  • Col E: Theoretical frequency with cells pooled to meet n = 4 minimum.
  • Col F: Observed cell frequency pooled to match theoretical frequency pooled cells.
  • Col G: Contributions to chi square (Fe – Fo)2/Fe.
  • Col H: The count of defectives by cell (d)(Fo).
    Conclusion: The calculated chi square = 13.30. The critical chi square = 9.49. Since the calculated value is greater than the critical value, the null hypothesis that the sample data represents the binomial distribution is rejected at the 95% confidence level.

Contingency Tables

A two-way classification table (rows and columns) containing original frequencies can be analyzed to determine whether the two variables (classifications) are independent or have significant associations. R. A. Fisher determined that when the marginal totals (of rows and columns) are analyzed in a certain way, that the chi-square procedure will test whether there is a dependency between the two classifications. In addition, a contingency coefficient (correlation) can be calculated. If the chi-square test shows a significant dependency, the contingency coefficient shows the strength of the correlation. It often happens that results obtained in samples do not always agree exactly with the theoretically expected results according to rules of probability. A measure of the difference found between observed and expected frequencies is supplied by the  statistic chi-square, χ2, where:
If χ2 = 0, the observed and theoretical frequencies agree exactly. If χ2 > 0, they do not agree exactly. The larger the value of χ2, the greater the discrepancy between observed and theoretical frequencies. The chi-square distribution is an appropriate reference distribution for critical values when the expected frequencies are at least equal to 5.
Example: The calculation for the E (expected or theoretical) frequency will be demonstrated in the following example. Five hospitals tried a new drug to alleviate the symptoms of emphysema. The results were classified at three levels: no change, slight improvement, marked improvement. The percentage matrix is shown in the table below. While the results expressed as percentages do suggest differences among hospitals, ratios presented as percentages can be misleading.
A proper analysis requires that original data be considered as frequency counts. The table below lists the original data on which the percentages are based. The calculation of expected, or theoretical, frequencies is based on the marginal totals. The marginal totals for the frequency data are the column totals, the row totals, and the grand total. The null hypothesis is that all hospitals have the same proportions over the three levels of classifications. To calculate the expected frequencies for each of the 15 cells under the null hypothesis requires the manipulation of the marginal totals as illustrated by the following calculation for one cell. Consider the count of 15 for the Hospital Alno change cell. The expected value, E, is:

The same procedure repeated for the other 14 cells yields

Each of these 15 cells makes a contribution to chi-square (χ2). For the same selected (illustrative) cell, the contribution is ChI Square over all cells. 

Assume alpha to be 0.01. The degrees of freedom for contingency tables is: d.f. = (rows – 1) x (columns -1).

For this example: d.f. = (5 – 1) x (3 – 1) = 8

The critical chi square: χ20.01,8 = 20.09
The calculated chi-square is larger than the critical chi-square. Therefore, one rejects the null hypothesis of hospital equality of results. The alternative hypothesis is that hospitals differ.

Coefficient of Contingency (C)

The degree of relationship, association or dependence of the classifications in a contingency table is by where N equals the grand frequency total.
The contingency coefficient is:

The maximum value of C is never greater than 1.0, and is dependent on the total number of rows and columns. For the example data, the maximum coefficient of contingency is:Where: k = min of (r, c) and r = rows, c = columns
There is a Yates correction for continuity test that can be performed when the contingency table has exactly two columns and two rows. That is, the degrees of freedom is equal to 1.

Correlation of Attributes

Contingency table classifications often describe characteristics of objects or individuals. Thus, they are often referred to as attributes and the degree of dependence, association, or relationship is called correlation of attributes. For (k = r = c) tables, the correlation coefficient, φ, is defined as: The value of φ falls between 0 and 1. If the calculated value of chi-square is significant, then φ is significant. In the above example, rows and columns are not equal and the correlation calculation is not applied.

 Back to Home Page

If you need assistance or have any doubt and need to ask any question  contact us at: You can also contribute to this discussion and we shall be happy to publish them. Your comment and suggestion is also welcome.

Hypothesis Testing

Hypothesis testing helps an organization determine whether making a change to a process input (x) significantly changes the output (y) of the process. It statistically determine if there are differences between two or more process outputs. Hypothesis testing is used to help determine if the variation between groups of data is due to true differences between the groups or is the result of common cause variation, which is the natural variation in a process.

This tool is most commonly used in the Analyze step of the DMAIC method to determine if different levels of a discrete process setting (x) result in significant differences in the output (y). An example would be “Do different regions of the country have different defect levels?” This tool is also used in the Improve step of the DMAIC method to prove a statistically significant difference in “before” and “after” data. It Identifies whether a particular discrete x has an effect on the y. This also checks for the statistical significance of differences. In other words, it helps determine if the difference observed between groups is bigger than what you would expect from common-cause variation alone. This gives  a p-value, which is the probability that a difference you observe is as big as it is only because of common-cause variation. It can be  used to compare two or more groups of data, such as “before” and “after” data.

Hypothesis testing assists in using sample data to make decisions about population parameters such as averages, standard deviations, and proportions. Testing a hypothesis using statistical methods is equivalent to making an educated guess based on the probabilities associated with being correct. When an organization makes a decision based on a statistical test of a hypothesis, it can never know for sure whether the decision is right or wrong, because of sampling variation. Regardless how many times the same population is sampled, it will never result in the same sample mean, sample standard deviation, or sample proportion. The real question is whether the differences observed are the result of changes in the population, or the result of sampling variation. Statistical tests are used because they have been designed to minimize the number of times an organization can make the wrong decision. There are two basic types of errors that can be made in a statistical test of a hypothesis:

  1. A conclusion that the population has changed when in fact it has not.
  2. A conclusion that the population has not changed when in fact it has.

The first error is referred to as a type I error. The second error is referred to as a type II error. The probability associated with making a type I error is called alpha (α) or the α risk. The probability of making a type II error is called beta (β) or the β risk. If the α risk is 0.05, any determination from a statistical test that the population has changed runs a 5% risk that it really has not changed. There is a 1 – α, or 0.95, confidence that the right decision was made in stating that the population has changed. If the β risk is 0.10, any determination from a statistical test that there is no change in the population runs a 10% risk that there really may have been a change. There would be a 1 – β, or 0.90, “power of the test,” which is the ability of the test to detect a change in the population. A 5% α risk and a 10% β risk are typical thresholds for the risk one should be willing to take when making decisions utilizing statistical tests. Based upon the consequence of making a wrong decision, it is up to the Black Belt to determine the risk he or she wants to establish for any given test, in particular the α risk. β risk, on the other hand, is usually determined by the following:

  • δ: The difference the organization wants to detect between the two population parameters. Holding all other factors constant, as the δ increases, the β decreases.
  • σ: The average (pooled) standard deviation of the two populations. Holding all other factors constant, as the σ decreases, the β decreases.
  • n: The number of samples in each data set. Holding all other factors constant, as the n increases, the β decreases.
  • α: The alpha risk or decision criteria. Holding all other factors constant, as the α decreases, the β increases.

Most statistical software packages will have programs that help determine the proper sample size, n, to detect a specific δ, given a certain σ and defined α and β risks.


How does an organization know if a new population parameter is different from an old population parameter? Conceptually, all hypothesis tests are the same in that a signal (δ)-to-noise (σ) ratio is calculated (δ/σ) based on the before and after data. This ratio is converted into a probability, called the p-value, which is compared to the decision criteria, the α risk. Comparing the p-value (which is the actual α of the test) to the decision criteria (the stated α risk) will help determine whether to state the system has or has not changed.
Unfortunately, a decision in a hypothesis can never conclusively be defined as a correct decision. All the hypothesis test can do is minimize the risk of making a wrong decision. Conducting a hypothesis test is analogous to a prosecuting attorney trying a case in a court of law. The objective of the prosecuting attorney is to collect and present enough evidence to prove beyond a reasonable doubt that a defendant is guilty. If the attorney has not done so, then the jury will assume that not enough evidence has been presented to prove guilt; therefore, they will conclude the defendant is not guilty. If one want to wants to make a change to an input (x) in an existing process to determine a specified improvement in the output (y), he or she will need to collect data after the change in x to demonstrate beyond some criteria (the α risk) that the specified improvement in y was achieved.

The following steps describe how to conduct a hypothesis test

  1.  Define the problem or issue to be studied.
  2.  Define the objective.
  3. State the null hypothesis, identified as H0.
    The null hypothesis is a statement of no difference between the before and after states (similar to a defendant being not guilty in court).
    H0: μbefore = μafter
    The goal of the test is to either reject or not reject H0.
  4. State the alternative hypothesis, identified as Ha.
    • The alternative hypothesis is what one is trying to prove and can be one of the following:
    • Ha: μbefore    μafter (a two-sided test)
    • Ha: μbefore < μafter (a one-sided test)
    • Ha: μbefore > μafter (a one-sided test)
    • The alternative chosen depends on what one is trying to prove. In a two-sided test, it is important to detect differences from the hypothesized mean, μbefore, that lie on either side of μbefore. The α risk in a two-sided test is split on both sides of the histogram. In a one-sided test, it is only important to detect a difference on one side or the other.
  5. Determine the practical difference (δ).
    The practical difference is the meaningful difference the hypothesis test should detect.
  6. Establish the α and β risks for the test.
  7. Determine the number of samples needed to obtain the desired β risk. Remember that the power of the test is (1-β).
  8. Collect the samples and conduct the test to determine a p-value.
    Use a software package to analyze the data and determine a p-value.
  9. Compare the p-value to the decision criteria (α risk) and determine whether to reject H0 in favor of Ha, or not to reject H0.
    • If the p-value is less than the α risk, then reject H0 in favor Ha.
    • If the p-value is greater than the α risk, there is not enough evidence to reject H0.

Depending on the population parameter of interest there are different types of hypothesis tests; these types are described in the following table.The table is divided into two sections: parametric and non-parametric. Parametric tests are used when the underlying distribution of the data is known or can be assumed (e.g., the data used for t-testing should subscribe to the normal distribution). Non-parametric tests are used when there is no assumption of a specific underlying distribution of the data.1

Terminology used in Hypothesis Testing

A number of commonly used hypothesis test terms are presented below.

  1. Null Hypothesis

    This is the hypothesis to be tested. The null hypothesis directly stems from the problem statement and is denoted as H0. Examples:-

    • If one is investigating whether a modified seed will result in a different yield/acre, the null hypothesis (two-tail) would assume the yields to be the same H0: Ya = Yb.
    •  If a strong claim is made that the average of process A is greater than the average of process B, the null hypothesis (one-tail) would state that process A ≤ process B. This is written as H0: A ≤ B.

    The procedure employed in testing a hypothesis is strikingly similar to a court trial. The hypothesis is that the defendant is presumed not guilty until proven guilty. However, the term innocent does not apply to a null hypothesis. A null hypothesis can only be rejected, or fail to be rejected, it cannot be accepted because of a lack of evidence to reject it. If the means of two populations are different, the null hypothesis of equality can be rejected if enough data is collected. When rejecting the null hypothesis, the alternate hypothesis must be accepted.

  2. Test Statistic

    In order to test a null hypothesis, a test calculation must be made from sample information. This calculated value is called a test statistic and is compared to an appropriate critical value. A decision can then be made to reject or not reject the null hypothesis.

  3. Types of Errors

    When formulating a conclusion regarding a population based on observations from a small sample, two types of errors are possible:

    • Type I error: This error occurs when the null hypothesis is rejected when it is, in fact, true. The probability of making a type I error is called α (alpha) and is commonly referred to as the producer’s risk (in sampling). Examples are:
      incoming products are good but called bad; a process change is thought to be different when, in fact, there is no difference.
    • Type II error: This error occurs when the null hypothesis is not rejected when it should be rejected. This error is called the consumer’s risk (in sampling) , and is denoted by the symbol β (beta). Examples are: incoming products are bad, but called good; an adverse process change has occurred but is thought to be no different.

    The degree of risk (α) is normally chosen by the concerned parties (α is normally taken as 5%) in arriving at the critical value of the test statistic. The assumption is  that a small value for α is desirable. Unfortunately, a small α risk increases the β risk. For a fixed sample size, α and β are inversely related. Increasing the sample size can reduce both the α and β risks.1

    Any test of hypothesis has a risk associated with it and one is generally concerned with the or risk (a type I error which rejects the null hypothesis when it is true). The level of this α risk determines the level of confidence (1 – α) that one has in the conclusion. This risk factor is used to determine the critical value of the test statistic which is compared to a calculated value.

  4. One-Tail Test

    If a null hypothesis is established to test whether a sample value is smaller or larger than a population value, then the entire or risk is placed on one end of a distribution curve. This constitutes a one-tail test.

    • A study was conducted to determine if the mean battery life produced by a new method is greater than the present battery life of 35 hours. In this case, the entire or risk will be placed on the right tail of the existing life distribution curve.
      H0: new< or = to present                   H1: new>present
      1Determine if the true mean is within the or critical region.
    • A chemist is studying the vitamin levels in a brand of cereal to determine if the process level has fallen below 20% of the minimum daily requirement. It is the manufacturer’s intent to never average below the 20% level. A one-tail test would be applied in this case, with the entire at risk on the left tail.
      H0: level > or = 20%                   H1: level < 20%Determine if the true mean is within the α  critical region.
  5. Two-Tail Test

    If a null hypothesis is established to test whether a population shift has occurred, in either direction, then a two-tail test is required. The allowable α error is generally divided into two equal parts. Examples:

    • An economist must determine if unemployment levels have changed significantly over the past year.
    • A study is made to determine if the salary levels of company A differ significantly from those of company B.

    H0: levels are =                                                     H1: levels are ≠
    1Determine if the true mean is within either the upper or lower α critical regions.

  6. Practical Significance vs. Statistical Significance

    The hypothesis is tested to determine if a claim has significant statistical merit. Traditionally, levels of 5% or 1% are used for the critical significance values. If the calculated test statistic has a p-value below the critical level then it is deemed to be statistically significant. More stringent critical values may be required when human injury or catastrophic loss is involved. Less stringent critical values may be advantageous when there are no such risks and the potential economic gain is high. On occasion, an issue of practical versus statistical significance may arise. That is, some hypothesis or claim is found to be statistically significant, but may not be worth the effort or expense to implement. This could occur if a large sample was tested to a certain value, such as a diet that results in a net loss of 0.5 pounds for 10,000 people. The result is statistically significant, but a diet losing 0.5 pounds per person would not have any practical significance. The  issues of practical significance will often occur if the sample size is not adequate. A power analysis may be needed to aid in the decision- making process.

  7. Power of Test H0 : μ = μ0

    Consider a null hypothesis that a population is believed to have mean μ0= 70.0 and σx = 0.80. The 95% confidence limits are 70±(1.96)(0.8) = 71.57 and 68.43. One accepts the hypothesis μ = 70 if (X-bar)s are between these limits. The alpha risk is that  sample means will exceed those limits. One can ask “what if” questions such as, “What if” μ shifts to 71, would it be detected?” There is a risk that the null hypothesis would be accepted even if the shift occurred. This risk is termed β. The value of β is large if μ is close to μ0 and small if μ is very different from μ0. This indicates that slight differences from the hypothesis will be difficult to detect and large differences will be easier to detect. The normal distribution curves below show the null and alternative hypotheses. If the process shifts from 70 to 71, there is a 76% probability that it would not be detected.1

    To construct a power curve, 1 – β is plotted against alternative values of μ. The power curve for the process under discussion is shown below. A shift in a mean away from the null increases the probability of detection. In general, as alpha increases, beta decreases and the power of 1 – β increases. One can say that a gain in power can be obtained by accepting a lower level of protection from the alpha error. Increasing the sample size makes it possible to decrease both alpha and beta and increase power.


    The concept of power also relates to experimental design and analysis of variance.
    The following equation briefly states the relationship for ANOVA.
    1 – β = P(Reject H0 /H0 is false)
    1 – β = Probability of rejecting the null hypothesis given that the null hypothesis is false.

  8. Sample Size

    In the statistical inference discussion thus far, it has been assumed that the sample size (n) for hypothesis testing has been given and that the critical value of the test statistic will be determined based on the α error that can be tolerated. The ideal procedure, however, is to determine the α and β error desired and then to calculate the sample size necessary to obtain the desired decision confidence.

    The sample size (n) needed for hypothesis testing depends on:

    • The desired type I (α) and type II (β) risk
    • The minimum value to be detected between the population means (μ – μ0)
    • The variation in the characteristic being measured (S or σ)

    Variable data sample size, only using a, is illustrated by the following: Assume in a pilot process one wishes to determine whether an operational adjustment will alter the process hourly mean yield by as much as 4 tons per hour. What is the minimum sample size which, at the 95% confidence level (Z=1.96), would confirm the significance of a mean shift greater than 4 tons per If hour? Historic information suggests that the standard deviation of the hourly output  is 20 tons. The general sample size equation for variable data (normal distribution) is:1

    Obtain 96 pilot hourly yield values and determine the hourly average. If this mean deviates by more than 4 tons from the previous hourly average, a significant change at the 95% confidence level has occurred. If the sample mean deviates by less than 4 tons/hr, the observable mean shift can be explained by chance cause.

    For binomial data, use the following formula:1

  9. Estimators

    In analyzing sample values to arrive at population probabilities, two major estimators are used: point estimation and interval estimation. For example, Consider the following tensile strength readings from 4 piano wire segments: 28.7, 27.9, 29.2 and 26.5 psi. Based on this data, the following expressions are true:

    1. Point estimation: If a single estimate value is desired (i.e., the sample average), then a point estimate can be obtained.128.08 psi is the point estimate for the population mean.
    2. Interval Estimate or Cl (Confidence Interval): From sample data one can calculate the interval within which the population mean is predicted to fall. A Confidence intervals are always estimated for population parameters and, in general, are derived from the mean and standard deviation of sample data. For small samples, a critical value from the t distribution is required and for 95% confidence, t = 3.182 for n-1 degrees of freedom. The CI equation and interval would be:
      If the population sigma is known (say σ = 2 psi), the Z distribution is used. The critical Z value for 95% confidence is 1.96. The CI equation and interval would be:1A confidence interval is a two-tail event and requires critical values based on an alpha/2 risk in each tail.  Other confidence interval formulas exist. These include percent nonconforming, Poisson distribution data and very small sample size data.
  1. Confidence Intervals for the Mean

    1. Continuous Data – Large Samples 

      Use the normal distribution to calculate the confidence interval for the mean.1Example: The average of 100 samples is 18 with a population standard deviation of 6. Calculate the 95% confidence interval for the population mean.1

    2. Continuous Data – Small Samples

      If a relatively small sample is used (<30) then the t distribution must be used.1Example : Use the same values as in the prior example except that the sample size is 25.1

  2. Confidence Intervals for Variation

    The confidence intervals for the mean were symmetrical about the average. This is  not true for the variance, since it is based on the chi square distribution. The formula is :1Example: The sample variance for a set of 25 samples was found to be 36. Calculate the 90% confidence interval for the population variance.1

  3. Confidence Intervals for Proportion

    For large sample sizes, with n(p) and n(1-p) greater than or equal to 4 or 5, the normal distribution can be used to calculate a confidence interval for proportion. The following formula is used:1Example: If 16 defectives were found in a sample size of 200 units, calculate the 90% confidence interval for the proportion.

Hypotheses Tests for Comparing  Single Population

We begin by considering hypothesis tests to compare parameters of a single population, such as , and fraction defective p, to specified values. For example, viscosity may be an important characteristic in a process validation experiment and we may want to determine if the population standard deviation of viscosity is less than a certain value or not. Additional examples of such comparisons are suggested by the following questions.

  1. Is the process centered on target? Is the measurement bias acceptable?
  2. Is the measurement standard deviation less than 5% of the specification width? Is the process standard deviation less than 10% of the specification width?
  3. Let p denote the proportion of objects in a population that possess a certain property such as products that exceed a certain hardness, or cars that are domestically manufactured. Is this proportion p greater than a certain specified value?

Comparing Mean (Variance Known)

  1. Z Test

    When the population follows a normal distribution and the population standard deviation, σx, is known, then the hypothesis tests for comparing a population mean, μ, with a fixed value, μ0, are given by the following:

    • H0: μ = μ0                  H1: μ ≠ μ0
    • H0: μ ≤ μ0                  H1: μ> μ0
    • H0: μ ≥ μ0                  H1: μ < μ0

    The null hypothesis is denoted by H0 and the alternative hypothesis is denoted by H1. The test statistic is given by:1

     where the sample average is X-bar, the number of samples is n and the standard  deviation of the mean is σx. Note, if n > 30, that the sample standard deviation, s, is  often used as an estimate of the population standard deviation, σx. The test statistic, Z, is compared with a critical value Zα  or Zα/2, which is based on a significance level,α, for a one—tailed test or α/2 for a two-tailed test. If the H1 sign is ≠, it is a two-tailed test. If the H1 sign is >, it is a right, one-tailed test, and if the H1 sign is <, it is a left, one-tailed test.
    Example: The average vial height from an injection molding process has been 5.00″ with a standard deviation of 0.12″. An experiment is conducted using new material which yielded the following vial heights: 5.10″, 4.90″, 4.92″, 4.87″, 5.09″, 4.89″, 4.95″, and 4.88″. Can one state with 95% confidence that the new material is producing shorter vials with the existing molding machine setup? This question involves an inference about a population mean with a known sigma. The Z test applies. The null and alternative hypotheses are:

    H0: μ ≥ μ0                  H1: μ < μ0

    H0: μ ≥ 5.00″                  H1: μ <5.00″

    The sample average is (X-bar) = 4.95″ with n = 8 and the population standard deviation is σx = 0.12″. The test statistic is:1Since the H, sign is <, it is a left, one-tailed test and with a 95% confidence, the level of significance, α = 1 – 0.95 = 0.05. Looking up the critical value in a normal distribution or Z table, one finds Z0.05 = -1.645. Since the test statistic, -1 .18, does not fall in the reject (or critical) region, the null hypothesis cannot be rejected. There is insufficient evidence to conclude that the vials made with the new material are shorter.
    If the test statistic had been, for example -1.85, we would have rejected the null hypothesis and concluded the vials made with the new material are shorter.1

  2. Student’s t Test

    This technique was developed by W. S. Gosset and published in 1908 under the pen name “Student.” Gosset referred to the quantity under study as t. The test has since been known as the student’s t test. The student’s t distribution applies to samples drawn from a normally distributed population. It is used for making inferences about a population mean when the population variance, σ2, is unknown and the sample size, n, is small. The use of the t distribution is never wrong for any sample size. However, a sample size of 30 is normally the crossover point between the t and Z tests. The test statistic formula is:1
    The null and alternative hypotheses are the same as were given for the Z test. The test statistic, t, is compared with a critical value, tα  or tα/2, which is based on a significance level,α, for a one—tailed test or α/2 for a two-tailed test and the number of degrees of freedom, d.f. The degrees of freedom is determined by the number of samples, n, and is simply: dt=n-1

    Example: The average daily yield of a chemical process has been 880 tons (μ = 880 tons). A new process has been evaluated for 25 days (n = 25) with a yield of 900 tons (X-bar) and sample standard deviation, s = 20 tons. Can one say with 95% confidence that the process has changed?

    The null and alternative hypotheses are:

    H0: μ = μ0                  H1: μ ≠ μ0

    H0: μ = 880 tons                  H1: μ ≠ 880 tons

    The test statistic calculation is:

    Since the H1 sign is ≠, it is a two-tailed test and with a 95% confidence, the level of significance, α = 1 – 0.95 = 0.05. Since it is a two-tail test, α/2 is used to determine the critical values. The degrees of freedom d.f. = n – 1 = 24. Looking up the critical values in a t distribution table, one finds t0.025.= -2.064 and t0.975 = 2.064. Since the test statistic, 5, falls in the right-hand reject (or critical) region, the null hypothesis is rejected. We conclude with 95% confidence that the process has changed.


One underlying assumption is that the sampled population has a normal probability distribution. This is a restrictive assumption since the distribution of the sample is unknown. The t distribution works well for distributions that are bell-shaped.

Comparing Standard Deviations/ Variance

Chi Square (χ2) Test

Standard deviation (or variance) is fundamental in making inferences regarding the population mean. In many practical situations, variance (σ2) assumes a position of greater importance than the population mean. Consider the following examples:

  1. A shoe manufacturer wishes to develop a new sole material with a more stable wear pattern. The wear variation in the new material must be smaller than the variation in the existing material.
  2. An aircraft altimeter manufacturer wishes to compare the measurement precision among several instruments.
  3. Several inspectors examine finished parts at the end of a manufacturing process. Even when the same lots are examined by different inspectors, the number of defectives varies. Their supervisor wants to know if there is a significant difference in the knowledge or abilities of the inspectors.

The above problems represent a comparison of a target or population variance with an observed sample variance, a comparison between several sample variances, or a comparison between frequency proportions. The standardized test statistic is called the Chi Square (χ2)test. Population variances are distributed according to the chi square distribution. Therefore, inferences about a single population variance will be based on chi square. The chi square test is widely used in two applications.
Case I. Comparing variances when the variance of the population is known.
Case ll. Comparing observed and expected frequencies of test outcomes when there is no defined population variance (attribute data).
When the population follows a normal distribution, the hypothesis tests for comparing a population variance, 0:, with a fixed value, 0:, are given by the following:

  • H0: σx2 = σ02                  H1: σx2 ≠ σ02
  • H0: σx2 ≤ σ02                  H1: σx2> σ02
  • H0: σx2 ≥ σ02                  H1x2 < σ02

The null hypothesis is denoted by H0 and the alternative hypothesis is denoted by H1. The test statistic is given by:1Where the number of samples is n and the sample variance is s2. The test statistic, A χ2, is compared with a critical value χα2, or χα/22, which is based on a significance level, α, for a one-tailed test or α/2 for a two-tailed test and the number of degrees of freedom, d.f. The degrees of freedom is determined by the number of samples, n, and is simply:  d,f.=n-1

If the H1 sign is≠, it is a two-tailed test. If the H1 sign is >, it is a right, one-tailed test, and if the H1 sign is <, it is a left, one-tailed test.

The χ2 distribution looks like so:1

Please note, unlike the Z and t distributions, the tails of the chi square distribution are non-symmetrical.

  • Chi square Case I. Comparing Variances When the Variance of the Population Is Known.

    Example: The R & D department of a steel plant has tried to develop a new steel alloy with less tensile variability. The R & D department claims that the new material will show a four sigma tensile variation less than or equal to 60 psi 95% of the time. An eight sample test yielded a standard deviation of 8 psi. Can a reduction in tensile
    strength variation be validated with 95% confidence?

    Solution: The best range of variation expected is 60 psi. This translates to a sigma of 15 psi (an approximate 4 sigma spread covering 95.44% of occurrences).

    H0: σx2 ≥ σ02                  H1x2 < σ02

    H0: σx2 ≥ 152                  H1x2 < 152

    From the chi square table: Because S is less than σ, this is a left tail test with n – 1 = 7. The critical value for 95% confidence is 2.17. That is, the calculated value will be less than 2.17, 5% of the time. Please note that if one were looking for more variability in the process a right tail rejection region would have been selected and the critical value would be 14.07.
    The calculated statistic is:1=(7)(8)2/(15)2=1.99
    Since 1.99 is less than 2.17, the null hypothesis must be rejected. The decreased variation in the new steel alloy tensile strength supports the R & D claim.1

  • Chi square Case ll. Comparing Observed and Expected Frequencies of Test Outcomes. (Attribute Data)

    It is often necessary to compare proportions representing various process conditions. Machines may be compared as to their ability to produce precise parts. The ability of inspectors to identify defective products can be evaluated. This application of chi square is called the contingency table or row and column analysis.
    The procedure is as follows:

    1. Take one subgroup from each of the various processes and determine the  observed frequencies (0) for the various conditions being compared.
    2. Calculate for each condition the expected frequencies (E) under the assumption that no differences exist among the processes.
    3.  Compare the observed and expected frequencies to obtain “reality.” The following calculation is made for each condition:


    4. Total all the process conditions:
    5. A critical value is determined using the chi square table with the entire level of significance, σ, in the one-tail, right side, of the distribution. The degrees of freedom is determined from the calculation (R-1)(C-1) [the number of rows minus 1 times the number of columns minus 1 ].
    6. A comparison between the test statistic and the critical value confirms if a ) significant difference exists (at a selected confidence level).

    Example: An airport authority wanted to evaluate the ability of three X-ray inspectors to detect key items. A test was devised whereby transistor radios were placed in ninety pieces of luggage. Each inspector was exposed to exactly thirty of the pre selected and “bugged” items in a random fashion. The observed results are summarized below.1 Is there any significant difference in the abilities of the inspectors? (95%  confidence)
    Null hypothesis:
    There is no difference among  three inspectors, H0: p1 = p2 = p3
    Alternative hypothesis:
    At least one of the proportions is different, H1: p1 ≠ p2 ≠ p3
    The degrees of freedom = (rows – 1)(columns – 1) = (2-1)(3-1) = 2
    The critical value of χ2 for DF = 2 and d = 0.05 in the one-tail, right side of the distribution, is 5.99 . There is only a 5% chance that the calculated value of χ2 will exceed 5.99.1


     = 0.220 + 0.004 + 0.289 + 1.019 + 0.020 + 1.333
    χ2 = 2.89

Since the calculated value of χ2 is less than the previously calculated critical value of 5.99 and this is a right tail test, the null hypothesis cannot be rejected. There is insufficient evidence to say with 95% confidence that the abilities of the inspectors differ.

Comparing Proportion

p Test

When testing a claim about a population proportion, with a fixed number of independent trials having constant probabilities, and each trial has two outcome possibilities (a binomial experiment), a p test can be used. When np < 5 or n(1-p) < 5, the binomial distribution is used to test hypotheses relating to proportion.
If conditions that np ≠ 5 and n(1-p)≠5 are met, then the binomial distribution of sample proportions can be approximated by a normal distribution. The hypothesis tests for comparing a sample proportion, p, with a fixed value, po, are given by the following:

  • H0: p = p0                  H1: p ≠ p0
  • H0: p ≤ p0                  H1: p> p0
  • H0: p ≥ μ0                  H1: μ < p0

The null hypothesis is denoted by H0 and the alternative hypothesis is denoted by H1. The test statistic is given by: 1Where the number of successes is x and the number of samples is n. The test statistic, Z, is compared with a critical value Zα  or Zα/2, which is based on a significance level,α, for a one—tailed test or α/2 for a two-tailed test .If the H1 sign is >, it is a right, one-tailed test, and if the H1 sign is <, it is a left, one-tailed test.

Example. A local newspaper stated that less than 10% of the rental properties did not allow renters with children. The city council conducted a random sample of 100 units and found 13 units that excluded children. Is the newspaper statement wrong based upon this data? In this case H0: p ≤ 0.1 and H1: p> 0.1 In this case p0 = 0.1 and the computed Z value is


For α= 0.05, Z = 1.64 and the newspaper statement cannot be rejected based upon this data at the 95% level of confidence.

Hypotheses Test for Comparing  two Population

Here we considers hypothesis tests to compare parameters of two populations with each other. For example, we may want to know if after a process change the process is different from the way it was before the change. The data after the change constitute one population to be compared with the data prior to change, which constitute the other population. Some specific comparative questions are: Has the process mean changed? Has the process variability reduced? If the collected data are discrete, such as defectives and non defectives, has percent defective changed?

Comparing Two Means (Variance Known)

Z Test.

The following test applies when we want to compare two population
means and the variance of each population is either known or the sample size is large (n > 30). Let denote the population mean, sample size, sample average and population standard deviation for the first population and let  represent the same quantities for the second population. The hypotheses being compared are H0: μ1 = μ2  and             H1: μ1 ≠μ2.  Under the null hypothesis

Therefore, the test statistic Z =

has a standard normal distribution. If the computed value of Z exceeds the critical value, the null hypothesis is rejected.

Example. We want to determine whether the tensile strength of products from two suppliers are the same. Thirty samples were tested from each supplier with the following results:  and Z =

The Z value for  α= 0.001 is 3.27; hence, the two means are different
with 99.9% confidence.

Comparing Two Means (Variance Unknown but Equal)

Independent t-Test.

This test is used to compare two population means when the sample sizes are small, the population variances are unknown but may be assumed to be equal. In this situation, a pooled estimate of the standard deviation is used to conduct the t-test. Prior to using this test, it is necessary to demonstrate that the two variances are not different, which can be done by using the F-test   The hypotheses being tested are The hypotheses being compared are H0: μ1 = μ2  and H1: μ1 ≠μ2. . A pooled estimate of variance is obtained by weighting the two variances in proportion to their degrees of freedom as follows:


The test statistic t has has a tn1+n2–2 distribution. If the computed value of t exceeds the critical value, H0 is rejected and the difference is said to be statistically significant.

Example. The following results were obtained in comparing surface soil pH at two different locations:1

Do the two locations have the same pH?
Assuming that the two variances are equal, we first obtain a pooled
estimate of variance:1

Spooled = 0.24. Then the t statistic is computed:1
For a two-sided test with = 0.05 and (n1 + n2 – 2) = 18 degrees of freedom, the critical value of t is t0.025,18 = 2.1. Since computed value of t exceeds the critical value 2.1, the hypothesis that the two locations have the same pH is rejected.

Comparing Two Means (Variance Unknown and Unequal)

Independent t-Test.

This test is used to compare two population means when the sample sizes are small (n < 30), the variance is unknown, and the two population variances are not equal, which should first be demonstrated by conducting the F-test to compare two variances. The hypotheses being compared are H0: μ1 = μ2  and H1: μ1 ≠μ2. The test statistic t and the degrees of freedom ν are1

If the computed t exceeds the critical t, the null hypothesis is rejected.
Example. The following data were obtained on the life of light bulbs made by two manufacturers:1

Is there a difference in the mean life of light bulbs made by the two manufacturers? Assuming  that the F-test in  shows that the two standard deviations are not equal. The computed t and ν are:1

For α= 0.05, tα/2,ν = 2.16. Since the computed value exceeds the critical t value, we are 95% sure that the mean life of the light bulbs from the two manufacturers is different.

Comparing Two Means (Paired t-test)

This test is used to compare two population means when there is a physical reason to pair the data and the two sample sizes are equal. A paired test is more sensitive in detecting differences when the population standard deviation is large. The hypotheses being compared are H0: μ1 = μ2  and H1: μ1 ≠μ2. The test statistic t1

d = difference between each pair of values
dbar = observed mean difference
sd = standard deviation of d

Example. Two operators conducted simultaneous measurements on
percentage of ammonia in a plant gas on nine successive days to find the extent of bias in their measurements.  Since the day-to-day differences in gas composition were larger than the expected bias, the tests were designed to permit paired comparison.1


For α= 0.05, t0.025,8 = 2.31. Since the computed t value is less than the critical t value, the results do not conclusively indicate that a bias exists.

Comparing Two Standard Deviations


This test is used to compare two standard deviations and applies for all sample sizes. The hypotheses being compared are H0: σ1 = σ2  and  H1: σ1 ≠σ2.1

F distribution, which is a skewed distribution and is characterized by the degrees of freedom used to estimate S1 and S2, called the numerator degrees of freedom (n1 – 1) and denominator degrees of freedom (n2 – 1), respectively. Under the null hypothesis, the F statistic becomes  S12/S22

In calculating the F ratio, the larger variance is in the numerator, so that the calculated value of F is greater than one. If the computed value of F exceeds the critical value Fα/2,n1–1,n2–1 the null hypothesis is rejected.11


Since the calculated F value is in the critical region, the null hypothesis is rejected. There is sufficient evidence to indicate a reduced variation and more consistency of strength after aging for 1 year.


Regression Analysis

y = f (x) Formula

To determine what factors in your process (as indicated by a measure) you can change to improve the CTQs(Critical to Qualities ) and, ultimately, the key business measures. It helps us to illustrates the causal relationship among the key business measures (designated as Y), the process outputs directly affecting the Y’s (designated as CTQ or y), and the factors directly affecting the process outputs (designated as x). It enables members of your improvement team to communicate the team’s findings to others in a simple format and also highlights the factors the team wants to change and what impact the change will have. It provides a matrix that can be used in the Control step of the DMAIC method for ongoing monitoring of the process after the team’s improvement work is complete Many people understand the concept of y = f (x) from mathematical education. The x, y, Y matrix is based on this concept. If it confuses team members to use these letters, simply use the terms key business measures, CTQ or process outputs, and causal factors instead; they represent the same concepts.

Gather the key business measures for your project either from  the team charter or check with your sponsor). Gather the CTQs that the improvement team selects as the most important for your project.  List the key business measure and the CTQ operational definition in a matrix. As your team progresses through the Measure and Analyze steps of the DMAIC method, add the causal-factor definitions (x’s) you discover

Guidelines for Filling Out an x, y, Y Matrix

A sample x,y, Y Matrix




Correlation is used to determine the strength of linear relationships between two process variables. It allows the comparison of an input to an output, two inputs against each other, or two outputs against each other .Correlation measures the degree of association between two independent continuous variables. However, even if there is a high degree of correlation, this tool does not establish causation. For example, the number of skiing accidents in Colorado is highly correlated with sales of warm clothing, but buying warm clothes did not cause the accidents. Correlation can be analyzed by calculating the Pearson product moment correlation coefficient (r). This coefficient is calculated as follows:


Where Sx and Sy are the sample standard deviations. The resulting value will be a number between -1 and +1. The higher the absolute value of r, the stronger the correlation. A value of zero means there is no correlation. A strong correlation is characterized by a tight distribution of plotted pairs about a best-fit line. It should be noted that correlation does not measure the slope of the best-fit line; it measures how close the data are to the best-fit line. A negative r implies that as one variable (x2) increases, the other variable (x1) decreases.


A positive r implies that as one variable (x3) increases, the other variable (x1) also increases.


A strong relationship other than linear can exist, yet r can be close to zero.


Regression measures the strength of association between independent factor(s) (also called predictor variable(s) or regressors) and a dependent variable (also called a response variable). For simple or multiple linear regression, the dependent variable must be a continuous variable. Predictor variables can be continuous or discrete, but must be independent of one another. Discrete variables may be coded, discrete levels (dummy variables (0, 1) or effects coding (-1, +1)).Regression is used to  investigate suspected correlations by generating an equation that quantifies the relationship. It explains the relationship through an equation for a line, curve, or surface. It explains the variation in y values and  helps to  predicts the impact of controlling a process variable (x). It helps to  predict future process performance for certain values of x. It also help to  identify the vital few x’s that drive y and also helps you to manipulate process conditions to generate desirable results (if x is controllable) and/or avoid undesirable results.
For linear regressions (i.e., when the relationship is defined by a line), the regression equation is represented as y = ao + a1x, where ao = intercept (i.e., the point where the line crosses x = 0) and a1 = slope (i.e., rise over run, or change in y per unit increase in x).

  • Simple linear regression relates a single x to a y. It has a single regressor (x) variable and its model is linear with respect to coefficients (a).
    y = a0 + a1x + error
    y = a0 + a1x + a2 x2 + a3 x3 + error.

    “Linear” refers to the coefficients a0, a1, a2, etc. In the second example, the relationship between x and y is a cubic polynomial in nature, but the model is still linear with respect to the coefficients.

  • Multiple linear regression relates multiple x’s to a y. It has multiple regressor (x) variables such as x1, x2, and x3. Its model is linear with respect to coefficients (b).
    y = b0 + b1x1 + b2x2 + b3x3 + error
  • Binary logistic regression relates x’s to a y that can only have a dichotomous value (one of two mutually exclusive outcomes such as pass/fail, on/off, etc.)
  • Least squares method: Use the least squares method, where you determine the regression equation by using a procedure that minimizes the total squared distance from all points to the line. This method finds the line where the squared vertical distance from each data point to the line is as small as possible (or the “least”). This means that the method minimizes the “square” of all the residuals.
Steps in Regression Analysis
  1. Plot the data on a Scatter Diagram: Be sure to plot your data before doing regression. The charts below show four sets of data that have the same regression equation: y = 3 + 0.5x.
    Obviously, there are four completely different relationships.1
  2. Measure the vertical distance from the points to the line
  3.  Square the figures
  4. Sum the total squared distance
  5.  Find the line that minimizes the sum
    Generally a computer program is used to generate the “best fit” line that represents the relationship between x and y.  The following sets of terms are often used interchangeably:
    • Regression equation and regression line.
    • Prediction equation and prediction line.
    • Fitted line, or fits, and model.

    When two variables show a relationship on a scatter plot, they are said to be correlated, but this does not necessarily mean they have a cause/ effect relationship. Correlation means two things vary together. Causation means changes in one variable cause changes in the other.

    The residual is the leftover variation in y after you use x to predict y. The residual represents common-cause (i.e., random and unexplained) variation. You determine a residual by subtracting the predicted y from the observed y
    Residuals are assumed to have the following properties:

    • Not related to the x’s.
    • Stable, independent, and not changing over time.
    • Constant and not increasing as the predicted y’s increase.
    • Normal (i.e., bell-shaped) with a mean of zero.

    check for  each of these assumptions. If the assumptions do not hold, the regression equation might be incorrect or misleading.

Simple Linear Regression Model

Consider the problem of predicting the test results (y) for students based upon an input variable (x), the amount of preparation time in hours using the data presented in Table below.



Study times (hours)

Test Results (%)





























Study Time Versus Test Results


An initial approach to the analysis of the data  is to plot the points on a graph known as a scatter diagram. Observe that y appears to increase as x increases. One method of obtaining a prediction equation relating y to x is to place a ruler on the graph and move it about until it seems to pass through the majority of the points, thus providing what is regarded as the “best fit” line.

The mathematical equation of a straight line is:

Y = β0 + β1x

Where β0 is the y intercept when x = 0 and β1 is the slope of the line. Here the x axis does not go to zero so the y intercept appears too high.  The equation for a straight line in this example is too simplistic. There will actually be a random error which is the difference between an observed value of y and the mean value of y for a given value of x. One assumes that for any given value of x, the observed value of y varies in a random manner and possesses a normal probability distribution.


The probabilistic model for any particular observed value of y is:
Mean value of y for
y = (mean value of y for a given value of x )+ (random error)

Y = β0 + β1x+ε

The Method of Least Squares


The statistical procedure of finding the “best-fitting” straight line is, in many  respects, a formalization of the procedure used when one fits a line by eye. The objective is to minimize the deviations of the points from the prospective line. If one denotes the predicted value of y obtained from the fitted line as  the prediction equation is:


Having decided to minimize the deviation of the points in choosing the best fitting line, one must now define what is meant by “best.”


The best fit criterion of goodness known as the principle of least squares is employed:
Choose, as the best fitting line, the line that minimizes the sum of squares of the deviations of the observed values of y from those predicted. Expressed mathematically, minimize the sum of squared errors given by:
The least square estimator of β0  and  β1, are calculated as follows:


One may predict y for a given value of x by substitution into the prediction equation. For example, if 60 hours of study time is allocated, the predicted test score would be:
While doing Regression analysis be careful of rounding errors. Normally, the calculations should carry a minimum of six significant figures in computing sums of squares of deviations. Note that the prior example consisted of convenient whole numbers which does not occur often. Always plot the data points and graph the least squares line. If the line does not provide a reasonable fit to the data points, there may be a calculation error. Projecting a regression line outside of the test area can be risky. The above equation suggests, without study, a student would make 31% on the test. The odds favor 25% if answer a is selected for all questions. The equation also
suggests that with 100 hours of study the student should attain 100% on the examination – which is highly unlikely.

Calculating Sε2 , an Estimator of σε2

Recall, the model for y assumes that y is related to x by the equation:

Y = β0 + β1x+ε

If the least squares line is used:


A random error, 5, enters into the calculations of  β0 and β1. The random errors affect the error of prediction. Consequently, the variability of the random errors (measured by σε2 plays an  important role when predicting by the least squares line.

The first step toward acquiring a boundary on a prediction error requires that one estimates σε2. It is reasonable to use SSE (sum of squares for error) based on (n – 2) degrees of freedom, one for each variable (x and y).

An Estimator for σε2

SSE = Sum of squared errors
SSE may also be written:




Example : Calculate an estimated σε2 for the data in Table given above.The existence of a significant relationship between y and x can be tested by whether β1 is equal to 0. If β1≠ 0 there is a linear relationship. The null hypothesis and  alternative hypothesis are:The test statistic is a t distribution with n – 2 degrees of freedom:


Example: From the data in Table above, determine if the slope results are significant at a 95% confidence level.
For a 95% confidence level, determine the critical values of t with  α= 0.025 in each tail, using n – 2 = 8 degrees of freedom:                t0.025,8 = -2.306 and  t0.025,8 = 2.306. Reject the null hypothesis           if t > 2.306 or t < -2.306, depending on whether the slope is positive  or negative. In this case, the null hypothesis is rejected and we conclude that β1 ≠ 0 and there is a linear relationship between y and x.

Confidence Interval Estimate for the Slope β1

 The confidence interval estimate for the slope B, is given by:


For example by Substitute previous data into the above formula to obtain the confidence interval around the slope of the line.
Intervals constructed by this procedure will enclose the true value of β1 95% of the time. Hence, for every 10 hours of increased study, the expected increase in test scores would fall in the interval of 3.86 to 10.05 percentage points.

Correlation Coefficient

The population linear correlation coefficient, p, measures the strength of the linear relationship between the paired x and y values in a population. p is a population parameter. For the population, the Pearson product moment coefficient of correlation, pm is given by:Where cov means covariance. Note that -1 ≤ρ≤ +1


The sample linear correlation coefficient, r, measures the strength of the linear relationship between the paired x and y values in a sample. r is a sample statistic. For a sample, the Pearson product moment coefficient of correlation, rx,y is given by:


For Example , Using the study time and test score data reviewed earlier, determine the correlation coefficient.  sxy = 772, sx = 1,110, sy = 696.9
The numerator used in calculating r is identical to the numerator of the formula for the slope β1. Thus, the coefficient of correlation r will assume exactly the same sign as β1 and will equal zero when      β1 = 0.

  • A positive value for r implies that the line slopes upward to the right.
  • A negative value for r implies that the line slopes downward to the right.
  • Note that r = 0 implies no linear correlation, not simply “no correlation.” A pronounced curvilinear pattern may exist.

When r = 1 or r = -1, all points fall on a straight line; when r = 0, they are scattered and give no evidence of a linear relationship. Any other value of r suggests the degree to which the points tend to be linearly related. If x is of any value in predicting y, then SSE can never be larger than:

Coefficient of Determination (R2)

The coefficient of determination is R2. The square of the linear correlation coefficient is r2. It can be shown that: R2 = r2 .


The coefficient of determination is the proportion of the explained variation divided by the total variation, when a linear regression is performed. r2 Iies in the interval of 0 ≤r2≤1. r2 will equal +1 only when all the points fall exactly on the fitted line. That is, when SSE equals zero.


For Example: Using the data from Example above , determine the coefficient of determination.


One can say that 77% of the variation in test scores can be explained by variation in study hours.


Where SST = total sum of squares (from the experimental average) and SSE = total sum of squared errors (from the best fit). Note that when SSE is zero, r2 equals one and when SSE equals SST, then r2 equals zero.

Correlation Versus Causation

In the above example, there is strong evidence of a correlation between car weight and gas milage. The student should be aware that a number of other factors (carburetor type, car design, air conditioning, passenger weights, speed, etc.) could also be important. The most important cause may be a different or a collinear variable. For example, car and passenger weight may be collinear. There can also be such a thing as a nonsensical correlation, i.e. it rains after my car is washed.

Simple Linear Regression In nutshell

  1. Determine which relationship will be studied.
  2. Collect data on the x and y variables.
  3. Set up a fitted line plot by charting the independent variable on the x axis and the dependent variable on the y axis.
  4. Create the fitted line. If creating the fitted line plot by hand, draw a straight line through the values that keep the least amount of total space between the line and the individual plotted points (a “best fit”).If using a computer program, compute and plot this line via the “least squares method.”
  5. Compute the correlation coefficient r.
  6. Determine the slope or y intercept of the line by using the equation y = mx + b. The y intercept (b) is the point on the y axis through which the “best fitted line” passes (at this point, x = 0). The slope of the line (m) is computed as the change in y divided by the change in x (m = Δy/ Δx). The slope, m, is also known as the coefficient of the predictor variable, x.
  7. Calculate the residuals. The difference between the predicted response variable for any given x and the experimental value or actual response (y) is called the residual. The residual is used to determine if the model is a good one to use. The estimated standard deviation of the residuals is a measure of the error term about the regression line.
  8. To determine significance, perform a t-test (with the help of a computer) and calculate a p-value for each factor. A p-value less than α (usually 0.05) will indicate a statistically significant relationship.
  9. Analyze the entire model for significance using ANOVA, which displays the results of an F-test with an associated p-value.
  10.  Calculate R2 and R2 adj. R2, the coefficient of determination, is the square of the correlation coefficient and measures the proportion of variation that is explained by the model. Ideally, R2 should be equal to one, which would indicate zero error.

    R2 = SSregression / SStotal
    = (SStotal – SSerror ) / SStotal
    = 1-[SSerror / SStotal ]
    Where SS = the sum of the squares.
    R2 adj is a modified measure of R2 that takes into account the number of terms in the model and the number of data points.
    R2 adj = 1- [SSerror / (n-p)] / [SStotal / (n-1)]
    Where n = number of data points and p = number of terms in the model. The number of terms in the model also includes the constant.
    Note: Unlike R2, R2 adj can become smaller when added terms provide little new information and as the number of model terms gets closer to the total sample size. Ideally, R2 adj should be maximized and as close to Ras possible. Conclusions should be validated, especially when historical data has been used.

  11. Multiple Linear Regression

Multiple linear regression is an extension of the methodology for linear regression to more than one independent variable. By including more than one independent variable, a higher proportion of the variation in y may be explained.

First-Order Linear Model

Y = β0 + β1x12x2 +…… + βkxk + ε

A Second-Order Linear Model (Two Predictor Variables)

Y = β0 + β1x1 + β2x23x1x2 + β4x125x22 + ε


Just like r2 (the linear coefficient of determination) R2 (the multiple coefficient of determination) take values in the interval:  0≤R2≤1

Attributes Data Analysis

The analysis of attribute data is organized into dichotomous values, categories, or groups. Applications involve decisions such as yes/no, pass/fail, good/bad, poor/fair/good/super/ excellent, etc. Some of the techniques used in nonlinear regression models include: logistic regression analysis, logit regression analysis, and probit regression analysis. A description of the three models follows:

  •  Logistic regression relates categorical, independent variables to a single dependent variable. The three models described within Minitab are binary, ordinal, and nominal.
  • Logit analysis is a subset of the log-linear model. It deals with only one dependent variable, using odds and odds ratio determinations.
  • Probit analysis is similar to accelerated life testing. A unit has a stress imposed on it with the response being pass/fail, good/bad, etc. The response is binary (good/bad) versus an actual failure time.

Log-linear models are nonlinear regression models similar to linear regression equations. Since they are nonlinear, it is necessary to take the logs of both sides of the equation in order to produce a linear equation. This produces a log-linear model. Logit models are subsets of this model.

Logistic Regression

Logistic regression  is used to establish a y = f (x) relationship when the dependent variable (y) is binomial or dichotomous. Similar to regression, it explores the relationships between one or more predictor variables and a binary response. Logistic Regression helps us to predict the probability of future events belonging to one group or another (i.e., pass/fail, profitable/nonprofitable, or purchase/not purchase).  Logistic regression relates one or more independent variables to a single dependent variable. The independent variables are described as predictor variables and the response is a dependent variable. Logistic regression is similar to regular linear regression, since both have regression coefficients, predicted values, and residuals. Linear regression assumes that the response variable is continuous, but for logistic regression, the response variable is binary. The regression coefficients for linear regression are determined by the ordinary least squares approach, while logistic regression coefficients are based on a maximum likelihood estimation.
Logistic regression can provide analysis of the two values of interest: yes/no, pass/fail, good/bad, enlist/not enlist, vote/no vote, etc. A logistic regression can also be described as a binary regression model. It is nonlinear and has a S-shaped form. The values are never below 0 and never above 1. The general logistic regression equation can be shown as:
y=b0+b1x1+e                   where y=0,1


The probability of results being in a certain category is given by:

The predictor variables (x’s) can be either continuous or discrete, just as for any problem using regression. However, the response variable has only two possible values (e.g., pass/fail, etc.). Because regression analysis requires a continuous response variable that is not bounded, this must be corrected. This is accomplished by first converting the response from events (e.g., pass/fail) to the probability of one of the events, or p. Thus if p = Probability (pass), then p can take on any value from 0 to 1. This conversion results in a continuous response, but one that is still bounded. An additional transformation is required to make the response both continuous and unbounded. This is called the link function. The most common link function is the “logit,” which is explained below.


Y = β0 + β1x
We need a continuous, unbounded Y.



Logistic regression also known as Binary Logistic regression(BLR) fits sample data to an S-shaped logistic curve. The curve represents the probability of the event. At low levels of the independent variable (x), the probability approaches zero. As the predictor variable increases, the probability increases to a point where the slope decreases. At high levels of the independent variable, the probability approaches 1. The following two examples fit probability curves to actual data. The curve on the top represents the “best fit.” The curve through the data on the bottom contains a zone of uncertainty where events and non-events (1’s and 0’s) overlap.

If the probability of an event, p, is greater than 0.5, binary logistic regression would predict a “yes” for the event to occur. The probability of an event not occurring is described as (1-p). The odds, or p/(1-p),compares the probability of an event occurring to the probability of it not occurring. The logit, or “link” function, represents the relationship between x and y.

Step for Logistic regression

  1. Define the problem and the question(s) to be answered.
  2. Collect the appropriate data in the right quantity.
  3. Hypothesize a model.
  4. Analyze the data. Many statistical software packages are available to help analyze data.
  5. Check the model for goodness of fit.
  6. Check the residuals for violations of assumptions.
  7. Modify the model, if required, and repeat.

A example will be used to compare the number of hours studied for exam versus pass/fail responses. The data is provided  for 50 students. The number of hours a student spends studying  is recorded. In addition, the end result, the dependent (pass/fail) variable, is noted. In logistic regression, because of the use of attribute data, there should be 50 data points per variable. An analysis will be made for the regular linear regression model. This result will then be compared to the logistic regression model. Logistic regression can be used  to predict the probability that an observation belongs to one of two groups.


For the logistic regression example, Minitab is used to determine the regression coefficients. Using Excel to calculate the probabilities, the logistic regression curve is displayed in Figure


An S-shaped curve can be used to smooth out the data points. The curve moves from the zero probability point up to the 1.0 line. The probabilities in the logistic curve were calculated from the  equation:Using Minitab, the regression coefficients and the equation can be determined. After determining the regression coefficients, the probability of a student passing the exam, after studying 80 hours, can be calculated.
It appears that there is a 54.5% probability of passing after 80 hours of study. At 100 hours or more, the probability of passing increases to more than 90%.
Minitab 13 has three logistic regression procedures, as described
Grimm  provides the following logistic regression assumptions:

  • There are only two values (pass/fail) with only one outcome per event
  • The outcomes are statistically independent
  • All relevant predictors are in the model
  • It is mutually exclusive and collectively exhaustive (one category at a time)
  • Sample sizes are larger than for linear regression

The individual regression coefficients can be tested for significance through comparison of the coefficient to its standard error. The 2 value will be compared to the value obtained from the normal distribution.

z=b1/Se .


The logistic regression model can be tested via several goodness-of-fit tests. Minitab will automatically test three different methods: Pearson, Deviance, and Homer-Lemeshow. The simple logistic regression model can be extended to include several other predictors (called multiple logistic regression). If the model contains only categorical variables, it can be classified as a log-linear model.

Logit Analysis .

Logit analysis uses odds to determine how much more likely an observation will be a member of one group versus another group (pass/fail, etc.). A probability of p = 0.80 of being in group A (passing) can be expressed in odds terms as 4:1. There are 4 chances to pass versus 1 chance to fail, or odds of 4:1. The ratio 4/1 or p/(1-p) is called the odds, and the log of the odds, L=ln(p/(1-p)) is called the Logit.
The logit ranges from 0 to 1: 0 < L < 1. The probability for a given L value is provided by the equation:

p =eL/(1 + eL)

Example: From the previous data, there were 50 students who took the exam, but only 27 passed. What are the odds of passing?
Odds = p/(1-P) = 0.54/0.46 = 1.17 or 1.17:1

Example  From the previous data, a student studying 80 hours has a 54.5% chance of passing. What are the odds and the accompanying logit probability?
Odds =P/(1-P)=0.545/(1-0.545) =0.545/0.455 =1.198 or 1.198:1
Logit = In(p/(1 – p)) = ln(1.198) = 0.1809
To find the probability, use the logit equation:
p = eL/(1 + eL)=  e0.1809/(1 + e0.1809)=1.198/2.198=0.545
If the student studies 80 hours, the probability of passing is 54.5%. This is the same result as before, but represents another way to calculate it.

The odds ratio is the change in the odds of moving up or down a level in a group for a one-unit increase or decrease in the predictor. The exponent, e, and slope of coefficient, b,, are used to determine the odds. lf b1 = 0.10821, then the odds of moving to another level is:  eb1 = e0.10821 = 1.1
Positive effects are greater than 1, while negative effects are between 0 and 1.

Logit Regression Model

In cases where the values for each category are continually increasing or continually decreasing, a log transform should be performed to obtain a near straight line. Expanding the logit formula to obtain a straight linear model results in the following formula:
L = logit = ln(p/(1-p)) = ln(eb0+b1x1) = ln(eb0eb1x1) = b0 + b1x1
The equation expanded for multiple predictor variables, x1, x2, …, Xn is:     L = b0 + b1x1 + b2x2 + + bnxn

Logit Regression Example

A medical researcher, with an interest in physical fitness, conducted a long-term walking plan for weight loss. She was able to enroll 1,120 patients in a 2 year walking program. The results were positive and weight loss appeared to accelerate as the number of steps walked increased. The data is presented in Table below. A linear regression of the data produced a good R2 value of 93.8%. However, the graph indicated nonlinear results.
A logit transformation of the data values was performed. The logit value was obtained by dividing the “number lost > 30 lb” by “number lost < 30 lb”, and then taking the natural log. A regression was performed on the steps walked and logit to obtain a new equation.

The resulting R2 was 98.7%. The equation is: L = -5.307 + 0.00067053 x1
Probit Analysis

Probit analysis is similar to accelerated life testing and survivability analysis. An item has a stress imposed upon it to see if it fails or survives. The probit model has an expected variance of 1 and a mean of zero. The logit model has an expected variance of π2 / 3 = 3.29 and an expected mean of zero. This probit model is close to the logit model, since it requires extremely large sample sizes to realize a difference from the logit model.

The probit model is: Φ-1(p) = α + βx = bo + b1x

Where; b0=-µ/σ and b1=-1/σ   or σ=1/Ib1I

In comparing the logit to the probit models, the b coefficients of the logit compared to the probit differ by 1.814. That is: bL = -1.814 bp


For Example: A circular plate is welded to a larger plate to form a supporting structure. There is a need to validate the structure’s capability to resist a torque force. This is a destructive test of the weldment, with a binary response consisting of success or failure. A torque wrench will be used to twist the structure. The levels of applied force are 50, 100, 150, and 200 Ibf-in. A total of 100 samples will be tested at each level of force.
The probit analysis, using Minitab, indicates the normal model was nonlinear and the logistic model would be a better choice. The coefficients are:
b0= 4.0058, b1 = -0.031368The probit model has 7 life distributions to choose from: normal, lognormal (base e), lognormal (base 10), logistic, loglogistic, Weibull, and extreme value. Using Minitab, the best fit for the above data was the logistic distribution. The percentiles, survival probabilities, and plots can also be obtained using Minitab.
For the weldment example, a table of percentiles (Minitab) provides the percentage of surviving parts at various levels of torque. The plot in Figure  shows that the 50% level to be about 130 lbf-in. The 5% survival level would be about 220 lbf-in.
Refer to Table below for a listing of low survival percentages.

The 5% success level (or 95% failure level) indicates the torque force to be 221.5752 lbf-in. The 95% confidence interval is also included.

 Back to Home Page

If you need assistance or have any doubt and need to ask any question  contact us at: You can also contribute to this discussion and we shall be happy to publish them. Your comment and suggestion is also welcome.

Process Capability

Process capability refers to the capability of a process to consistently make a product that meets a customer specified specification tolerance. Capability indices are used to predict the performance of a process by comparing the width of process variation to the width of the specified tolerance. It is used extensively in many industries and only has meaning if the process being studied is stable (in statistical control). Capability indices allow calculations for both short term (Cp and Cpk ) and/or long-term (Pp and Ppk) performance for a process whose output is measured using variable data at a specific opportunity for a defect.

The determination of process capability requires a predictable pattern of statistically stable behavior (most frequently a bell-shaped curve) where the chance causes of variation are compared to the engineering specifications. A capable process is a process whose spread on the bell-shaped curve is narrower than the tolerance range or specification limits. USL is the upper specification limit and LSL is the lower specification limit.
It is often necessary to compare the process variation with the engineering or specification tolerances to judge the suitability of the process. Process capability analysis addresses this issue. A process capability study includes three steps:

  • Planning for data collection
  • Collecting data
  • Plotting and analyzing the results

The objective of process quality control is to establish a state of control over the manufacturing process and then maintain that state of control through time. Actions that change or adjust the process are frequently the result of some form of capability study. When the natural process limits are compared with the specification range, any of the following possible courses of action may result:

  • Do nothing. If the process limits fall well within the specification limits, no action may be required.
  • Change the specifications. The specification limits may be unrealistic. In some cases, specifications may be set tighter than necessary. Discuss the situation with the final customer to see if the specifications may be relaxed or modified.
  • Center the process. When the process spread is approximately the same as the specification spread, an adjustment to the centering of the process may bring the bulk of the product within specifications.
  • Reduce variability. This is often the most difficult option to achieve. It may be possible to partition the variation (stream-to-stream, within piece, batch-to-batch, etc.) and work on the largest offender first. For a complicated process, an experimental design may be used to identify the leading source of variation.
  • Accept the losses. In some cases, management must be content with a high loss rate (at least temporarily). Some centering and reduction in variation may be possible, but the principal emphasis is on handling the scrap and rework efficiently.

Other capability applications:

  •  Providing a basis for setting up a variables control chart
  •  Evaluating new equipment
  • Reviewing tolerances based on the inherent variability of a process
  • Assigning more capable equipment to tougher jobs
  • Performing routine process performance audits
  • Determining the effects of adjustments during processing

Identifying Characteristics

The identification of characteristics to be measured in a process capability study should meet the following requirements:

  • The characteristic should be indicative of a key factor in the quality of the product or process.
  • It should be possible to adjust the value of the characteristic.
  • The operating conditions that affect the measured characteristic should be defined and controlled

If a part has ten different dimensions, process capability would not normally be performed for all of these dimensions. Selecting one, or possibly two, key dimensions provides a more manageable method of evaluating the process capability. For example in the case of a machined part, the overall length or the diameter of a hole might be the critical dimension. The characteristic selected may also be determined by the history of the part and the parameter that has been the most difficult to control or has created problems in the next higher level of  assembly. Customer purchase order requirements or industry standards may also determine the characteristics that are required to be measured. In the automotive industry, the Production Part Approval Process (PPAP)  states “An acceptable level of preliminary process capability must be determined prior to submission for all characteristics designated by the customer or supplier as safety, key, critical, or significant, that can be evaluated using variables (measured) data.” Chrysler, Ford and General Motors use symbols to designate safety and/or government regulated characteristics and important performance, fit, or appearance characteristics.

Identifying Specifications/Tolerances

The process specifications or tolerances, are determined either by customer requirements, industry standards, or the organization’s engineering department. The process capability study is used to demonstrate that the process is centered within the specification limits and that the process variation predicts the process is capable of producing parts within the tolerance requirements. When the process capability study indicates the process is not capable, the information is used to evaluate and improve the process in order to meet the tolerance requirements. There may be situations where the specifications or tolerances are set too tight in relation to the achievable process capability. In these circumstances, the specification must be reevaluated. If the specification cannot be opened, then the action plan is to perform 100% inspection of the process, unless inspection testing is destructive.

Developing Sampling Plans

The appropriate sampling plan for conducting process capability studies depends upon the purpose and whether there are customer or standards requirements for the study. Ford and General Motors specify that process capability studies for PPAP submissions be based on data taken from a significant production run of a minimum
of 300 consecutive pieces.
If the process is currently running and is in control, control chart data may be used to calculate the process capability indices. If the process fits a normal distribution and is in statistical control, then the standard deviation can be estimated from:
1For new processes, for example for a project proposal, a pilot run may be used to estimate the process capability. The disadvantage of using a pilot run is that the estimated process variability is most likely less than the process variability expected from an ongoing process. Process capabilities conducted for the purpose of improving the process may be performed using a design of experiments (DOE) approach in which the optimum A values of the process variables which yield the lowest process variation is the  objective.

Verifying Stability and Normality

If only common causes of variation are present in a process, then the output of the process forms a distribution that is stable over time and is predictable. If special causes of variation are present, the process output is not stable over time.

1The Figure  depicts an unstable process with both process’ average and variation out-of-control. Note, the process may also be unstable if either the process average or variation is out-of-control. Common causes of variation refer to the many sources of variation within a process that has a stable and repeatable distribution over time. This is called a state of statistical control and the output of the process is predictable. Special causes refer to any factors causing variation that are not always acting on the process. If special causes of variation are present, the process distribution changes and the process output is not stable over time. When plotting a process on a control chart, lack of process stability can be shown by several types of patterns including: points outside the control limits, trends, points on one side of the center line, cycles, etc. The validity of the normality assumption may be tested using the chi square hypothesis test. To perform this test, the data is partitioned into data ranges. The number of data points in each range is then compared with the number predicted from a normal distribution. Using the hypothesis test with a selected confidence level, a conclusion can be made as to whether the data follows a normal distribution.
The chi square hypothesis test is:
Ho: The data follows a specified distribution
H1: The data does not follow a specified distribution
and is tested using the following test statistic:
1Continuous data may be tested using the Kolmogorov-Smirnov goodness-of-fit test. It has the same hypothesis test as the chi square test, and the test statistic is given
1Where D is the test statistic and F is the theoretical cumulative distribution of the continuous distribution being tested. An attractive feature of this test is that the distribution of the test statistic does not depend on the underlying cumulative distribution function being tested. Limitations of this test are that it only applies to continuous distributions and that the distribution must be fully specified. The location, scale, and shape parameters must be specified and not estimated from the data. The Anderson-Darling test is a modification of the Kolmogorov-Smirnov test and gives more weight to the tails of the distribution.  If the data does not fit a normal distribution, the chi square hypothesis test may also be used to test the fit to other distributions such as the exponential or binomial distributions.

Capability index Failure Rates

There is a direct link between the calculated Cp (and Pp values) with the standard  normal (Z value) table. 1A Cp of 1.0 is the loss suffered at a Z value of 3.0, ppm equals parts per million of nonconformance (or failure) when the process:

  • is centered on Y
  • Has a two-tailed specification
  • Is normally distributed
  • Has no significant shifts in average or dispersion

When the Cp, Cpk, Pp, and Ppk values are 1.0 or less, 2 values and the standard normal table can be used to determine failure rates. With the drive for increasingly dependable products, there is a need for failure rates in the Cp range of 1.5 to 2.0.

Process Capability Indices

To determine process capability, an estimation of sigma is necessary:
(IR is an estimate of process capability sigma and comes from a control chart.
The capability index is defined as:1As a rule of thumb:

  • CR > 1.33 Capable
  •  CR = 1.00 to 1.33 Capable with tight control
  • CR < 1.00 incapable

The capability ratio is defined as:1As a rule of thumb: 

  • CR < 0.75 Capable
  • CR = 0.75 to 1.00 Capable with tight control
  • CR > 1.00 incapable

Note, this rule of thumb logic is somewhat out of step with the six sigma assumption of a ±1.5 sigma shift. The above formulas only apply if the process is centered, stays centered within the specifications, and CR = CPR.

Cpk is the ratio giving the smallest answer between:

For Example, For a process with  (Xbar)= 12, σR = 2 an USL =16 and LSL = 4, determine Cp and Cpk min:

Cpm index

The Cpm index is defined as:
1Where: USL = upper specification limit
LSL = lower specification limit
μ = process mean
T = target value
σ = process standard deviation
Cpm is based on the Taguchi index, which places more emphasis on process centering on the target.

For example for a process with μ = 12, σ = 2, T = 10, USL = 16 and LSL = 4, determine Cpm:

Process Performance indices

To determine process performance, an estimation of sigma is necessary:
1σi is a measure of total data sigma and generally comes from a calculator or computer.
The performance index is defined as:
1The performance ratio is defined as:
1Ppk is the ratio giving the smallest answer between:1

Short-Term and Long-Term Capability

Up to this point, process capability has been discussed in terms of stable processes, with assignable causes removed. In fact, the process average and spread are dependent upon the number of units measured or the duration over which the process is measured.
When a process capability is determined using one operator on one shift, with one piece of equipment, and a homogeneous supply of materials, the process variation is relatively small. As factors for time, multiple operators, various lots of material, environmental changes, etc. are added, each of these contributes to increasing the process variation. Control limits based on a short-term process evaluation are closer together than control limits based on the long-term process. A short run can be described with respect to time and a small run, where there is a small number of pieces produced. When a small amount of data is available, there is generally less variation than is found with a larger amount of data. Control limits based on the smaller number of samples will be narrower than they should be, and control charts will produce false out-of-control patterns. Smith suggests a modified X(bar) and R chart for short runs, running an initial 3 to 10 pieces without adjustment. A calculated value is compared with a critical value and either the process is adjusted or an initial number of subgroups is run. Inflated D4 and A2 values are used to establish control limits. Control limits are recalculated after additional groups are run. For small runs, with a limited amount of data,  X and MR chart can be used. The X represents individual data values, not an average, and the MR is the moving range, a measure of piece-to-piece variability. Process capability or Cpk values determined from either of these methods must be considered preliminary information. As the number of data points increases, the calculated process capability will approach the true capability. When comparing attribute with variable data, variable data generally provides more information about the process, for a given number of data points. Using variables data, a reasonable estimate of the process mean and variation can be made with 25 to 30 groups of five samples each. Whereas a comparable estimate using attribute data may require 25 groups of 50 samples each. Using variables data is preferable to using attribute data for estimating process capability.

Short-Term Capability Indices

The short-term capability indices Cp and Cpk are measures calculated using the short-term process standard deviation. Because the short-term process variation is used, these measures are free of subgroup drift in the data and take into account only the within subgroup variation. Cp is a ratio of the customer-specified tolerance to six standard deviations of the short-term process variation. Cp is calculated without regard to location of the data mean within the tolerance, so it gives an indication of what the process could perform to if the mean of the data was centered between the specification limits. Because of this assumption, Cp is sometimes referred to as the process potential. Cpk is a ratio of the distance between the process average and the closest specification limit, to three standard deviations of the short-term process variation. Because Cpk takes into account location of the data mean within the tolerance, it is a more realistic measure of the process capability. Cpk is sometimes referred to as the process performance.

Long-Term Capability Indices

The long-term capability indices Pp and Ppk are measures calculated using the long-term process standard deviation. Because the long-term process variation is used, these measures take into account subgroup drift in the data as well as the within subgroup variation. Pp is a ratio of the customer-specified tolerance to six standard deviations of the long-term process variation. Like Cp, Pp is calculated without regard to location of the data mean within the tolerance. Ppk is a ratio of the distance between the process average and the closest specification limit, to three standard deviations of the long-term process variation. Like Cpk, Ppk takes into account the location of the data mean within the tolerance. Because Ppk uses the long-term variation in the process and takes into account the process centering within the specified tolerance, it is a good indicator of the process performance the customer is seeing.

Because both Cp and Cpk are ratios of the tolerance width to the process variation, larger values of Cp and Cpk are better. The larger the Cp and Cpk, the wider the tolerance width relative to the process variation. The same is also true for Pp and Ppk. What determines a “good” value depends on the definition of “good.” A Cp of 1.33 is approximately equivalent to a short-term Z of 4. A Ppk of 1.33 is approximately equivalent to a long-term Z of 4. However, a Six Sigma process typically has a short term Z of 6 or a long-term Z of 4.5.1

Where σst = short-term pooled standard deviation.
And σlt = long-term standard deviation.

Manufacturing Example:

Suppose the diameter of a spark plug is a critical dimension that needs to conform to lower and upper customer specification limits of 0.480″ and 0.490″, respectively. Five randomly selected spark plugs are measured in every work shift. Each of the five samples on each work shift is called a subgroup. Subgroups have been collected for three months on a stable process. The average of all the data was 0.487″. The short-term standard deviation has been calculated and was determined to be 0.0013″. The long-term standard deviation was determined to be 0.019″.

To Calculate Cp and Cpk:
Cp = (0.490 – 0.480)/(6 x 0.0013) = 0.010/0.0078 = 1.28
Cpl = (0.487 – 0.480)/(3 x 0.0013) = 0.007/0.0039 = 1.79
Cpu = (0.490 – 0.487)/(3 x 0.0013) = 0.003/0.0039 = 0.77
Cpk = min (Cpl, Cpu)
Cpk = min (1.79, 0.77) = 0.77

To Calculate Pp and Ppk:
Pp = (0.490″ – 0.480″)/(6 x 0.019) = 0.0100/0.114 = 0.09
Ppl = (0.487 – 0.480)/(3 x 0.019) = 0.007/0.057 = 0.12
Ppu = (0.490 – 0.487)/(3 x 0.019) = 0.003/0.057 = 0.05
Ppk = min (Ppl, Ppu)
Ppk = min (0.12, 0.05) = 0.05

In this example, Cp is 1.28. Because Cp is the ratio of the specified tolerance to the process variation, a Cp value of 1.28 indicates that the process is capable of delivering product that meets the specified tolerance (if the process is centered). (A Cp greater than 1 indicates the process can deliver a product that meets the specifications at least 99.73% of the time.) Any improvements to the process to increase our value of 1.28 would require a reduction in the variability within our subgroups. Cp, however, is calculated without regard to the process centering within the specified tolerance. A centered process is rarely the case so a Cpk value must be calculated.
Cpk considers the location of the process data average. In this calculation, we are comparing the average of our process to the closest specification limit and dividing by three short-term standard deviations. In our example, Cpk is 0.77. In contrast to the Cp measurement, the Cpk measurement clearly shows that the process is incapable of producing product that meets the specified tolerance.
Any improvements to our process to increase our value of 0.77 would require a mean shift in the data towards the center of the tolerance and/or a reduction in the within subgroup variation. (Note: For centered processes, Cp and Cpk will be the same.) Our Pp is 0.09. Because Pp is the ratio of the specified tolerance to the process variation, a Pp value of 0.09 indicates that the process is incapable of delivering product that meets the specified tolerance. Any improvements to the process to increase our value of 0.09 would require a reduction in the variability within and/or between subgroups. Pp, however, is calculated without regard to  he process centering within the specified tolerance. A centered process is rarely the case so a Ppk value, which accounts for lack of process centering, will surely indicate poor capability for our process as well. (Note: For both Pp and Cp, we assume no drifting of the subgroup averages.) Ppk represents the actual long-term performance of the process and is the index that most likely represents what customers receive. In the example, Ppk is 0.05, confirming our Pp result of poor process performance. Any improvements to the process to increase our value of 0.05 would require a mean shift in the data towards the center of the tolerance and/or a reduction in the within subgroup and between subgroup variations.

Business Process Example:

Suppose a call center reports to its customers that it will resolve their issue within fifteen minutes. This fifteen minute time limit is the upper specification limit. It is desirable to resolve the issue as soon as possible; therefore, there is no lower specification limit. The call center operates twenty-four hours a day in eight-hour shifts. Six calls are randomly measured every shift and recorded for two months. An SPC chart shows the process is stable. The average of the data is 11.7 minutes, the short-term pooled standard deviation is 1.2 minutes, and the long-term standard deviation is 2.8 minutes.

To Calculate Cp and Cpk:

Cp = cannot be calculated as there is no LSL
Cpl = undefined
Cpu = (15 – 11.7)/(3 x 1.2) = 3.3/3.6 = 0.92
Cpk = min (Cpl, Cpu) = 0.92

To Calculate Pp and Ppk:
Pp = cannot be calculated as there is no LSL
Ppl = undefined
Ppu = (15 – 11.7)/(3 x 2.8) = 3.3/8.4 = 0.39
Ppk = min (Pplk, Ppu) = 0.39

In this example, we can only evaluate Cpk and Ppk as there is no lower limit. These numbers indicate that if we can eliminate between subgroup variation, we could achieve a process capability (Ppk) of 0.92, which is our current Cpk.

Process Capability for Non-Normal Data

In the real world, data does not always fit a normal distribution, and when it does not, the standard capability indices does not give valid information because they are based on the normal distribution. The first step is a visual inspection of a histogram of the data. If all data values are well within the specification limits, the process would appear to be capable. One additional strategy is to make non-normal data resemble normal data by using a transformation. The question is which one to select for the specific situation. Unfortunately, the choice of the “best” transformation is generally not obvious.
The Box-Cox power transformations are given by: 1Given data observations x1, x2,……. xn, select the power λ that maximizes the logarithm of the likelihood function:
1Where the arithmetic mean of the transformed data is:1

Process capability indices and formulas described elsewhere in this Post are based on the assumption that the data are normally distributed. The validity of the normality assumption may be tested using the chi square hypothesis test. One approach to address the non-normal distribution is to make transformations to “normalize” the data. This may be done with statistical software that performs the Box-Cox transformation. As an alternative approach, when the data can be represented by a probability plot (i.e. a Weibull distribution) one should use the 0.135 and 99.865 percentiles to describe the spread of the data.

It is often necessary to identify non-normal data distributions and to transform them  into near normal distributions to determine process capabilities or failure rates Assume that a process capability study has been conducted. Some 30 data points from a non-normal distribution are shown in Table  below. An investigator can check the data for normality using techniques such as the dot plot, histogram, and normal probability plot. 1

A histogram displaying the above non-normal data indicates a distribution that is skewed to the right


A probability plot can also be used to display the non-normal data, The data points are clustered to the left with some extreme points to the right. Since this is a non- normal distribution, a traditional process capability index is meaningless.1

If the investigator has some awareness of the history of the data, and knows it to follow a Poisson distribution, then a square root transformation is a possibility. The standard deviation is the square root of the mean. Some typical data transformations include:

  • Log transformation (log x)
  • Square root or power transformation (x y)
  • Exponential (e y)
  • Reciprocal (1/x)

In order to find the right transformation, some exploratory data analysis may be required. Among the useful power transformation techniques is the Box-Cox procedure. The applicable formula is:
y ’ =yλ
Where lambda, λ, is the power or parameter that must be determined to transform the data. For λ = 2, the data is squared. For λ = 0.5, a square root is needed.

One can also use Excel or Minitab to handle the data calculations and to draw the normal probability plot. With the use of Minitab, an investigator can let the Box-Cox tool automatically find a suitable power transform. in this example, a power transform of 0.337 is indicated. All 30 transformed data points from Table  above, using   y’ = y0.337, are shown in Table below.1

A probability plot of the newly transformed data will show a near normal distribution.1
Now, a process capability index can be determined forthe data. However, the investigator must remember to also transform the specifications. If the original specifications were 1 and 10,000, the new limits would be 1 and 22.28.

Process Capability for Attribute Data

The control chart represents the process capability, once special causes have been identified and removed from the process. For attribute charts, capability is defined as the average proportion or rate of nonconforming product.

  • for p charts, the process capability is the process average nonconforming, ̅p and is preferably based on 25 or more in-control periods. If desired, the proportion conforming to specification, 1-̅p may be used.
  • for np charts, the process capability is the process average nonconforming, ̅p and is preferably based on 25 or more in-control periods.
  • for c charts, the process capability is the process average nonconforming, ̅c in a sample of fixed size n.
  • for u charts, the process capability is the process average nonconforming per reporting unit ̅u.

The average proportion of nonconformities may be reported on a defects per million opportunities scale by multiplying ̅p times 1,000,000.

Process Performance Metrics

  • A defect is defined as something that does not conform to a known and accepted customer standard.
  • A unit is the product, information, or service used or purchased by a customer.
  • An opportunity for a defect is a measured characteristic on a unit that needs to conform to a customer standard (e.g., the ohms of an electrical resistor, the diameter of a pen, the time it takes to deliver a package, or the address field on a form).
  • Defective is when the entire unit is deemed unacceptable because of the nonconformance of any one of the opportunities for a defect.
  • Defects = D
  • Opportunities (for a defect) = O
  • Units = U
  • Yield = Y

Defect Relationships

Defects per million opportunities (DPMO) helps to determine the capability of a process. DPMO allows for the calculation of capability at one or more opportunities and ultimately, if desired, for the entire organization.

Calculating DPMO depends on whether the data is variable or attribute, and if there is one or more than one opportunity for a defect. If there is:

  • One opportunity with variable data, use the Z transform to determine the probability of observing a defect, then multiply by 1 million.
  •  One opportunity with attribute data, calculate the percent defects, then multiple by 1 million.
  • More than one opportunity with both variable and/or attribute data, use one of two methods to determine DPMO.
  • To calculate DPO, sum the defects and sum the total opportunities for a defect, then divide the defects by the total opportunities and multiply by 1 million. For eg If there are eight defects and thirty total opportunities for a defect, then
    DPMO = (8/30) x 1,000,000 = 266,667
  • When using this method to evaluate multiple opportunity variable data, convert the calculated DPMO into defects and opportunities for each variable, then sum them to get total defects and opportunities. For eg. If one step in a process has a DPMO of 50,000 and another step has a DPMO of 100,000, there are 150,000 total defects for 2 million opportunities or 75,000 DPMO overall.
  1. Total opportunities: T0 = TOP = U x O
  2. Defects per unit: DPU = also = D/U= -ln (Y)
  3. Defects per normalized unit: = -In (Ynorm)
  4. Defects per unit opportunity= DPO = DPU/O=D/(Ux0)
  5. Defects per million opportunities: DPMO = DPO x 106

for example  a matrix chart indicates the following information for 100 production units. Determine DPU. Assume that each unit in had 6 opportunities for a defect (i.e characteristics A, B, C, D, E, and F). Determine DPO and DPMO.
11One would expect to find an average of 0.47 defects per unit.
DPO = DPU/O=0.47/6 = 0.078333
DPMO = DPO x 106 = 78,333

Rolled Throughput Yield

Rolled Throughput Yield (RTY) is used to assess the true yield of a process that includes a hidden factory. A hidden factory adds no value to the customer and involves fixing things that weren’t done right the first time. RTY determines the probability of a product or service making it through a multistep process without being scrapped or ever reworked.

There are two methods to measure RTY:
Method 1 assesses defects per unit (dpu), when all that is known is the final number of units produced and the number of defects. Shown in the following diagram are six units, each containing five opportunities for a defect.1

Given that any one defect can cause a unit to be defective, it appears the yield of this process is 50%. This, however, is not the whole story. Assuming that defects are randomly distributed, the special form of the Poisson distribution formula
RTY = e-dpu
can be used to estimate the number of units with zero defects (i.e., the RTY). The previous figure showed eight defects over six units, resulting in 1.33 dpu. Entering this into our formula:
RTY = e-1.33
RTY = 0.264

According to this calculation, this process can expect an average of 26.4% defect-free units that have not been reworked (which is much different than the assumed 50%).

Method 2 determines throughput yield (Ytp), when the specific yields at each opportunity for a defect are known. If, on a unit, the yield at each opportunity for a defect is known (i.e., the five yields at each opportunity in the previous figure), then these yields can be multiplied together to determine the RTY. The yields at each opportunity for a defect are known as the throughput yields, which can be calculated as
Ytp = e-dpu
for that specific opportunity for a defect for attribute data, and
Ytp = 1- P(defect)
for variable data, where P(defect) is the probability of a defect based on the normal distribution. Shown in the following figure is one unit from the previous figure in which the associated Ytp’s at each opportunity were measured for many units.1

Multiplying these yields together results in the RTY:
RTY = Ytp1 x Ytp2 x Ytp3 x Ytp4 x Ytp5
RTY = 0.536 x 0.976 x 0.875 x 0.981 x 0.699
RTY = 0.314
According to this calculation, an average of 31.4% defect free
units that have not been reworked can be expected.

 Yield Relationships

Note, the Poisson equation is normally used to model defect occurrences. If there is a historic defect per unit (DPU) level for a process, the probability that an item contains X flaws (PX) is described mathematically by the equation:
1Where:    X is an integer greater or equal to 0
DPU is greater than 0
Note that 0! (zero factorial) = 1 by definition.

If one is interested in the probability of having a defect free unit (as most of us are), then X = 0 in the Poisson formula and the math is simplified:
P(0) = e-dpu
Therefore, the following common yield formulas follow:
Yield or first pass yield: Y = FPY = e-dpu .
Defects per unit: DPU = – ln (Y) (In means natural logarithm)1

Total defects per unit: TDPU = -ln (Ynorm)

For example the yield for  a process has a DPU of 0.47 is
Y = e-dpu = e-0.47 = 0.625 = 62.5%
For example the DPU for a process with a first pass yield of 0.625 is
DPU = -ln(Y) = -ln 0.625 = 0.47
Example: A process consists of 4 sequential steps: 1, 2, 3, and 4. The yield of each step is as follows: Y1 = 99%, Y, =98%, Y3 = 97%, Y4 = 96%. Determine the rolled throughput yield and the total defects per unit.
Yrt  = (0.99)(0.98)(0.97)(0.96) = 0.90345 = 90.345%
TDPU = -ln(RTY) = —ln 0.90345 = 0.1015

Rolled throughput yield is defined as the cumulative calculation of yield or defects through multiple process steps. The determination of the rolled throughput yield (RTY) can help a team focus on serious improvements.

  • Calculate the yield for each step and the resulting RTY
  • The RTY for a process will be the baseline metric
  • Revisit the project scope
  • Significant yield differences can suggest improvement opportunities

Sigma Relationships

Probability of a defect = P(d)
P(d)=1-Y or 1 – FPY
also P(d) = 1 – Yrt (for a series of operations)

P(d) can be looked up in a Z table (using the table in reverse to determine Z).

The Z value determined  is called Z long-term or Z equivalent.
Z short-term is defined as: Zst = Zlt + 1.5 shift

For example  the Z short-term for   Z long-term = 1.645,  is
Zst = Zlt +1.5 =1.645 +1.5 = 3.145

Schmidt and Launsby  report that the 6 sigma quality level (with the 1.5 sigma shift) can be approximated by:
6 Sigma Quality Level = 0.8406 + SQRT(29.37 – 2.221 x In (ppm))

Example: If a process were producing 80 defectives/million, what would be the 6 sigma quality level?
6σ = 0.8406 + SQRT(29.37 – 2.221 x In (80))
6σ = 0.8406 + SQRT(29.37 – 2.221 (4.3820))
6σ= 0.8406 + 4.4314 = 5.272 (about 5.3)

Back to Home Page

If you need assistance or have any doubt and need to ask any question  contact us at: . You can also contribute to this discussion and we shall be very happy to publish them in this blog. Your comment and suggestion is also welcome.

One-piece flow

One-Piece Flow is a fundamental element of becoming lean. To think of processing one unit at a time usually sends a shudder through the organization which has batch manufacturing as its lifeblood. The word “one” does not necessarily have a literal meaning. It should be related to the customers’ requirements and could be one unit of order. However, what it does mean is that the organisation should only process what the customer wants, in the quantity he wants and when he wants it.

One-piece flow (also commonly referred to as continuous flow manufacturing) is a technique used to manufacture components in a cellular environment. The cell is an area where everything that is needed to process the part is within easy reach, and no part is allowed to go to the next operation until the previous operation has been completed. The goals of one-piece flow are: to make one part at a time correctly all the time to achieve this without unplanned interruptions to achieve this without lengthy queue times. One-piece flow describes the sequence of product or of transactional activities through a process one unit at a time. In contrast, batch processing creates a large number of products or works on a large number of transactions at one time – sending them together as a group through each operational step. One-piece flow focuses on employees’ efforts on the manufacturing process itself rather than on waiting, transporting products, and storing inventory. It also makes the production process flow smoothly, one piece at a time, creating a steady workload for all employees involved. One-piece flow methods need short changeover times and are conducive to a pull system.

There are many advantages to incorporating the one-piece flow method into your work processes. These include the following:

  • It reduces the time that elapses between a customer order and shipment of the finished product.
  • It prevents the wait times and production delays that can occur during batch processing.
  • By reducing excess inventory, one-piece flow reduces the labour, energy, and space that employees must devote to storing and transporting large lots or batches.
  • It reduces the damage that can occur to product units during batch processing.
  • It reveals any defects or problems in product units early in the production process.
  • It gives your organization the flexibility to meet customer demands for a specific product at a specific time.
  • It reduces your operating costs by making non-value-added work more evident. This enables you to eliminate waste.

Difference between a push system and a pull system

“Fat” organizations use a push system. In such a system, goods are produced and handed off to a downstream process, where they are stored until needed. This type of system creates excess inventory. Lean organizations, on the other hand, use a pull system, in which goods are built only when a downstream process requests them. The customer then “pulls” the product from the organization. The final operation in a production process drives a pull system. Customer-order information goes only to the product’s final assembly area. As a result, nothing is produced until it is needed or wanted downstream, so the organization produces only what is needed. A pull system streamlines the flow of materials through your production process. This greatly improves your organization’s productivity by doing the following:

  • It reduces the time that employees spend in nonvalue-added steps, such as waiting and transporting product units.
  • It reduces downtime caused by product changeovers and equipment adjustments.
  • It reduces the distances that materials or works in progress must travel between assembly steps.
  • It eliminates the need for inspection or reworking of materials.
  • It bases your equipment usage on your cycle time.

Achieving one-piece flow

While many are familiar with the terminology, there is still a significant amount of confusion regarding what one-piece flow means and, more importantly, how to achieve it. Let us begin by stepping back and attempting to understand the concept of “connected flow.” Achieving connected flow means implementing a means of connecting each process step within a value stream. In a typical MRP batch-and-queue manufacturing environment as illustrated below, parts move from functional area to functional area in batches, and each processing step or set of processing steps is controlled independently by a schedule.

There is little relationship between each manufacturing step and the steps immediately upstream or downstream. This results in:

  • Large amounts of scrap when a defect is found because of large batches of WIP,
  • Long manufacturing lead time,
  • Poor on-time delivery and/or lots of finished goods inventory to compensate,
  • Large amounts of WIP.

When we achieve connected flow, there is a relationship between processing steps: That relationship is either a pull system such as a supermarket or FIFO lane or a direct link (one-piece flow). As illustrated below, one-piece flow is the ideal method for creating connected flow because the product is moved from step to step with essentially no waiting (zero WIP).

The basic condition for achieving one-piece flow works best when your production process and products meet certain requirements.To be good candidates for one-piece flow, we must have the following conditions:

  • Processes must be able to consistently produce a good product. If there are many quality issues, one-piece flow is impossible.
  • Your product changeover times must be very short; almost instantaneous is best. One-piece flow is impractical when many time-consuming changeover operations are needed during the production process.
  • Another requirement is that the products you make must be suitable for one-piece flow. Very small product units are usually not suitable because too much time is required for their setup, positioning, and removal from production equipment. The one-piece flow might be possible for the production of very small product units if you can completely automate their movement through your production process and if your cycle time is short.
  • Process times must be repeatable as well. If there is much variation, one-piece flow is impossible.
  • Equipment must have very high (near 100 percent) uptime. Equipment must always be available to run. If equipment within a manufacturing cell is plagued with downtime, one-piece flow will be impossible.
  • Processes must be able to be scaled to tact time, or the rate of customer demand. For example, if tact time is 10 minutes, processes should be able to scale to run at one unit every 10 minutes.
    Without the above conditions in place, some other form of connecting flow must be used. This means that there will be a buffer of inventory typically in the form of a supermarket or FIFO lane between processes; the goal would be to eventually achieve one-piece flow (no buffer) by improving the processes. If a set of processes is determined to a candidate for one-piece flow, then the next step is to begin implementation of a one-piece flow cell.

Implementing one-piece flow

The number of units you produce should equal the number of items your customer’s order. In other words, your selling cycle time should equal your manufacturing cycle time.The first step in implementing a one-piece flow cell is to decide which products or product families will go into the cells, and determine the type of cell: Product-focused or mixed model. For product-focused cells to work correctly, demand needs to be high enough for an individual product. For mixed model cells to work, changeover times must be kept short; a general rule of thumb is that change over time must be less than one tact time. The next step is to calculate tact time for the set of products that will go into the cell. Tact time is a measure of customer demand expressed in units of time and is calculated as follows:
Tact time = Available work-time per shift / Customer demand per shift
Next, determine the work elements and time required for making one piece. In much detail, list each step and its associated time. Time each step separately several times and use the lowest repeatable time. Then, determine if the equipment to be used within the cell can meet tact time. Considerations here include changeover times, load and unload times and downtime. The next step is to create a lean layout. Using the principles of 5-S (eliminating those items that are not needed and locating all items/equipment/materials that are needed at their points of use in the proper sequence), design a layout. Space between processes within a one-piece flow cell must be limited to eliminate motion waste and to prevent unwanted WIP accumulation. U shaped cells are generally best; however, if this is impossible due to factory floor limitations, other shapes will do. For example, I have implemented S-shaped cells in areas where a large U shape is physically impossible. Finally, balance the cell and create standardized work for each operator within the cell. Determine how many operators are needed to meet tact time and then split the work between operators. Use the following equation:
Number of operators = Total work content / Tact time
In most cases, an “inconvenient” remainder term will result (e.g., you will end up with Number of Operators = 4.4 or 2.3 or 3.6 instead of 2.0, 3.0, or 4.0). If there is a remainder term, it may be necessary to kaizen the process and reduce the work content. Other possibilities include moving operations to the supplying process to balance the line.

One-Piece Flow in production

The following illustration shows the impact of batch size reduction when comparing batch and – queue and one-piece flow.

How we can see differences between these both flow systems is very enormous. One-piece flow system saved 18 minutes for to the same batch of 10 pieces. With this system can be produced rather 3 times more than a batch and queue system. Next, the first piece was in processes for only 3 minutes. It means that system or operator can check part immediately in every process (A, B and C). Batch and queue system allowed produce many parts after every process. If will be occurred failure in the system than will be detected too late and many parts will be damaged.

 Equipment for one-piece flow

To accommodate one-piece flow, equipment should be correctly sized to meet customer demand. Machines designed for batch production might not be easy to adapt to one-piece-flow cycle times. One-piece flow works best with machines that are smaller and somewhat slower than equipment that is suited for batch processing. Equipment used for one-piece flow also needs to be easy to set up quickly so that you can use it to produce a wide mix of products. Because the volume, capacity, and force requirements are often lower for one-piece-flow production, machines that are suited for it can be smaller. Smaller machines save space and leave little opportunity for waste, such as inventory and defective parts, to accumulate. They are also less expensive to purchase. Slower machines are often sufficient for one-piece flow because the aim is to produce goods according to the manufacturing cycle time. Automated and semi-automated machines work well in one-piece-flow production. They stop and give the operator a signal when a cycle is complete or if any problems occur. They are sometimes also capable of notifying the next operation when to begin processing. And they often unload automatically after processing is done. Synchronize your equipment’s production operations by delaying the start of faster operations rather than speeding up or slowing down the machines. Running production equipment outside of its specified range can reduce product quality or tool life.

To achieve a one-piece-flow method’s full potential, it is important to follow five points with regard to your work-cell layout and employee training. These points are outlined below.

  1. Simplify the flow of your materials and parts. Below are several guidelines to follow:
    • Keep all goods flowing in the same direction.
    • Make sure all parts flow from storage through the factory according to the processing sequence.
    • Use first-in, first-out, or FIFO stocking.
    • Arrange parts for easy feeding into the production line.
    • Eliminate any non-value-added space in your work cells.
    • Keep all pathways in work areas clear; leave aisles open along walls and windows.
    • Make sure that material input and production output are separate operations.
    • Position your equipment to allow easy maintenance access.
    • Make sure separate work processes are located as close together as possible.
  2. Set up your production lines to maximize the equipment operators’ productivity. Review the feasibility of both straight-line and U-shaped work cells and their impact on both operator movement and productivity and the flow of work materials. Remember that a U-shaped work cell brings the ending point of a work process close to the beginning point, which minimizes the distance an operator has to move before beginning a new production cycle. This setup is better for some work processes than a straight-line work cell.
  3. Allot space in the layout of your work cells for regular equipment and product inspection. Remember that the employees working in each cell must be able to easily conduct a full-lot inspection. Such inspections prevent defects by catching any errors and non-standard conditions. This ensures that only defect-free parts are fed to the next step in your production process.
  4. Minimize your in-process inventory. Predetermine the stock that employees will have on hand for the entire production line. Arrange your work cells to enable an easy flow of materials into and out of all work areas.
  5. When your equipment is arranged to enable a smooth process flow, equipment operators might need to learn how to run different types of equipment. Such operators usually need to work standing up, instead of sitting down, so they can easily run a number of machines in sequence. Keep this in mind when designing your work cells. Cross-train your employees so that they know how to perform different work functions. Equipment operators are then able to go to other work cells if production is not required at their normal work areas. This also enables an entire work team to take full responsibility for the production process.

Tools  to implement a one-piece-flow process

Three tools are necessary for assessing and planning for a one-piece-flow process:

  1. PQ analysis table
  2. Process route table
  3. Standard Operation
  4. Quick Changeover
  1. PQ analysis table

    A PQ analysis table is a tool that helps employees understand the types of products your organization produces and the volume that your customers demand. It also shows whether the majority of your production volume is made up of a small or wide variety of parts. The PQ analysis table enables employees to identify what products are suitable for one-piece-flow production. The P in PQ stands for products; the Q stands for the quantity of production output.
    Case example: Quick-Lite’s PQ analysis Quick-Lite conducts a PQ analysis of its spark-plug final-assembly part numbers to see if a wide or limited variety of spark plugs makes up most of the volume. They find that six spark plugs made up 53.3% of the total volume. The manufacturing processes for these six spark plugs are likely candidates for one-piece-flow operations.

    Once the Quick-Lite team identifies these products in a PQ analysis table, they create a process route table to determine whether a similar technology is used to manufacture all six types of spark plugs.

  2. A process route table

    A process route table shows the machines and equipment required for processing a component or completing an assembly process. Such a table helps you to arrange your equipment in production lines according to product type and to group related manufacturing tasks into work cells. You can also use a process route table to analyze process, function, or task-level activities. The steps for creating a process route table are as follows:
    1. Somewhere above the top of the table, write the following:
    a. The name or number of the department whose activity is being analyzed.
    b. The operation or product that is being analyzed.
    c. The name of the person completing the form.
    d. The date on which the form is completed.
    2. Use the “No.” column on the left for the sequential numbering of the products or operations being analyzed.
    3. For each product or operation you are analyzing, enter the item name, machine number, or function.
    4. For each product or operation, enter circled numbers in the various resource columns that correspond to the sequence in which the resources are used for that product or operation.
    5. Connect the circled numbers with lines or arrows to indicate the sequence of operations. Once you have completed the table, look for items or products that follow the same, or nearly the same, the sequence of the machine and/or resource usage. You might be able to group these machines and/or resources together in the same work cells to improve the efficiency of your operations.
    Once your work team a) collects all the data necessary for selecting the products that are suitable for one-piece flow, b) verifies the operations needed and the available capacity, and c) understands the specific task in detail, you can implement the layout of your improved work cells and make one-piece flow a reality in your organization.

  3. Standard Operations

    A work combination is a mixture of people, processes, materials, and technology that comes together to enable the completion of a work process. The term standard operations refer to the most efficient work combination that a company can put together. When you apply all your knowledge of lean principles to a particular work process to make it as efficient as possible, a standard operation is a result. Employees then use this documented process as a guide to consistently apply the tasks they must perform in that work process. In addition, once you prepare standard operations for your work processes, they serve as the basis for all your organization’s training, performance monitoring, and continuous improvement activities. A big part of making your organization a lean enterprise is identifying different types of waste and finding ways to eliminate them. Ultimately, however, it is the correct combination of people, processes, materials, and technology that enables your organization to create quality products and services at the lowest possible operational cost. Putting together standard operations forces you to break down each of your work processes into definable elements. This enables you to readily identify waste, develop solutions to problems, and provide all employees with guidance about the best way to get things done. Many organizations that have used standard operations report that this lean initiative is the one that has had the biggest impact on their ability to produce better-quality products and services, make their workflow smoother and make their training process more productive. In addition, standard operations enable employees to actually see the waste that they previously didn’t see. The process for developing standard operations involves eight steps.

    1. Establish improvement teams.
    2. Determine your takt time.
    3. Determine your cycle time.
    4. Determine your work sequence.
    5. Determine the standard quantity of your work in progress.
    6. Prepare a standard workflow diagram.
    7. Prepare a standard operations sheet.
    8. Continuously improve your standard operations.

    Step 1: Establish improvement teams

    Some organizations take a top-down approach to the development of standard operations: supervisors alone determine what work tasks are to be performed, by whom, and when. Other organizations believe that only front-line workers should develop standard operations because these employees have keen insight into how things are done. But due to the nature of the steps required to establish standard operations, a team-based approach is best. It is best to have all employees who are impacted by a work process involved in the development of standard operations for that process. Lean organizations understand the need for complete buy-in and support of all work tasks by all the employees involved. It’s also important to coordinate this team effort with your organization’s other lean initiatives.

    Step 2: Determine your takt time

    Takt time is the total available work time per day (or shift), divided by customer-demand requirements per day (or shift). Takt time enables your organization to balance the pace of its production outputs to match the rate of customer demand. The mathematical formula for determining your takt time is as follows:
    takt time = available daily production time/ required daily quantity of output

    Step 3: Determine your cycle time

    Cycle time is the time it takes to successfully complete the tasks required for a work process. It is important to note that a work process’s cycle time may or may not equal its takt time. A process capacity table is a helpful tool for gathering information about the sequence of operations that make up a work process and the time required to complete each operation. Ultimately, the process capacity table can help you determine machine and operator capacity. Complete a process capacity table before you begin making changes such as moving equipment, changing the sequence of your operations, or moving employees’ positions and/or changing their job responsibilities. It is important to first know what your current capacity is and what it will be in the new process configuration that you plan.

    Steps for Creating a Process Capacity Table

    1. Enter the line/cell name.
    2. Record the total work time per shift.
    3. Enter the number of shifts.
    4. Record the maximum output per shift.
    5. Enter the sequence number of each processing step being performed on the part or product.
    6. Record the operation description, which is the process being performed on the part or product.
    7. Enter the number (if applicable) of the machine performing the process.
    8. Record the walk time, the approximate time required between the end of one process and the beginning of the next process.
    9. Enter the manual time, the time an operator must take to manually operate a machine when an automatic cycle is not activated. The manual time includes the time required to unload a finished part from the machine; load a new, unfinished part; and restart the machine.
    10. Record the automated time, the time required for a machine’s automatic cycle to perform an operation, from the point when the start button is activated to the point when the finished part is ready to be unloaded.
    11. Calculate the total cycle time by adding the manual time and the
      automated time.
    12. Enter the pieces per change, the total number of parts or products that a machine typically produces before its tool bits must be changed due to wear.
    13. Record the change time, the amount of time required to physically change a machine’s tool bits or perform a sample inspection. This is the time required to change tooling due to normal wear during a production run— not the changeover time required to go from making one part or product to making another.
    14. Calculate the time per piece, the change time divided by the pieces per change.
    15. Enter the production capacity per shift (also known as the total capacity). This is the total number of units that can be produced during the available hours per shift or per day.
    16. Record the takt time for the work process in the Takt Time box, using the mathematical formula shown earlier in this chapter.
    17. Calculate the total capacity of the process by adding the time to finish the process and the time per piece.

    Step 4: Determine your work sequence

    A work sequence is a sequential order in which the tasks that make up a work process are performed. A work sequence provides employees with the correct order in which to perform their duties. This is especially important for multifunction operators who must perform tasks at various workstations within the takt time. A standard operations combination chart enables your improvement team to study the work sequence for all your organization’s work processes. In such a chart, each task is listed sequentially and broken down into manual, automated, wait, and walk times. Wait time is not included in a process capacity table because worker idle time has no impact on automated activities or the capacity of a process. However, wait time is included in a standard operations combination chart to identify idle time during which a worker could instead be performing other tasks, such as external setup, materials handling, or inspection. The goal is to eliminate all worker idle time.

    The steps for completing a standard operations combination chart are described below.

    1.  At the top of a form indicate the following:
      1. The date that the work process is being mapped.
      2. The number of pages (if the chart is more than one page long).
      3. The name of the equipment operator.
      4. The name of the person entering data on the form (if different from the operator).
      5. The number and/or name of the part or product being produced.
      6. The name of the process or activity is mapped.
      7. The machine number and/or name.
      8. The work cell number and/or name.
      9. The required output per designated period (e.g., parts per shift or pounds per day).
      10. The takt time for the process.
      11. The total capacity for the process. Ideally, this should equal the takt time that you calculated in step 2.
    2. The difference between the takt time and the cycle time for the work process.
    3. It is often helpful to indicate the type of units the work activity is usually measured. Activities are normally measured in seconds, but some are measured in minutes or even longer intervals.
    4. Number every fifth or tenth line on the graph area to facilitate your recording of activity times. Choose convenient time intervals so that either the takt time or the actual cycle time—whichever is greater—is located near the right side of the graph area.
    5. Draw a line that represents the activity’s takt time. Trace the line with red so it stands out.
    6. Sequentially number each operational step in the appropriate column. Steps can include any or all of the following:
      • Manual operations.
      • Automated operations.
      • Time spent walking from one location to another.
      • Time spent waiting.
    7. Provide a brief name and description for each step.
    8. Note the time required for the completion of each step in the appropriate column.
    9.  Draw a horizontal line on the graph representing each step, using the following guidelines:
      • The length of the line should equal the duration of the step.
      • The line type should match the action type (see the line key at the top of the sample chart).
      • Each line type should be in a different colour, which will make your chart much easier to read.
      • Each line you draw should begin at the point on the vertical timeline that corresponds to the actual time the activity begins. It should end at the actual time the activity ends.

    For example, if the first step of work activity is an automatic hopper fill that takes fifteen seconds to complete, and the operator assembles a carton for ten seconds during that fifteen seconds, both steps would start at time zero, with the carton assembly ending at time ten and the automatic fill ending at time fifteen. However, if the operator waits until the automatic hopper fill is completed before assembling the carton, the fill would start at time zero and end at time ten, but the carton assembly would start at time fifteen and end
    at time twenty-five. Your completed standard operations combination chart should provide you with some useful insights, including the following:

    • If the total time to complete the process or activity equals the red takt-timeline,  You already have an efficient work combination in place.
    • If the total time required to complete the process or activity falls short of the red takt-timeline, you might be able to add other operations to the activity to use your resources more effectively.
    • If the total time required to complete the process or activity is longer than the red takt-timeline, there is waste in your process.

    Use the following steps to identify where this waste occurs:
    1. Look over the steps in your process to see if any of them can be compressed or eliminated. Perhaps one or more steps can be completed during periods when the equipment operator is waiting for automated operations to be completed.
    2. Look at the movement of employees and materials. Can you reduce or eliminate any of it by relocating supplies or equipment?

    Step 5: Determine the standard quantity of your work in progress

    The standard quantity of your work in progress (WIP) is the minimum amount of WIP inventory that must be held at or between your work processes. Without having this quantity of completed work on hand, it is impossible to synchronize your work operations.
    When determining the best standard quantity of WIP you should have, consider the following points:

    • Try to keep the quantity as small as possible.
    • Ensure that the quantity you choose is suitable to cover the time required for your error-proofing and quality-assurance activities.
    • Make sure that the quantity enables all employees to easily and safely handle parts and materials between work operations.

    Step 6: Prepare a standard workflow diagram

    A workflow diagram shows your organization’s current equipment layout and the movement of materials and workers during work processes. Such a diagram helps your improvement team plan future improvements to your organization, such as one-piece flow. The information in your workflow diagram supplements the information in your process capacity table and standard operations combination chart. When combined, the data in these three charts serve as a good basis for developing your standard operations sheet. The steps for completing a workflow diagram are described below.

    1. At the top of the diagram, indicate the following:
      a. The beginning and endpoints of the activity you are mapping.
      b. The date the activity is being mapped. The name of the person completing the diagram should also be included.
      c. The name and/or a number of the part or product being produced.
    2. Sketch the work location for the work process you are mapping, showing all of the facilities directly involved with the process.
    3. Indicate the work sequence by numbering the facilities in the order in which they are used during the activity.
    4. Connect the facility numbers with solid arrows and number them, starting with 1 and continuing to the highest number needed. Use solid arrows to indicate the direction of the workflow.
    5. Using a dashed arrow, connect the highest-numbered facility to facility number 1. This arrow indicates a return to the beginning of the production cycle.
    6. Place a diamond symbol (✧) at each facility that requires a quality check.
    7. Place a cross symbol (✝) at each facility where safety precautions or checks are required. Pay particular attention to facilities that include rotating parts, blades, or pinch points.
    8. Place an asterisk (* ) at each location where it is normal to accumulate standard WIP inventory. Adjacent to the asterisk, indicate the magnitude of the inventory— measured in number, weight, volume, and so on.
    9. Also, enter the total magnitude of the inventory in the “Number of WIP Pieces” box.
    10. Enter the takt time for the operation in the “Takt Time” box. Calculate the takt time.
    11. Enter the time required to complete a single cycle of the activity in the “Cycle Time” box. Ideally, this time should equal the takt time.

    The workflow diagram provides a visual map of workspace organization, movement of materials and workers, and distances travelled—information not included in either the process capacity table or the standard operations combination chart. You can use this information to improve your workspace organization, re-sequence your work steps, and reposition your equipment, materials, and workers to shorten your cycle time and the overall travel distance. This will help you to achieve your takt time.

    Step 7: Prepare a standard operations sheet

    Numerous formats exist for standard operations sheets. In general, the layout for your sheet should include the components listed below:

    1. The header section should contain the following:
      • Process name
      • Part or product name
      • Takt time
      • Cycle time
      • Sign-offs
      • Approval date
      • Revision level
    2. The work sequence section should contain the following:
      • Sequence number
      • Description of task
      • Manual time
      • Automated time
      • Walk time
      • Inventory requirements
      • Key points
      • Safety precautions
      • Related job procedures
    3. The workflow diagram section should contain a pictorial representation of the work area.
    4. The footer section should contain the following:
      • Lean enterprise tools applied to the work process
      • Safety equipment required
      • Page indicator (for multiple-page standard operations sheets)

    Step 8: Continuously improve your standard operations

    After you complete your standard operations sheet, you should train all employees who are affected by your changes to the work process in question. Don’t be surprised if, during this training, employees discover potential opportunities for even greater improvement. It is through the continuous improvement of your standard operations that your organization can systematically drive out waste and reduce costs. You should review your organization’s standard operations sheet(s) on a periodic basis to ensure all employees are accurately complying with them.

  4. Quick Changeover

    Quick changeover is a method of analyzing your organization’s manufacturing processes and then reducing the materials, skilled resources, and time required for equipment setup, including the exchange of tools and dies.Using the quick-changeover method helps your production teams reduce downtime by improving the setup process for new product launches and product changeovers, as well as improving associated maintenance activities. There are many advantages to using the quick changeover method. These include the following:

    • Members of your team can respond to changes in product demand more quickly.
    • Machine capacity is increased, which allows for greater production capacity.
    • Manufacturing errors are reduced.
    • Changeovers are made more safely.
    • You can reduce your inventory (and its associated costs) because it is no longer needed for extended downtimes.
    • Once you can make changeovers according to an established procedure, you can train additional operators to perform these tasks, which increases the flexibility of your organization.
    • Lead times are shortened, improving your organization’s competitive position in the marketplace.

    You use the PDCA Cycle to make improvements to your setup and changeover processes. The procedure to implement Quick changeover involves the following steps:

    1. Evaluate your current processes. (Plan)

      a. Conduct an overview of your current production process to identify all equipment and processes that require downtime for changeover. Include all processes that require tooling replacement or new dies, patterns, moulds, paints, test equipment, filtration media, and so on.
      b. Collect data using a check sheet for each process. Make sure the check sheet includes information about the following:

      • Duration of the changeover. This is the time it takes from the start of the changeover process to its completion, including preparation and cleanup.
      • The amount of production typically lost during the changeover, including the number of units not produced, the number of hours that operators are not engaged in productive activities lost production time, and rework (measured in hours and units).
      • Process events that are constraint operations: these are operations that are long in duration or are critical to completing the manufacturing process.

      c. Create a matrix diagram to display this data for each production process (categories might include setup time, resources and materials required, and change over time).

      d. Select a process as your target for improvement. A good process to choose is one that has a long downtime, setup time, and/or change over time; is a frequent source of error or safety concerns, or is critical to process output. A constraint operation that requires a changeover during your production operations is often a good first target to select. Choose no more than three targets to work on at one time.

    2. Document all the current changeover activities for the process you have selected. (Plan)

      a. Make a checklist of all the parts and steps required in the current changeover, including the following:

      • Names
      • Specifications
      • Numeric values for all measurements and dimensions
      • Part numbers
      • Special settings

      b. Identify any waste or problems associated with your current changeover activities.

      c. Record the duration of each activity. See the sample data sheet below.

      d. Create a graph of your current change over time (in seconds) to establish a baseline for improvement.
      e. Set your improvement target. A target of a 50% reduction is recommended.

    3. Identify internal and external process activities. (Plan)

      a. Create two categories on your checklist: one for internal processes, and one for external processes.
      b. List each task under the appropriate category, making sure to keep the tasks in the correct sequence.

    4. Turn as many internal processes as possible into external processes. (Plan)

      Using your checklist, complete the following steps:
      a. Identify the activities that employees currently perform while the line or process is idle that can be performed while it is still running.
      b. Identify ways to prepare in advance any operating conditions that must be in place while the line is running (e.g., preheating equipment).
      c. Standardize parts and tools that are required for the changeover process, including the following:

      • Dimensions.
      • Securing devices used.
      • Methods of locating and centring objects.
      • Methods of expelling and clamping objects.
    5. Streamline the process. (Plan)

      a. Use visual management techniques to organize your workplace.
      b. Consider ways to error-proof the process.
      c. Consider ways to eliminate unnecessary delays in your internal processes by doing the following:

      • Identifying the activities that can be done concurrently by multiple employees.
      • Using signals, such as buzzers or whistles, to cue operators.
      • Using one-turn, one-motion, or interlocking methods.

      d. Consider ways to eliminate unnecessary delays in your external processes by making improvements in the following:

      • Storage and transportation of parts and tools.
      • Automation methods.
      • Accessibility of resources.

      e. Create a new process map showing your proposed changes to the setup process.

    6. Test your proposed changes to the process. (Do)

      a. Consider the feasibility of each proposed change.
      b. Prepare and check all materials and tools required for the changeover. Make sure they are where they should be and that they are in good working order.
      c. Perform your revised setup activities for the parts and tools. Adjust settings, calibrate equipment, set checkpoints, and so on, as required.
      d. Perform a trial run of your proposed changes.
      e. Collect data on the duration of the setup time, and update your changeover improvement chart.

    7. Evaluate the results of your changes. (Check)

      Take a look at the results of the changes you have made. Did the results meet your target goal? If so, go on to step 8. If not, make adjustments or consider other ways in which you can streamline your changeover activities and make the process external.

    8. Implement your new quick-changeover process and continue to work to improve it. (Act)

      • Document the new procedures and train all involved employees on the new procedures.
      • Continue to collect data for continuous improvement of the changeover process.
      • Create a revised matrix diagram of the change processes and begin the quick changeover process again.

Cellular Manufacturing

Cellular Manufacturing is a method of producing similar products using cells, or groups of team members, workstations, or equipment, to facilitate operations by eliminating setup and unneeded costs between operations. Cells might be designed for a specific process, part, or a complete product. They are favourable for single-piece and one-touch production methods and in the office or the factory. Because of increased speed and the minimal handling of materials, cells can result in great cost and time savings and reduced inventory. Cellular design often uses group technology, which studies a large number of components and separates them into groups with like characteristics, sometimes with a computer’s help, and which requires the coding of classifications of parts and operation. The cellular design also uses families-of-parts processing, which groups components by shape and size to be manufactured by the same people, tools, and machines with little change to process or setup. Regardless of the cell design (straight line, u-shape, or other), the equipment in the cell is placed very near one another to save space and time. The handling of materials can be by hand, conveyor, or robot. A cell supervisory computer must be used to control movement between equipment pieces and the conveyor when robots or conveyors are used.

The Definition of a Cell

A cell is a combination of people, equipment, and workstations organized in the order of process flow, to manufacture all or part of a production unit. I make little distinction between a cell and what is sometimes called a flow line. However, the implication of a cell is that it:

  • Has one-piece, or a very small lot, flow
  • Is often used for a family of products
  • Has equipment that is right-sized and very specific for this cell
  • Is usually arranged in a C or U shape so the incoming raw materials and outgoing finished goods are easily monitored
  • Has cross-trained people for flexibility

Objectives of cellular manufacturing:

  • To shorten manufacturing lead times by reducing setup, work part handling, waiting times, and batch sizes.
  • To reduce Work in Process (WIP) inventory. Smaller batch sizes and shorter lead times reduce work-in-process.
  •  To improve quality. Accomplished by allowing each cell to specialize in producing a smaller number of different parts. This reduces process variability.􀂃
  • To simplify production scheduling. Instead of scheduling parts through a sequence of machines in a process-type shop layout, the system simply schedules the parts through the cell.
  •  To reduce setup times. Accomplished by using group tooling (cutting tools, jigs, and fixtures) that have been designed to process the part family rather than part tooling, which is designed
    for an individual part. This reduces the number of individual tools required as well as the time to change tooling between parts.

Steps to Implement Cell Manufacturing

After you’ve mapped your value streams, you are ready to set up continuous flow manufacturing cells. Most cells that have been set up in the past ten years do not have continuous flow; most changes to cells have been a layout change only. That is, machines were moved in a cellular arrangement and nothing more was changed. A change in layout alone does not create a continuous flow. This article will discuss seven steps to creating a continuous flow of manufacturing cells.

  1. Decide which products or product families will go into your cells, and determine the type of cell: Product-focused or Group Technology (mixed model). For product-focused cells to work correctly, demand needs to be high enough for an individual product. For mixed model or group technology cells to work, changeover times must be kept short.
  1. Calculate Takt Time.Takt time, often mistaken for cycle time, is not dependent on your productivity- it is a measure of customer demand expressed in units of time:

Takt Time = Available work-time per shift / Customer demand per shift

Ex: Work time/Shift = 27,600 seconds

Demand/Shift = 690 units

Takt Time = 27,600/690 = 40 sec.

The customer demands one unit every 40 seconds. What if your demand is unpredictable and relatively low volume? Typically, demand is unpredictable; however, aggregate demand (that is, the demand for a group of products that would run through a cell) is much more predictable. Takt time should generally not be adjusted more than monthly. Furthermore, holding finished goods inventory will help in handling fluctuating demand.

  1. Determine the work elements and time required for making one piece. In much detail, document all of the actual work that goes into making one unit. Time each element separately several times and use the lowest repeatable time. Do not include wasteful elements such as walking and waiting time.
  2. Determine if your equipment can meet takt time. Using a spreadsheet determine if each piece of equipment that will be required for the cell you are setting up is capable of meeting takt time.
  3. Create a lean layout. More than likely, you will have more than one person working in your cell (this depends on takt time); however, you should arrange the cell such that one person can do it. This will ensure that the least possible space is consumed. Less space translates to less walking, movement of parts, and waste. U-shaped cells are generally best; however, if this is impossible due to factory floor limitations, other shapes will do. For example, I have implemented S-shaped cells in areas where a large U-shape is physically impossible.
  4. Balance the cell. This involves determining how many operators are needed to meet takt time.

Number of Operators = Total Work content / Takt time

Ex.: Total work content: 49 minutes

Takt time: 12 minutes

Number of operators: 49/12 = 4.08 (4 operators)

If there is a remainder term, it may be necessary to kaizen the process and reduce the work content. Other possibilities include moving operations to the supplying process to balance the line. For example, one of my clients moved simple assembly operations from their assembly line to their injection moulding operation to reduce work content and balance the line.

  1. Determine how the work will be divided among the operators.There are several approaches. Some include:
  • Splitting the work evenly between operators
  • Having one operator perform all the elements to make a complete circuit of the cell in the direction of material flow
  • Reversing the above
  • Combinations of the above

After you’ve determined the above 7 elements, you will have gathered much of the necessary data required to begin drawing and laying out your continuous flow manufacturing cell.

Back to Home Page

If you need assistance or have any doubt and need to ask any question contact us at You can also contribute to this discussion and we shall be happy to publish them. Your comment and suggestion are also welcome.


Introduction to Kaizen.

Kaizen is a Japanese management strategy that means “change for the better” or “continuous slow improvement, a belief that all aspects of life should be constantly improved. It comes from the Japanese words “kai” means continuous or change and “zen” means improvement, better. The Japanese way encourages small improvements day after day, continuously. The key aspect of Kaizen is that it is an on-going, never-ending improvement process. It’s a soft and gradual method opposed to more usual western habits to scrap everything and start with new. In Japan where the concept originated, kaizen applies to all aspects of life, not just to the workplace. Kaizen is the word that was originally used to describe a key element of the Toyota Production System that means “making things the way they should be” according to the basic, sensible principles of profitable industrial engineering. It means creating an atmosphere of continuous improvement by changing your view, your method and your way of thinking to make something better. In use, Kaizen describes an environment where companies and individuals proactively work to improve the manufacturing process. The kaizen system is based on incremental innovation, where employees are encouraged to make small changes in their work area on an ongoing basis. The cumulative effect of all these little changes over time can be quite significant, especially if all of the employees within a company and its leaders are committed to this philosophy. Improvements are usually accomplished at little or no expense without sophisticated techniques or expensive equipment. Instead of sinking more money in buying machinery, Kaizen veers an organization towards paying attention to small but significant details. Managers are encouraged to improve the efficiency of existing infrastructure instead of investing in more of the same. Kaizen focuses on simplification by breaking down complex processes into their sub processes and then improving them. The driving force behind kaizen is dissatisfaction with the status quo, no matter how good the firm is perceived to be. Standing still will allow the competition to overtake and pass any complacent firm. The act of being creative to solve a problem or make an improvement not only educates people but also inspires to go further. The fundamental idea behind kaizen comes straight from the Deming’s PDCA cycle:

  • someone has an idea for doing the job better (Plan)
  • experiments will be conducted to investigate the idea (Do)
  • the results evaluated to determine if the idea produced the desired result (Check)
  • if so, the standard operating procedures will be changed (Act)

Kaizen is a system that involves every employee, from upper management to the cleaning crew. Everyone is encouraged to come up with small improvement suggestions on a regular basis. In the first stage, management should make every effort to help the workers provide suggestions, no matter how primitive, for the improvement of the worker’s job and the workshop. This will help the workers look at the way they are doing their jobs. In the second stage, management should stress employee education so that employees can provide better suggestions. To enable workers to provide better suggestions, they should be equipped to analyze problems and the environment. This requires education. Main subjects for suggestions are, in order of importance:

  • Improvement in one’s own work
  • Savings in energy, material, and other resources
  • Improvement in the working environment
  • Improvements in machines and processes
  • Improvements in tools
  • Improvements in office work
  • Improvements in product quality
  • Ideas for new products
  • Customer services and customers relations
  • Others

Kaizen is based on making changes anywhere improvements can be made. Western philosophy may be summarized as, “if it ain’t broke, don’t fix it.” The Kaizen philosophy is to “do it better, make it better, improve it even if it isn’t broken, because if we don’t, we can’t compete with those who do.” For example, Toyota is well-known as one of the leaders in using Kaizen. In 1999 at one U.S. plant, 7,000 Toyota employees submitted over 75,000 suggestions; out of them, 99% were implemented.

Philosophy of kaizen:

Kaizen is one of the most commonly used words in Japan. It is used, not only in the workplace, but in popular culture as well. Kaizen is a foundation on which companies are built. Kaizen is such a natural way for people in Japan to think that managers and workers often do not make a conscious effort to think “Kaizen.” They just think the way they think – and that way happens to be Kaizen! If you are aware of the Kaizen philosophy and strive to implement it, not a day should go by without some kind of improvement being made somewhere in the company. After WWII most Japanese companies had to start over. Everyday brought new challenges, and rising to those challenges resulted in progress. Simply staying in business required a step forward everyday, and this made Kaizen a way of life.

  1. Constant Improvement

    In any business, management creates standards that employees must follow to perform the job. In Japan, maintaining and improving standards is the main goal of management. If you improve standards, it means you then establish higher standards which you observe, maintain and then later try to improve upon. This is an unending process. If you do not maintain the standard, it is bound to slip back, giving it the “two steps forward, one step back” effect. Lasting improvement is achieved only when people work to higher standards. For this reason, maintenance and improvement go hand in-hand for Japanese managers. Generally speaking, the higher up the manager is, the more he should be concerned with improvement. At the bottom level, an unskilled laborer may spend the day simply following instructions. However as he becomes better at his job, he begins to think about ways to improve, or make his job easier. In doing this, he finds ways to make his work more efficient, thus adding to overall improvement within the company. The value of improvement is obvious. In business, whenever improvements are made, they are eventually going to lead to better quality and productivity. Improvement is a process. The process starts with recognizing a need, and the need becomes apparent when you recognize a problem. Kaizen puts an emphasis on problem-awareness and will lead you to the identification of problems.
    According to Bicheno, kaizen or CI can be classified in five different improvement types; passive incremental, passive breakthrough, enforced incremental, enforced breakthrough and blitz.

    1. Passive Incremental
      Passive Incremental improvements can be the suggestion scheme with or without rewards, and with or without team emphasis. A team based passive incremental improvement example is the quality circle. According to Bicheno non-acknowledgement and non-recognition have probably been the major reason for suggestions schemes producing poor results and being abandoned.

    2. Passive Breakthrough
      Passive Breakthroughs normally springs from traditional industrial engineering projects and work study projects, particularly if the initiative is left to the Industrial Engineering of work study department Bicheno. According to Bicheno passive breakthroughs have probably been the greatest source of productivity improvement over the past 100 years. It is described by Bicheno as being the classic improvement method by industrial engineering and stated that it has been around for many years.
    3. Enforced Incremental
      Enforced Incremental is driven waste elimination and thereby not only left to chance of operator initiative. Examples of drivers could be response analysis, line stop, inventory withdrawal, waste checklist and the stage 1, stage 2 cycle. It is about setting up a culture that drives improvement, which constantly opens up new opportunities for another improvement activity Bicheno.
    4. Enforced Breakthrough
      Enforced Breakthroughs can be industrial engineering activities, for example initiated by management or by crisis. It is driven by active value stream current and future state mapping which generally target the complete value stream and followed up by action review cycles and an action plan or master schedule Bicheno.
    5. Blitz
      Blitz or kaizen events are a combination of Enforced Incremental and Enforced Breakthrough. It is breakthrough because typical blitz events achieve between 25% and 70% improvements within either a week or within a month at most. On the other hand it is incremental because blitz events typically relates to small areas so it is typically more point kaizen (local area) than flow kaizen (full value stream). It is enforced because the expectations and opportunities are in place Bicheno. According to Bicheno blitz events are not necessarily continuous improvement if you see it as an isolated event. But blitz events should be repeated in the same area at regular intervals. Product change, priority change, people change and technology improvement.
  2. Problem Solving

    Where there are no problems, there is no potential for improvement. When you recognize that a problem exists, Kaizen is already working. The real issue is that the people who create the problem are often not directly inconvenienced by it, and thus tend to not be sensitive to the problem. In day-to-day management situations, the first instinct is to hide or ignore the problem rather than to correct it. This happens because a problem is …. well, a problem! By nature, nobody wants to be accused of having created a problem. However if you think positive, you can turn each problem into a valuable opportunity for improvement. So, according to Kaizen philosophy, when you identify a problem, you must solve that problem. Once you solve a problem, you, in essence, surpass a previously set standard. This results in the need to set a new, higher standard and is the basis for the Kaizen concept.

  3. Standardization

    If you don’t first set a standard, you can never improve upon that standard. There must be a precise standard of measurement for every worker, every machine, every process and even every manager. To follow the Kaizen strategy means to make constant efforts to improve upon a standard. For Kaizen, standards exist only to be surpassed by better standards. Kaizen is really based on constant upgrading and revision. Not everything in a process or work environment needs to be measurable and standardized. Sometimes, Japanese factories use a one-point standardization. Each worker performs many tasks, but only one of those tasks needs to be standardized. This one-point standard is often displayed in the workplace so that the worker is always mindful of it. After the standard is followed for a while, it becomes second nature to perform the task to meet the standard. At that point, another standard can be added. Standardization is a way of spreading the benefits of improvement throughout the organization. In a disciplined environment, everyone, including management, is mindful of those standards.

  4. The Suggestion System

    Kaizen covers every part of a business. From the tasks of laborers to the maintenance of machinery and facilities, Kaizen has a role to play. All improvements will eventually have a positive effect on systems and procedures. Many top Japanese executives believe that Kaizen is 50 percent of management’s job, and really, Kaizen is everybody’s job! It is important for management to understand the workers role in Kaizen, and to support it completely. One of the main vehicles for involving all employees in Kaizen is through the use of the suggestion system. The suggestion system does not always provide immediate economic payback, but is looked at as more of a morale booster. Morale can be improved through Kaizen activities because it gets everyone involved in solving problems. In many Japanese companies, the number of suggestions made by each worker is looked at as a reflection of the supervisor’s Kaizen efforts. It is a goal of managers and supervisors to come up with ways to help generate more suggestions by the workers. Management is willing to give recognition to employees for making efforts to improve, and they try to make this recognition visible. Often, the number of suggestions is posted individually on the wall of the workplace in order to encourage competition among workers and among groups. A typical Japanese plant has a space reserved in the corner of each workshop for publicizing activities going on in the workplace. Some of the space might be reserved for signs indicating the number of suggestions made by workers or groups, or even post the actual suggestion. Another example would be to display a tool that has been improved as a result of a worker’s suggestion. By displaying these sorts of improvements, workers in other work areas can adopt the same improvement ideas. Displaying goals, recognition and suggestions helps to improve communication and boost morale. Kaizen begins when the worker adopts a positive attitude toward changing and improving the way he works. Each suggestion leads to a revised standard, and since the new standard has been set by a workers own volition, he takes pride in the new standard and is willing to follow it. If, on the contrary, he is told to follow a standard imposed by management, he may not be as willing to follow it. Thus, through suggestions, employees can participate in Kaizen in the workplace and play an important role in upgrading standards. Japanese managers are more willing to go along with a change if it contributes to any of the following goals:

    • Making the job easier
    • Making the job more productive
    • Removing drudgery from the job
    •  Improving product quality
    • Removing nuisance from the job
    • Saving time and cost
    • Making the job safer
  5. Process-Oriented Thinking

    Another change you will notice with Kaizen is that it generates a process oriented way of thinking. This happens because processes must be improved before you get improved results. In addition to being process oriented, Kaizen is also people-oriented, since it is directed at people’s efforts.  In Japan, the process is considered to be just as important as the intended result.  A process-oriented manager should be people-oriented and have a reward system based on the following factors:

    • Discipline
    • Participation and involvement
    • Time management
    • Morale
    • Skill development
    • Communication
  6. Kaizen vs. Innovation

    Kaizen vs. innovation could be referred to as the gradualist-approach vs. the great-leap-forward approach. Japanese companies generally favor the gradualist approach and Western companies favor the great-leap approach, which is an approach epitomized by the term innovation. Innovation is characterized by major changes in the wake of technological breakthroughs, or the introduction of the latest management concepts or production techniques. Kaizen, on the other hand, is un-dramatic and subtle, and its results are seldom immediately visible. Kaizen is continuous while innovation is a one-shot phenomenon. Further, innovation is technology and money-oriented whereas Kaizen is people- oriented.Kaizen does not call for a large investment to implement it, but it does call for a great deal of continuous effort and commitment. To implement Kaizen, you need only simple, conventional techniques. Often, common sense is all that is needed. On the other hand, innovation usually requires highly sophisticated technology, as well as a huge investment. Often, innovation does not bring the staircase effect, however, because it lacks the Kaizen strategy to go along with it. Once a new system has been installed as a result of new innovation, it is subject to steady deterioration unless continuing efforts are made to first maintain it and then improve on it. There is no such thing as static or constant. The worst companies are those that do nothing but maintenance (no internal drive for Kaizen OR innovation). Improvement by definition is slow, gradual and often invisible with effects that are felt over the long run. In a slow-growth economy, Kaizen often has a better payoff than innovation does. For example: it’s difficult to increase sales by 10% but it’s not so difficult to cut manufacturing costs by 10%. Kaizen requires virtually everyone’s personal efforts and the knowledge that with that effort and time, improvements will be made. Management must make a conscious and continuous effort to support it. It requires a substantial management commitment of time and effort. Investing in Kaizen means investing people, not capital.

  7. Management Support of Kaizen

    If the benefits of Kaizen come gradually, and its effects are felt only on a long-term basis, it is obvious that Kaizen can thrive only under top management that has a genuine concern for the long-term health of the company. One of the major differences between Japanese and Western management styles is their time frames. Japanese management has a long-term perspective and Western managers tend to look for shorter-term results. Unless top management is determined to introduce Kaizen as a top priority, any effort to introduce Kaizen to the company will be short lived. Kaizen starts with the identification of problems. In the Western hire-and -fire environment, identification of a problem often means a negative performance review and may even carry the risk of dismissal. Superiors are busy finding fault with subordinates, and subordinates are busy covering up problems. Changing the corporate culture to accommodate and foster Kaizen – to encourage everybody to admit problems and to work out plans for their solution – will require sweeping changes in personnel practices and the way people work with each other. Kaizen’s introduction and direction must be top-down, but the suggestions for Kaizen should be bottom up, since the best suggestions for improvement usually come from those closest to the problem. Western Management will be required to introduce process-oriented criteria at every level, which will necessitate company-wide retraining programs as well as restructuring of the planning and control systems. The benefits of Kaizen are obvious to those who have introduced it. Kaizen leads to improved quality and greater productivity. Where Kaizen is introduced for the first time, management may easily see productivity increase by 30 percent, 50 percent and even 100 percent and more, all without any major capital investments. Kaizen helps lower the breakeven point. It helps management to become more attentive to customer needs and build a system that takes customer requirements into account. The Kaizen strategy strives to give undivided attention to both process and result. It is the effort that counts when we are talking about process improvement, and management should develop a system that rewards the efforts of both workers and managers, and not just the recognition of results. Kaizen does not replace or preclude innovation. Rather, the two are complementary. Ideally, innovation should take off after Kaizen has been exhausted, and Kaizen should follow as soon as innovation is initiated. Kaizen and innovation are inseparable ingredients in progress. The Kaizen concept is valid not only in Japan, but in other countries. All people have an instinctive desire to improve themselves. Although it is true that cultural factors affect an individual’s behavior, it is also true that the individual’s behavior can be measured and affected through a series of factors or processes. Thus, it is always possible regardless of the culture, to break behavior down into processes and to establish control points and check points. This is why such management tools and decision-making and problem solving have a universal validity.

Kaizen -The three pillars

According to M. Imai, a guru in these management philosophies and practices , the three pillars of kaizen are summarized as follows:

  1. Housekeeping
  2. Waste elimination
  3. Standardization

and as he states , the management and employees must work together to fulfill the requirements for each category. Tο be ensured success on activities on those three pillars three factors have also to be taken account .

  1. Visual management,
  2. The role of the supervisor,
  3. The importance of training and creating a learning organization.

More analytically on each one pillar of Kaizen:

  1. Housekeeping

    This is a process of managing the work place ,known as ‘’Gemba’’ (workplace ) in Japanese, for improvement purposes .Imai introduced the word ’’Gemba ‘’, which means ‘’real place’’, where value is added to the products or services before passing them to next process where they are formed.
    For proper housekeeping a valuable tool or methodology is used , the 5S methodology. Then term “Five S” is derived from the first letters of Japanese words referred to five practices leading to a clean and manageable work area: seiri (organization), seiton (tidiness), seiso (purity), seiketsu (cleanliness), and shitsuke (discipline). The English words equivalent of the 5S’s are sort, straighten, sweep, sanitize, and sustain. 5S evaluations provide measurable insight into the orderliness of a work area and there are checklists for manufacturing and nonmanufacturing areas that cover an array of criteria as i.e. cleanliness, safety, and ergonomics. Five S evaluation contributes to how employees feel about product, company, and their selves and today it has become essential for any company, engaged in manufacturing, to practice the 5S’s in order to be recognized as a manufacturer of world-class status

    1. Seiri: SORT what is not needed. Use the red tag system of tagging items considered not needed, then give everyone a chance to indicate if the items really are needed. Any red tagged item for which no one identifies a need is eliminated (sell to employee, sell to scrap dealer, give away, put into trash.
    2. Seiton: STRAIGHTEN what must be kept. Make things visible. Put tools on peg board and outline the tool so its location can be readily identified. Apply the saying “a place for everything, and everything a place’’.
    3. Seiso: SCRUB everything that remains. Clean and paint to provide a pleasing appearance.
    4. Seiketsu: SPREAD the clean/check routine. When others see the improvements in the Kaizen area, give them the training and the time to improve their work area.
    5. Shitsuke: STANDARDIZATION and self-discipline. Established a cleaning schedule. Use downtime to clean and straighten area.

    As some of the benefits of employees of practicing the five S could be referred to as follows:
    Creates cleanliness, sanitary, pleasant, and safe working environments; it revitalizes Gemba and greatly improves employee morale and motivation; it eliminates various kinds of waste by minimizing the need to search for tools, making the operators’ jobs easier, reducing physically strenuous work, and freeing up space; it creates a sense of belonging and love for the place of work for the employees

  2. Waste (Muda ) elimination.

    Muda in Japanese means waste. The resources at each process — people and machines — either add value or do not add value and therefore ,any non-value adding activity is classified as muda in Japan. Work is a series of value-adding activities, from raw materials ,ending to a final product. Muda is any non-value-added task. To give some examples ,there are presented here Muda in both manufacturing and office settings described below:

    Muda in Manufacturing

    • Shipping defective parts
    • Waiting for inspection
    • Walking and transporting parts
    • Overproduction
    • Excess inventory which hides

    Muda in Office

    • Passing on work that contains errors
    • Signature approvals, bureaucracy
    • Walking or routing documents
    • Copies, files, a lot of papers
    • Excess documentation

    The aim is to eliminate the seven types of waste caused by overproduction, waiting, transportation, unnecessary stock, over processing ,motion, and a defective part, and presented as following:

    1. Overproduction – Production more than production schedule
    2. Inventory – Too much material ahead of process hides problems
    3. Defects – Material and labor are wasted; capacity is lost at bottleneck
    4. Motion – Walking to get parts because of space taken by high WIP
    5. Processing – Protecting parts for transport to another process
    6. Waiting – Poor balance of work; operator attention time
    7. Transportation – Long moves; re-stacking; pick up/put down

    So muda (waste) elimination will cover the categories described as follows:

    1. Muda of overproduction. Overproduction may arises from fear of a machine’s failure, rejects, and employee absenteeism. Unfortunately, trying to get ahead of production can result in tremendous waste, consumption of raw materials before they are needed, wasteful input of manpower and utilities, additions of machinery, increased burdens in interest, additional space to store excess inventory, and added transportation and administrative costs.
    2. Muda of inventory. Final products, semi finished products, or part supplies kept in inventory do not add any value. Rather, they add cost of operations by occupying space, requiring additional equipment and facilities such as warehouses, forklifts, and computerized conveyor systems .Also the products deteriorate in quality and may even become obsolete overnight when market changes or competitors introduce a new product or customers change their taste and needs. Warehouses further require additional manpower for operation and administration. Excess items stay in inventory and gather dust (no value added), and their quality deteriorates over time. They are even at risk of damage through fire or disaster. Just-in-time (JIT) production system helps to solve this problem .
    3. Muda of defects (repair or rejects). Rejects, interrupt production and require rework and a great waste of resources and effort .Rejects will increase inspection work, require additional time to repair, require workers to always stand by to stop the machines, and increase of course paperwork.
    4. Muda of motion. Any motion of a persons not directly related to adding value is unproductive. Workers should avoid walking, lifting or carrying heavy objects that require great physical exertion because it is difficult, risky, and represents non-value added activities. Rearranging the workplace would eliminate unnecessary human movement and eliminate the requirement of another operator to lift the heavy objects. Analysis of operators’ or workers leg and hand motions in performing their work will help companies to understand what needs to be done.
    5. Muda of processing. There are many ways that muda can happen in processing. For example, failure to synchronize processes and bottlenecks create muda and can be eliminated by redesigning the assembly lines so, utilizing less input to produce the same output. Input here refers to resources, utilities, and materials. Output means items such as products, services, yield, and added value. Reduce the number of people on the line; the fewer line employees the better. Fewer employees will reduce potential mistakes, and thus create fewer quality problems. This does not mean that we need to dismiss our employees. There are many ways to use former line employees on Kaizen activities, i.e., on value-adding activities. When productivity goes up, costs will go down. In manufacturing, a longer production line requires more workers, more work-in-process and a longer lead-time. More workers also means a higher possibility of making mistakes, which leads to quality problems. More workers and a longer lead-time will also increase cost of operations. Machines that go down interrupts production. Unreliable machinery necessitates batch production, extra work-in-process, extra inventory, and extra repair efforts. A newly hired employee without proper training to handle the equipment can consequently delay operation, which may be just as costly as if the equipment were down. Eventually, quality will suffer and all these factors can increase operation costs.
    6. Muda of waiting. Muda of waiting occurs when the hands of the operator are idle; when an operator’s work is put on hold because of line imbalances, a lack of parts, or machine downtime; or when the operator is simply monitoring a machine as the machine performs a value-adding job. Watching the machine, and waiting for parts to arrive, are both muda and waste seconds and minutes. Lead time begins when the company pays for its raw materials and supplies, and ends when the company receives payment from customers for products sold. Thus, lead time represents the turnover of money. A shorter lead time means better use of resources, more flexibility in meeting customer needs, and a lower cost of operations. Muda elimination in this area presents a golden opportunity for Kaizen. There are many ways to cut lead time. This can be done through improving and speeding up feedback from customer orders, having closer communications with suppliers, and by streamlining and increasing the flexibility of Gemba operations . Another common type of muda in this category is time. Materials, products, information, and documentation sit in one place without adding value. On the production floor, temporary muda takes the form of inventory. In office work, it happens when documents or pieces of information sit on a desk or in trays or inside computer disks waiting to be analysed, or for a decision or a signature.
    7. Muda of transportation In workplace ,gemba, one notices all sorts of transport by such means as trucks, forklifts, and conveyors. Transportation is an essential part of operations, but moving materials or products adds no value. Even worse, damage often occurs during transport. To avoid muda, any process that is physically distant from the main line should be incorporated into the line as much as possible. Because eliminating muda costs nothing, muda elimination is one of the easiest ways for a company to improve its Gemba’s operations
  3. Standardization

    Standards are set by management, but they must be able to change when the environment changes. Companies can achieve dramatic improvement as reviewing the standards periodically, collecting and analysing data on defects, and encouraging teams to conduct problem-solving activities. Once the standards are in place and are being followed then if there are deviations, the workers know that there is a problem. Then employees will review the standards and either correct the deviation or advise management on changing and improving the standard. It is a never-ending process and is better explained and presented by the PDCA cycle(plan-do-check-act), known as Demming cycle , shown
    Pick a project (Pareto Principle)
    Gather data (Histogram and Control Charts)
    Find cause (Process Flow Diagram and Cause/Effect Diagram
    Pick likely causes (Pareto Principle and Scatter Diagrams)
    Try Solution (Cause/Effect , ‘’5W AND 1H ‘’ methodology : who, what, why, when, where, how)
    Implement solution
    Monitor results (Pareto, Histograms, and Control Charts)
    Standardize on new process (Write standards, Train, Foolproof, Quality-At-The-Source[QUATS])

    A successful PDCA cycle then is followed by the SDCA cycle where ‘S’ stands for standardization and maintenance of the new situation. So, PDCA stands for improvement and SDCA stands for maintenance .The two cycles are combined and presented on the following1
    Standardization process is a very important one that has few key features, presented below:

    • Represent the best, easiest, and safest way to do the job.
    • Offer the best way to preserve know-how and expertise.
    •  Provide a way to measure performance.
    • Show the relationship between cause and effect
    • Provide a basis for both maintenance and improvement
    • Provide objectives and indicate training goals
    • Provide a basis for training
    • Create a basis for auditing or diagnosis, and
    • Provide a means for preventing recurrence of errors and minimizing variability.

Types of Kaizen:

Types of Kaizen are based on the degree of problems or issues. If you do not know the degree of problem or issue, one may have a wrong approach in implementing Kaizen, and may take unnecessary action and waste time. Let’s look at different types of Kaizen and how those are implemented.

  1. Small Kaizen

    Small Kaizen or simple, quick Kaizen is useful to solve small issues that exist in the workplace. Small Kaizen does not need many resources and time to improve the situation. Many small issues that exist in the workplace are often ignored as staffs are used to work in such an environment, and forget to recognize small problems/issues as “Problem”. Note that the hospitals practicing 5S very well and sustain their 5S activities are often unknowingly practicing small Kaizen. One of the effective ways of practicing small Kaizen is using “Kaizen suggestion board.” Kaizen topics are usually discussed among Work Improvement Team (WIT) members.


    KAIZEN activity starts from sensing and realization of small issues/ problems in your work place. It is recommended to keep “Kaizen Memo” as a record of small Kaizen activities. Record about problems, countermeasures taken and improvement achieved together with pictures.

  2. Large Kaizen

    Large Kaizen approach is applied to solve complicated problems that need inputs and some other resources. Large Kaizen requires adequate time to analyze the problem carefully to solve problems and prevent recurrences. One cycle of large Kaizen is usually 6 months as shown in Diagram.

1Time spent for each step is dependent on data collection methods, number of countermeasures to implement, and monitoring of progress.

Kaizen Events

Montabon definition of a kaizen event: “Kaizen events are essentially well structured, multi-day problem solving sessions involving a cross-functional team, who is empowered to use experimentation as they see fit to derive a solution”.
Van et al’s definition of a kaizen event: “A kaizen event is a focused and structured improvement project, using a dedicated cross-functional team to improve a targeted work area, with specific goals, in an accelerated timeframe”. 
First and foremost CI, lean and kaizen events are performed by organizations with groups and individual people so it is important to categorize the different way of working in order to find the best work approach according to improving synergy levels. The framework of Kaizen is based on four areas; plan, implement, sustain and support. Furthermore it is constructed so it can be self assessed, in order to improve specific topics and in order to improve itself. The article of Van et al concludes that; “Use of the framework as a design and assessment tool appeared to make the kaizen events program more effective in the case study organization”.