Commonly Used Distribution  in Six Sigma

Some of the Commonly Used Distribution  used in Six Sigma are

1. Normal distribution:

The normal distribution has numerous applications. It is useful when it is equally likely that readings will fall above or below the average. When a sample of several random measurements is averaged, the distribution of such repeated sample averages tends to be normally distributed regardless of the distribution of the measurements being averaged. Mathematically, if


the distribution of (X) bar becomes normal as n increases. If the set of samples being averaged have the same mean and variance, then the mean of the (X) bar is equal to the mean (µ) of the individual measurements, and the variance of the(X) bar is:


Where σx2, is the variance of the individual variables being averaged. The tendency of sums and averages of independent observations, from populations with finite variances, to become normally distributed as the number of variables being summed or averaged becomes large is known as the central limit theorem. For distributions with little skewness, summing or averaging as few as 3 or 4 variables will result in a normal distribution. For highly skewed distributions, more than 30 variables may have to be summed or averaged to obtain a normal distribution. The normal probability density function is:


The density function shown  is the standard normal probability density  function. The normal probability density function is not skewed The standard normal probability density function has a mean of O and a standard deviation of 1. The normal probability density function cannot be integrated implicitly. Because of this, a transformation to the standard normal distribution is made, and the normal cumulative distribution function or reliability function is read from a table.  If x is a normal random variable, it can be transformed to standard normal using the expression:


Example  A battery is produced with an average voltage of 60 and a standard deviation of 4 volts. If 9 batteries are selected at random, what is the probability that the total voltage of the 9 batteries is greater than 530? What is the probability that 1 the average voltage of the 9 batteries is less than 62?
Solution Part A: The expected total voltage for nine batteries is 540. The expected standard deviation of the voltage of the total of nine batteries is:

STOTAL2 = 9 x (4)2=144                                                           STOTAL=12

Transforming to standard normal: Z = (530-540) /12= -0.833

From the standard normal table, the area to the right of z is 0.7976.

Solution Part B: The expected value is 60. The standard deviation is :


From table P = 0.0668, The area to the left of z is 1 – 0.0668 = 0.9332

The probability density function of the voltage of the individual batteries and of the average of nine batteries is shown in Figure. The distribution of the averages has less variance because the standard deviation of the averages is equal to the standard deviation of the individuals divided by the square root of the sample size.


When all special causes of variation are eliminated, many variable data processes, when sampled and plotted, produce a bell-shaped distribution. If the base of the histogram is divided into six (6) equal lengths (three on each side of the average), the amount of data in each interval exhibits the following percentages:

The area outside of specification for a normal curve can be determined by a Z value.


The Z transformation formula is:


Where: x = data value (the value of concern)
µ = mean
σ = standard deviation

This transformation will convert the original values to the number of standard deviations away from the mean. The result allows one to use a single standard normal table to describe areas under the curve (probability of occurrence).

There are several ways to display the normal (standardized) distribution:

 1. As a number under the curve up to the Z value:


2. As a number beyond the Z value:


3. As a number under the curve and at a distance from the mean:


2. Binomial Distribution

A binomial distribution is useful when there are only two results in a random experiment (e.g., pass or failure, compliance or noncompliance, yes or no, present or absent). The tool is frequently applicable to attribute data. Altering the first scenario discussed under the normal distribution section to a binomial distribution scenario yields the following:
A dimension on a part is critical . This critical dimension is measured daily on a random sample of parts from a large production process . To expedite the inspection process, a tool is designed to either pass or fail a part that is tested. The output now is no longer continuous. The output is now binary (pass or fail for each part) ; hence, the binomial distribution can be used to develop an attribute sampling plan .
Other application examples are as follows :

  • Product either passes or fails test; determine the number of defective units.
  • Light bulbs work or do not work; determine the number of defective light bulbs.
  • People respond yes or no to a survey question; determine the proportion of people who answer yes to the question.
  • Purchase order forms are either filled out incorrectly or correctly; determine the number of transactional errors .
  • The appearance of a car door is acceptable or unacceptable ; determine the number of parts of unacceptable appearance .

The following binomial equation could be expressed using either of the following two expressions:

  • The probability of exactly x defects in n binomial trials with probability of defect equal to p is [see P(X = x) relationship]
  • For a random experiment of sample size n where there are two categories of events, the probability of success of the condition x in one category (where there is n – x in the other category) is

where (q = 1 – p) is the probability that the event will not occur. Also, the binomial coefficient gives the number of possible combinations respect to the number of occurrences, which equates to


From the binomial equation it is noted that the shape of a binomial distribution is dependent on the sample size (n) and the proportion of the population having a characteristic (p) (e.g., proportion of the population that is not in compliance). For an n of 8 and various p values (i.e., 0.1, 0.5, 0.7, and 0.9), the four binomial distributions and corresponding cumulative distributions  (for the probability of an occurrence P), is as:


Example: Consider now that the probability of having the number “2” appear exactly three times in seven rolls of a six-sided die is


we could calculate the probability of “2” occurring for other frequencies besides three out of seven. A summary of these probabilities (e.g., the probability of rolling a “2” one time is 0.390557) is as follows:

P(X =O) = 0.278301
P(X = 1) = 0.390557
P(X = 2) = 0.234897
P(X = 3) = 0.078487
P(X = 4) = 0.015735
P(X = 5) = 0.001893
P(X = 6) = 0.000126
P(X = 7) = 3.62E-06

The probabilities from this table sum to one. From this summary we note that the probability, for example, of rolling a “2” three, four, five, six, or seven times is 0.096245 (i.e., 0.078487 + 0.015735 + 0.001893 + 0.000126 + 3 .62256 x 10-6).

Example:  A part is said to be defective if a hole that is drilled into it is less or greater than specifications . A supplier claims a failure rate of 1 in 100. If this failure rate were true, the probability of observing one defective part in 10 samples is


The probability of the test having a defect is only 0.091 . This exercise has other implications. An organization might choose a sample of 10 to assess a criterion failure rate of 1 in 100. The effectiveness of this test is questionable because the failure rate of the population would need to be a lot larger than 1 / 100 for their to be a good chance of having a defective test sample . That is, the test sample size is not large enough to do an effective job.

 The binomial distribution mean and standard deviation, sigma, can be obtained from the following calculations when the event of interest is the count of defined occurrences in the population, e.g., the number of defectives or effectives.

The binomial mean = µ = np


3. Poisson Distribution

A random experiment of a discrete variable can have several events and the probability of each event is low. This random experiment can follow the Poisson distribution . The following two scenarios exemplify data that can follow a Poisson distribution .
There are a large number of dimensions on a part that are critical. Dimensions are measured on a random sample of parts from a large production process. The number of “out-of-specification conditions” are noted on each sample. This collective “number-of-failures” information from the samples can often be modeled using a Poisson distribution. A repairable system is known to have a constant failure rate as a function of usage (i.e., follows an HPP). In a test a number of systems are exercised and the number of failures are noted for the systems. The Poisson distribution can be used to design /analyze this test. Other application examples are estimating the number of cosmetic nonconformance when painting an automobile, projecting the number of industrial accidents for next year, and estimating the number of unpopped kernels in a batch of popcorn. The Poisson distribution is one of several distributions used to model discrete data and has numerous applications in industry. The Poisson distribution can be an approximation to the binomial when p is equal to or less than 0.1, and the sample size n is fairly larger


where e is a constant of 2.71828, x is the number of occurrences, and λ can equate to a sample size multiplied by the probability of occurrence (i.e., np).nP(X = 0) has application as a Six Sigma metric for yield, which equates to Y = P(X = 0) = e = e-D/U = e-DPU, where D is defects, U is unit, and DPU is defects per unit.


The probability of observing a or fewer events is


The Poisson distribution is dependent only on one parameter, the mean (µ) of the distribution. Figure below shows Poisson distributions (for the probability of an occurrence P) for the mean values of 1, 5, 8, and 10, and corresponding cumulative distributions.


EXAMPLE A company observed that over several years they had a mean manufacturing line shutdown rate of 0.10 per day. Assuming a Poisson distribution, determine the probability of two shutdowns occurring on the same day.
For the Poisson distribution, X. = 0.10 occurrences/day and x = 2 results in the probability P(X = 2) =eλx/x!=e-0.10.12/2!= 0.004524

The Poisson is used as a distribution for defect counts and can be used as an approximation to the binomial. For np<5, the binomial is better approximated by the Poisson than the normal distribution. When the normalized Poisson is used to model defects, the sample size should be large enough for the Poisson mean to have a value of at least 4 or 5. However, whatever the mean value, the cumulative Poisson distribution provides both the individual and cumulative terms. The Poisson distribution is used to model rates, such as rabbits per acre, defects per unit, or arrivals per hour. The Poisson distribution is closely related to the exponential distribution. If x is a Poisson distributed random variable, then 1/x is an exponential random variable. If x is an exponential random variable, then 1/x is a Poisson random variable. For a random variable to be Poisson distributed, the probability of an occurrence in an interval must be proportional to the length of the interval, and the number of occurrences per interval must be independent.

The Poisson distribution average and standard deviation can be obtained from the following calculations:


4. Chi Square Distribution

The chi-square distribution is an important sampling distribution. One application of this distribution is where the chi-square distribution  is used to determine the confidence interval for the standard deviation of a population. The chi square, t, and F distributions are formed from combinations of random variables. Because of this, they are generally not used to model physical phenomena, like time to fail, but are used to make decisions and construct confidence intervals. These three distributions are considered sampling distributions.
The chi square distribution is formed by summing the squares of standard normal random variables. For example, if z is a standard normal random variable, then:


is a chi square random variable (statistic) with n degrees of freedom. A chi square statistic is also created by summing two or more chi square statistics and dividing by the sum of the degrees of freedom. A distribution having this property is regenerative. The chi square distribution is a special case of the gamma distribution with a failure rate of 2, and degrees of freedom equal to 2 divided by the number of degrees of freedom for the corresponding chi square distribution. The chi square probability density function is:


Example: A chi square random variable has 7 degrees of freedom, what is the critical value if 5% of the area under the chi square probability density is desired in the right tail?
Solution : When hypothesis testing, this is commonly referred to as the critical value with 5% significance, or a = 0.05. From the chi square table, this value is 14.067.

5. F Distribution

If X is a chi square random variable with ϒ1 degrees of freedom, and Y is a chi square random variable with ϒ2 degrees of freedom, and if X and Y are independent, then:


is an F distribution with ϒ1 and ϒ2 degrees of freedom. The F distribution is used extensively to test for equality of variances from two normal populations.

The F probability density function is:


6. Student’s t Distribution

The student’s t distribution is formed by combining a standard normal random variable and a chi square random variable. If z is a standard normal random variable, and X2 is a chi square random variable with ϒ degrees of freedom, then a random variable with a t distribution is:


Like the normal distribution, when random variables are averaged, the distribution of the average tends to be normal, regardless of the distribution of the individuals. The t distribution is equivalent to the F distribution with 1 and ϒ degrees of freedom. The t distribution is commonly used for hypothesis testing and constructing  confidence intervals for means. It is used in place of the normal distribution when the standard deviation is unknown. The t distribution compensates for the error in the estimated standard deviation. If the sample size is large, n>100, the error in the estimated standard deviation is small, and the t distribution is approximately normal.
The t probability density function is:


Where ϒ is the degrees of freedom. The t probability density function is as shown


The mean and variance of the t distribution are:


From a random sample of n items, the probability that:


falls between any two specified values is equal to the area under the t probability density function between the corresponding values on the x-axis with n-1 degrees of freedom.


Example: The burst strength of 15 randomly selected seals is given below.
What is the probability that the burst strength of the population is greater than 500?
480    489    491    508    501
500    486    499    479    496
499    504    501    496    498
Solution: The mean of these 15 data points is 495.13. The sample standard deviation of these 15 data points is 8.467. The probability that the population mean is greater than 500 is equal to the area under the t probability density function, with 14 degrees of freedom, to the left of:


From the t table, the area under the t probability density function,
with 14 degrees of freedom, to the left of -2.227 is 0.0214. This value must be interpolated (2.227 falls between the 0.025 value of 2.145 and the 0.010 value of 2.624) but can be computed directly using electronic spreadsheets, or calculators. Simply stated, making an inference from the sample of 15 data points, there is a 2.14% possibility that the true population mean is greater than 500.

7. Bivariate Normal Distribution

The joint distribution of two variables is called a bivariate distribution. Bivariate distributions may be discrete or continuous. There may be total independence of the two independent variables, or there may be a covariance between them. The graphical representation of a bivariate distribution is a three dimensional plot,
with the x and y-axis representing the independent variables and the z-axis representing the frequency for discrete data or the probability for continuous data. A special case of the bivariate distribution is the bivariate normal distribution, in which there are two random variables. For this case, the bivariate normal density is

Where: µ1 and µ2 are the two means
σ1 and σ2 are the two variances and are each > 0
ρ is the correlation coefficient of the two random variables
The bivariate normal distribution surface is shown in Figure


Note that the maximum occurs at x1 =µ1 and x2 =µ2.

8. Exponential  Distribution

The exponential distribution applies to the useful life cycle of many products. The exponential distribution is used to model items with a constant failure rate. The exponential distribution is closely related to the Poisson distribution. If a random variable, x, is exponentially distributed, then the reciprocal of x, y = 1/x follows a Poisson distribution. Likewise, if x is Poisson distributed, then y = 1/x is exponentially distributed. Because of this behavior, the exponential distribution is usually used to model the mean time between occurrences, such as arrivals or I failures, and the Poisson distribution is used to model occurrences per interval, such as arrivals, failures, or defects.
Where: A is the failure rate and 6 is the mean
From the equation above, it can be seen that A = 1/9.

The following scenario exemplifies a situation that follows an exponential distribution :
A repairable system is known to have a constant failure rate as a function of usage. The time between failures will be distributed exponentially . The failures will have a rate of occurrence that is described by an HPP The Poisson distribution can be used to design a test where sampled systems are tested for the purpose of determining a confidence interval for the failure rate of the system.

The PDF for the exponential distribution is simply f (x) = (1/θ)e-x/θ
Also f (x) = λe-λx
From the above equation it can be seen  λ= 1/θ, where λ is the failure rate and θ is the mean

Integration of this equation yields the CDF for the exponential distribution F(x) = 1 – e-x/θ

The exponential distribution is only dependent on one parameter (θ), which is the mean of the distribution (i.e., mean time between failures). The instantaneous failure rate (i.e., hazard rate) of an exponential distribution is constant and equals 1 /θ. Figure below illustrates the characteristic shape of the PDF, and the corresponding shape for the CDF. The curves were generated for a θ value of 1000.


The variance of the exponential distribution is equal to the mean squared.
σ2= λ2= 1/θ2                               therefore       σ= λ= 1/θ

The exponential distribution is characterized by its hazard function which is constant. Because of this, the exponential distribution exhibits a lack of memory. That is, the probability of survival for a time interval, given survival to the beginning of the interval, is dependent only on the length of the interval.

9. Lognormal Distribution

If a data set is known to follow a lognormal distribution, transforming the data by taking a logarithm yields a data set that is approximately normally distributed.


The most common transformation is made by taking the natural logarithm, but any base logarithm, also yields an approximate normal distribution. When random variables are summed, as the sample size increases, the distribution of the sum becomes a normal distribution, regardless of the distribution of the individuals. Since lognormal random variables are transformed to normal random variables by taking the logarithm, when random variables are multiplied, as the sample size increases, the distribution of the product becomes a lognormal distribution regardless of the distribution of the individuals. This is because the logarithm of the product of several variables is equal to the sum of the logarithms of the individuals. This is shown below:

y = x1  x2  x3 .
ln y=ln x1+ln x2+ln x3

The standard lognormal probability density function is:


Where: µ is the location parameter or mean of the natural logarithms of the individual values
σ is the scale parameter or standard deviation of natural logarithms of the individual values.

The following scenario exemplifies a situation that can follow a lognormal distribution :
A nonrepairable device experiences failures through metal fatigue . Time of failure data from this source often follows the lognormal distribution .

The lognormal distribution exhibits many PDF shapes. This distribution is often useful in the analysis of economic, biological, life data (e.g., metal fatigue and electrical insulation life), and the repair times of equipment. The distribution can often be used to fit data that has a large range of values . The logarithm of data from this distribution is normally distributed; hence, with this transformation, data can be analyzed as if they came from a normal distribution. The lognormal distribution takes on several shapes depending on the value of the shape parameter. The lognormal distribution is skewed right, and the skewness increases as the value of  σ increases.


The mean of the lognormal distribution can be computed from its parameters:

mean = e(µ +σ2/2)

The variance of the lognormal distribution is:

variance = (e(2µ+σ2))(eσ2 -1)

Where µ and σ2 are the mean and variance of natural log values.

10. Weibull Distribution

The Weibull distribution is one of the most widely used distributions in reliability and  statistical applications. It is commonly used to model time to fail, time to repair, and material strength. There are two common versions of the Weibull distribution, the two parameter Weibull and the three parameter Weibull. The difference is the three parameter Weibull distribution has a location parameter when there is some non- zero time to first failure.
The three parameter Weibull probability density function is:


Where: β is the shape parameter
Θ is the scale parameter
δ is the  location parameter

The three parameter Weibull distribution can also be expressed as:


Where: β is the shape parameter
η is the scale parameter (determines the width of the distribution)
γ is the non-zero location parameter (the point below which there are no failures)

The following scenario exemplifies a situation that can follow a two-parameter Weibull distribution:
A nonrepairable device experiences failures through either early-life, intrinsic, or wear-out phenomena. Failure data of this type often follow the Weibull distribution.

The following scenario exemplifies a situation where a three-parameter Weibull distribution is applicable:

 A dimension on a part is critical . This critical dimension is measured daily on a random sample of parts from a large production process . Information is desired about the “tails” of the distribution . A plot of the measurements indicate that they follow a three-parameter Weibull distribution better than they follow a normal distribution .

The shape parameter is what gives the Weibull distribution its flexibility. By changing the value of the shape parameter, the Weibull distribution can model an, wide variety of data. If β = 1 the Weibull distribution is identical to the exponential distribution, if β = 2, the Weibull distribution is identical to the Rayleigh distribution; if β is between 3 and 4, the Weibull distribution approximates the normal distribution.
The Weibull distribution approximates the lognormal distribution for several values of β. For most populations, more than fifty samples are required to differentiate between the Weibull and lognormal distributions. The effect of the shape parameter on the Weibull distribution is  as shown, the Weibull distribution has shape
flexibility ; hence, this distribution can be used to describe many types of data.


The scale parameter determines the range of the distribution. The scale parameter is also known as the characteristic life if the location parameter is equal to zero. If δ does not equal zero, the characteristic life is equal to Θ+δ; 63.2% of all values fall below the characteristic life regardless of the value of the shape parameter.


The location parameter is used to define a failure-free zone. The probability of failure when x is less than δ is zero. When δ>0, there is a period when no failures can occur. When δ


The mean and variance of the Weibull distribution are computed using the gamma distribution. The mean of the Weibull distribution is equal to the characteristic life if the shape parameter is equal to one.
The mean of the Weibull distribution is:


The variance of the Weibull distribution is:


The variance of the Weibull distribution decreases as the value of the shape parameter increases.

11. Hypergeometric Distribution

The hypergeometric distribution is used to model discrete data. The hypergeometric distribution applies when the population size, N, is small compared to the sample size, or stated another way, when the sample, n, is a relatively large proportion of the population (n >0.1N). Sampling is done without replacement. The hypergeometric distribution is a complex combination calculation and is used when the defined occurrences are known or can be calculated. The number of successes, r, in the sample follows the hypergeometric function:


N = population size
n = sample size
d = number of occurrences in the population
N – d = number of non occurrences in the population
r = number of occurrences in the sample .
The term x is used instead of r in many texts.

The hypergeometric distribution is similar to the binomial distribution. Both are used to model the number of successes given a fixed number of trials and two possible outcomes on each trial. The difference is that the binomial distribution requires the probability of success to be the same for all trials, while the hypergeometric distribution does not. The hypergeometric distribution using different r values is shown


n = sample size
r = number of occurrences

d = occurrences in population
N = population size


The mean and the variance of the hypergeometric distribution are


Example  From a group of 20 products, 10 are selected at random for testing. What is the probability that the 10 selected contain the 5 best units?
N=20,  n=10, d=5, (N-d)=15 and r=5


Back to Home Page

If you need assistance or have any doubt and need to ask any question  contact at: You can also contribute to this discussion and we shall be very happy to publish them in this blog. Your comment and suggestion is also welcome.

Leave a Reply