Once facts or data have been classified and summarized, they must be interpreted, presented, or communicated in an efficient manner to drive data-based decisions. Statistical problem-solving methods are used to determine if processes are on target if the total variability is small compared to specifications and if the process is stable over time. Businesses can no longer afford to make decisions based on averages alone; process variations and their sources must be identified and eliminated.
Descriptive statistics are used to describe or summarize a specific collection of data (typical samples of data). Descriptive statistics encompass both numerical and graphical techniques, and are used to determine the:
- The central tendency of the data.
- Spread or dispersion of the data.
- Symmetry and skewness of the data.
Inferential statistics is the method of collecting samples of data and making inferences about population parameters from the sample data. Before reviewing basic statistics, the different types of data must be identified. The type of data that has been collected as process inputs (x’s) and/or outputs (y’s) will determine the type of statistics or analysis that can be performed.
Types of Data
Data is objective information that everyone can agree on. Measurability is important in collecting data. The three types of data are attribute data, variables data, and locational data. Of these three, attribute and variables data are more widely used.
Attribute data is discrete. This means that the data values can only be integers, for example, 3, 48, 1029. Counted data or attribute data are answers to questions like “how many”, “how often”, or “what kind.” Examples include:
- How many of the final products are defective?
- How many people are absent each day?
- How many days did it rain last month?
- What kind of performance was achieved?
Variables data is continuous. This means that the data values can be any real number, for example, 1.037, -4.69, 84.35. Measured data (variables data) are answers to questions like “how long,” “what volume,” “how much time” and “how far.” This data is generally measured with some instrument or device. Examples include:
- How long is each item?
- How long did it take to complete the task?
- What is the weight of the product?
Measured data is regarded as being better than counted data. It is more precise and contains more information. For example, one would certainly know much more about the climate of an area if they knew how much it rained each day rather than how many days trained. Collecting measured data is often difficult or expensive, so counted data must be used. In some situations, data will only occur as counted data. For example, a food producer may measure the performance of microwave popcorn by counting the number of unpropped kernels of corn in each bag tested. For information that can be obtained as either attribute or variables data, it is generally preferable to collect variables data.

The third type of data which does not fit into attribute data and variable data is known as locational data which simply answers the question “where.” Charts that utilize locational data are often called “measles charts” or concentration charts. Examples are a drawing showing locations of paint blemishes on an automobile or a map of Pune with sales and distribution offices indicated.

Another way to classify data is as discrete or continuous.
Continuous data:
- Has no boundaries between adjoining values.
- Includes most non-counting intervals and ratios(e.g., time).
Discrete data:
- Has clear boundaries.
- Includes nominals, counts, and rank-orders, (e.g., Monday vs. Friday, an electrical circuit with or without a short).

Conversion of Attributes Data to Variables Measures
Some data may only have discrete values, such as this part is good or bad, or I like or dislike the quality of this product. Since variables data provides more information than does attribute data, for a given sample size, it is desirable to use variables data whenever possible. When collecting data, there are opportunities for some types of data to be either attributes or variables. Instead of a good or bad part, the data can be stated as to how far out of tolerance or within tolerance it is. The like or dislike of product quality can be converted to a scale of how much do I like or dislike it.
Referring back to the Table above, two of the data examples could easily be presented as variables data: 10 scratches could be reported as the total scratch length of 8.37 inches, and 25 paint runs as 3.2 sq. in. surface area of paint runs. Consideration of the cost of collecting variables versus attributes data should also be given when choosing the method. Typically, the measuring instruments are more costly for performing variables measurements, and the cost to organize, analyze and store variables data is higher as well. A go/no-go ring gage can be used to quickly check outside diameter threads. To determine the actual pitch diameter is a slower and more costly process. Variables data requires storing of individual values and computations for the mean, standard deviation, and other estimates of the population. Attributes data requires minimal counts of each category and hence requires very little data storage space. For manual data collection, the required skill level of the technician is higher for variables data than for attribute data. Likewise, the cost of automated equipment for variables data is higher than for attributes data. The ultimate purpose for the data collection and the type of data are the most significant factors in the decision to collect attribute or variables data.

The table details the four Measurement scales in increasing order of statistical desirability.
Many of the interval measures may be useful for ratio data as well.

Examples of continuous data, discrete data, and measurement scales:
- Continuous data: A Wagon weighs 478.61 Kg
- Discrete data: Of a lot, 400 pieces failed
- Ordinal scale: Defects are categorized as critical, major A, major B, and minor
- Nominal scale: A print-out of all shipping codes for last week’s orders
- Ratio scale: The individual weights of a sample of widgets
- Interval scale: The temperatures of steel rods (°F) after one hour of cooling
Ensuring Data Accuracy and Integrity.
Bad data is not only costly to capture but corrupts the decision-making process. Some considerations include:
- Avoid emotional bias relative to targets or tolerances when counting, measuring, or recording digital or analogue displays.
- Avoid unnecessary rounding. Rounding often reduces measurement
sensitivity. Averages should be calculated to at least one more decimal position than individual readings. - If data occurs in time sequence, record the order of its capture.
- If an item characteristic changes over time, record the measurement or classification as soon as possible after its manufacture, as well as after a stabilization period.
- To apply statistics which assume a normal population, determine whether the expected dispersion of data can be represented by at least 8 to 10 resolution increments. If not, the default statistic may be the count of observations which do or do not meet specification criteria.
- Screen or filter data to detect and remove data entry errors such as digital transposition and magnitude shifts due to a misplaced decimal point.
- Avoid removal by hunch. Use objective statistical tests to identify outliers.
- Each important classification identification should be recorded along with the data. This information can include time, machine, auditor, operator, gage, lab, material, target, process change and conditions, etc.
It is important to select a sampling plan appropriate for the purpose of the use of the data. There are no standards as to which plan is to be used for data collection and analysis, therefore the analyst makes a decision based upon experience and the specific needs. There are many other sampling techniques that have been developed for specific needs.
Population vs. Sample
A population is every possible observation or census, but it is very rare to capture the entire population in data collection. Instead, samples, or subsets of populations as illustrated in the following figure, are captured. A statistic, by definition, is a number that describes a sample characteristic. Information from samples can be used to “infer” or approximate a population characteristic called a parameter.

-
Random Sampling
Sampling is often undertaken because of time and economic advantages. The use of a sampling plan requires randomness in sample selection. Obviously, true random sampling requires giving every part an equal chance of being selected for the sample. The sample must be representative of the lot and not just the product that is easy to obtain. Thus, the selection of samples requires some upfront thought and planning. Often, the emphasis is placed on the mechanics of sampling plan usage and not on sample identification and selection. Sampling without randomness ruins the effectiveness of any plan. The product to be sampled may take many forms: in a layer, on a conveyor, in sequential order, etc. The sampling sequence must be based on an independent random plan. The sample is determined by selecting an appropriate number from a hat or random number table.
-
Sequential Sampling
Sequential sampling plans are similar to multiple sampling plans except that sequential sampling can theoretically continue indefinitely. Usually, these plans are ended after the number inspected has exceeded three times the sample size of a corresponding single sampling plan. Sequential testing is used for costly or destructive testing with sample sizes of one and is based on a probability ratio test developed by Wald .
-
Stratified Sampling
One of the basic assumptions made in sampling is that the sample is randomly selected from a homogeneous lot. When sampling, the “lot” may not be homogeneous. For example, parts may have been produced on different lines, by different machines, or under different conditions. One product line may have well-maintained equipment, while another product line may be older or have poorly % maintained equipment. The concept behind stratified sampling is to attempt to select random samples from each group or process that is different from other similar groups or processes. The resulting mix of samples thus drawn can be biased if the proportion of the samples does not reflect the relative frequency of the groups. To the person using the sample data, the implication is that they must first be aware of the possibility of stratified groups and second, phrase the data report such that the observations are relevant only to the sample drawn and may not necessarily reflect the overall system.
Data Collection Methods
Collecting information is expensive. To ensure that the collected data is relevant to the problem, some prior thought must be given to what is expected. Manual data collection requires a data form. Check sheets, tally sheets, and checklists are data collection methods that are widely used. Other data collection methods include automatic measurement and data coding.
Some data collection guidelines are:
- Formulate a clear statement of the problem
- Define precisely what is to be measured
- List all the important characteristics to be measured
- Carefully select the right measurement technique
- Construct an uncomplicated data form
- Decide who will collect the data
- Arrange for an appropriate sampling method
- Decide who will analyze and interpret the results
- Decide who will report the results
Without an operational definition, most data is meaningless. Both attribute and variable specifications must be defined. Data collection includes both manual and automatic methods. Data collected manually may be done using printed forms or by data entry, at the time the measurements are taken. Manual systems are labor-intensive and subject to human errors in measuring and recording the correct values. Automatic data collection includes electronic chart recorders and digital storage. The data collection frequency may be synchronous, based on a set time interval, or asynchronous, based on events. Automatic systems have higher initial costs than manual systems and have the disadvantage of collecting both “good” and “erroneous” data. Advantages to using automatic data collection systems include high accuracy rates and the ability to operate unattended.
Automatic Measurement
Automatic sorting gages are widely used to sort parts by dimension. They are normally accurate within 0.0001″. When computers are used as part of an automated measurement process, there are several important issues. Most of these stem from the requirements of software quality engineering but have important consequences in terms of ensuring that automated procedures get answers at least as “correct” as those that arise from manual measurements. Computer-controlled measurement systems may offer distinct advantages over their human counterparts. (Examples include improved test quality, shorter inspection times, lower operating costs, automatic report generation, improved accuracy, and automatic calibration.) Automated measurement systems have the capacity and speed to be used in high-volume operations. Automated systems have the disadvantages of higher initial costs, and a lack of mobility and flexibility compared to humans. Automated systems may require technical malfunction diagnostics. When used properly, they can be a powerful tool to aid in the improvement of product quality. Applications for automatic measurement and digital vision systems are quite extensive. The following incomplete list is intended to show examples:
- Error proofing a process
- Avoiding human boredom and errors
- Sorting acceptable from defective parts
- Detecting flaws, surface defects, or foreign material
- Creating CAD drawings from an object
- Building prototypes by duplicating a model
- Making dimensional measurements
- Performing high-speed inspection of critical parameters
- Machining, using either laser or mechanical methods
- Marking and identifying parts
- Inspecting solder joints on circuit boards
- Verifying and inspecting the packaging
- Providing optical character and bar code recognition
- Identifying missing components
- Controlling motion
- Assembling components
- Verifying colour
Data Coding
The efficiency of data entry and analysis is frequently improved by data coding. Problems due to not coding include:
- Inspectors trying to squeeze too many digits into small blocks on a form
- Reduced throughput and increased errors by clerks at keyboards reading and entering large sequences of digits for a single observation
- insensitivity of analytic results due to rounding large sequences of digits
Coding by adding or subtracting a constant or by multiplying or dividing by a factor:
Let the subscript, lowercase c, represent a coded statistic; the absence of a subscript represents raw data; uppercase C indicates a constant, and lowercase f represents a factor. Then:

Coding by substitution:
Consider a dimensional inspection procedure in which the specification is nominal plus and minus 1.25″. The measurement resolution is 1/8 of an inch and inspectors, using a ruler, record plus and minus deviations from nominal. A typically recorded observation might be 32-3/8″ crammed in a space that was designed to accommodate three characters. The data can be coded as integers expressing the number of 1/8 inch increments deviating from nominal. The suggestion that check sheet blocks could be made larger could be countered by the objection that there would be fewer data points per page.
Coding by truncation of repetitive place values:
Measurements such as 0.55303, 0.55310, 0.55308, in which the digits 0.553 repeats in all observations, can be recorded as the last two digits expressed as integers. Depending on the objectives of the analysis, it may or may not be necessary to decode the measurements.
Probability
Most quality theories use statistics to make inferences about a population based on information contained in samples. The mechanism one uses to make these inferences is probability.
Conditions for Probability
The probability of any event, E, lies between 0 and 1. The sum of the probabilities of all possible events in a sample space, S, = 1.
Simple Events
An event that cannot be decomposed is simple, E. The set of all sample points for an experiment is called the sample space, S.
If an experiment is repeated a large number of times, N, and the event, E, is observed nE times, the probability of E is approximate:

For eg the probability of observing 3 on the toss of a single die is:

What is the probability of getting 1, 2, 3, 4, 5, or 6 by throwing a die?

Use of Venn (Circle) Diagrams
A Venn diagram or set diagram is a diagram that shows all possible logical relations between a finite collection of different sets. Venn diagrams were conceived around 1880 by John Venn. They are used to teach elementary set theory, as well as illustrate simple set relationships in probability, logic, statistics, linguistics, and computer science. On occasion, a circle diagram can help conceptualize the relationship between work elements in order to optimize work activities. Shown below is a hypothetical analysis of the workload for a shipping employee using a Venn (or circle) diagram.

A Venn (circle) diagram illustrates relationships between events. In this case, there is an overlap between packing and data entry, as well as packing and pulling stock. Making CDs is exclusive of other activities. If the sample space equals 1.0 or 100%, then one can determine both the busy time and idle time in an 8-hour shift.
Busy time = Packing + Data entry + Pulling stock + Making CDs – Overlap
= 0.30 + 0.20 + 0.25 + 0.10 – 0.06 – 0.04
= 0.85 – 0.10
= 0.75
In an 8-hour shift, there are 6.0 hours of activity. By the same logic, there are 2.0 idle hours. After deducting customary lunch and break times, one can consider whether additional duties can be assumed by this individual. Venn diagrams are normally used to explain probability theory. In the above diagram, making CDs and packing are mutually exclusive, but packing and pulling stock are not. The final calculation is reflected in ‘the additive law of probability
Compound Events
Compound events are formed by a composition of two or more events. They consist of more than one point in the sample space. For example, if two dice are tossed, what is the probability of getting an 8? A die and a coin are tossed. What is the probability of getting a 4 and tail? The two most important probability theorems are additive and multiplicative. For the following discussion, EA = A and EB = B.
l. Composition.
Consists of two possibilities -a union or intersection.
A. Union of A and B
If A and B are two events in a sample space, S, the union of A and B (A U B) contains all sample points in events A, B, or both.
Example: In the die toss E1,E2,E3,E4,E5 and E6 is probability of getting 1, 2, 3, 4, 5, or 6, consider the following:
If A = E1, E2 and E3 (numbers less than 4)
and B = E1, E3 and E5 (odd numbers),
then A U B = E1, E2, E3 and E5.
B. Intersection of A and B
If A and B are two events in a sample space, S, the intersection of A and B (A ∩ B) is composed of all sample points that are in both A and B.
From the above example, A ∩ B = E1 and E3

ll. Event Relationships.
There are three relationships involved in finding the probability of an event: complementary, conditional, and mutually exclusive.
A. Complement of an Event
The complement of event A is all sample points in the sample space, S, but not in A. The complement of A is 1-PA.
For Example, If PA (cloudy days) is 0.3, the complement of A would be 1 – PA = 0.7 (clear).

B. Conditional Probabilities
The conditional probability of event A occurring, given that event B, has occurred is:

For example, If event A (rain) = 0.2 and event B (cloudiness) = 0.3, what is the probability of rain on a cloudy day? (Note, it will not rain without clouds.)


Two events A and B are said to be independent if either:
P(A|B) = P(A) or P(B|A) = P(B)
However, P(A|B) = 0.67 and P(A) = 0.2= no equality, and
P(B|A) = 1.00 and P(B) = 0.3 = no equality.
Therefore, the events are said to be dependent.
C. Mutually Exclusive Events
If event A contains no sample points in common with event B, then they are said to be mutually exclusive.
For Example, Obtaining a 3 and a 2 on the toss of a single die is a mutually exclusive event. The probability of observing both events simultaneously is zero.
The probability of obtaining either a 3 or a 2 is:
PE2+PE3=1/6 + 1/6= 1/3
The Additive Law
- If the two events are not mutually exclusive:
P(A U B)=P(A)+P(B)-P(A|B)
Note that P (A U B) is shown in many texts as P (A + B) and is read as the probability of A or B.For Example, If one owns two cars and the probability of each car starting on a cold morning is 0.7, what is the probability of getting to work on his car?
- If the two events are not mutually exclusive:
P (A U B) = 0.7 + 0.7 – (0.7×0.7)
=1.4 – 0.49 =0.91 or 91%
- If the two events are mutually exclusive, the law reduces to:
P (A U B) = P(A) + P(B) also P (A + B) = P(A) + P(B)For Example, If the probability of finding a black sock in a dark room is 0.4 and the probability of finding a blue sock is 0.3, what is the chance of finding a blue or black sock?

P (A U B) = 0.4 + 0.3 = 0.7 or 70%
Note: The problem statements centre around the word “or”
Will car A or B start?
Will one get a black or blue sock?
For any two events, A and B, such that P(B) ≠ 0:
P(A|B) = P(A ∩ B)/ P(B) and P(A ∩ B) = P(A|B)P(B)
Note: The problem statements center around the word “and”
Descriptive Statistics
Descriptive statistics include measures of central tendency, measures of dispersion, probability density function, frequency distributions, and cumulative distribution functions.
Measures of Central Tendency
Measures of central tendency represent different ways of characterizing the central value of a collection of data. Three of these measures will be addressed here: mean, mode, and median.
The Mean X̅̅ (X -bar)
The mean is the total of all data values divided by the number of data points.

For Example The for the following 9 numbers, 5 3 7 9 8 5 4 5 8 is 6
The arithmetic mean is the most widely used measure of central tendency.
Advantages of using the mean:
- It is the centre of gravity of the data
- It uses all data
- No sorting is needed
Disadvantages of using the mean:
- Extreme data values may distort the picture
- It can be time-consuming
- The mean may not be the actual value of any data points,
The Mode
The mode is the most frequently occurring number in a data set.
For Eg . the mode of the following data set: 5 3 7 9 8 5 4 5 8 is: 5
Note: It is possible for groups of data to have more than one mode.
Advantages of using the mode:
- No calculations or sorting are necessary
- It is not influenced by extreme values A
- It is an actual value
- it can be detected visually in distribution plots
The disadvantage of using the mode:
- The data may not have a mode or may have more than one mode
The Median (Midpoint)
The median is the middle value when the data is arranged in ascending or descending order. For an even set of data, the median is the average of the middle two values.
Examples: Find the median of the following data set:
(10 Numbers) 2 2 2 3 4 6 7 7 8 9
(9 Numbers) 2 2 3 4 5 7 8 8 9
Answer: 5 for both examples
Advantages of using the median:
- Provides an idea of where most data is located
- Little calculation required
- insensitivity to extreme values
Disadvantages of using the median:
- The data must be sorted and arranged
- Extreme values may be important.
- Two medians cannot be averaged to obtain a combined distribution median
- The median will have more variation (between samples) than the average

Measures of Dispersion
Other than the central tendency, the other important parameter to describe a set of data is spread or dispersion. Three main measures of dispersion will be reviewed: range, variance, and standard deviation.
Range (R)
The range of a set of data is the difference between the largest and smallest values.
Example Find the range of the following data set: 5 3 7 9 3 5 4 5 3
Answer: 9 – 3 = 6
Variance ( σ2,s2)
The variance, σ2 or s2, is equal to the sum of the squared deviations from the mean, divided by the sample size. The formula for variance is:

The variance is equal to the standard deviation squared.
Standard Deviation (σ, s)
The standard deviation is the square root of the variance.

Alternatively

N is used for a population and n -1 for a sample to remove potential bias in relatively small samples (less than 30).
Example: Calculate the Standard Deviation of the following Data set using the formula


Coefficient of Variation (COV)
The coefficient of variation equals the standard deviation divided by the mean and is expressed as a percentage.

Probability Density Function
In probability theory, a probability density function (PDF), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. The probability of the random variable falling within a particular range of values is given by the integral of this variable’s density over that range—that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one.
Suppose a species of bacteria typically lives 4 to 6 hours. What is the probability that a bacterium lives exactly 5 hours? The answer is actually 0%. Lots of bacteria live for approximately 5 hours, but there is a negligible chance that any given bacterium dies at exactly 5.0000000000.. hours. Instead, we might ask: What is the probability that the bacterium dies between 5 hours and 5.01 hours? Let’s say the answer is 0.02 (i.e., 2%). Next: What is the probability that the bacterium dies between 5 hours and 5.001 hours? The answer is probably around 0.002 since this is 1/10th of the previous interval. The probability that the bacterium dies between 5 hours and 5.0001 hours is probably about 0.0002, and so on. The ratio (probability of dying during an interval) / (duration of the interval) is approximately constant and equal to 2 per hour (or 2 /hour). For example, there is 0.02 probability of dying in the 0.01-hour interval between 5 and 5.01 hours, and (0.02 probability / 0.01 hours) = 2 /hour. This quantity 2/ hour is called the probability density for dying at around 5 hours.
Therefore, in response to the question “What is the probability that the bacterium dies at 5 hours?”, a literally correct but unhelpful answer is “0”, but a better answer can be written as (2/ hour) dt. This is the probability that the bacterium dies within a small (infinitesimal) window of time around 5 hours, where it is the duration of this window. For example, the probability that it lives longer than 5 hours but shorter than (5 hours + 1 nanosecond), is (2/hour)×(1 nanosecond) ≃ 6×10−13 (using the unit conversion 3.6×1012 nanoseconds = 1 hour). There is a probability density function f with f(5 hours) = 2/ hour. The integral of f over any window of time (not only infinitesimal windows but also large windows) is the probability that the bacterium dies in that window.
The probability density function, f(x), describes the behaviour of a random variable. Typically, the probability density function is viewed as the “shape” of the distribution. It is normally a grouped frequency distribution. Consider the histogram for the length of a product shown in Figure below.

A histogram is an approximation of the distribution’s shape. The histogram shown appears symmetrical. It shows this histogram with a smooth curve overlaying the data. The smooth curve is the statistical model that describes the population; in this case, the normal distribution.
When using statistics, the smooth curve represents the population. The differences between the sample data represented by the histogram and the population data represented by the smooth curve are assumed to be due to sampling error. In reality, the difference could also be caused by a lack of randomness in the sample or an incorrect model. The probability density function is similar to the overlaid model. The area below the probability density function to the left of a given value, x, is equal to the probability of the random variable represented on the x-axis is less than the given value x. Since the probability density function represents the entire sample space, the area under the probability density function must equal one. Since negative probabilities are impossible, the probability density function, f(x), must be positive for all values of x.
Stating these two requirements mathematically for continuous distributions with f(x)≥ 0;

The figure below demonstrates how the probability density function is used to compute probabilities. The area of the shaded region represents the probability of a single product drawn randomly from the population having a length less than 185. This probability is 15.9% and can be determined by using the standard normal table.

Cumulative Distribution Function
The cumulative distribution function, F(x), denotes the area beneath the probability density function to the left of x.

The area of the shaded region of the probability density function is 0.2525 which corresponds to the cumulative distribution function at x = 190. Mathematically, the cumulative distribution function is equal to the integral of the probability density function to the left of x.

For Example, A random variable has the probability density function f(x) = 0.125x, where x is valid from 0 to 4. The probability of x being less than or equal to 2 is:

Properties of a Normal Distribution
A normal distribution can be described by its mean and standard deviation. The standard normal distribution is a special case of the normal distribution and has a mean of zero and a standard deviation of one. The tails of the distribution extend to ± infinity. The area under the curve represents 100% of the possible observations. The curve is symmetrical such that each side of the mean has the same shape and contains 50% of the total area. Theoretically, about 95% of the population is contained within ± 2 standard deviations.

If a data set is normally distributed, then the standard deviation and mean can be used to determine the percentage (or probability) of observations within a selected range. Any normally distributed scale can be transformed to its equivalent Z scale or score using the formula: Z= (x-μ)/σ
x will often represent a lower specification limit (LSL) or upper specification limit (USL). Z, the “sigma value,” is a measure of standard deviations from the mean. Any normal data distribution can be transformed to a standard normal curve using the Z transformation. The area under the curve is used to predict the probability of an event occurring.

Example: If the mean is 85 days and the standard deviation is five days, what would be the yield if the USL is 90 days?

A standard Z table is used to determine the area under the curve. The area under the curve represents probability.
Because the curve is symmetric, the area shown as yield would be 1-P(z>1) = 0.841 or 84.1%.
In accordance with the equation, Z can be calculated for any “point of interest,” x.
Variation
The following figure shows three normal distributions with the same mean. What differs between the distributions is the variation.

The first distribution displays less variation or dispersion about the mean. The second distribution displays more variation and would have a greater standard deviation. The third distribution displays even more variation.
Short-term vs. Long-term Variation
The duration over which data is collected will determine whether short-term or long-term variation has been captured within the subgroup.
There are two types of variation in every process:
common cause variation and special cause variation. Common cause variation is completely random (i.e., the next data point’s specific value cannot be predicted). It is the natural variation of the process. Special cause variation is the nonrandom variation in the process. It is the result of an event, an action, or a series of events or actions. The nature and causes of special cause variation are different for every process. Short-term data is data that is collected from the process in subgroups. Each subgroup is collected over a short length of time to capture common cause variation only (i.e., data is not collected across different shifts because variation can exist from operator to operator).
Thus, the subgroup consists of “like” things collected over a narrow time frame and is considered a “snapshot in time” of the process. For example, a process may use several raw materials lots per shift. A representative short-term sample may consist of CTQ measurements within one lot. Long-term data is considered to contain both special and common causes of variation that are typically observed when all of the input variables have varied over their full range. To continue with the same example, long-term data would consist of several raw material lots measured across several short-term samples.

Processes tend to exhibit more variation in the long term than in the short term. Long-term variability is made up of short-term variability and process drift. The shift from short term to long term can be quantified by taking both short-term and long-term samples.

On average, short-term process means tend to shift and drift by 1.5 sigmas.
Zlt = Zst – 1.5
(The short-term Z (Zst) is also known as the benchmark sigma value. A Six Sigma process would have six standard deviations between the mean and the closest specification limit for a short-term capability study. The following figure illustrates the Z-score relationship to the Six Sigma philosophy:

In a Six Sigma process, customer satisfaction and business objectives are robust to shifts caused by process or product variation.

Drawing Valid Statistical Conclusions
The objective of statistical inference is to draw conclusions about population characteristics based on the information contained in a sample. Statistical inference in a practical situation contains two elements: (1) the inference and (2) a measure of its validity. The steps involved in statistical inference are:
- Define the problem objective precisely
- Decide if the problem will be evaluated by a one-tail or two-tail test
- Formulate a null hypothesis and an alternate hypothesis
- Select a test distribution and a critical value of the test statistic reflecting the degree of uncertainty that can be tolerated (the alpha, α, risk)
- Calculate a test statistic value from the sample information
- Make an inference about the population by comparing the calculated value to the critical value. This step determines if the null hypothesis is to be rejected. If the null is rejected, the alternate must be accepted.
- Communicate the findings to interested parties
Every day, in our personal and professional lives, individuals are faced with decisions between choice A or choice B. In most situations, the relevant information is available, but it may be presented in a form that is difficult to digest. Quite often, the data seems inconsistent or contradictory. In these situations, an intuitive decision may be little more than an outright guess. While most people feel their intuitive powers are quite good, the fact is that decisions made on gut-feeling are often wrong.
Null Hypothesis and Alternative Hypothesis
The null hypothesis is the hypothesis to be tested. The null hypothesis directly stems from the problem statement and is denoted as Ho;
The alternate hypothesis must include all possibilities which are not included in the null hypothesis and is designated H1.
Examples of null and alternate hypothesis: :
Null hypothesis : Ho: Ya = Yb Ho: A ≤ B
Alternate hypothesis: Ho: Ya ≠ Yb Ho: A > B
A null hypothesis can only be rejected, or fail to be rejected, it cannot be accepted because of a lack of evidence to reject it.
Add Test Statistic
In order to test a null hypothesis, a test calculation must be made from sample information. This calculated value is called a test statistic and is compared to an appropriate critical value. A decision can then be made to reject or not reject the null hypothesis.
Types of Errors
When formulating a conclusion regarding a population-based on observations from a small sample, two types of errors are possible:
- Type I error: This error results when the null hypothesis is rejected when it is, in fact, true.
- Type II error: This error results when the null hypothesis is not rejected when it should be rejected.
The degree of risk (α) is normally chosen by the concerned parties (α is normally taken as 5%) in arriving at the critical value of the test statistic.
Enumerative (Descriptive) Studies
Enumerative data is data that can be counted. For example the classification of things, the classification of people into intervals of income, age, health. A census is an enumerative collection and study. Useful tools for tests of hypothesis conducted on enumerative data are the chi-square, binomial, and Poisson distributions. Deming, in 1975, defined contrast between enumeration and analysis:
- Enumerative study: A study in which action will be taken on the universe.
- An analytical study: A study in which action will be taken on a process to improve performance in the future.
Numerical descriptive measures create a mental picture of a set of data. The measures calculated from a sample are called statistics. When these measures describe a population, they are called parameters.

The table shows examples of statistics and parameters for the mean and standard deviation. These two important measures are called central tendency and dispersion.
Summary of Analytical and Enumerative Studies
Analytical studies start with the hypothesis statement made about population parameters. A sample statistic is then used to test the hypothesis and either reject or fail to reject the null hypothesis. At a stated level of confidence, one is then able to make inferences about the population.
If you need assistance or have any doubt and need to ask any question contact me at preteshbiswas@gmail.com. You can also contribute to this discussion and I shall be happy to publish them. Your comment and suggestion are also welcome.