Generative AI & Artificial General Intelligence (AGI)

Friday, February 09, 2007

Mean, Media, and Mode

All these three are central tendencies. They are central score among a set of scores. Mean is heavily influenced by extreme values hence is not suitable for measuring process performance.

[Mean is also called average; median is the middle value in a set of sorted data; mode is the value repeating most of the times]

An illustration representing the fallibility of Mean and merit of Median is given below:

These are the marks obtained by students in mathematics in a particular class.

Marks
95
45
34
67
78
99
87
89
67
56
45
65
65
67
87
84
96

Here, the mean is 66.65, and the median is 67. If the mathematics teacher is asked to improve the MEAN MARKS BY 20 (i.e. performance should be so enhanced that mean becomes 80), it would be quite an easy task. Since mean can be boosted by inflating the extreme values, the teacher might pick up the brightest students of the class (students who have already scored quite high), and improve their performance. For example, a student at 87% can be easily trained to perform at 100%. (while neglecting the weak students, as training them and expecting a good performance so as to boos the overall mean is a pretty time consuming task…and that too without a promise of success).

If on the other hand the teacher had been asked to raise the median by 20, then it would not have been easy. For the median to increase, the performance of at least half of the class needs to be improved. Half of the class has to score more than the set target.

In Customer satisfaction index, for example, it is better to focus on median than on mean.

So, median is a better representation of a set of data compared to mean.

Formula for median in Excel =median(a2:a12)

Formula for quartiles:

Q1 = Quartile(a2:12, 1)
Q2 = Quartile(a2:12, 2)
Q3 = Quartile(a2:12, 3)

Standard deviation gives a measure of dispersion (the extent to which values vary from the mean). Therefore standard deviation is a good measure for process performance rather than mean, median or mode.

2. Measure

This is the second phase in the DMAIC phases of Six Sigma. A measurement system is created, which helps in knowing Ys and identifying potential Xs for the Six Sigma initiative. A measurement system is established to ensure that the data collected for the six sigma project is accurate.

In Define Phase – the phase prior to Measure, the potential projects (problems/opportunities (Ys)) are identified. Approximations of the size of the six sigma project are taken to draft a schedule. In the Measure phase, the actual indicators are identified and the quantum of work is identified. This gives the correct estimate of the volume of work on hand, which helps in accurate estimations.

The data collected for the six sigma green belt project should have the following characteristics:

Accurate (Observed value should be equal to the actual value), no matter how many times the task is performed.
Repeatable (When a person performs the task twice, he should be able to yield the same results)
Reproducible: (When two persons measure the same item, the results should be identical.). An example of reproducibility is software estimates. No matter who does it, the estimates should be in close proximity to one another (i.e. they should not vary much).
Stable: The results should be stable over a period of time.

The roadmap for "Measuring" is as shown in the diagram above.

Note: that the first three steps could have been done in the Define phase itself. In the define phase, an approximate of Y’s volume is taken, while in the Measure phase, the actual volume of Y is calculated. If in the Define phase, only the approximate idea of size of Y is known, then the first three steps are required in Measure phase, otherwise not. On the other hand if you know the size in the Define phase itself, then the first three steps in the Measure phase can be avoided.

To summarize, we carry out the following under the Measure phase:

1. To select the appropriate Y, we use the following:

a. Sigma Level (Performance of Y)
b. RTY (Rolled Throughput Yield)
c. CP(Inherent process capability), and CpK (Resultant process capability)

2. Identify the Xs and prioritize, we use the following:

a. Process mapping
b. Fish bone diagrams
c. Pareto analysis
d. FDM (Function Deployment Method)

At the end of step 2, we have the list of prioritized Xs.

Y = f(X)

An alternate way to look at Six Sigma

Y = f(X)

Here we shall talk about what this straight line function is all abouta and how it leads us to DMAIC.

Y is a function of X. Its value depends on the value assigned to X. Y, thus is dependent, while X is not. Y is called KPOV (Key Process Output Variable). X is called KPIV (Key Process Input Variable).

Y is the output. X is the input. To get results, we should focus on inputs (Xs), not on outputs (Ys). For example, commonly, companies focus on sales target, but not variables / processes that affect the sales target. When variables / processes that control the sales target are identified, and fine tuned, the sales target is automatically brought under control.

Talking in terms of software defects, if all causes of bugs are identified and addressed (all Xs ), then there is no need to test the final product! The final testing can be ignored. Though this is a an idealistic statement, this is what six sigma tries to achieve - reduce the causes of errors so that final inspection can be ignored.

Inspections, manual in particular are never error free. So, no matter how many cycles of review a code undergoes, possibility of error oversight still remains. Therefore, inspections dont really help.

Dell computers for example packages its computer components such that there is no chance of wrong fittings of parts - incompatible system elements would just not fit. Error proofing is done. (Dell call center handlers therefore are confident of letting their customers open the system and repair it as per their instructions given online...)

In software industry, this means modular programming, which yeild good benefits. Modules are pretested, self-containing entities that just need to be integrated and a final system integration test done.

Let us for example say to improve process performance, Eureka Forbes has several Ys to choose from :- Sales, Number of products sold per month, etc. Of them lets consider Sales.

Y = Sales
For this Y, following are the possible Xs
X1 = Product Quality
X2 = Product Features
X3 = Price
X4 = Advertisement Effectiveness
X5 = Sales Force Effectiveness

Of these Xs, lets pick up X5 (Sales force effectiveness) and consider this as Y. Now, for this Y, the possible Xs are:

X1 = Training Effectiveness
X2 = Recruitment and Selection Effectiveness
X3 = Attrition

Next, lets take X1 (Training Effectiveness as Y). For this, the possibel Xs are:

X1 = Trainer Competence
X2 = Duration of Training
X 3 = Training Content

Thus, Y = f (X) helps us in drilling down from output to input to help us select green belt projects. Green belt projects usually have fixed time frame. They have to be chosen such that they are completed well within the time frame. Y = f(X) helps in choosing the Xs, and the corresponding Ys that are dependent on those Xs.

The challenges faced while drilling down for Xs are:

1. Identification of Ys (Which Ys to choose)
2.

a. Measurability of Y
- Current Y
- Target Y

b. Identification of Xs

3. Identification of vital Xs among the identified Ys: Focusing on all Xs may not be yielding. There could be vital Xs whose fine tuning would give results.

4. Improve vital Xs and verify their impact on Y

5. Sustaining the improvements

The above five points are nothing but D-M-A-I-C.

1 = D
2 = M
3 = A
4 = I
5 = C

Thursday, February 08, 2007

Process performance and sigma levels...

Some processes might have to operate at levels above 6 sigma. For example mission critical applications like satellite launch, etc. So, the expected performance (and also the tolerance for defects) depends on the task on hand. Mission critical applications cannot afford to have even a single defect.

Same mean diff sigma, diff mean and same sigma...examples

Same mean different sigma

In a normal curve, most of the values tend to be crowded at the mean. Curves with the same mean and different standard deviation are as shown in the first diagram. (would upload the diagram later). Since the mean for all these curves coincides, the resultant figure looks like one curve mounting on top of the other.

The process representing curve with the steepest incline is the most capable because less values fall outside of the USL, LSL. So, the more sharper the nromal curve is, the better are the processes representing it.

Different mean, but same standard deviation

A good example of this is the heights of defense recruits in Russia, Japan, and India. Russian are the tallest, Indians the average, while the japanese are the shortest. When plotted, the curves (will upload the figure later) look like mountain ranges of same height. The curve representing the Japs will be to the extreme left, the Indians in middle, and the one represting the Russians to the extreme right.

* Note that in this example, the standard deviation in all the divisions would be the same.