C# Implementation of the Error Function and Standard Deviation Algorithm to Calculate Yield
Weston Turner 2008
Analoglogic.net
weston@analoglogic.net

The Error function is used in manufacturing statistics to calculate yield using the following equation:

\begin{align}&\Phi_{\mu,\sigma^2}(\mu+n\sigma)-\Phi_{\mu,\sigma^2}(\mu-n\sigma)\\
&=\Phi(n)-\Phi(-n)=2\Phi(n)-1=\mathrm{erf}\bigl(n/\sqrt{2}\,\bigr),\end{align}
So erf(n / sqrt(2)) is yield where n is sigma, or the number of standard deviations between the mean and the nearest specified limit of a process.

\operatorname{erf}(x) = \frac{2}{\sqrt{\pi}}\int_0^x e^{-t^2} dt.
This is the Error function expressed with calculus. There are no equivalent operations in most computer programming languages to set the equation up in code, so an approximation using elementary functions will have to suffice.

\operatorname{erf}(x)^2\approx 1-\exp\left(-x^2\frac{4/\pi+ax^2}{1+ax^2}\right)
where
 a = -\frac{8(\pi-3)}{3\pi(\pi-4)}.

Finally, as a function in code, the above yield equation looks like:

        public static double yield(double lowerLimit, double upperLimit, List<double> dL) {
            const double a = -((8 * (Math.PI - 3)) / (3 * Math.PI * (Math.PI - 4)));
            double x = sigma(lowerLimit, upperLimit, dL) / Math.Sqrt(2);
            return Math.Sqrt(1 - Math.Exp(-Math.Pow(x, 2) * ((4 / Math.PI + a * Math.Pow(x, 2))
                / (1 + a * Math.Pow(x, 2)))));
        }

Some explanation I think is in order. First, ignore public and static they are language specific syntax. Double means that the return data type of this function is a double precision decimal number (64 bit floating point, so rounding error is minimal). The lower and upper specified limits are passed into the function (in order to calculate sigma) and a list of doubles "dL" (double list, list of doubles) (the data to be analyzed, measured data from some sort of manufacturing process) is also passed in. The expression to set the value of the constant "a" is the the first line inside the function block. Then "x" is calculated using the sigma function which appears below, (once again, sigma in this context is the number of standard deviations that fit between the mean of a data set and the nearest specified limit of the process, if a data point is outside the limit it is a failure and counts against yield).

        public static double sigma(double lowerLimit, double upperLimit, List<double> dL) {
            return Math.Min(upperLimit - dL.Average(), dL.Average() - lowerLimit)
                / standardDeviation(dL);
        }

where standard deviation is calculated using the following function:

        public static double standardDeviation(List<double> dL) {
            double variance = 0;
            foreach (double d in dL)
                variance += Math.Pow(d - dL.Average(), 2);
            return Math.Sqrt(variance / dL.Count);
        }

The above algorithm was developed following the steps below:
  1. Find the mean, \scriptstyle\overline{x}, of the values.
  2. For each value xi calculate its deviation (\scriptstyle x_i - \overline{x}) from the mean.
  3. Calculate the squares of these deviations.
  4. Find the mean of the squared deviations. This quantity is the variance σ2.
  5. Take the square root of the variance.

The algorithm defines the standard deviation of a discrete random variable or data set. Expressed with sigma notation it looks like this, but once again there is no sigma operation in most programming languages so an iterative loop structure takes its place.

\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i - \overline{x})^2}\,,
where \scriptstyle \overline{x} is the arithmetic mean of the values xi, defined as:
\overline{x} = \frac{x_1+x_2+\cdots+x_N}{N} = \frac{1}{N}\sum_{i=1}^N x_i\,.

So, the gist of this example is that using elementary functions and algorithms a programmer can approximate and/or substitute for calculus operations. None of the code functions I showed you were very complicated but they described complicated mathematical functions. None of them used exotic operations yet they approximated the pure math very closely.

Using the statistics library in the references section below you can perform detailed analysis on process measurements that conform to a normal distribution. An example of output from such an analysis can be seen below.

CurrentDC {
    Data Points:    218
    Data Min:       5.1100001335144
    Data Max:       378.660003662109
    Mean:           317.085046846932
    Std Dev:        45.0348075856894
    Lower Limit:    100
    Upper Limit:    510
    Cp:             1.60679452542626
    CpK:            1.42789517335009
    Sigma:          4.28368552005027
    Yield:          99.9982711132954%
    DPM:            17.2888670464745


References