In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Basic Statistics
Michael Koch
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Statistics
Are a tool to answer questions: How accurate is the analyses? How many analyses do I have to make to
overcome problems of inhomogeneity or imprecision?
Does the product meet the specification? With what confidence is the limit
exceeded?
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Analytical results vary
Because of unavoidable deviation during measurement
Because of inhomogeneity between the subsamples
It is very important to differentiate between these two cases
Because the measures to be taken can be different
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Variations
Are always present
If there seems to be none, the resolution is not high enough
322322322322322322322322322322322322322322322322322322322322
322,3322,5322,3322,2321,7321,8321,7322,2321,6322,4322,4321,5322,4322,1322,2322,5321,7321,8321,9322,4
322,29322,49322,28322,17321,67321,76321,75322,17321,58322,36322,40321,53322,42322,05322,16322,47321,73321,76321,85322,41
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Histogram
For the description of the variability Divide range of possible measurements
into a number of groups Count observations in each group Draw a boxplot of the frequencies
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Histogram
0
2
4
6
8
10
12
14
16
18
20
45,75 46,25 46,75 47,25 47,75 48,25 48,75 49,25 49,75 50,25 50,75
mg/g
fre
qu
en
cy
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Distribution
If the number of values rises and the groups have a smaller range, the histogramm changes into a distribution curve
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Cumulative Distribution
The cumulative distribution is obtained by summing up all frequencies
It can be used to find the proportion of values that fall above or below a certain value
For distributions (or histograms) with a maximum, the cumulative distribution curve shows the typical S-shape
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Cumulative Distribution
0
20
40
60
80
100
45,75 46,75 47,75 48,75 49,75 50,75 mg/g
cu
mu
lati
ve f
req
ue
nc
y [%
]
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Population – Sample
It is important to know, whether all data (the population) or only a subset (a sample) are known.
PopulationSampleAll inhabitants of a town
A selection of 1000 inhabitants of a townAny number of measurements of copper in a soil
Not possible
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Distribution Descriptives
The data distribution can be described with four characteristics: Measures of location Measures of dispersion Skewness Kurtosis
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Location – Arithmetic Mean
Best estimation of the population mean µ for random samples from a population
xx
n
ii
n
1
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Location – Median
Sort data by value Median is the central value of this series
The median is robust, i.e. it is not affected by extreme values or outliers
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Location – Mode
The most frequent value
Large number of values necessary It is possible to have two or more
modes
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Location – Other Means
Geometric mean Harmonic Mean Robust Mean (e.g. Huber mean) ...
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Dispersion – Variance of a population
The mean of the squares of the deviations of the individual values from the population mean.
2
2
1
( )x
n
ii
n
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Dispersion – Variance of a sample
Variance of population with (n-1) replacing n and the arithmetic mean of the data (estimated mean) replacing the population mean µ
s
x x
n
ii
n
2
2
1
1
( )
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Dispersion – standard deviation
The positive square root of the variance
Population
( )x
n
ii
n2
1
Sample
s
x x
n
ii
n
( ) 21
1
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Dispersion – standard deviation of the mean
possible means of a data set are more closely clustered than the data
sdmn
where n is the number of
measurements for the mean
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Dispersion – relative standard deviation
measure of the spread of data compared with the mean
RSDs
x
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Skewness
The skewness of a distribution represents the degree of symmetry Analytical data close to the detection
limit often are skewed because negative concentrations are not possible
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Kurtosis
The peakedness of a distribution. Flat platykurtic Very peaked leptokurtic In between mesokurtic
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Normal distribution - I
First studied in the 18th century by Carl Friedrich Gauss
He found that the distribution of errors could be closely approximated by a curve called the „normal curve of errors“
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Normal distribution - II
bell shaped completely determined by µ and σ
µ
y ex
1
2
2
22
( )
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Normal distribution – important properties
the curve is symmetrical about µ the greater the value of σ the greater the spread of the curve approximately 68% (68,27%) of the data lies within µ±1σ approximately 95 % (95,45%) of the data lies within µ±2σ approximately 99,7 % (99,73%) of the data lies within µ±3σ
µµµµ
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Normal distribution – a useful model
With the mean and the standard deviation area under the curve can be defined
These areas can be interpreted as proportions of observations falling within these ranges defined by µ and σ
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
One-tailed / two-tailed questions
One-tailed e.g. a limit for the
specification of a product
It is only of interest, whether a certain limit is exceeded or not
Two-tailed e.g. the analysis of a
reference material we want to know,
whether a measured value lies within or outside the certified value ± a certain range
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
One-sided probabilities
µ-k µ µ+k
Less than µ + k , P(<)Greater than µ + k , P(>)
k P(<) P(>)1 0.841 0.1591.64 0.950 0.0501.96 0.975 0.0252 0.977 0.0233 0.998 0.002
One-sided Probabilities
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Two-sided probabilities
µ-k µ µ+k
Area within the range µ ± k , P(in)Area outside the range µ ± k , P(out)
k P(in) P(out)1 0.683 0.3171.64 0.900 0.1001.96 0.950 0.0502 0.954 0.0463 0.997 0.003
Two-sided Probabilities
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Population / sample small random samples of results from a
normal distribution may not look normal at first sight e.g. samples from a population of 996 data
sample of 10 values
0
0,5
1
1,5
2
2,5
3
3,5
42 50 58 66 74 82 90 98 106
114
122
130
138
146
154
value
fre
qu
en
cy
sample of 20 values
0
0,5
1
1,5
2
2,5
3
3,5
42 50 58 66 74 82 90 98 106
114
122
130
138
146
154
value
fre
qu
en
cy
sample of 50 values
0
1
2
3
4
5
6
42 50 58 66 74 82 90 98 106
114
122
130
138
146
154
value
fre
qu
en
cy
sample of 100 values
0
1
2
3
4
5
6
7
8
value
fre
qu
en
cy
sample of 249 values
0
2
4
6
8
10
12
value
fre
qu
en
cy
sample of 498 values
0
5
10
15
20
25
value
fre
qu
en
cy
all 996 values
05
1015202530354045
value
freq
uenc
y
Population: µ=100, σ=19,75Sample size mean s
10 99,6 20,48
20 101,8 18,14
50 97,76 19,13
100 96,52 20,14
249 98,63 20,10
498 97,81 20,49
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Confidence limits / intervals
The confidence limits describe the range within which we expect with given confidence the true value to lie.
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Confidence limits for the population mean
If the population standard deviation σ and the estimated mean are known:
CL for the population mean, estimated from n measurements on a confidence level of 95%
CL xn
1 96.
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Confidence limits for the population mean
If the standard deviation has to be estimated from a sample standard deviation s (this is normally the case):
Additional uncertainty use of Student t distribution
CL x ts
n
Where t is the two-tailed tabulated value for (n-1) degrees of freedom and the requested confidence limit
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
t-valuesCritical values for the Student t-Testd.o.f. \ 1T 60% 75% 80% 85% 90% 95% 97,50% 99% 99,50% 99,90% 99,95%d.o.f. \ 2T 20% 50% 60% 70% 80% 90% 95% 98% 99% 99,80% 99,90%
1 0,3249 1,000 1,376 1,963 3,078 6,314 12,71 31,82 63,66 318,3 636,62 0,2887 0,8165 1,061 1,386 1,886 2,920 4,303 6,965 9,925 22,33 31,603 0,2767 0,7649 0,9785 1,250 1,638 2,353 3,182 4,541 5,841 10,21 12,944 0,2707 0,7407 0,9410 1,190 1,533 2,132 2,776 3,747 4,604 7,173 8,6105 0,2672 0,7267 0,9195 1,156 1,476 2,015 2,571 3,365 4,032 5,893 6,8696 0,2648 0,7176 0,9057 1,134 1,440 1,943 2,447 3,143 3,707 5,208 5,9597 0,2632 0,7111 0,8960 1,119 1,415 1,895 2,365 2,998 3,499 4,785 5,4088 0,2619 0,7064 0,8889 1,108 1,397 1,860 2,306 2,896 3,355 4,501 5,0419 0,2610 0,7027 0,8834 1,100 1,383 1,833 2,262 2,821 3,250 4,297 4,781
10 0,2602 0,6998 0,8791 1,093 1,372 1,812 2,228 2,764 3,169 4,144 4,58711 0,2596 0,6974 0,8755 1,088 1,363 1,796 2,201 2,718 3,106 4,025 4,43712 0,2590 0,6955 0,8726 1,083 1,356 1,782 2,179 2,681 3,055 3,930 4,31813 0,2586 0,6938 0,8702 1,079 1,350 1,771 2,160 2,650 3,012 3,852 4,22114 0,2582 0,6924 0,8681 1,079 1,345 1,761 2,145 2,624 2,977 3,787 4,14015 0,2579 0,6912 0,8662 1,074 1,341 1,753 2,131 2,602 2,947 3,733 4,07316 0,2576 0,6901 0,8647 1,071 1,337 1,746 2,120 2,583 2,921 3,686 4,01517 0,2573 0,6892 0,8633 1,069 1,333 1,740 2,110 2,567 2,898 3,646 3,96518 0,2571 0,6884 0,8620 1,067 1,330 1,734 2,201 2,552 2,878 3,610 3,92219 0,2569 0,6876 0,8610 1,066 1,328 1,729 2,093 2,539 2,861 3,579 3,88320 0,2567 0,6870 0,8600 1,064 1,325 1,725 2,086 2,528 2,845 3,552 3,850
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Confidence limits – depending on n
Confidence Limits are strongly dependent on the number of measurements
µ
CLn=1
µ
CLn=1
CLn=3
µ
CLn=1
CLn=3
CLn=10
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Trueness – Precision – Accuracy - Error Trueness
the closeness of agreement between the true value and the sample mean
Precision the degree of agreement between the results of
repeated measurements Accuracy
the sum of both, trueness and precision Error
the difference between one measurement and the true value
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Linkage
Improvingaccuracy
Improving precision
Error
Impr
ovin
g tr
uene
ss
from: LGC, vamstat II
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance Testing - ?
Typical Situation: We have analyzed a reference material
and we want to know, if the bias between the mean of measurements and the certified value of a reference material is „significant“ or just due to random errors.
This is a typical task for significance testing
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance testing
a decision is made about a population based on observations made from a sample of the
population on a certain level of confidence
there can never be any guarantee that the decision is absolutely correct
on a 95% confidence level there is a probability of 5% that the decision is wrong
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Before significance testing
inspect a graphical display of the data before using any statistics
are there any data that are not suitable? is a significance test really necessary? compare the statistical result with a
commonsense visual appraisal a difference may be statistically significant,
but not of any importance, because it is small
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
The most important thing in statistics
No blind use
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
The stages of significance testing - I
Null hypothesis a statement about the population, e.g.:
There is no bias (µ = xtrue)
Alternative hypothesis H1
the opposite of the null hypothesis, e.g.: there is a bias (µ ≠ xtrue)
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
The stages of significance testing - II
Critical value the tabulated value of the test statistics,
generally at the 95% confidence level. If the calculated value is greater than the
tabulated value than the test result indicates a significant difference
Evaluation of the test statistics
Decisions and conclusions
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance test example – the question
Is the bias between the mean of measurements and the certified value of a reference material significant or just due to random errors?
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance test example – graphical explanation
x ref
95% confidence interval for the population mean µ
x
significant
x ref
95% confidence interval for the population mean µ
x
not significant
the reference value is outside the 95% confidence limits of the data set
the bias is unlikely to result from random variations in the results
the bias is significant
the reference value is inside the 95% confidence limits of the data set
the bias is likely to result from random variations in the results
the bias is not significant
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance test example - calculation
The Null hypothesis we assume there is no bias: x=xref
Calculation of confidence limits x
t s
nx
t s
n
With the Null hypothesis x
t s
nx x
t s
nre f
t t
x x
s
n
critica l observed
ref
or
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
t-valuesCritical values for the Student t-Testd.o.f. \ 1T 60% 75% 80% 85% 90% 95% 97,50% 99% 99,50% 99,90% 99,95%d.o.f. \ 2T 20% 50% 60% 70% 80% 90% 95% 98% 99% 99,80% 99,90%
1 0,3249 1,000 1,376 1,963 3,078 6,314 12,71 31,82 63,66 318,3 636,62 0,2887 0,8165 1,061 1,386 1,886 2,920 4,303 6,965 9,925 22,33 31,603 0,2767 0,7649 0,9785 1,250 1,638 2,353 3,182 4,541 5,841 10,21 12,944 0,2707 0,7407 0,9410 1,190 1,533 2,132 2,776 3,747 4,604 7,173 8,6105 0,2672 0,7267 0,9195 1,156 1,476 2,015 2,571 3,365 4,032 5,893 6,8696 0,2648 0,7176 0,9057 1,134 1,440 1,943 2,447 3,143 3,707 5,208 5,9597 0,2632 0,7111 0,8960 1,119 1,415 1,895 2,365 2,998 3,499 4,785 5,4088 0,2619 0,7064 0,8889 1,108 1,397 1,860 2,306 2,896 3,355 4,501 5,0419 0,2610 0,7027 0,8834 1,100 1,383 1,833 2,262 2,821 3,250 4,297 4,781
10 0,2602 0,6998 0,8791 1,093 1,372 1,812 2,228 2,764 3,169 4,144 4,58711 0,2596 0,6974 0,8755 1,088 1,363 1,796 2,201 2,718 3,106 4,025 4,43712 0,2590 0,6955 0,8726 1,083 1,356 1,782 2,179 2,681 3,055 3,930 4,31813 0,2586 0,6938 0,8702 1,079 1,350 1,771 2,160 2,650 3,012 3,852 4,22114 0,2582 0,6924 0,8681 1,079 1,345 1,761 2,145 2,624 2,977 3,787 4,14015 0,2579 0,6912 0,8662 1,074 1,341 1,753 2,131 2,602 2,947 3,733 4,07316 0,2576 0,6901 0,8647 1,071 1,337 1,746 2,120 2,583 2,921 3,686 4,01517 0,2573 0,6892 0,8633 1,069 1,333 1,740 2,110 2,567 2,898 3,646 3,96518 0,2571 0,6884 0,8620 1,067 1,330 1,734 2,201 2,552 2,878 3,610 3,92219 0,2569 0,6876 0,8610 1,066 1,328 1,729 2,093 2,539 2,861 3,579 3,88320 0,2567 0,6870 0,8600 1,064 1,325 1,725 2,086 2,528 2,845 3,552 3,850
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance test example – conclusions I
If we find tcritical > tobserved
conforming to our Null hypothesis
we can say: we cannot reject the Null hypothesis we accept the Null hypothesis we find no significant bias at the 95% level we find no measurable bias (under the given
experimental conditions) but not: there is no bias
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Significance test example – conclusions II
If we find tcritical < tobserved
not conforming to our Null hypothesis
we can say: we reject the null hypothesis we find significant bias at the 95% level but not: there is a bias, because there is
roughly a 5% chance of rejecting the Null hypothesis when it is true.
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
One-sample t-test
This test compares the mean of analytical results with a stated value
The question may be two-tailed as in the previous example
or one-tailed e.g.: is the copper content of an alloy below the
specification? for one-tailed questions other t-values are used
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
t-valuesCritical values for the Student t-Testd.o.f. \ 1T 60% 75% 80% 85% 90% 95% 97,50% 99% 99,50% 99,90% 99,95%d.o.f. \ 2T 20% 50% 60% 70% 80% 90% 95% 98% 99% 99,80% 99,90%
1 0,3249 1,000 1,376 1,963 3,078 6,314 12,71 31,82 63,66 318,3 636,62 0,2887 0,8165 1,061 1,386 1,886 2,920 4,303 6,965 9,925 22,33 31,603 0,2767 0,7649 0,9785 1,250 1,638 2,353 3,182 4,541 5,841 10,21 12,944 0,2707 0,7407 0,9410 1,190 1,533 2,132 2,776 3,747 4,604 7,173 8,6105 0,2672 0,7267 0,9195 1,156 1,476 2,015 2,571 3,365 4,032 5,893 6,8696 0,2648 0,7176 0,9057 1,134 1,440 1,943 2,447 3,143 3,707 5,208 5,9597 0,2632 0,7111 0,8960 1,119 1,415 1,895 2,365 2,998 3,499 4,785 5,4088 0,2619 0,7064 0,8889 1,108 1,397 1,860 2,306 2,896 3,355 4,501 5,0419 0,2610 0,7027 0,8834 1,100 1,383 1,833 2,262 2,821 3,250 4,297 4,781
10 0,2602 0,6998 0,8791 1,093 1,372 1,812 2,228 2,764 3,169 4,144 4,58711 0,2596 0,6974 0,8755 1,088 1,363 1,796 2,201 2,718 3,106 4,025 4,43712 0,2590 0,6955 0,8726 1,083 1,356 1,782 2,179 2,681 3,055 3,930 4,31813 0,2586 0,6938 0,8702 1,079 1,350 1,771 2,160 2,650 3,012 3,852 4,22114 0,2582 0,6924 0,8681 1,079 1,345 1,761 2,145 2,624 2,977 3,787 4,14015 0,2579 0,6912 0,8662 1,074 1,341 1,753 2,131 2,602 2,947 3,733 4,07316 0,2576 0,6901 0,8647 1,071 1,337 1,746 2,120 2,583 2,921 3,686 4,01517 0,2573 0,6892 0,8633 1,069 1,333 1,740 2,110 2,567 2,898 3,646 3,96518 0,2571 0,6884 0,8620 1,067 1,330 1,734 2,201 2,552 2,878 3,610 3,92219 0,2569 0,6876 0,8610 1,066 1,328 1,729 2,093 2,539 2,861 3,579 3,88320 0,2567 0,6870 0,8600 1,064 1,325 1,725 2,086 2,528 2,845 3,552 3,850
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Two-sample t-test
This test compares two independent sets of data (analytical results)
The question may be two-tailed or one-tailed as well two-tailed: are the results of two methods
significantly different one-tailed: are the results of method A
significantly lower than the results of method B
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
two-sample t-test – t-value
The formula for the calculation of the t-value is somewhat more complex:
21111
)(
21
2221
21
21
21
nnnsns
nn
xxtobserved
The degree of freedom for the tabulated value tcritical is
d.o.f = n1+n2-2
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
two-sample t-test - validity
The two-sample t-test is valid only if there is no significant difference between the both standard deviations s1 and s2.
This can (and should) be tested with the F-test
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
F-test
The F-test is a method of testing whether two independent estimates of variance are significantly different.
The F-value is calculated according to Fobserved=sA
2/sB2, with sA
2 > sB2
The critical value of F depends on the degrees of freedom dofA=nA-1 and dofB=nB-1 and can be taken from a table
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
nA=10
nB=10
F-values
1 2 3 4 5 6 7 8 9 10
1 647,79 799,50 864,16 899,58 921,85 937,11 948,22 956,66 963,29 968,63
2 38,51 39,00 39,17 39,25 39,30 39,33 39,36 39,37 39,39 39,40
3 17,44 16,04 15,44 15,10 14,89 14,74 14,62 14,54 14,47 14,42
4 12,22 10,65 9,98 9,61 9,36 9,20 9,07 8,98 8,91 8,84
5 10,01 8,43 7,76 7,39 7,15 6,98 6,85 6,76 6,68 6,62
6 8,81 7,26 6,60 6,23 5,99 5,82 5,70 5,60 5,52 5,46
7 8,07 6,54 5,89 5,52 5,29 5,12 5,00 4,90 4,82 4,76
8 7,57 6,06 5,42 5,05 4,82 4,65 4,53 4,43 4,36 4,30
9 7,21 5,72 5,08 4,72 4,48 4,32 4,20 4,10 4,03 3,96
10 6,94 5,46 4,83 4,47 4,24 4,07 3,95 3,86 3,78 3,72
degrees of freedom for numerator (A)
deg
rees
of
free
dom
fo
r de
num
erat
or (
B)
F statistics for P=0.05
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
F-test - decision
If Fobserved < Fcritical than the variances sA2
and sB2 are not significantly different
on the chosen confidence level
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Paired t-test
Two different methods are applied for measuring different samples
sample 1sample 2
sample 3sample 4
sample 5sample 6
0
0,5
1
1,5
2
2,5
3
Method A
Method B
Two-sample t-test is inappropriate, because differences between samples ar much greater than a bias between the two methods
paired t-test
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Paired t-test - principle
The differences between both methods are calculated
A one-sample t-test is applied on the differences to decide whether the mean difference is significantly different from zero.
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Paired t-test – test statistics
The t-value is calculated according to:
n
s
x
observed diff
diff
t
The critical t-value is taken from the t-table for the desired confidence level
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Statistics
There are many statistical tools for the analyst to interpret analytical data
Statistics are not a „cure-all“ that can answer all questions
If the analyst uses statistics with a commonsense, this is a powerful tool to help him to answer his customers or his own questions.
In: Wenclawiak, Koch, Hadjicostas (eds.) Quality Assurance in Analytical Chemistry – Training and Teaching
Koch, M.: Basic Statistics © Springer-Verlag Berlin Heidelberg 2003
Recommended literature and training software
LGC, VAMSTAT II, Statistics Training for Valid Analytical Measurements. CD-ROM, Laboratory of the Government Chemist, Teddington 2000.
H. Lohninger: Teach/Me - Data Analysis. Multimedia teachware. Springer-Verlag 1999.
J.C. Miller and J.N. Miller: Statistics for Analytical Chemistry. Ellis Horwood Ltd., Chicester 1988
T.J. Farrant: Practical Statistics for the Analytical Scientist – a Bench Guide. Royal Society of Chemistry for the Laboratory of the Government Chemist (LGC), Teddington 1997