Upload
others
View
7
Download
0
Embed Size (px)
Citation preview
Mixed Models
Symposium Agricultural field trials10.10.2007
University of Hohenheim
Dr. sc. agr. Andreas Büchse07/2002 - 04/2007 Fachgebiet Bioinformatik. Universität Hohenheimsince 05/2007 Agrarzentrum Limburgerhof. BASF Aktiengesellschaft
Outline
1. Definition of mixed models
2. Examples for use (usefulness) of mixed models2.1 Series of experiments
2.2 Hierarchical Designs
2.3 Repeated measurement
2.4 Split-Plot, multifactorial trials
2.5 Random coefficient models
2.6 Spatial correlation
3. Summary, Conclusions, rules of thumb
First...
I will put the slides on the symposium homepage
www.uni-hohenheim.de/bioinformatik
next week
Or send an eMail to
Literature
Littell, Milliken, Stroup, Wolfinger (1996): SAS System for Mixed Models. SAS Institute, Cary, NC. 2nd ed. 2006!
Verbeke, G & G. Molenberghs, 2000: Linear Mixed Models for Longitudinal Data. Springer-Verlag, New York.
Piepho, H.P., Büchse, A., and Emrich, K. (2003): A hitchhiker's guide to the mixed model analysis of randomized experiments. Journal of Agronomy and Crop Science, 189, 310-322
Piepho, H.P., Büchse, A., Richter, C. (2004): A mixed modelling approach to randomized experiments with repeated measures. Journal of Agronomy and Crop Science, 190, 230-247
Zimmerman, D. L. & D. A. Harville, 1991: A Random Field Approach to the Analysis of Field-Plot Experiments and Other Spatial Experiments. Biometrics 47: 223-23.
Longford, N. T., 1993: Random Coefficient Models. Oxford University Press.
Definition
What does this mean, „mixed“ model?
a variety trial
One location, several varieties, e.g. 4 replications
Randomized Complete Block Design
fixed effects model
Model of a randomized complete block design
yik = μ + bk + αi + eik
yij = yield of the i-th variety in the k-th blockμ = constant (fix)bk = effect of the k-th block (fix)αi = effect of the i-th variety (fix)eik = residual error ~N (0, σ²e)
Only one random effect ⇒ no mixed model
a variety trial (again)
One location, several varieties, maybe from a cross of twolines, e.g. 4 replicationsRandomized Complete Block Design
a variety trial (again)
If the breeder is interested in estimating response to selection
R = i h σG
R = response to selectioni = selection intensityh = sqrt of heritabilityσG = sqrt of genetic variance
Then, it is reasonable to treat variety effects as randomeffects (sampled from a Gaussian distribution)
mixed model
Model of a randomized complete block design, withrandom variety effects
yik = μ + bk + αi + eik
yik = yield of the i-th variety in the k-th blockμ = constant (fix)bk = effect of the k-th block (fix)αi = effect of the i-th variety ~N (0, σ²G)eik = residual error ~N (0, σ²e)
more than one random effect ⇒ mixed model !
Definition
A mixed linear model is a model that have morethan one random effect (not only the residual error).
It has random and fixed effects.
Random = random sample from a (normal) distribution
Fixed effects versus random effects
Interest in linear models [...] lies mainly in estimating (and testing hypotheses about) linear functions of the effects in the models. These effects are what we call fixed effects, and the models are correspondingly called fixed effectsmodels. There are, however, situations where we have no interestin linear functions of effects but where, by the nature of the data and their derivation, the things of prime interestconcerning the effects are variances. Effects of this nature are called random effects [...] and the models are calledrandom effects models.Models involving a mixture of fixed and random effects arecalled mixed models.
SEARLE, Linear models, 1971
Fixed or random?
Are inferences going to be drawn from these dataabout just these levels of the factor?
„Yes“ – then the effects are to be considered as fixed effects
„No“ – then, presumably, inferences will be madenot just about the levels occurring in the data butabout some population of levels [...] the effectsare considered as being random.
SEARLE, Linear models, 1971
Fix or random?
Levels of factor aresample from a randomdistribution?
no fix factor estimate fixedeffects (BLUE)
yes
randomfactor
What isthe maininterest?
only distribution of random effects
Estimate variancecomponents (REML)
distribution of randomeffects as well as effectsthemselves
Estimate variancecomponents (REML) and realized random effects(BLUP)
Are years random? (SEARLE, Linear models, 1971)
[...] take the case of year effects, for example in studying wheat yields: are the effects of years on yield to be considered fixed or random?
The years themselves are unlikely to be random, for they will probably be a group of consecutiveyears over which the data have been gathered orthe experiments run.
But the effects on yield may reasonably beconsidered random - unless, perhaps, one isinterested in comparing specific years for somepurpose.
Are blocks fix or random?
Everlasting story...
• With fix blocks s.e.m. is underestimated
• For s.e.d. it does not matter!
• With random blocks, in small experiments, the variancecomponent is sometimes estimated <0, which results in model reduction
• > my opinion: blocks fix is OK.
Some examples
• Genotypic lines in a breeding experiment (random)• Varieties in a VCU-trial (fix, or random with Shrinkage ->
BLUP, see Michel & Piepho)• Different doses of plant protection agents (fix)• Different agents at same level (fix)• Locations (fix or random)• Regions (fix)• Years (random, mostly)• Complete blocks (both is possible)• Incomplete blocks (both is possible, random allows use of
interblock information (see Williams, Proceedings p. 266)
some theory
General form of a mixed model in matrix notation
y = Xβ + Zu + e
y = a vector with n observations
X, Z = Designmatrices
β = a vector of fixed effectsu = a vector of random effectse = a vector of residuals
e ∼ N(0,R) u ∼ N(0,G)
R, G = variance-covariance matrices
General form of a mixed model in matrix style
y = Xβ + Zu + e e ∼ N(0,R) u ∼ N(0,G)
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
+⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛⋅
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
+⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛⋅
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
=
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
6
5
4
3
2
1
3
2
1
2
1
100001010100010001
101101101011011011
141515151612
eeeeee
bb
αααμ
6 plots, 2 fixed block effects, 3 random variety effects
G: the variance-covariance-matrix of random effects
A 3 by 3 matrix because we have 3 varieties
Genetic variance on the diagonal
No covariance between genotypic effects
IG 22
2
2
2
100010001
000000
GG
G
G
G
σσσ
σσ
=⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛=
⎟⎟⎟
⎠
⎞
⎜⎜⎜
⎝
⎛
=
R: the residual variance-covariance-matrix (iid)
6 by 6 matrix because we have 6 plots.Residual variance on the diagonalNo covariance (in case of independent errors)This is in every case the form of R in a fixed effects model. But don‘t have to be in a mixed model. In a mixed model alternative forms of R are possible, we will see later.
IR 22
2
2
2
2
2
2
100000010000001000000100000010000001
000000000000000000000000000000
ee
e
e
e
e
e
e
σσ
σσ
σσ
σσ
=
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
=
⎟⎟⎟⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜⎜⎜⎜
⎝
⎛
=
Estimation
The fixed and random effects are estimated withthe method of WEIGHTED LEAST SQUARES
Weights depend on variance components (G, R)
Normally the variance components are unknownand have to be estimated themselves
An accepted method for the estimation is REML(Patterson & Thompson 1971)
Estimation of random and fixed effects
( ) yVX'XVX'β 11 −−−=ˆ
Weighted least square mean estimation of:
V = ZGZ' + R
For more details see e.g. Littell et al. „SAS-System for MIXED Models“and cited literature
Difficult matrix equations, but the computer does thisfor us ☺
Software for Linear Mixed Models
SAS PROC MIXED
MIXED Procedure in SPSS
GenStat
ASReml
S-Plus
R
Several other packages (Statistica, Systat, ...)
Some examples for using mixed models
Example 1: Trial series
A cultivar performance trial14 oat varieties, 18 sites
Model for a single siteyik = μ + bk + αi + eik
yik = yield of the i-th variety in the k-th blockμ = constant (fix)bk = effect of the k-th block (fix)α i = effect of the i-th variety (fix)eik = residual error ~N (0, σ²e)
From single trial to a series
yik = μ + αi + bk + eik add index j for sites
yijk = μj + αij + bjk + eijk
yijk = yield of the i-th variety in the k-th block at site jμj = mean yield at site j (fix or random?)αij = effect of the i-th variety at site j (fix?) bjk = effect of the k-th block at site j (fix?)eijk = residual error ~N (0, σ²e)
Main effects for site and variety
yijk = μj + αij + bjk + eijk = μ + αi + βj + (αβ)ij + bjk + eijk
yijk = yield of the i-th variety in the k-th block at site jμ = global meanαi = effect of the i-th varietyβj = effect of the j-th site(αβ)ij = interaction of the i-th variety by the j-th sitebjk = effect of the k-th block at site jeijk = residual error ~N (0, σ²e)
Which effects should be random?
have sites fixed or random effects?
Aim of the series?
want to make inferences for a region or a state (in thisexample whole Germany) -> random
Are the sites a random sample from „a population“ of sitesin Germany?
The sites are mostly no random sample, but maybe thesite effects are.
(for single years site effects are confounded with site byyear interaction!)
Are variety effects fixed or random?
Aim of the series?
want to know about performance of the testedvarieties
-> fix
The varieties are no random sample from „a population“ of varieties
mixed model
yijk = μ + αi + βj + (αβ)ij + bjk + eijk
sites: randomvarieties: fixedsite × variety: random (sites are random)blocks within site: random (sites are random)
βj ∼ N(0, σ²β)(αβ)ij ∼ N(0, σ²αβ)bjk ∼ N(0, σ²b)eijk ∼ N(0, σ²e)
SAS Code for trial series
data t; input variety site block yield; datalines; 1 1 1 58.1300 1 1 2 60.8000 <more data> 14 18 4 60.0000 ; proc mixed data=t; class site variety block; model yield= variety / ddfm=kenwardroger; random site variety*site block(site);lsmeans variety/pdiff; run;
Denominator degrees of freedom according to themethod of Kenward & Roger
Output
Cov Parm Estimate
site 82.1917site*variety 5.9844block(site) 8.0897Residual 9.4411
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > F
variety 13 221 8.67 <.0001
This are thevariancecomponents
OutputLeast Squares Means
StandardEffect variety Estimate Error DF t Value Pr > |t|
variety 1 53.6514 2.2676 20.2 23.66 <.0001variety 2 51.0719 2.2676 20.2 22.52 <.0001variety 3 55.6722 2.2676 20.2 24.55 <.0001variety 4 56.2379 2.2676 20.2 24.80 <.0001variety 5 55.9819 2.2676 20.2 24.69 <.0001variety 6 53.6103 2.2676 20.2 23.64 <.0001variety 7 53.8129 2.2676 20.2 23.73 <.0001variety 8 52.3715 2.2676 20.2 23.10 <.0001variety 9 58.4689 2.2676 20.2 25.78 <.0001variety 10 56.5832 2.2676 20.2 24.95 <.0001variety 11 53.8617 2.2676 20.2 23.75 <.0001variety 12 57.2378 2.2676 20.2 25.24 <.0001variety 13 54.7246 2.2676 20.2 24.13 <.0001variety 14 53.7700 2.2676 20.2 23.71 <.0001
Output
Differences of Least Squares Means
StandardEffect variety _variety Estimate Error DF t Value Pr > |t|
variety 1 2 2.5794 0.9629 221 2.68 0.0079variety 1 3 -2.0208 0.9629 221 -2.10 0.0370variety 1 4 -2.5865 0.9629 221 -2.69 0.0078variety 1 5 -2.3306 0.9629 221 -2.42 0.0163variety 1 6 0.04111 0.9629 221 0.04 0.9660variety 1 7 -0.1615 0.9629 221 -0.17 0.8669variety 1 8 1.2799 0.9629 221 1.33 0.1852variety 1 9 -4.8175 0.9629 221 -5.00 <.0001variety 1 10 -2.9318 0.9629 221 -3.04 0.0026variety 1 11 -0.2103 0.9629 221 -0.22 0.8273variety 1 12 -3.5864 0.9629 221 -3.72 0.0002variety 1 13 -1.0732 0.9629 221 -1.11 0.2663variety 1 14 -0.1186 0.9629 221 -0.12 0.9021...variety 13 14 0.9546 0.9629 221 0.99 0.3226
allow heterogenous error for each site
proc mixed data=t lognote; class site variety block; model yield= variety / ddfm=kenwardroger; random site variety*site block(site);
repeated / group=site;lsmeans variety/pdiff; run;
CPU-time effect of different syntax
/*heterogenous errors - long cpu time*/proc mixed data=t lognote; class site variety block; model yield = variety / ddfm=kenwardroger; random site variety*site block(site); large G-matrixrepeated / group=site;run;
/*heterogenous errors - short cpu time*/proc mixed data=t lognote; class site variety block; model yield = variety / ddfm=kenwardroger; random int variety block / subject=site;repeated / group=site;run;
allow heterogenous error for each siteCov Parm Subject Group Estimate
Intercept site 82.3446variety site 6.0686block site 7.7449Residual site 1 3.2903Residual site 2 5.3894Residual site 3 11.0351Residual site 4 5.9991Residual site 5 3.5523Residual site 6 9.7576Residual site 7 5.9514Residual site 8 4.7286Residual site 9 9.7070Residual site 10 15.2087Residual site 11 6.3460Residual site 12 5.3596Residual site 13 7.2015Residual site 14 7.9454Residual site 15 44.0650Residual site 16 4.9044Residual site 17 7.6436Residual site 18 11.1075
Model selection
Homogenous errors or heterogenous errors?
Comparison of AIC (Akaike criterion)
Fit Statistics heterogeous homogenous
-2 Res Log Likelihood 5471.4 5635.7AIC (smaller is better) 5513.4 5643.7AICC (smaller is better) 5514.3 5643.7BIC (smaller is better) 5532.1 5647.2
Model selection
Likelihood Ratio test
Fit Statistics heterogeous homogenous
-2 Res Log Likelihood 5471.4 5635.7
The log-Likelihood of heterogenous is higher
T = 164.3
Compare to a Chi²-distribution at 17 df.
Chi²(5%, 17) = 27.6 ⇒ significantly better
outlook
Several different structures for genotype-environmentinteraction possible
Piepho, H.-P., 1997: Analyzing genotype-environment data by mixedmodels with multiplicative effects. Biometrics 53: 761-766.
F. v. Eeuwijk: Statistical methods for the analysis of seriesof plant breeding experiments. (Proceed. of this meeting, 242 ff.)
2-step-analysis: Estimate means „variety×site“ in first step. Then analysethis means (not the plot values) over all sites in 2nd step
Michel et al. (Proceedings 136-141)
Example 2: Hierarchical designs
(analogous to a example from Snedecor & Cochran, Statistical methods, 1967)
plant calcium content [%] of single leaves
1 3.28 3.09 3.03 3.03
2 3.52 3.48 3.38 3.38
3 2.88 2.80 2.81 2.76
4 3.34 3.38 3.23 3.26
Calcium content of the four plants
calcium
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
plant
1 2 3 4
Statistic inference
What is the best estimator for average Ca-content?
1) Simple arithmetic mean from 16 leaves
2) Calculate the average per plant first and thencalculate the average over four plant means
Does this matter?
Calculate plant means
data beet;input plant @@;do leaf=1 to 4;input calcium @@; output;end;
datalines;1 3.28 3.09 3.03 3.032 3.52 3.48 3.38 3.383 2.88 2.80 2.81 2.764 3.34 3.38 3.23 3.26;proc means data = beet noprint;
by plant;var calcium;output out = plantmean mean=plantmean;
proc print;run;
Plant means
plant _FREQ_ plantmean
1 4 3.1075
2 4 3.4400
3 4 2.8125
4 4 3.3025
Global mean =
(3.1075 + 3.4400 + 2.8125 + 3.3025) / 4 = 3.166
Simple arithmetic mean from 16 values
proc means data = beet noprint;*by plant;var calcium;output out = plantmeanmean=plantmean;proc print;run;
_TYPE_ _FREQ_ plantmean
0 16 3.166
Statistic inference
3.166 is only an estimator for the real mean
> We want to calculate standard errors
1) Calculate the simple arithmetic mean and its standarderror from 16 leaves
2) Calculate the average per plant first and thencalculate the average over four plant means and itsstandard error2 a) With fixed plant effects
2 b) With random plant effects
Does this matter?
SAS PROC MIXED
Proc mixed data=beet; /*independent observations*/class plant;model Calcium = /ddfm=kr;estimate "independent" intercept 1 /cl;
run;
Proc mixed data=beet; /*fix plant effect*/class plant; model Calcium = plant / ddfm=kr;estimate "fix plants" intercept 1 plant 0.25 0.25 0.250.25 /cl;
run;
Proc mixed data=beet; /*random plant effect*/class plant;model Calcium = /ddfm=kr;random plant;estimate "random plants" intercept 1 /cl ;
run;
Three different results
StandardLabel Estimate Error DF Lower Upper
independent 3.166 0.064 15 3.030 3.301
fix plants 3.166 0.020 12 3.121 3.210
random plants 3.166 0.136 3 2.733 3.599
Three different results
StandardLabel Estimate Error DF Lower Upper
independent 3.166 0.064 15 3.030 3.301
fix plants 3.166 0.020 12 3.121 3.210
random plants 3.166 0.136 3 2.733 3.599
Independent: wrong! Because of correlation within plants
Fix plants: narrow inference space, only valid for the sampledplants
Random plants: broad inference space. Inference possible forpopulation of all plants (that have had a chance to be part of thesample)
different sources of random variation
StandardLabel Estimate Error DF Lower Upper
random plants 3.166 0.136 3 2.733 3.599
Covariance Parameter Estimates
plant 0.072380residual 0.006602
calcium
2.7
2.8
2.9
3.0
3.1
3.2
3.3
3.4
3.5
3.6
plant
1 2 3 4
different sources of random variation
StandardLabel Estimate Error DF Lower Upper
random plants 3.166 0.136 3 2.733 3.599
Covariance Parameter Estimates
plant 0.072380residual 0.006602
01851,016
006602,04
07238,0)(22
=+=+=rtr
Var leavesplant σσμ
136,001851,0)(.. === μVarES
Two ways to specify the correlation
y = Xβ + Zu + e
G and R Matrix
y = Xβ + e
only R Matrix
1)Proc mixed data=beet;class plant;model Calcium = /ddfm=kr;random plant;run;
2)Proc mixed data=beet;class plant;model Calcium = /ddfm=kr;repeated /subject=plant type=cs;run;
1) G and R matrix
Covariance Parameter Estimates
plant 0.072380residual 0.006602
IG 2
2
2
2
2
1000010000100001
07238.0
000000000000
G
G
G
G
G
σ
σσ
σσ
=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⋅=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=
Covariance between leavesfrom the same plant
IR 2
2
2
2
2
1000010000100001
0066.0
000000000000
e
e
e
e
e
σ
σσ
σσ
=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
⋅=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=
This is the R-matrix for 4 leaves of 1 plant. The total R is a 16 by 16 matrix
2) Only R matrix
Covariance Parameter Estimates
Cov Parm Subject Estimate
CS plant 0.07238Residual 0.006602
Covariancebetween leavesfrom the sameplant
JIR 22
22222
22222
22222
22222
plante
planteplantplantplant
plantplanteplantplant
plantplantplanteplant
plantplantplantplante
σσ
σσσσσσσσσσσσσσσσσσσσ
+=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
++
++
=
This structure is called „compound-symmetry“
Estimated R Matrix for plant 1
Row Col1 Col2 Col3 Col41 0.07898 0.07238 0.07238 0.072382 0.07238 0.07898 0.07238 0.072383 0.07238 0.07238 0.07898 0.072384 0.07238 0.07238 0.07238 0.07898
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
++
++
=
2222
2222
2222
2222
²²
²²
plantplantplantplant
plantplantplantplant
plantplantplantplant
plantplantplantplant
σσσσσσσσσσσσσσσσσσσσ
R
The complete V-Matrix, it‘s block diagonal
Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 Col161 0.07898 0.07238 0.07238 0.07238 0 0 0 0 0 0 0 0 0 0 0 02 0.07238 0.07898 0.07238 0.07238 0 0 0 0 0 0 0 0 0 0 0 03 0.07238 0.07238 0.07898 0.07238 0 0 0 0 0 0 0 0 0 0 0 04 0.07238 0.07238 0.07238 0.07898 0 0 0 0 0 0 0 0 0 0 0 05 0 0 0 0 0.07898 0.07238 0.07238 0.07238 0 0 0 0 0 0 0 06 0 0 0 0 0.07238 0.07898 0.07238 0.07238 0 0 0 0 0 0 0 07 0 0 0 0 0.07238 0.07238 0.07898 0.07238 0 0 0 0 0 0 0 08 0 0 0 0 0.07238 0.07238 0.07238 0.07898 0 0 0 0 0 0 0 09 0 0 0 0 0 0 0 0 0.07898 0.07238 0.07238 0.07238 0 0 0 0
10 0 0 0 0 0 0 0 0 0.07238 0.07898 0.07238 0.07238 0 0 0 011 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07898 0.07238 0 0 0 012 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07238 0.07898 0 0 0 013 0 0 0 0 0 0 0 0 0 0 0 0 0.07898 0.07238 0.07238 0.0723814 0 0 0 0 0 0 0 0 0 0 0 0 0.07238 0.07898 0.07238 0.0723815 0 0 0 0 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07898 0.0723816 0 0 0 0 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07238 0.07898
Every row or column represents one leaf, every 4 by 4 block represents one plant. The values on the diagonal are thevariance of single observations, added up from residual and plant variance. Beside the diagonal are the covariances of leaves from the same plant (= the plant variance).
Repeated measurements
Each plant is measured four times (4 leaves)
The plant is the „subject“ in repeated measurementsyntax and is measured „repeatedly“
Proc mixed data=beet;class plant;model Calcium = /ddfm=kr;repeated /subject=plant type=cs r rcorr;run;
covariance and correlation
Covariance Parameter Estimates
plant 0.072380residual 0.006602
Covariance between leavesfrom the same plant
92,0078982,007238,0
006602,007238,007238,0),( 22
2
' ==+
=+
=leavesplants
plantsjiji yycorr
σσσ
The higher the correlation, the lower the informationget from an additional observation from the same plant
Estimated R Correlation Matrix for plant 1
Row Col1 Col2 Col3 Col4
1 1.0000 0.9164 0.9164 0.91642 0.9164 1.0000 0.9164 0.91643 0.9164 0.9164 1.0000 0.91644 0.9164 0.9164 0.9164 1.0000
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=
44434241
43333231
42322221
41312111
ρρρρρρρρρρρρρρρρ
corrR
Example 3: Repeated measurements
Mineralisation in wheat after ploughing of grass-clover
Mineral N measured in 4 depths
3 different soil tillages
randomized complete blocks
no description
1 rotortiller (2 times), ploughing2 rotortiller (1 time), ploughing3 only ploughing
4 depths in the soil under a block design
Block I Block II Block III Block IV
Tiefe 1 Tiefe 2 Tiefe 3 Tiefe 4
How can we model repeated mesurements?
How to work out the correct linear model
See Piepho et al. (2004): A mixed modelapproach for randomized experiments withrepeated measures. JACS 190, 230-247.
1. step: set up model for single depth
yij = μ + bj + τi + eij
yij = Nmin-content of i-th tillage in j-th block (log-transformed!)
μ = global meanbj = effect of j-th blockτi = effect of i-th treatmenteij = error of ij-h plot
model over 4 depths
Block I Block II Block III Block IV
yij1 = μ1 + bj1 + τi1 + eij1
yij2 = μ2 + bj2 + τi2 + eij2
yij3 = μ3 + bj3 + τi3 + eij3
yij4 = μ4 + bj4 + τi4 + eij4
2. step: combine depths
yijt = μt + bjt + τit + eijt t =1,2,3,4: index for depth
factorial split-up of treatment effects:
μt + τit = μ + αi + βt + (αβ)it
αi = main effect of i-th tillage
βt = main effect of t-th depth
(αβ)it = interaction tillage × depth
complete model
yijt = μ + bj + (bβ)jt + αi + βt + (αβ)it + eijt
bj = block effect(bβ)jt = depth specific block effectαi = main effect of i-th tillageβt = main effect of t-th depth(αβ)it = interaction tillage × depth
Set up the linear model
y = Xβ + e
Observations from same plot are not independent, theyare correlated!
IR 2
2
2
2
2
000000000000
e
e
e
e
e
σ
σσ
σσ
=
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=
possible correlation structures
Compound Symmetry (CS):corr(eijs, eijt) = ρ
Autoregressiv [AR(1)]:corr(eijs, eijt) = ρ|s-t|
Unstructured (UN):Each depth has its own variance and covariances
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
++
++
=
22222
22222
22222
22222
ploteplotplotplot
plotploteplotplot
plotplotploteplot
plotplotplotplote
σσσσσσσσσσσσσσσσσσσσ
R
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=
11
11
23
2
2
32
2
ρρρρρρρρρρρρ
σ eR
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
=
24434241
34233231
24232221
14131221
e
e
e
e
σσσσσσσσσσσσσσσσ
R
compound symmetry
eijt = fij + gijt (Split-Plot model)
fij = ij-th plot effect
gijt = residual = depth specific deviation of plot effect
( ) 2g
2f
2f
ijtijs σσσρe,ecorr+
==
AR(1) model
0.00.10.20.30.40.50.60.70.80.91.0
0 1 2 3 4 5 6
Korrelation
Abstand |s-t|
ρ = 0,9
ρ = 0,3
ρ = 0,6
model selection
Modell -2 log L§ AIC$ (smaller is better)
homogeneous variances:Unabhängig 21,9 23,9CS 15,7 19,7AR(1) 13,9 17,9
Heterogeneous variances:independent 15,1 23,1CS 6,7 16,7AR(1) 6,4 16,4Unstructured -2,5 17,5
§ log L = log-Likelihood; p = Zahl der Varianzparameter$ AIC = Akaike Information Criterion = -2 log L + 2p
best model
MIXED Code
proc mixed;class depth block tillage;model log_nmin= depth block block*depth
tillage tillage*depth /ddfm=kr;repeated depth /sub=block*tillage type=cs;lsmeans tillage*depth /pdiff;run;
Observations from different plots („subjects“) are independent
„subject“ = individual unit, which is measured repeatedly
MIXED Code
proc mixed;class depth block tillage;model log_nmin= depth block block*depth
tillage tillage*depth /ddfm=kr;repeated depth /sub=block*tillage type=cs;/*repeated depth /sub=block*tillage type=ar(1);repeated depth /sub=block*tillage type=csh;repeated depth /sub=block*tillage type=arh(1);repeated depth /sub=block*tillage type=un; */lsmeans tillage*depth /pdiff;run;
Observations from different plots („subjects“) are independent
„subject“ = individual unit, which is measured repeatedly
Model selection
We don‘t know the true model!
There are no good or bad models, but somemodels are useful – to make inferences
Akaike's Information Criterion (AIC), AICC
Based on logLikelihood and penalized for numberof parameters
Selects good models with low complexity
Example 4: Split-Plot designs
Effect of different crops on yield of following cornvarieties.
PreCrops in mainplots
Corn varieties in subplots
Data from Dr. C. Böhmel
(Univ. Hohenheim)
PreCrop A PreCrop Bvariety 1 variety 2variety 2 block 1 variety 4variety 3 variety 1variety 4 variety 3
PreCrop B PreCrop Avariety 3 variety 4variety 2 block 2 variety 3variety 1 variety 2variety 4 variety 1
Complete trial design
A B C D E FS1 S6S2 S2S3 S8S4 S3S5 S1S6 S7S7 S9S8 S4S9 S5
D F A E C B
E D C B F A
bloc
k 1
bloc
k 2
bloc
k 3
3 blocks, 6 crops, 9 varieties
Linear Model for RCB
yijk = µ + ai + bj + rk + abij + eijk
yij yield of the j-th variety after i-th crop in k-th block
µ global mean or constant (fix)
ai effect of the i-th crop (fix)
bj effect of the j-th variety (fix)
rk effect of the k-th block (fix)
abij interaction between i-th crop and j-th variety (fix)
eijk residual error of the ijk-th plot ~N(0,σ²e)
Linear Model for Split-plot-design
yijk = µ + ai + bj + rk + abij + fik + eijk
yij yield of the j-th variety after i-th crop in k-th blockµ global mean or constant (fix)ai effect of the i-th crop (fix)bj effect of the j-th variety (fix)rk effect of the k-th block (fix)abij interaction between i-th crop and j-th variety (fix)fik error of the ik-th mainplot ~N(0, σ ²ra)eijk residual error of the ijk-th plot ~N(0,σ²e)
Why do we need the mainplot error?
1: the observations are not independent. Plots in same mainplot are closer to another, should bemore equal to another. -> Covariance betweenplots from same mainplot
2: Repeated measurement design/ mainplots aremeasured 9 times (9 varieties)
3: „Analyze as randomized“: Two steps of randomization. Each step is connected to oneeffect. Randomization units that not contain all treatments should be random.
SAS-Code (MIXED)
proc mixed data = corn;class block variety crop;model yield = block variety crop
variety*crop /ddfm=kr;random crop*block ;lsmeans variety*crop/pdiff;run;
alternative SAS-Code (MIXED)
proc mixed data = corn;class block variety crop;model yield = block variety crop
variety*crop /ddfm=kr;repeated/subject=crop*block type=cs;lsmeans variety*crop/pdiff;run;
Output (MIXED)
Covariance ParameterEstimates
Cov Parm Estimate
block*crop 22.47Residual 275.67
Type 3 Tests of Fixed Effects
Num DenEffect DF DF F Value Pr > F
Block 2 10 2.88 0.1029Variety 8 96 15.10 <.0001Crop 5 10 1.17 0.3878Variety*Crop 40 96 1.22 0.2185
Covariance between yieldsfrom the same mainplot
Analysis with software for fixed effects model (Proc GLM)
proc glm data=corn; class block variety crop;model yield = block variety crop
variety*crop crop*block;random crop*block / test;lsmeans variety*crop;run;
Output GLM “Tests of Hypotheses for Mixed Model Analysis of Variance”
Same F-values as with MIXED
source df SS MS Fvalue ProbFblock 2 2752 1376.0 2.88 0.103variety 8 33293 4161.6 15.10 <.0001crop 5 2796 559.2 1.17 0.388variety*crop 40 13407 335.2 1.22 0.219block*crop 10 4779 477.9 1.73 0.084error 96 26465 275.7
Drawback of Proc GLM
F-tests are correct (when using the „/test“ – option),
but standard errors of difference for meancomparisons mostly not!
Standard error of difference for variety comparisonsafter different crops:
( ) ( ) 76,1983
67,27547,2222)(
22
'' =+⋅
=+⋅
=−r
yyVar erajiji
σσ
sed = sqrt(198.76) = 14.09
Approximation of denominator df
Satterthwaite-approximation
²)(²)1()1(]²)1[()1()1(
rae
rae
MQaMQbaMQMQbaraDDF
⋅+⋅−⋅−+⋅−⋅−⋅−⋅
=
proc mixed data = corn;class block variety crop;model yield = block variety
crop variety*crop /ddfm=kr;random crop*block ;lsmeans variety*crop/pdiff;run; Method of
Kenward & Roger
correct sed with MIXED
Differences of Least Squares MeansStandard
Effect V Crop V Crop Estimate Error DF tValue Pr>|t|
Crop 1 2 0.0846 5.9499 10 0.01 0.9889Crop 1 3 -3.7664 5.9499 10 -0.63 0.5409V 1 2 -19.3639 5.5345 96 -3.50 0.0007V 1 3 -33.9237 5.5345 96 -6.13 <.0001V*Crop 1 1 1 2 -13.8173 14.0983 98 -0.98 0.3295V*Crop 1 1 1 3 -4.5292 14.0983 98 -0.32 0.7487V*Crop 1 1 2 1 -34.3912 13.5566 96 -2.54 0.0128V*Crop 1 1 3 1 -28.6052 13.5566 96 -2.11 0.0375
Split plot with repeated measurements
At four different dates some plants out of each plotwere harvested
How can we model this?
Repeated measurement for mainplots and subplots!
First step
Take the model for one single date:
proc mixed data = corn;class block variety crop;model yield = block variety crop
variety*crop /ddfm=kr;random crop*block ;lsmeans variety*crop/pdiff;run;
Second step
Add a fixed time effect
proc mixed data = corn;class block variety crop date;model yield = block variety crop datevariety*crop variety*datevariety*crop*date/ddfm=kr;random crop*block ;lsmeans variety*crop*date/pdiff;run;
third step
Add a covariance structure for the main- and subplots-The structure for mainplots in the G-matrix
-The structure for subplots in the R-matrix
proc mixed data = corn;class block variety crop date;model yield = block variety crop date variety*crop variety*date variety*crop*date/ddfm=kr;random date / subject=crop*block type=cs;repeated date / subject=crop*block*variety
type=cs;lsmeans variety*crop*date/pdiff;run;
Output
Covariance Parameter Estimates
Cov Parm Subject Estimate
Variance WDH*ZF 70.2625CS WDH*ZF 72.0857CS WDH*Sorte*ZF 81.8311Residual 326.37
Output G-matrix
Estimated G Matrix
Row block crop date Col1 Col2 Col3 Col4
1 date 1 1 167 142.35 72.0857 72.0857 72.08572 date 1 1 182 72.0857 142.35 72.0857 72.08573 date 1 1 193 72.0857 72.0857 142.35 72.08574 date 1 1 207 72.0857 72.0857 72.0857 142.35
Estimated G Correlation Matrix
Row block crop date Col1 Col2 Col3 Col4
1 date 1 1 167 1.0000 0.5064 0.5064 0.50642 date 1 1 182 0.5064 1.0000 0.5064 0.50643 date 1 1 193 0.5064 0.5064 1.0000 0.50644 date 1 1 207 0.5064 0.5064 0.5064 1.0000
Output R-matrix
Estimated R Matrix for block*crop 1 1
Row Col1 Col2 Col3 Col41 408.20 81.8311 81.8311 81.83112 81.8311 408.20 81.8311 81.83113 81.8311 81.8311 408.20 81.83114 81.8311 81.8311 81.8311 408.20
Estimated R Correlation Matrix for block*crop 1 1
Row Col1 Col2 Col3 Col41 1.0000 0.2005 0.2005 0.20052 0.2005 1.0000 0.2005 0.20053 0.2005 0.2005 1.0000 0.20054 0.2005 0.2005 0.2005 1.0000
fourth step
Compound Symmetry: type = CS
Select the best covariance structure
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
++
++
1111
1111
1111
1111
²²
²²
σσσσσσσσσσσσσσσσσσσσ
Select the best covariance structure
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
11
11
²
23
2
2
32
ρρρρρρρρρρρρ
σ
Autoregressiv: type = AR(1)
Select the best covariance structure
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
24342414
43232313
42322212
41312121
σρσσρσσρσσρσσσρσσρσσρσσρσσσρσσρσσρσσρσσσ
Heterogenous CS: type = CSH
Select the best covariance structure
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
2434
224
314
432323
213
24232
2212
341
23121
21
σρσσρσσρσσρσσσρσσρσσρσσρσσσρσσρσσρσσρσσσ
Heterogenous AR(1): type = ARH(1)
Select the best covariance structure
⎟⎟⎟⎟⎟
⎠
⎞
⎜⎜⎜⎜⎜
⎝
⎛
24342414
43232313
42322212
41312121
σσσσσσσσσσσσσσσσ
Unstrukturiert: type = UN
Unstructured showed best AIC
CS AIC (smaller is better) 5570.1
CSH AIC (smaller is better) 5538.7
AR(1) AIC (smaller is better) 5561.2
ARH(1)AIC (smaller is better) 5526.1
UN AIC (smaller is better) 5516.4
last step
Perform hypotheses tests, calculate means and standard errors with the chosen covariancestructure
Example 5: Random coefficient models
A sugar beet variety trial series underinfestation with Rhizoctonia solani
Infestation measured in % necrotic rootsurface
4 tolerant varieties, 1 susceptible varietiy
17 sites in 2002
Data provided from German Institute of sugar beetresearch.
Ladewig & Lukashyk (proceedings, p. 98-103)
Relation between infestation and yield, one site
0
2
4
6
8
10
12
14
16
0 10 20 30 40 50 60 70 80
Befallsdruck (% verfaulte Rübenkörperoberfläche Indikatorsorte)
Ber
eini
gter
Zuc
kere
rtrag
(t/h
a)
"Fabiola""Premiere""Syncro""Solea""Impuls"
BZE von vier rhizoctonia-toleranten Zuckerrübensorten in Abhängigkeitvom Rhizoctonia-Befallsdruck. Standort: Gangelt 2002 (LIZ Jülich)
variety 1variety 2variety 3variety 4variety 5 (susceptible)
% infestation susceptible variety
yield
ANCOVA-Model
infestation (x)
yield (y)
µ
ai
αi{
xyßδδ
=
regression of variety ixyßiδδ
=
generalregression
xµxayE iiii )()( γβαβ +++=+=
[ ] ijkjkijkijjiijjiijk ebxvvuuy +++++++++= )()( γγβααμ
µ = interceptαi = effect of i-th varietyuj = effect of j-h site (random)(αu)ij = interaction variety by site (random)xijk = infestation of ijk-th plotß = general regression between infestation and yieldγi = deviation of i-th variety from regressionvj = devitation of j-th site from regression (random)(γv)ij = deviation from regression due to interaction (random)bjk = effect of k-th block at j-th site (random)eijk= residual error
Complete model for the trial series
„Random Coefficient Model“
Random intercept (site effects) and randomdeviations from mean regression
Regressions are random sample from a population of regression lines
• Laird & Ware, Biometrics 1982; • Longford 1993, Random
Coefficient Models; • Wolfinger 1996, JABES; • Littell et al. 1996, SAS System for
Mixed Models
Σ1 und Σ2 unstructured variance-covariance-matrices
⎟⎟⎠
⎞⎜⎜⎝
⎛==⎟⎟
⎠
⎞⎜⎜⎝
⎛2
2
varvuv
uvu
j
j
vu
σσσσ
1Σ
⎟⎟⎠
⎞⎜⎜⎝
⎛==⎟⎟
⎠
⎞⎜⎜⎝
⎛2
,
,2
)()(
varvvu
vuu
ij
ij
vu
γγα
γαα
σσσσ
γα
2Σ
Model
• allowing for covariance between intercept and slope makes parameter estimates „resistent“against scale effects
Variance of site-intercepts(σ²u) and site-slopes (σ²ν); σuν = covariance
SAS-Code (1)
Proc mixed data=gesamt method=reml maxiter=100;class site variety block;model yield = variety variety*infestation/ solutionnoint ddfm=kr;random int infestation / subject= site type=unr ;random int infestation / subject= site*variety
type=unr ;random block/subject= site;
lsmeans variety /pdiff at infestation =0;lsmeans variety /pdiff at infestation =0.1;lsmeans variety /pdiff at infestation =0.3;lsmeans variety /pdiff at infestation =0.5;lsmeans variety /pdiff at infestation =1;ods output lsmeans=estimates_yield ;ods output diffs=diffs_yield ;
run;
SAS-Output (1)
WARNING: Did not converge.
Covariance Parameter ValuesAt Last Iteration
Cov Parm Subject Estimate
UN(1,1) site 1.5704UN(2,1) site -0.5173UN(2,2) site -0.4490Var(1) site*variety 0.2140Var(2) site*variety 1.4670Corr(2,1) site*variety 8.13E-34Block site 0.02270Residual 1.4028
• Change scaling of covariate so (additition, subtraction, division, multiplication) that variances on diagonal of variance-cov-matrices have equal dimension
• Reduction of model
• try models with heterogenous errors
Model selection to reach convergence
SAS-Code (2)
data gesamt; set Gesamt;infestation_v=((infestation)-0)/100; run;Proc mixed data=gesamt method=reml maxiter=100;
class site variety block;model yield = variety variety*infestation/ solution nointddfm=kr;
* random int infestation/ subject=site type=unr ;random int infestation/ subject=site*variety type=unr ;random block/subject=site;parms (2)(2)(0.2)(0.03)(1.3);
lsmeans variety/pdiff at infestation=0;lsmeans variety/pdiff at infestation=0.1;lsmeans variety/pdiff at infestation=0.3;lsmeans variety/pdiff at infestation=0.5;lsmeans variety/pdiff at infestation=1;ods output lsmeans=estimates_yield ;ods output diffs=diffs_yield ;
run;
Covariance Parameter Estimates
Cov Parm Subject Estimate
Var(1) site*variety 2.8899Var(2) site*variety 2.3892Corr(2,1) site*variety -0.6637Block site 0.03468Residual 1.3555
StandardEffect variety Estimate Error DF t Value Pr > |t|
variety Fabiola 9.2547 0.4779 59.6 19.36 <.0001variety Impuls 10.2814 0.4558 56.7 22.56 <.0001variety Premiere 10.0701 0.4797 60.5 20.99 <.0001variety Solea 10.1740 0.5997 60.9 16.97 <.0001variety Syncro 9.8342 0.4972 60.2 19.78 <.0001infestation*variety Fabiola -3.5664 0.7648 20 -4.66 0.0001infestation*variety Impuls -7.9021 0.7013 19 -11.27 <.0001infestation*variety Premiere -1.6013 0.8433 23.1 -1.90 0.0701infestation*variety Solea -1.8531 1.0612 32.6 -1.75 0.0902infestation*variety Syncro -0.6893 0.8913 21.7 -0.77 0.4476
Num DenEffect DF DF F Value Pr > F
variety 5 59.5 397.51 <.0001infestation*variety 5 22.5 30.53 <.0001
site x varietysite x variety x infest.Corr between interceptand slope
SAS-Output (2): lsmeans
variety 1sus.varietyvariety 2variety 4variety 3variety 1sus.varietyvariety 2variety 4variety 3
expected values, 17 sites, 2002
0
2
4
6
8
10
12
0 10 20 30 40 50 60 70 80 90 100
Befallsdruck [% verfaulte Rübenkörperoberfläche Indikatorsorte]
Ber
eini
gter
Zuc
kere
rtrag
[t/h
a]
FabiolaImpulsPremiereSoleaSyncro
Fabiola
Impuls
PremiereSyncro
Solea
error bar = mean standard error of difference
susceptible
variety 3variety 2variety 4
variety 1variety 1suscep. varietyvariety 2variety 4variety 3
data 2005, centred for site effects
High residual error, due to small plots (30 plants)
Example 6: Spatial models
Several examples for spatial models in thissymposium (e.g. SCHNEIDER et al. page 209 Proceedings)
What are thebasic methodsof spatialstatistics?
Starting point
Spatial statistics can be seen as an multidimensional extension of repeatedmeasurements. [...] Observations are correlatedin two spatial (mostly) dimensions.
LITTELL et al. 1996
• The nearer two points are together, the higherthe correlation should be
• Correlation depends on distance
• Several models possible to describe thisdistance-correlation dependency
Semivariance: a measure of similarity
h = a distance „class“
νi, νj = measurements at single observations
∑=
−=hhji
jiij
vvhN
h)|,(
)²()(2
1)(γ
The higher the correlation, the lower the semi-variance
Anisotropic Variogram
dotted = East west, solid = north-south
sill
nugget
Some models to describe the empirical variogram
)/exp()( ρijij ddf −=
0)(;(22
31)( 3
3
=≤⎥⎥⎦
⎤
⎢⎢⎣
⎡+−= ijij
ijijij dfelsed
dddf ρ
ρρ
)/exp()( 22 ρij
ddf ij −=
dij = distance between two points i and j
ρ = an estimable parameter (the correlation between points)
Exponential, Spherical and Gaussian model
How to fit spatial models with SAS: A uniformity trial
data spatvar;input rep bloc row col yield;datalines;1 4 1 1 10.54111 4 1 2 8.58061 2 1 3 11.27901 2 1 4 12.43441 4 2 1 10.34161 4 2 2 11.3103...4 15 7 7 12.02534 15 7 8 10.42984 14 8 5 7.32204 14 8 6 10.51044 15 8 7 12.68084 15 8 8 10.4482;
The spherical model
proc mixed scoring=5;
model yield= ;
parms (0 to 10 by 2.5) (1 to 10 by 3);
repeated / subject=intercept type=sp(sph)(row col);
run;
Output for spherical model
Covariance Parameter Estimates
Cov Parm Subject Estimate
SP(SPH) Intercept 2.7120Residual 3.2607
Fit Statistics
-2 Res Log Likelihood 227.5AIC (smaller is better) 231.5AICC (smaller is better) 231.7BIC (smaller is better) 235.8
Output for spherical model
Covariance Parameter Estimates
Cov Parm Subject Estimate
SP(SPH) Intercept 2.7120Residual 3.2607
Fit Statistics
-2 Res Log Likelihood 227.5AIC (smaller is better) 231.5AICC (smaller is better) 231.7BIC (smaller is better) 235.8
0)(;(22
31)( 3
3
=≤⎥⎥⎦
⎤
⎢⎢⎣
⎡+−= ijij
ijijij dfelsed
dddf ρ
ρρ
Spherical model with nugget effect
proc mixed scoring=50 convh=1e-06;model yield= ;parms (0 to 10 by 2.5)
(1 to 10 by 3) (0.05 to 1.05 by 0.25);repeated / subject=intercept local
type=sp(sph)(row col);run;
Output for spherical model with nugget
Covariance Parameter Estimates
Cov Parm Subject Estimate
Variance Intercept 3.2189SP(SPH) Intercept 2.7162Residual 0.02144
Fit Statistics
-2 Res Log Likelihood 227.5AIC (smaller is better) 233.5AICC (smaller is better) 233.9BIC (smaller is better) 240.0
Some take home messages
An effect is (should be) random when it is ...
... related to a randomization step
... a random sample from a population (of effects)
... models the covariance between observations
Adress the mixed model structure in R or G matrix
Different software available (not only SAS)
Model selection important (the mixed world is morecomplex than the fixed effects models)