Software for Linear Mixed Models

Mixed Models

Symposium Agricultural field trials10.10.2007

University of Hohenheim

Dr. sc. agr. Andreas Büchse07/2002 - 04/2007 Fachgebiet Bioinformatik. Universität Hohenheimsince 05/2007 Agrarzentrum Limburgerhof. BASF Aktiengesellschaft

[email protected]

Outline

1. Definition of mixed models

2. Examples for use (usefulness) of mixed models2.1 Series of experiments

2.2 Hierarchical Designs

2.3 Repeated measurement

2.4 Split-Plot, multifactorial trials

2.5 Random coefficient models

2.6 Spatial correlation

3. Summary, Conclusions, rules of thumb

First...

I will put the slides on the symposium homepage

www.uni-hohenheim.de/bioinformatik

next week

Or send an eMail to

[email protected]

Literature

Littell, Milliken, Stroup, Wolfinger (1996): SAS System for Mixed Models. SAS Institute, Cary, NC. 2nd ed. 2006!

Verbeke, G & G. Molenberghs, 2000: Linear Mixed Models for Longitudinal Data. Springer-Verlag, New York.

Piepho, H.P., Büchse, A., and Emrich, K. (2003): A hitchhiker's guide to the mixed model analysis of randomized experiments. Journal of Agronomy and Crop Science, 189, 310-322

Piepho, H.P., Büchse, A., Richter, C. (2004): A mixed modelling approach to randomized experiments with repeated measures. Journal of Agronomy and Crop Science, 190, 230-247

Zimmerman, D. L. & D. A. Harville, 1991: A Random Field Approach to the Analysis of Field-Plot Experiments and Other Spatial Experiments. Biometrics 47: 223-23.

Longford, N. T., 1993: Random Coefficient Models. Oxford University Press.

Definition

What does this mean, „mixed“ model?

a variety trial

One location, several varieties, e.g. 4 replications

Randomized Complete Block Design

fixed effects model

Model of a randomized complete block design

yik = μ + bk + αi + eik

yij = yield of the i-th variety in the k-th blockμ = constant (fix)bk = effect of the k-th block (fix)αi = effect of the i-th variety (fix)eik = residual error ~N (0, σ²e)

Only one random effect ⇒ no mixed model

a variety trial (again)

One location, several varieties, maybe from a cross of twolines, e.g. 4 replicationsRandomized Complete Block Design

a variety trial (again)

If the breeder is interested in estimating response to selection

R = i h σG

R = response to selectioni = selection intensityh = sqrt of heritabilityσG = sqrt of genetic variance

Then, it is reasonable to treat variety effects as randomeffects (sampled from a Gaussian distribution)

mixed model

Model of a randomized complete block design, withrandom variety effects

yik = μ + bk + αi + eik

yik = yield of the i-th variety in the k-th blockμ = constant (fix)bk = effect of the k-th block (fix)αi = effect of the i-th variety ~N (0, σ²G)eik = residual error ~N (0, σ²e)

more than one random effect ⇒ mixed model !

Definition

A mixed linear model is a model that have morethan one random effect (not only the residual error).

It has random and fixed effects.

Random = random sample from a (normal) distribution

Fixed effects versus random effects

Interest in linear models [...] lies mainly in estimating (and testing hypotheses about) linear functions of the effects in the models. These effects are what we call fixed effects, and the models are correspondingly called fixed effectsmodels. There are, however, situations where we have no interestin linear functions of effects but where, by the nature of the data and their derivation, the things of prime interestconcerning the effects are variances. Effects of this nature are called random effects [...] and the models are calledrandom effects models.Models involving a mixture of fixed and random effects arecalled mixed models.

SEARLE, Linear models, 1971

Fixed or random?

Are inferences going to be drawn from these dataabout just these levels of the factor?

„Yes“ – then the effects are to be considered as fixed effects

„No“ – then, presumably, inferences will be madenot just about the levels occurring in the data butabout some population of levels [...] the effectsare considered as being random.

SEARLE, Linear models, 1971

Fix or random?

Levels of factor aresample from a randomdistribution?

no fix factor estimate fixedeffects (BLUE)

yes

randomfactor

What isthe maininterest?

only distribution of random effects

Estimate variancecomponents (REML)

distribution of randomeffects as well as effectsthemselves

Estimate variancecomponents (REML) and realized random effects(BLUP)

Are years random? (SEARLE, Linear models, 1971)

[...] take the case of year effects, for example in studying wheat yields: are the effects of years on yield to be considered fixed or random?

The years themselves are unlikely to be random, for they will probably be a group of consecutiveyears over which the data have been gathered orthe experiments run.

But the effects on yield may reasonably beconsidered random - unless, perhaps, one isinterested in comparing specific years for somepurpose.

Are blocks fix or random?

Everlasting story...

• With fix blocks s.e.m. is underestimated

• For s.e.d. it does not matter!

• With random blocks, in small experiments, the variancecomponent is sometimes estimated <0, which results in model reduction

• > my opinion: blocks fix is OK.

Some examples

• Genotypic lines in a breeding experiment (random)• Varieties in a VCU-trial (fix, or random with Shrinkage ->

BLUP, see Michel & Piepho)• Different doses of plant protection agents (fix)• Different agents at same level (fix)• Locations (fix or random)• Regions (fix)• Years (random, mostly)• Complete blocks (both is possible)• Incomplete blocks (both is possible, random allows use of

interblock information (see Williams, Proceedings p. 266)

some theory

General form of a mixed model in matrix notation

y = Xβ + Zu + e

y = a vector with n observations

X, Z = Designmatrices

β = a vector of fixed effectsu = a vector of random effectse = a vector of residuals

e ∼ N(0,R) u ∼ N(0,G)

R, G = variance-covariance matrices

General form of a mixed model in matrix style

y = Xβ + Zu + e e ∼ N(0,R) u ∼ N(0,G)

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

+⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛⋅

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

+⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛⋅

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

=

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

6

5

4

3

2

1

3

2

1

2

1

100001010100010001

101101101011011011

141515151612

eeeeee

bb

αααμ

6 plots, 2 fixed block effects, 3 random variety effects

G: the variance-covariance-matrix of random effects

A 3 by 3 matrix because we have 3 varieties

Genetic variance on the diagonal

No covariance between genotypic effects

IG 22

2

2

2

100010001

000000

GG

G

G

G

σσσ

σσ

=⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛=

⎟⎟⎟

⎠

⎞

⎜⎜⎜

⎝

⎛

=

R: the residual variance-covariance-matrix (iid)

6 by 6 matrix because we have 6 plots.Residual variance on the diagonalNo covariance (in case of independent errors)This is in every case the form of R in a fixed effects model. But don‘t have to be in a mixed model. In a mixed model alternative forms of R are possible, we will see later.

IR 22

2

2

2

2

2

2

100000010000001000000100000010000001

000000000000000000000000000000

ee

e

e

e

e

e

e

σσ

σσ

σσ

σσ

=

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

=

⎟⎟⎟⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜⎜⎜⎜

⎝

⎛

=

Estimation

The fixed and random effects are estimated withthe method of WEIGHTED LEAST SQUARES

Weights depend on variance components (G, R)

Normally the variance components are unknownand have to be estimated themselves

An accepted method for the estimation is REML(Patterson & Thompson 1971)

Estimation of random and fixed effects

( ) yVX'XVX'β 11 −−−=ˆ

Weighted least square mean estimation of:

V = ZGZ' + R

For more details see e.g. Littell et al. „SAS-System for MIXED Models“and cited literature

Difficult matrix equations, but the computer does thisfor us ☺

Software for Linear Mixed Models

SAS PROC MIXED

MIXED Procedure in SPSS

GenStat

ASReml

S-Plus

R

Several other packages (Statistica, Systat, ...)

Some examples for using mixed models

Example 1: Trial series

A cultivar performance trial14 oat varieties, 18 sites

Model for a single siteyik = μ + bk + αi + eik

yik = yield of the i-th variety in the k-th blockμ = constant (fix)bk = effect of the k-th block (fix)α i = effect of the i-th variety (fix)eik = residual error ~N (0, σ²e)

From single trial to a series

yik = μ + αi + bk + eik add index j for sites

yijk = μj + αij + bjk + eijk

yijk = yield of the i-th variety in the k-th block at site jμj = mean yield at site j (fix or random?)αij = effect of the i-th variety at site j (fix?) bjk = effect of the k-th block at site j (fix?)eijk = residual error ~N (0, σ²e)

Main effects for site and variety

yijk = μj + αij + bjk + eijk = μ + αi + βj + (αβ)ij + bjk + eijk

yijk = yield of the i-th variety in the k-th block at site jμ = global meanαi = effect of the i-th varietyβj = effect of the j-th site(αβ)ij = interaction of the i-th variety by the j-th sitebjk = effect of the k-th block at site jeijk = residual error ~N (0, σ²e)

Which effects should be random?

have sites fixed or random effects?

Aim of the series?

want to make inferences for a region or a state (in thisexample whole Germany) -> random

Are the sites a random sample from „a population“ of sitesin Germany?

The sites are mostly no random sample, but maybe thesite effects are.

(for single years site effects are confounded with site byyear interaction!)

Are variety effects fixed or random?

Aim of the series?

want to know about performance of the testedvarieties

-> fix

The varieties are no random sample from „a population“ of varieties

mixed model

yijk = μ + αi + βj + (αβ)ij + bjk + eijk

sites: randomvarieties: fixedsite × variety: random (sites are random)blocks within site: random (sites are random)

βj ∼ N(0, σ²β)(αβ)ij ∼ N(0, σ²αβ)bjk ∼ N(0, σ²b)eijk ∼ N(0, σ²e)

SAS Code for trial series

data t; input variety site block yield; datalines; 1 1 1 58.1300 1 1 2 60.8000 <more data> 14 18 4 60.0000 ; proc mixed data=t; class site variety block; model yield= variety / ddfm=kenwardroger; random site variety*site block(site);lsmeans variety/pdiff; run;

Denominator degrees of freedom according to themethod of Kenward & Roger

Output

Cov Parm Estimate

site 82.1917site*variety 5.9844block(site) 8.0897Residual 9.4411

Type 3 Tests of Fixed Effects

Num DenEffect DF DF F Value Pr > F

variety 13 221 8.67 <.0001

This are thevariancecomponents

OutputLeast Squares Means

StandardEffect variety Estimate Error DF t Value Pr > |t|

variety 1 53.6514 2.2676 20.2 23.66 <.0001variety 2 51.0719 2.2676 20.2 22.52 <.0001variety 3 55.6722 2.2676 20.2 24.55 <.0001variety 4 56.2379 2.2676 20.2 24.80 <.0001variety 5 55.9819 2.2676 20.2 24.69 <.0001variety 6 53.6103 2.2676 20.2 23.64 <.0001variety 7 53.8129 2.2676 20.2 23.73 <.0001variety 8 52.3715 2.2676 20.2 23.10 <.0001variety 9 58.4689 2.2676 20.2 25.78 <.0001variety 10 56.5832 2.2676 20.2 24.95 <.0001variety 11 53.8617 2.2676 20.2 23.75 <.0001variety 12 57.2378 2.2676 20.2 25.24 <.0001variety 13 54.7246 2.2676 20.2 24.13 <.0001variety 14 53.7700 2.2676 20.2 23.71 <.0001

Output

Differences of Least Squares Means

StandardEffect variety _variety Estimate Error DF t Value Pr > |t|

variety 1 2 2.5794 0.9629 221 2.68 0.0079variety 1 3 -2.0208 0.9629 221 -2.10 0.0370variety 1 4 -2.5865 0.9629 221 -2.69 0.0078variety 1 5 -2.3306 0.9629 221 -2.42 0.0163variety 1 6 0.04111 0.9629 221 0.04 0.9660variety 1 7 -0.1615 0.9629 221 -0.17 0.8669variety 1 8 1.2799 0.9629 221 1.33 0.1852variety 1 9 -4.8175 0.9629 221 -5.00 <.0001variety 1 10 -2.9318 0.9629 221 -3.04 0.0026variety 1 11 -0.2103 0.9629 221 -0.22 0.8273variety 1 12 -3.5864 0.9629 221 -3.72 0.0002variety 1 13 -1.0732 0.9629 221 -1.11 0.2663variety 1 14 -0.1186 0.9629 221 -0.12 0.9021...variety 13 14 0.9546 0.9629 221 0.99 0.3226

allow heterogenous error for each site

proc mixed data=t lognote; class site variety block; model yield= variety / ddfm=kenwardroger; random site variety*site block(site);

repeated / group=site;lsmeans variety/pdiff; run;

CPU-time effect of different syntax

/*heterogenous errors - long cpu time*/proc mixed data=t lognote; class site variety block; model yield = variety / ddfm=kenwardroger; random site variety*site block(site); large G-matrixrepeated / group=site;run;

/*heterogenous errors - short cpu time*/proc mixed data=t lognote; class site variety block; model yield = variety / ddfm=kenwardroger; random int variety block / subject=site;repeated / group=site;run;

allow heterogenous error for each siteCov Parm Subject Group Estimate

Intercept site 82.3446variety site 6.0686block site 7.7449Residual site 1 3.2903Residual site 2 5.3894Residual site 3 11.0351Residual site 4 5.9991Residual site 5 3.5523Residual site 6 9.7576Residual site 7 5.9514Residual site 8 4.7286Residual site 9 9.7070Residual site 10 15.2087Residual site 11 6.3460Residual site 12 5.3596Residual site 13 7.2015Residual site 14 7.9454Residual site 15 44.0650Residual site 16 4.9044Residual site 17 7.6436Residual site 18 11.1075

Model selection

Homogenous errors or heterogenous errors?

Comparison of AIC (Akaike criterion)

Fit Statistics heterogeous homogenous

-2 Res Log Likelihood 5471.4 5635.7AIC (smaller is better) 5513.4 5643.7AICC (smaller is better) 5514.3 5643.7BIC (smaller is better) 5532.1 5647.2

Model selection

Likelihood Ratio test

Fit Statistics heterogeous homogenous

-2 Res Log Likelihood 5471.4 5635.7

The log-Likelihood of heterogenous is higher

T = 164.3

Compare to a Chi²-distribution at 17 df.

Chi²(5%, 17) = 27.6 ⇒ significantly better

outlook

Several different structures for genotype-environmentinteraction possible

Piepho, H.-P., 1997: Analyzing genotype-environment data by mixedmodels with multiplicative effects. Biometrics 53: 761-766.

F. v. Eeuwijk: Statistical methods for the analysis of seriesof plant breeding experiments. (Proceed. of this meeting, 242 ff.)

2-step-analysis: Estimate means „variety×site“ in first step. Then analysethis means (not the plot values) over all sites in 2nd step

Michel et al. (Proceedings 136-141)

Example 2: Hierarchical designs

(analogous to a example from Snedecor & Cochran, Statistical methods, 1967)

plant calcium content [%] of single leaves

1 3.28 3.09 3.03 3.03

2 3.52 3.48 3.38 3.38

3 2.88 2.80 2.81 2.76

4 3.34 3.38 3.23 3.26

Calcium content of the four plants

calcium

2.7

2.8

2.9

3.0

3.1

3.2

3.3

3.4

3.5

3.6

plant

1 2 3 4

Statistic inference

What is the best estimator for average Ca-content?

1) Simple arithmetic mean from 16 leaves

2) Calculate the average per plant first and thencalculate the average over four plant means

Does this matter?

Calculate plant means

data beet;input plant @@;do leaf=1 to 4;input calcium @@; output;end;

datalines;1 3.28 3.09 3.03 3.032 3.52 3.48 3.38 3.383 2.88 2.80 2.81 2.764 3.34 3.38 3.23 3.26;proc means data = beet noprint;

by plant;var calcium;output out = plantmean mean=plantmean;

proc print;run;

Plant means

plant _FREQ_ plantmean

1 4 3.1075

2 4 3.4400

3 4 2.8125

4 4 3.3025

Global mean =

(3.1075 + 3.4400 + 2.8125 + 3.3025) / 4 = 3.166

Simple arithmetic mean from 16 values

proc means data = beet noprint;*by plant;var calcium;output out = plantmeanmean=plantmean;proc print;run;

_TYPE_ _FREQ_ plantmean

0 16 3.166

Statistic inference

3.166 is only an estimator for the real mean

> We want to calculate standard errors

1) Calculate the simple arithmetic mean and its standarderror from 16 leaves

2) Calculate the average per plant first and thencalculate the average over four plant means and itsstandard error2 a) With fixed plant effects

2 b) With random plant effects

Does this matter?

SAS PROC MIXED

Proc mixed data=beet; /*independent observations*/class plant;model Calcium = /ddfm=kr;estimate "independent" intercept 1 /cl;

run;

Proc mixed data=beet; /*fix plant effect*/class plant; model Calcium = plant / ddfm=kr;estimate "fix plants" intercept 1 plant 0.25 0.25 0.250.25 /cl;

run;

Proc mixed data=beet; /*random plant effect*/class plant;model Calcium = /ddfm=kr;random plant;estimate "random plants" intercept 1 /cl ;

run;

Three different results

StandardLabel Estimate Error DF Lower Upper

independent 3.166 0.064 15 3.030 3.301

fix plants 3.166 0.020 12 3.121 3.210

random plants 3.166 0.136 3 2.733 3.599

Three different results


independent 3.166 0.064 15 3.030 3.301

fix plants 3.166 0.020 12 3.121 3.210

random plants 3.166 0.136 3 2.733 3.599

Independent: wrong! Because of correlation within plants

Fix plants: narrow inference space, only valid for the sampledplants

Random plants: broad inference space. Inference possible forpopulation of all plants (that have had a chance to be part of thesample)

different sources of random variation


random plants 3.166 0.136 3 2.733 3.599

Covariance Parameter Estimates

plant 0.072380residual 0.006602

calcium

2.7

2.8

2.9

3.0

3.1

3.2

3.3

3.4

3.5

3.6

plant

1 2 3 4

different sources of random variation


random plants 3.166 0.136 3 2.733 3.599



01851,016

006602,04

07238,0)(22

=+=+=rtr

Var leavesplant σσμ

136,001851,0)(.. === μVarES

Two ways to specify the correlation

y = Xβ + Zu + e

G and R Matrix

y = Xβ + e

only R Matrix

1)Proc mixed data=beet;class plant;model Calcium = /ddfm=kr;random plant;run;

2)Proc mixed data=beet;class plant;model Calcium = /ddfm=kr;repeated /subject=plant type=cs;run;

1) G and R matrix



IG 2

2

2

2

2

1000010000100001

07238.0

000000000000

G

G

G

G

G

σ

σσ

σσ

=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

⋅=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

Covariance between leavesfrom the same plant

IR 2

2

2

2

2

1000010000100001

0066.0

000000000000

e

e

e

e

e

σ

σσ

σσ

=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

⋅=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

This is the R-matrix for 4 leaves of 1 plant. The total R is a 16 by 16 matrix

2) Only R matrix


Cov Parm Subject Estimate

CS plant 0.07238Residual 0.006602

Covariancebetween leavesfrom the sameplant

JIR 22

22222

22222

22222

22222

plante

planteplantplantplant

plantplanteplantplant

plantplantplanteplant

plantplantplantplante

σσ

σσσσσσσσσσσσσσσσσσσσ

+=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

++

++

=

This structure is called „compound-symmetry“

Estimated R Matrix for plant 1

Row Col1 Col2 Col3 Col41 0.07898 0.07238 0.07238 0.072382 0.07238 0.07898 0.07238 0.072383 0.07238 0.07238 0.07898 0.072384 0.07238 0.07238 0.07238 0.07898

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

++

++

=

2222

2222

2222

2222

²²

²²

plantplantplantplant





R

The complete V-Matrix, it‘s block diagonal

Row Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9 Col10 Col11 Col12 Col13 Col14 Col15 Col161 0.07898 0.07238 0.07238 0.07238 0 0 0 0 0 0 0 0 0 0 0 02 0.07238 0.07898 0.07238 0.07238 0 0 0 0 0 0 0 0 0 0 0 03 0.07238 0.07238 0.07898 0.07238 0 0 0 0 0 0 0 0 0 0 0 04 0.07238 0.07238 0.07238 0.07898 0 0 0 0 0 0 0 0 0 0 0 05 0 0 0 0 0.07898 0.07238 0.07238 0.07238 0 0 0 0 0 0 0 06 0 0 0 0 0.07238 0.07898 0.07238 0.07238 0 0 0 0 0 0 0 07 0 0 0 0 0.07238 0.07238 0.07898 0.07238 0 0 0 0 0 0 0 08 0 0 0 0 0.07238 0.07238 0.07238 0.07898 0 0 0 0 0 0 0 09 0 0 0 0 0 0 0 0 0.07898 0.07238 0.07238 0.07238 0 0 0 0

10 0 0 0 0 0 0 0 0 0.07238 0.07898 0.07238 0.07238 0 0 0 011 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07898 0.07238 0 0 0 012 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07238 0.07898 0 0 0 013 0 0 0 0 0 0 0 0 0 0 0 0 0.07898 0.07238 0.07238 0.0723814 0 0 0 0 0 0 0 0 0 0 0 0 0.07238 0.07898 0.07238 0.0723815 0 0 0 0 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07898 0.0723816 0 0 0 0 0 0 0 0 0 0 0 0 0.07238 0.07238 0.07238 0.07898

Every row or column represents one leaf, every 4 by 4 block represents one plant. The values on the diagonal are thevariance of single observations, added up from residual and plant variance. Beside the diagonal are the covariances of leaves from the same plant (= the plant variance).

Repeated measurements

Each plant is measured four times (4 leaves)

The plant is the „subject“ in repeated measurementsyntax and is measured „repeatedly“

Proc mixed data=beet;class plant;model Calcium = /ddfm=kr;repeated /subject=plant type=cs r rcorr;run;

covariance and correlation



Covariance between leavesfrom the same plant

92,0078982,007238,0

006602,007238,007238,0),( 22

2

' ==+

=+

=leavesplants

plantsjiji yycorr

σσσ

The higher the correlation, the lower the informationget from an additional observation from the same plant

Estimated R Correlation Matrix for plant 1

Row Col1 Col2 Col3 Col4

1 1.0000 0.9164 0.9164 0.91642 0.9164 1.0000 0.9164 0.91643 0.9164 0.9164 1.0000 0.91644 0.9164 0.9164 0.9164 1.0000

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

44434241

43333231

42322221

41312111

ρρρρρρρρρρρρρρρρ

corrR

Example 3: Repeated measurements

Mineralisation in wheat after ploughing of grass-clover

Mineral N measured in 4 depths

3 different soil tillages

randomized complete blocks

no description

1 rotortiller (2 times), ploughing2 rotortiller (1 time), ploughing3 only ploughing

4 depths in the soil under a block design

Block I Block II Block III Block IV

Tiefe 1 Tiefe 2 Tiefe 3 Tiefe 4

How can we model repeated mesurements?

How to work out the correct linear model

See Piepho et al. (2004): A mixed modelapproach for randomized experiments withrepeated measures. JACS 190, 230-247.

1. step: set up model for single depth

yij = μ + bj + τi + eij

yij = Nmin-content of i-th tillage in j-th block (log-transformed!)

μ = global meanbj = effect of j-th blockτi = effect of i-th treatmenteij = error of ij-h plot

model over 4 depths

Block I Block II Block III Block IV

yij1 = μ1 + bj1 + τi1 + eij1

yij2 = μ2 + bj2 + τi2 + eij2

yij3 = μ3 + bj3 + τi3 + eij3

yij4 = μ4 + bj4 + τi4 + eij4

2. step: combine depths

yijt = μt + bjt + τit + eijt t =1,2,3,4: index for depth

factorial split-up of treatment effects:

μt + τit = μ + αi + βt + (αβ)it

αi = main effect of i-th tillage

βt = main effect of t-th depth

(αβ)it = interaction tillage × depth

complete model

yijt = μ + bj + (bβ)jt + αi + βt + (αβ)it + eijt

bj = block effect(bβ)jt = depth specific block effectαi = main effect of i-th tillageβt = main effect of t-th depth(αβ)it = interaction tillage × depth

Set up the linear model

y = Xβ + e

Observations from same plot are not independent, theyare correlated!

IR 2

2

2

2

2

000000000000

e

e

e

e

e

σ

σσ

σσ

=

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

possible correlation structures

Compound Symmetry (CS):corr(eijs, eijt) = ρ

Autoregressiv [AR(1)]:corr(eijs, eijt) = ρ|s-t|

Unstructured (UN):Each depth has its own variance and covariances

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

++

++

=

22222

22222

22222

22222

ploteplotplotplot

plotploteplotplot

plotplotploteplot

plotplotplotplote


R

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

11

11

23

2

2

32

2

ρρρρρρρρρρρρ

σ eR

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

=

24434241

34233231

24232221

14131221

e

e

e

e

σσσσσσσσσσσσσσσσ

R

compound symmetry

eijt = fij + gijt (Split-Plot model)

fij = ij-th plot effect

gijt = residual = depth specific deviation of plot effect

( ) 2g

2f

2f

ijtijs σσσρe,ecorr+

==

AR(1) model

0.00.10.20.30.40.50.60.70.80.91.0

0 1 2 3 4 5 6

Korrelation

Abstand |s-t|

ρ = 0,9

ρ = 0,3

ρ = 0,6

model selection

Modell -2 log L§ AIC$ (smaller is better)

homogeneous variances:Unabhängig 21,9 23,9CS 15,7 19,7AR(1) 13,9 17,9

Heterogeneous variances:independent 15,1 23,1CS 6,7 16,7AR(1) 6,4 16,4Unstructured -2,5 17,5

§ log L = log-Likelihood; p = Zahl der Varianzparameter$ AIC = Akaike Information Criterion = -2 log L + 2p

best model

MIXED Code

proc mixed;class depth block tillage;model log_nmin= depth block block*depth

tillage tillage*depth /ddfm=kr;repeated depth /sub=block*tillage type=cs;lsmeans tillage*depth /pdiff;run;

Observations from different plots („subjects“) are independent

„subject“ = individual unit, which is measured repeatedly

MIXED Code

proc mixed;class depth block tillage;model log_nmin= depth block block*depth

tillage tillage*depth /ddfm=kr;repeated depth /sub=block*tillage type=cs;/*repeated depth /sub=block*tillage type=ar(1);repeated depth /sub=block*tillage type=csh;repeated depth /sub=block*tillage type=arh(1);repeated depth /sub=block*tillage type=un; */lsmeans tillage*depth /pdiff;run;

Observations from different plots („subjects“) are independent

„subject“ = individual unit, which is measured repeatedly

Model selection

We don‘t know the true model!

There are no good or bad models, but somemodels are useful – to make inferences

Akaike's Information Criterion (AIC), AICC

Based on logLikelihood and penalized for numberof parameters

Selects good models with low complexity

Example 4: Split-Plot designs

Effect of different crops on yield of following cornvarieties.

PreCrops in mainplots

Corn varieties in subplots

Data from Dr. C. Böhmel

(Univ. Hohenheim)

PreCrop A PreCrop Bvariety 1 variety 2variety 2 block 1 variety 4variety 3 variety 1variety 4 variety 3

PreCrop B PreCrop Avariety 3 variety 4variety 2 block 2 variety 3variety 1 variety 2variety 4 variety 1

Complete trial design

A B C D E FS1 S6S2 S2S3 S8S4 S3S5 S1S6 S7S7 S9S8 S4S9 S5

D F A E C B

E D C B F A

bloc

k 1

bloc

k 2

bloc

k 3

3 blocks, 6 crops, 9 varieties

Linear Model for RCB

yijk = µ + ai + bj + rk + abij + eijk

yij yield of the j-th variety after i-th crop in k-th block

µ global mean or constant (fix)

ai effect of the i-th crop (fix)

bj effect of the j-th variety (fix)

rk effect of the k-th block (fix)

abij interaction between i-th crop and j-th variety (fix)

eijk residual error of the ijk-th plot ~N(0,σ²e)

Linear Model for Split-plot-design

yijk = µ + ai + bj + rk + abij + fik + eijk

yij yield of the j-th variety after i-th crop in k-th blockµ global mean or constant (fix)ai effect of the i-th crop (fix)bj effect of the j-th variety (fix)rk effect of the k-th block (fix)abij interaction between i-th crop and j-th variety (fix)fik error of the ik-th mainplot ~N(0, σ ²ra)eijk residual error of the ijk-th plot ~N(0,σ²e)

Why do we need the mainplot error?

1: the observations are not independent. Plots in same mainplot are closer to another, should bemore equal to another. -> Covariance betweenplots from same mainplot

2: Repeated measurement design/ mainplots aremeasured 9 times (9 varieties)

3: „Analyze as randomized“: Two steps of randomization. Each step is connected to oneeffect. Randomization units that not contain all treatments should be random.

SAS-Code (MIXED)

proc mixed data = corn;class block variety crop;model yield = block variety crop

variety*crop /ddfm=kr;random crop*block ;lsmeans variety*crop/pdiff;run;

alternative SAS-Code (MIXED)


variety*crop /ddfm=kr;repeated/subject=crop*block type=cs;lsmeans variety*crop/pdiff;run;

Output (MIXED)

Covariance ParameterEstimates

Cov Parm Estimate

block*crop 22.47Residual 275.67

Type 3 Tests of Fixed Effects


Block 2 10 2.88 0.1029Variety 8 96 15.10 <.0001Crop 5 10 1.17 0.3878Variety*Crop 40 96 1.22 0.2185

Covariance between yieldsfrom the same mainplot

Analysis with software for fixed effects model (Proc GLM)

proc glm data=corn; class block variety crop;model yield = block variety crop

variety*crop crop*block;random crop*block / test;lsmeans variety*crop;run;

Output GLM “Tests of Hypotheses for Mixed Model Analysis of Variance”

Same F-values as with MIXED

source df SS MS Fvalue ProbFblock 2 2752 1376.0 2.88 0.103variety 8 33293 4161.6 15.10 <.0001crop 5 2796 559.2 1.17 0.388variety*crop 40 13407 335.2 1.22 0.219block*crop 10 4779 477.9 1.73 0.084error 96 26465 275.7

Drawback of Proc GLM

F-tests are correct (when using the „/test“ – option),

but standard errors of difference for meancomparisons mostly not!

Standard error of difference for variety comparisonsafter different crops:

( ) ( ) 76,1983

67,27547,2222)(

22

'' =+⋅

=+⋅

=−r

yyVar erajiji

σσ

sed = sqrt(198.76) = 14.09

Approximation of denominator df

Satterthwaite-approximation

²)(²)1()1(]²)1[()1()1(

rae

rae

MQaMQbaMQMQbaraDDF

⋅+⋅−⋅−+⋅−⋅−⋅−⋅

=

proc mixed data = corn;class block variety crop;model yield = block variety

crop variety*crop /ddfm=kr;random crop*block ;lsmeans variety*crop/pdiff;run; Method of

Kenward & Roger

correct sed with MIXED

Differences of Least Squares MeansStandard

Effect V Crop V Crop Estimate Error DF tValue Pr>|t|

Crop 1 2 0.0846 5.9499 10 0.01 0.9889Crop 1 3 -3.7664 5.9499 10 -0.63 0.5409V 1 2 -19.3639 5.5345 96 -3.50 0.0007V 1 3 -33.9237 5.5345 96 -6.13 <.0001V*Crop 1 1 1 2 -13.8173 14.0983 98 -0.98 0.3295V*Crop 1 1 1 3 -4.5292 14.0983 98 -0.32 0.7487V*Crop 1 1 2 1 -34.3912 13.5566 96 -2.54 0.0128V*Crop 1 1 3 1 -28.6052 13.5566 96 -2.11 0.0375

Split plot with repeated measurements

At four different dates some plants out of each plotwere harvested

How can we model this?

Repeated measurement for mainplots and subplots!

First step

Take the model for one single date:


variety*crop /ddfm=kr;random crop*block ;lsmeans variety*crop/pdiff;run;

Second step

Add a fixed time effect

proc mixed data = corn;class block variety crop date;model yield = block variety crop datevariety*crop variety*datevariety*crop*date/ddfm=kr;random crop*block ;lsmeans variety*crop*date/pdiff;run;

third step

Add a covariance structure for the main- and subplots-The structure for mainplots in the G-matrix

-The structure for subplots in the R-matrix

proc mixed data = corn;class block variety crop date;model yield = block variety crop date variety*crop variety*date variety*crop*date/ddfm=kr;random date / subject=crop*block type=cs;repeated date / subject=crop*block*variety

type=cs;lsmeans variety*crop*date/pdiff;run;

Output



Variance WDH*ZF 70.2625CS WDH*ZF 72.0857CS WDH*Sorte*ZF 81.8311Residual 326.37

Output G-matrix

Estimated G Matrix

Row block crop date Col1 Col2 Col3 Col4

1 date 1 1 167 142.35 72.0857 72.0857 72.08572 date 1 1 182 72.0857 142.35 72.0857 72.08573 date 1 1 193 72.0857 72.0857 142.35 72.08574 date 1 1 207 72.0857 72.0857 72.0857 142.35

Estimated G Correlation Matrix

Row block crop date Col1 Col2 Col3 Col4

1 date 1 1 167 1.0000 0.5064 0.5064 0.50642 date 1 1 182 0.5064 1.0000 0.5064 0.50643 date 1 1 193 0.5064 0.5064 1.0000 0.50644 date 1 1 207 0.5064 0.5064 0.5064 1.0000

Output R-matrix

Estimated R Matrix for block*crop 1 1


Estimated R Correlation Matrix for block*crop 1 1


fourth step

Compound Symmetry: type = CS

Select the best covariance structure

⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

++

++

1111

1111

1111

1111

²²

²²



⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

11

11

²

23

2

2

32

ρρρρρρρρρρρρ

σ

Autoregressiv: type = AR(1)


⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

24342414

43232313

42322212

41312121

σρσσρσσρσσρσσσρσσρσσρσσρσσσρσσρσσρσσρσσσ

Heterogenous CS: type = CSH


⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

2434

224

314

432323

213

24232

2212

341

23121

21

σρσσρσσρσσρσσσρσσρσσρσσρσσσρσσρσσρσσρσσσ

Heterogenous AR(1): type = ARH(1)


⎟⎟⎟⎟⎟

⎠

⎞

⎜⎜⎜⎜⎜

⎝

⎛

24342414

43232313

42322212

41312121

σσσσσσσσσσσσσσσσ

Unstrukturiert: type = UN

Unstructured showed best AIC

CS AIC (smaller is better) 5570.1

CSH AIC (smaller is better) 5538.7

AR(1) AIC (smaller is better) 5561.2

ARH(1)AIC (smaller is better) 5526.1

UN AIC (smaller is better) 5516.4

last step

Perform hypotheses tests, calculate means and standard errors with the chosen covariancestructure

Example 5: Random coefficient models

A sugar beet variety trial series underinfestation with Rhizoctonia solani

Infestation measured in % necrotic rootsurface

4 tolerant varieties, 1 susceptible varietiy

17 sites in 2002

Data provided from German Institute of sugar beetresearch.

Ladewig & Lukashyk (proceedings, p. 98-103)

Relation between infestation and yield, one site

0

2

4

6

8

10

12

14

16

0 10 20 30 40 50 60 70 80

Befallsdruck (% verfaulte Rübenkörperoberfläche Indikatorsorte)

Ber

eini

gter

Zuc

kere

rtrag

(t/h

a)

"Fabiola""Premiere""Syncro""Solea""Impuls"

BZE von vier rhizoctonia-toleranten Zuckerrübensorten in Abhängigkeitvom Rhizoctonia-Befallsdruck. Standort: Gangelt 2002 (LIZ Jülich)

variety 1variety 2variety 3variety 4variety 5 (susceptible)

% infestation susceptible variety

yield

ANCOVA-Model

infestation (x)

yield (y)

µ

ai

αi{

xyßδδ

=

regression of variety ixyßiδδ

=

generalregression

xµxayE iiii )()( γβαβ +++=+=

[ ] ijkjkijkijjiijjiijk ebxvvuuy +++++++++= )()( γγβααμ

µ = interceptαi = effect of i-th varietyuj = effect of j-h site (random)(αu)ij = interaction variety by site (random)xijk = infestation of ijk-th plotß = general regression between infestation and yieldγi = deviation of i-th variety from regressionvj = devitation of j-th site from regression (random)(γv)ij = deviation from regression due to interaction (random)bjk = effect of k-th block at j-th site (random)eijk= residual error

Complete model for the trial series

„Random Coefficient Model“

Random intercept (site effects) and randomdeviations from mean regression

Regressions are random sample from a population of regression lines

• Laird & Ware, Biometrics 1982; • Longford 1993, Random

Coefficient Models; • Wolfinger 1996, JABES; • Littell et al. 1996, SAS System for

Mixed Models

Σ1 und Σ2 unstructured variance-covariance-matrices

⎟⎟⎠

⎞⎜⎜⎝

⎛==⎟⎟

⎠

⎞⎜⎜⎝

⎛2

2

varvuv

uvu

j

j

vu

σσσσ

1Σ

⎟⎟⎠

⎞⎜⎜⎝

⎛==⎟⎟

⎠

⎞⎜⎜⎝

⎛2

,

,2

)()(

varvvu

vuu

ij

ij

vu

γγα

γαα

σσσσ

γα

2Σ

Model

• allowing for covariance between intercept and slope makes parameter estimates „resistent“against scale effects

Variance of site-intercepts(σ²u) and site-slopes (σ²ν); σuν = covariance

SAS-Code (1)

Proc mixed data=gesamt method=reml maxiter=100;class site variety block;model yield = variety variety*infestation/ solutionnoint ddfm=kr;random int infestation / subject= site type=unr ;random int infestation / subject= site*variety

type=unr ;random block/subject= site;

lsmeans variety /pdiff at infestation =0;lsmeans variety /pdiff at infestation =0.1;lsmeans variety /pdiff at infestation =0.3;lsmeans variety /pdiff at infestation =0.5;lsmeans variety /pdiff at infestation =1;ods output lsmeans=estimates_yield ;ods output diffs=diffs_yield ;

run;

SAS-Output (1)

WARNING: Did not converge.

Covariance Parameter ValuesAt Last Iteration


UN(1,1) site 1.5704UN(2,1) site -0.5173UN(2,2) site -0.4490Var(1) site*variety 0.2140Var(2) site*variety 1.4670Corr(2,1) site*variety 8.13E-34Block site 0.02270Residual 1.4028

• Change scaling of covariate so (additition, subtraction, division, multiplication) that variances on diagonal of variance-cov-matrices have equal dimension

• Reduction of model

• try models with heterogenous errors

Model selection to reach convergence

SAS-Code (2)

data gesamt; set Gesamt;infestation_v=((infestation)-0)/100; run;Proc mixed data=gesamt method=reml maxiter=100;

class site variety block;model yield = variety variety*infestation/ solution nointddfm=kr;

* random int infestation/ subject=site type=unr ;random int infestation/ subject=site*variety type=unr ;random block/subject=site;parms (2)(2)(0.2)(0.03)(1.3);

lsmeans variety/pdiff at infestation=0;lsmeans variety/pdiff at infestation=0.1;lsmeans variety/pdiff at infestation=0.3;lsmeans variety/pdiff at infestation=0.5;lsmeans variety/pdiff at infestation=1;ods output lsmeans=estimates_yield ;ods output diffs=diffs_yield ;

run;



Var(1) site*variety 2.8899Var(2) site*variety 2.3892Corr(2,1) site*variety -0.6637Block site 0.03468Residual 1.3555

StandardEffect variety Estimate Error DF t Value Pr > |t|

variety Fabiola 9.2547 0.4779 59.6 19.36 <.0001variety Impuls 10.2814 0.4558 56.7 22.56 <.0001variety Premiere 10.0701 0.4797 60.5 20.99 <.0001variety Solea 10.1740 0.5997 60.9 16.97 <.0001variety Syncro 9.8342 0.4972 60.2 19.78 <.0001infestation*variety Fabiola -3.5664 0.7648 20 -4.66 0.0001infestation*variety Impuls -7.9021 0.7013 19 -11.27 <.0001infestation*variety Premiere -1.6013 0.8433 23.1 -1.90 0.0701infestation*variety Solea -1.8531 1.0612 32.6 -1.75 0.0902infestation*variety Syncro -0.6893 0.8913 21.7 -0.77 0.4476


variety 5 59.5 397.51 <.0001infestation*variety 5 22.5 30.53 <.0001

site x varietysite x variety x infest.Corr between interceptand slope

SAS-Output (2): lsmeans

variety 1sus.varietyvariety 2variety 4variety 3variety 1sus.varietyvariety 2variety 4variety 3

expected values, 17 sites, 2002

0

2

4

6

8

10

12

0 10 20 30 40 50 60 70 80 90 100

Befallsdruck [% verfaulte Rübenkörperoberfläche Indikatorsorte]

Ber

eini

gter

Zuc

kere

rtrag

[t/h

a]

FabiolaImpulsPremiereSoleaSyncro

Fabiola

Impuls

PremiereSyncro

Solea

error bar = mean standard error of difference

susceptible

variety 3variety 2variety 4

variety 1variety 1suscep. varietyvariety 2variety 4variety 3

data 2005, centred for site effects

High residual error, due to small plots (30 plants)

Example 6: Spatial models

Several examples for spatial models in thissymposium (e.g. SCHNEIDER et al. page 209 Proceedings)

What are thebasic methodsof spatialstatistics?

Starting point

Spatial statistics can be seen as an multidimensional extension of repeatedmeasurements. [...] Observations are correlatedin two spatial (mostly) dimensions.

LITTELL et al. 1996

• The nearer two points are together, the higherthe correlation should be

• Correlation depends on distance

• Several models possible to describe thisdistance-correlation dependency

Semivariance: a measure of similarity

h = a distance „class“

νi, νj = measurements at single observations

∑=

−=hhji

jiij

vvhN

h)|,(

)²()(2

1)(γ

The higher the correlation, the lower the semi-variance

Anisotropic Variogram

dotted = East west, solid = north-south

sill

nugget

Some models to describe the empirical variogram

)/exp()( ρijij ddf −=

0)(;(22

31)( 3

3

=≤⎥⎥⎦

⎤

⎢⎢⎣

⎡+−= ijij

ijijij dfelsed

dddf ρ

ρρ

)/exp()( 22 ρij

ddf ij −=

dij = distance between two points i and j

ρ = an estimable parameter (the correlation between points)

Exponential, Spherical and Gaussian model

How to fit spatial models with SAS: A uniformity trial

data spatvar;input rep bloc row col yield;datalines;1 4 1 1 10.54111 4 1 2 8.58061 2 1 3 11.27901 2 1 4 12.43441 4 2 1 10.34161 4 2 2 11.3103...4 15 7 7 12.02534 15 7 8 10.42984 14 8 5 7.32204 14 8 6 10.51044 15 8 7 12.68084 15 8 8 10.4482;

The spherical model

proc mixed scoring=5;

model yield= ;

parms (0 to 10 by 2.5) (1 to 10 by 3);

repeated / subject=intercept type=sp(sph)(row col);

run;

Output for spherical model



SP(SPH) Intercept 2.7120Residual 3.2607

Fit Statistics

-2 Res Log Likelihood 227.5AIC (smaller is better) 231.5AICC (smaller is better) 231.7BIC (smaller is better) 235.8

Output for spherical model



SP(SPH) Intercept 2.7120Residual 3.2607

Fit Statistics


0)(;(22

31)( 3

3

=≤⎥⎥⎦

⎤

⎢⎢⎣

⎡+−= ijij

ijijij dfelsed

dddf ρ

ρρ

Spherical model with nugget effect

proc mixed scoring=50 convh=1e-06;model yield= ;parms (0 to 10 by 2.5)

(1 to 10 by 3) (0.05 to 1.05 by 0.25);repeated / subject=intercept local

type=sp(sph)(row col);run;

Output for spherical model with nugget



Variance Intercept 3.2189SP(SPH) Intercept 2.7162Residual 0.02144

Fit Statistics


Some take home messages

An effect is (should be) random when it is ...

... related to a randomization step

... a random sample from a population (of effects)

... models the covariance between observations

Adress the mixed model structure in R or G matrix

Different software available (not only SAS)

Model selection important (the mixed world is morecomplex than the fixed effects models)

Documents

Software for Linear Mixed Models