34
69 5 Discriminant Analysis 5.1 Introduction a Example: Iris species Goal: A rule to idenitify for any plant the correct species. Example bank bills: Identify forged bills! b General Model class k i , variables (characteristics) X (j ) i X i ∼F k i , F k , parametric, usually normal distribution c True class k i fixed, unknown item = incidental parameter d ... or random variable K i . Model includes P hK i = k i = π k F k = conditional distr. of X i given K i = k .

oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

69

5DiscriminantAnalysis

5.1Introduction

aExample:Irisspecies

Goal:Aruletoidenitifyforanyplantthecorrectspecies.

Examplebankbills:Identifyforgedbills!

bGeneralModelclasski,variables(characteristics)X(j)i

Xi∼Fki,Fk,parametric,usuallynormaldistribution

cTrueclasskifixed,unknownitem=incidentalparameter

d...orrandomvariableKi.ModelincludesP〈Ki=k〉=πk

Fk=conditionaldistr.ofXigivenKi=k.

Page 2: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

705.1

eFk,πkgiven−→rulex7→K̂〈x〉„IdentificationAnalysis"

Inapplications:EstimateFkfromtrainingdata.

fSimplestModel:Xi∼Nm〈µki,|Σ〉

µ̂k=Xk=1nk

∑{i|ki=k}

Xi

|̂ Σ=1

n−g

∑g

k=1

∑{i|ki=k}

(Xi−Xk.)(Xi−Xk.)T

=1

n−g

∑i(Xi−Xki.)(Xi−Xki.)

T

Page 3: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

71

5.2ClassificationAccordingtoKnownDistributions

aNewobservationx0.−→"estimate"k0!

Decisionbetweengpossibilitiesbasedondatax0.

bNormaldatawithequal|Σ,X0∼N〈µk,|Σ〉.Beginwithcase|Σ=I.

Page 4: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

72

k̂=argmink

⟨d〈x0,µk;|Σ〉

Page 5: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

735.2

cGeneralcase?−→Max.Likelihood−→k̂0=argmaxk

⟨fk〈x0〉

X0∼N〈µk,|Σ〉.−→

d2〈x0,µ2;|Σ〉−d

2〈x0,µ1;|Σ〉>0=⇒k̂0=2

h〈x0〉=d2〈x0,µ2;| Σ〉−d

2〈x0,µ1;| Σ〉

=(x0−µ2)T| Σ−1

(x0−µ2)−(x0−µ1)T| Σ−1

(x0−µ1)

=−2(µ2−µ1)T| Σ−1

x0+µT2| Σ−1

µ2−µT1| Σ−1

µ1

=α+βTx0

k̂〈x0〉={1ifh〈x0〉<0

2ifh〈x0〉>0

h:lineardiscriminantfunctionofFisher

Page 6: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

74 5.2

dEstimateparameters!

R>library(MASS);lda(Species∼.,data=iris)

0.40.50.60.70.80.9

0.00.1

0.20.3

0.4

l.Petal.Length

l.Petal.W

idth

versicolorvirginica

versicolorvirginica

Diskriminanz−Funktion

Häufigkeit

02

46

810

12

−4−3−2−101234

Page 7: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

755.2

eLogisticRegression.Lineardiscr.fn.↔linearregressionfunction.

Regression:"Predict"targetvariableYfromexplanatoryvariablesx!Here:

Y=classnumber=binaryvariable.

Random:Y,characterizedbyπ=P〈Y=1〉.π=functionofx.

Y=1:observationbelongstoclass2.

ProbabilitiesY=1andY=0proportionaltofk〈x〉!

log

⟨P〈Y=1〉

P〈Y=0〉

⟩=log

⟨f2(x〉

f1(x〉

⟩=h〈x〉=α+β

Tx.

="prob./complementaryprob."=odds.

LogisticRegression:logodds=linearfunctioninX(j).

Page 8: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

76 5.2

f−→Directestimationofαandβ.

NoassumptionsaboutexplanatoryvariablesX(j)

−→asflexibleasmultipleregression!

Allowforfactors,transformations,interactions!

glm(Species∼.,data=d.iris,family=binomial)

g>2classes,equal|Σ−→minimalMahalanobisdist.d2〈x0,µk;|Σ〉

h|Σ=I−→Mahalanobisdistance=ordinary(Euklidean)distance.

3classes−→planethroughclasscenterssufficesforclassification.

(Distancefromthisplane:iflarge,observationdoesnotfitintoanyclass!)

gclasses:g−1-dimensionalspace−→g−1diskriminantfunctions.|Σ6=Ianalogous.

Page 9: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

77

(i)(ii)

Page 10: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

78 5.2

i

−10−505

−6−4

−20

2

discriminant function 1

discriminant f. 2

setosaversicolorvirginica

Coefficientsβ̂ofthediscr.functions:

SepalblätterPetalblätter

D.f.18.709.07-20.779-3.529D.f.2-9.85-15.18-0.7130.313

Page 11: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

795.2

kUnequalcovariancematrices

d2〈x,µ2;| Σ2〉−d

2〈x,µ1;| Σ1〉−c=

(x−µ2)T| Σ−1

2(x−µ2)−(x−µ1)T| Σ−1

1(x−µ1)−c=0

quadraticeq.inx−→quadraticdiscriminantanalysis.

Page 12: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

80

Page 13: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

815.2

lModelSelection.

•Checkassumptions(multivar.normaldistribution!)

•selectexplanatoryvariables

•modelnon-linearrelationshipsandinteractions.

(Interactionscontradictnormaldistribution)

Page 14: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

82

5.3ErrorRates

aDiagnostictestsinmedicine−→diseasedandhealthyindividuals.

2kindsoferror:

•healthysubjectsclassifiedasill

−→lossofconfidence,uselesstreatment

•Diseasedsubjectsclassifiedashealthy

−→maymisslifesavingtreatment!

Page 15: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

83 5.3

bExample:Vesselconstriction,predictfromVolumeandrateofheartbeat.

−0.4−0.3−0.2−0.10.00.10.20.30.40.50.6

−0.4−0.2

0.00.2

0.40.6

log(Vol)

log(Rate)

verengtgesund

Page 16: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

84 5.3

cTypesofError.Test„positive"−→classifiedasill

testresulttruthhealthy,"negative"ill,"positive"

healthyo.k.wrongpositivefalsealarm

illwrongnegativeo.k.missedalarm

dSensitivityandSpezificity.

Sensitivity=#ill&positive

#ill

Spezificity=#healynegatives

#healthySensitivity=(condit.)prob.ofadiseaseds.tobeclassifiedassuch

Specificity=(condit.)prob.ofahealthys.tobeclassifiedassuch

Page 17: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

85 5.3

eVariablethreshold.

Sensitivity=P〈k̂=2|K=2〉=P〈h〈X〉>c|K=2〉

Spezificity=P〈k̂=1|K=1〉=P〈h〈X〉<c|K=1〉

−3−2−1012

0.00.2

0.40.6

0.81.0

discriminant function

status

sensitivityspecificity

Page 18: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

86 5.3

fChoosethresholdpragmatically!

Refinedecision:Allowforregionofindeterminacy!

gAssumeyougeta"positive"testresultHowdoyoureact?

rateofwrongpositives=#wrongpositives

#positives

rateofwrongnegatives=#wrongnegatives

#negatives

Conditionalprob.ofbeinghealthy,givenapositivetest.

−→„falsealarm".Prob.P〈k=1|k̂=2〉

Conditionalprob.ofhavingthedisease,givenanegativetest..

−→„missedalarm".Prob.P〈k=2|k̂=1〉

Theseprob.sareonlywelldefinedifkisrandom!

Page 19: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

875.3

h−→Newmodel,Kinsteadofk.

Distr.ofKgivenbyπ1=P〈K=1〉and

„prevalence"π2=P〈K=2〉=1−π1.

P〈K=1|k̂=2〉tobecalculatedbyBayes’theorem:

P〈K=1|k̂=2〉=P〈K=1undk̂=2〉

P〈k̂=2〉

=P〈k̂=2|K=1〉π1

P〈k̂=2|K=1〉π1+P〈k̂=2|K=2〉π2

Page 20: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

885.3

iExample:Given:

•thesensitivityP〈k̂=2|K=2〉=0.95

•thespecificityP〈k̂=1|K=1〉=0.9

•theprevalenceπ2=0.01.

P〈K=1|k̂=2〉=0.1∗0.99

0.1∗0.99+0.95∗0.01

=0.099

0.099+0.0095=0.912

Forrarediseasesthefalsealarmrateishigh!

Page 21: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

895.3

jErrorrates.„Theoretical":

Q=π1P〈k̂〈X〉=2|X∼F〈θ1〉〉+π2P〈k̂〈X〉=1|X∼F〈θ2〉〉

Errorratetobeestimated!(Generalconsideration...)

kXi∼Nm〈µKi,I〉mitµ1=−∆/2[1,0]T,µ2=+∆/2[1,0]T

Page 22: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

90

−2024

−4−2

02

4

−20246

−4−2

02

4π1=π2=1/2:k̂=1,ifX

(1)<0,k̂=2otherwise.

−→Q=Φ〈−∆/2〉.General:∆2=(µ2−µ1)T|Σ−1

(µ2−µ1)

Page 23: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

915.3

lEstimateparamtersofthemodel

−→param.estimatederrorrateQ̂=Φ〈−∆̂/2〉.

mApparentErrorRate.relativefrequencyintrainingdata,

Qapp=(#{i|k̂〈X1i〉=2}+#{i|k̂〈X2i〉=1})/n.

toooptimistic!

nDetermineerrorrateusingnewdata("testdata")!

or:Splitdatasetrandomlyintotrainingandtestdata.

Adequateifdatasourceis"endless"(datamining).

Page 24: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

925.3

oCrossvalidation

Estimatedecisionrulewithoutusingobservationi

−→Prob.ofamisclassificationofiisunbiased.

Obs.icorrectlyclassified?

Repeatthisforallobs.−→ratesofclassificationerrors.

Qcv1=(#{i|k̂[−i]〈X1i〉=2}+#{i|k̂[−i]〈X2i〉=1})/n

−→Comparetoresampling,jackknife!

Page 25: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

93

MessagesDiscriminantAnalysis

•Classicalmethodsofdiscriminantanalysis:

•2classes,equal|Σ−→lineareDA,onediscr.function

−→logisticregression

•g≥3classes,equal|Σ−→linearDA,g−1discr.functions

−→multinomialregression

•unequal|Σ−→quadraticdiscr.analysis

•2classes,variablethresholdcforlin.discr.function

−→Sensitivityundspezificityusedforchoosingc.

•Errorrates:Naiveappearenterrorrateistoooptimistic.

−→crossvalidation!

Page 26: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

94

5.4FurtherMethods

aTheassumptionwasX∼Nm

⟨µh,|Σh

Inappropriateforlargedatasets!Usemoredetailedinformation!

bNearestNeighborsForanewobservationX0tobeclassified

find`≥1nearestneighbors(intrainingdata).

Simplerule:Majorityvoteoftheclassnumbersamongnearestn.

Problem:Whichmetricisappropriate?

R>library(class);knn(...);knn1(...)

Page 27: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

95

Caluvulg

012345678901234567

01

23

45

67

Caluvulg

Nardstri

Festrubr

Caluvulg

02

46

8

Page 28: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

96

−4−3−2−101234

−4−3

−2−1

01

LD1

LD2

44

2

4 4

3

3

3

3

4

3

3

3

3 3

3

3

3

4

2

3

4

3

4

3

3

2

34

4

4

4

4

4

43

3

4

3

4

3

33

3

3

44 4

2

3

3

4

3

2

4 44

3

22

22

2

222

22

2

22

2

2

3

2

Page 29: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

975.4

cNeuralNetworks

Generalregressionproblem:InputXi,outputYi.

Onehiddenlayerfeed-forwardneuralnetwork:

Y(k)

=gk

⟨αk+∑

`w`kg̃`

⟨α̃`+∑

jw̃j`X(j)⟩⟩

Usethelogisticfunctionforgandg̃!

Fordiscriminantanalysis:NeedruletoconvertoutputYintok̂

e.g.K=argmaxk〈Y(k)〉

R>library(nnet);nnet(...)

Page 30: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

98

-

-

-

-

-

l

l

l

l

l

������������1

��

��

��

��

��

��3

��

��

��

��

��

���

- ������������1

��

��

��

��

��

��3

PPPPPPPPPPPPq

- ������������1Q

QQ

QQ

QQ

QQ

QQQs

PPPPPPPPPPPPq

-

@@

@@

@@

@@

@@

@@R

QQ

QQ

QQ

QQ

QQ

QQs

PPPPPPPPPPPPq

α0+Σ;h̃

α0+Σ;h̃

α0+Σ;h̃

������������1-

PPPPPPPPPPPPqβ0+Σ;

γ0+γ1h̃-

x(j)

α(k)j

z(k)

βk

y

inputneurons

hiddenneurons

outputneuron

Page 31: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

99 5.4

dFlexibleDiscriminantAnalysisPreliminaryRemark:For2groups,consider

logisticregression.

Lessappropriate:UseLeastSquaresevenwithbiniaryY.

−→estimatedregressionfunction=lin.discriminantfunction!

Severalclasses:same,formultinomialregression.

−→ThesubspaceofβTXequals

thesubspaceofthediscr.functions.

FlexibleDA:Useanymoreflexibleregressionmethod

insteadofthelinear(withoriginalX’s)

R>library(mda);fda(...,method=mars)oder

method=bruto

Literatur:Hastie,TibshiraniandBuja(1994)

Page 32: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

1005.4

eMixtureDiscriminantAnalysisR>library(mda);mda(...)HastieandTibshirani(1994)

fClassificationandRegressionTrees(CART)(2groups)

Splittheobservationsonthebasisofthe

mostdiscriminatingsinglevariableinto2groups.

Spliteachgroupagain(ifsuccessful)into2groups

usingasinglevariable.Repeat!

−→DecisionTree

R>library(tree);tree(...)oder

R>library(rpart);rpart(...)

Page 33: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

1015.4

gGeneralRemark

NeuralNetworksandothers

aregeneralandflexible.BUT:

•Dangerofover-fittingthedata–exceptinlargedatasets.

Tricktoavoidthat:Selectmodelonthebasesofofcrossvalidation.

•nodirectgraphicaldisplayandinterpretation,„blackbox".

Page 34: oduction Intr 5.1 ysis Anal Discriminantstat.ethz.ch › ~stahel › courses › multivariate › script › se-mu-discrim.pdfDiscriminant Anal ysis 5.1 Intr oduction a Example: Ir

102 5.4

hBoostingIdea:A(too)simpleclassif.methodcanbeimprovedby"recycling":

1.Estimatetheruleasgiven−→Classificationk̂(0)〈xi〉.

2.Determinewronglyclassifiedobservations.

Re-estimatetheruleusingincreasedweightsforthemis-class.obs.

−→Classificationk̂(1)

〈xi〉.

Repeat2asoftenasuseful.

Boostedrule:(weighted)majorityvoteamongtheclassificationsk̂(`).Fried-

man,HastieandTibshirani(2000)

iBaggingBootstrapaggregating.

Determinetherulemanytimesbybootstrappingthetrainingdata.

−→Majorityvote.

LLiteratur:Ripley(1996)Treatsthesemethodsbutthelast2withfocuson

applications.Sometimeslachsprecision.