Linear regression with interval data: the LIR approach · 2020-01-06 · Linear regression with...

Linear regression with interval data:the LIR approach

Andrea Wiencierz and Marco E. G. V. CattaneoDepartment of Statistics, LMU Munich

Statistische Woche, Vienna, AustriaSeptember 20, 2012

A. Wiencierz and M. Cattaneo (LMU Munich) linLIR Sep. 20, 2012 1 / 16

Likelihood-based Imprecise Regression (LIR)

� (X1,Y1), . . . , (Xn,Yn)

with (Xi ,Yi )i.i.d.∼ P

� simple linear regression:

Y = f (X ) = a + b X

●●

−2 0 2 4 6

� (X1,Y1), . . . , (Xn,Yn)

Y = f (X ) = a + b X

●●

−2 0 2 4 6

� (X1,Y1), . . . , (Xn,Yn)

Y = f (X ) = a + b X

●●

−2 0 2 4 6

(Simple) linear LIR with interval data

� (X ∗1 ,Y∗1 ), . . . , (X ∗n ,Y

∗n )

where X ∗i =[X i ,X i

]and Y ∗i =

[Y i ,Y i

� with V ∗i = X ∗i × Y ∗i((Xi ,Yi ),V

∗i )

i.i.d.∼ P

such that for ε ∈ [0, 1]

P((Xi ,Yi ) /∈ V ∗i ) ≤ ε� simple linear regression:

Y = f (X ) = a + b X

� p-quantile QRf ,p, withp ∈ (0, 1), of thedistribution of theresiduals

Rf ,i = |Yi − f (Xi )|

−2 0 2 4 6

●●●●

●●

� (X ∗1 ,Y∗1 ), . . . , (X ∗n ,Y

∗n )

]and Y ∗i =

[Y i ,Y i

]� with V ∗i = X ∗i × Y ∗i

((Xi ,Yi ),V∗i )

i.i.d.∼ P

P((Xi ,Yi ) /∈ V ∗i ) ≤ ε

Y = f (X ) = a + b X

Rf ,i = |Yi − f (Xi )|

−2 0 2 4 6

●●●●

●●

� (X ∗1 ,Y∗1 ), . . . , (X ∗n ,Y

∗n )

]and Y ∗i =

[Y i ,Y i

]� with V ∗i = X ∗i × Y ∗i

((Xi ,Yi ),V∗i )

i.i.d.∼ P

Y = f (X ) = a + b X

Rf ,i = |Yi − f (Xi )|

−2 0 2 4 6

●●●●

●●

� (X ∗1 ,Y∗1 ), . . . , (X ∗n ,Y

∗n )

]and Y ∗i =

[Y i ,Y i

]� with V ∗i = X ∗i × Y ∗i

((Xi ,Yi ),V∗i )

i.i.d.∼ P

Y = f (X ) = a + b X

Rf ,i = |Yi − f (Xi )|

−2 0 2 4 6

●●●●

●●

� imprecise residuals:

r f ,i = min(x,y)∈v∗i

|y − f (x)|

r f ,i = sup(x,y)∈v∗i

|y − f (x)|

� uncertainty about f :

data imprecision andstatistical uncertainty

� consider Cf ,p,β,ε:likelihood-basedconfidence region forQRf ,p with cutoff pointβ ∈ (0, 1)

−2 0 2 4 6

●●●●

●●

|y − f (x)|

−2 0 2 4 6

●●●●

●●

|y − f (x)|

−2 0 2 4 6

●●●●

●●

|y − f (x)|

� result U : set of allplausible functions

−2 0 2 4 6

●●●●

●●

Recapitulation: (simple) linear LIR with interval data

� ((Xi ,Yi ),V∗i )

i.i.d.∼ P , P ∈ Pε = {P : P((Xi ,Yi ) /∈ V ∗i ) ≤ ε} , ε ∈ [0, 1]

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

� observations v∗1 , . . . , v∗n induce (normalized) profile likelihood function likQRf

of the p-quantile of the distribution of Rf for each f ∈ F� likQRf

is a stepwise constant function with points of discontinuity at:

0 = r f ,(0), . . . , r f ,(dn(p−ε)e), r f ,(bn(p+ε)c+1), . . . , r f ,(n+1) = +∞

� Cf = [r f ,(k+1), r f ,(k)] , values of k , k ∈ N ∪ {0} depend on n, p, β, ε

� LIR result U = {f ∈ F : r f ,(k+1) ≤ qLRM} , where qLRM = inff∈F

r f ,(k)

� if there is a unique f with r f ,(k) = qLRM , it is optimal according to the LRMcriterion and called fLRM ; LRM means Likelihood-based Region Minimax

� further details in: M. Cattaneo, A. Wiencierz (2012). Likelihood-basedImprecise Regression. Int. J. Approx. Reasoning 53. 1137-1154.

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

of the p-quantile of the distribution of Rf for each f ∈ F

� likQRfis a stepwise constant function with points of discontinuity at:

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

� ((Xi ,Yi ),V∗i )

� Yi = f (Xi ) , f ∈ F =

{fa,b :

R → RX 7→ a + b X

, a, b ∈ R}

r f ,(k)

Statistical properties of the LIR method

� the presented method for linear LIR generalizes Least Quantile of Squares(LQS) regression to imprecise data and to accounting directly for statisticaluncertainty

� very robust due to nonparametric probability model and quantiles,

breakdown-point ε∗ = min{k,n−k}n

n→∞−→ min{p, 1− p} − ε

� exact confidence level of Cf :

infP∈Pε

P(Cf 3 QRf) =

k∑k=k+1

)pk (1− p)n−k ε = 0

k∑k=k+1

)(p + ε)k (1− (p + ε))n−k ε > 0, p ≤ 0.5

k∑k=k+1

)(p − ε)k (1− (p − ε))n−k ε > 0, p > 0.5

n→∞−→ min{p, 1− p} − ε

infP∈Pε

P(Cf 3 QRf) =

k∑k=k+1

)pk (1− p)n−k ε = 0

k∑k=k+1

)(p + ε)k (1− (p + ε))n−k ε > 0, p ≤ 0.5

k∑k=k+1

)(p − ε)k (1− (p − ε))n−k ε > 0, p > 0.5

n→∞−→ min{p, 1− p} − ε� exact confidence level of Cf :

infP∈Pε

P(Cf 3 QRf) =

k∑k=k+1

)pk (1− p)n−k ε = 0

k∑k=k+1

)(p + ε)k (1− (p + ε))n−k ε > 0, p ≤ 0.5

k∑k=k+1

)(p − ε)k (1− (p − ε))n−k ε > 0, p > 0.5

n→∞−→ min{p, 1− p} − ε

infP∈Pε

P(Cf 3 QRf) =

k∑k=k+1

)pk (1− p)n−k ε = 0

k∑k=k+1

)(p + ε)k (1− (p + ε))n−k ε > 0, p ≤ 0.5

k∑k=k+1

)(p − ε)k (1− (p − ε))n−k ε > 0, p > 0.5

infP∈Pε

P(Cf 3 QRf) =

k∑k=k+1

)pk (1− p)n−k ε = 0

k∑k=k+1

)(p + ε)k (1− (p + ε))n−k ε > 0, p ≤ 0.5

k∑k=k+1

)(p − ε)k (1− (p − ε))n−k ε > 0, p > 0.5

infP∈Pε

P(Cf 3 QRf) =

k∑k=k+1

)pk (1− p)n−k ε = 0

k∑k=k+1

)(p + ε)k (1− (p + ε))n−k ε > 0, p ≤ 0.5

k∑k=k+1

)(p − ε)k (1− (p − ε))n−k ε > 0, p > 0.5

Implementation: Exact algorithm for simple linear LIR

� aim: determine the setof undominatedfunctions U = {f ∈ F :r f ,(k+1) ≤ qLRM}

� 1st step: find qLRM

� B fLRM ,qLRM(blue dashed

lines) is the thinnestband containing at leastk imprecise data

� here β = 0.8, p = 0.6,n = 17 , and k = 12

−2 0 2 4 6

●●●●

●●

� here β = 0.8, p = 0.6,n = 17 , and k = 12

−2 0 2 4 6

●●●●

●●

� here β = 0.8, p = 0.6,n = 17 , and k = 12

−2 0 2 4 6

●●●●

●●

� here β = 0.8, p = 0.6,n = 17 , and k = 12

−2 0 2 4 6

●●●●

●●

Implementation: Exact algorithm - Part 1

� some of the included kimprecise data touchthe border of B fLRM ,qLRM

in 3 different points

� bLRM can be any slopedetermined by thecorresponding cornerpoints of 2 imprecisedata or 0

� B: set of all 4(n2

possible values for bLRM

� for each b ∈ B findab ∈ R for whichr fab,b,(k)

is minimal

� qLRM = minb∈B

r fab,b

−2 0 2 4 6

●●●●

●●

is minimal

� qLRM = minb∈B

r fab,b

−2 0 2 4 6

●●●●

●●

is minimal

� qLRM = minb∈B

r fab,b

−2 0 2 4 6

●●●●

●●

is minimal

� qLRM = minb∈B

r fab,b

−2 0 2 4 6

●●●●

●●

is minimal

� qLRM = minb∈B

r fab,b

−2 0 2 4 6

●●●●

●●

� step 2: determine U

� if f ∈ U , then B f ,qLRM

intersects at least k + 1imprecise data

� here k = 8

� for each b ∈ R find theof intercept valuesa ∈ R, for whichr fa,b,(k+1) ≤ qLRM

� U can also berepresented by thecorresponding subset ofthe parameter space

−2 0 2 4 6

●●●●

●●

� step 2: determine U� if f ∈ U , then B f ,qLRM

� here k = 8

−2 0 2 4 6

●●●●

●●

� here k = 8

−2 0 2 4 6

●●●●

●●

� here k = 8

−2 0 2 4 6

●●●●

●●

Set of undominated parameters

−2.5 −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

Implementation of the algorithm in R

Recapitulation: Exact algorithm for simple linear LIR with interval data

� the 1st part of the algorithm generalizes the exact algorithm for LQSregression

� the presented algorithm has computational complexity O(n3 log n)

� further details in: M. Cattaneo, A. Wiencierz (2012). On the implementationof LIR: the case of simple linear regression with interval data. TechnicalReport 127. Department of Statistics. LMU Munich.

linLIR package

� linLIR: linear Likelihood-based Imprecise Regression, available at CRAN:http://cran.r-project.org/

� function to plot 2-dimensional interval data set

� s.linlir function implements the exact algorithm

� further tools to summarize and visualize results

linLIR package

Example

� 2-dimensional intervaldata set of n = 514observations

� LIR analysis with p =0.5, β = 0.26, ε = 0

� k = 238 , k = 276

−10 0 10 20 30

●●●● ●

●●●●

●●

●●●●

●●

●●●●

●●

● ●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

● ●

●●●●

●●●●●

●●

●●●●

●●

● ●●●●

●●

●●●●

●●

●●●●

●●

Example

� k = 238 , k = 276

−10 0 10 20 30

●●●● ●

●●●●

●●

●●●●

●●

●●●●

●●

● ●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

● ●

●●●●

●●●●●

●●

●●●●

●●

● ●●●●

●●

●●●●

●●

●●●●

●●

Example

� k = 238 , k = 276

−10 0 10 20 30

●●●● ●

●●●●

●●

●●●●

●●

●●●●

●●

● ●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

● ●

●●●●

●●●●●

●●

●●●●

●●

● ●●●●

●●

●●●●

●●

●●●●

●●

Example

� k = 238 , k = 276

� obtained set ofundominated functions

−10 0 10 20 30

●●●● ●

●●●●

●●

●●●●

●●

●●●●

●●

● ●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

●●

●●●●

● ●

●●●●

●●●●●

●●

●●●●

●●

● ●●●●

●●

●●●●

●●

●●●●

●●

Example

� k = 238 , k = 276

� obtained set ofundominated functions

� obtained set ofparameters

0.0 0.5 1.0 1.5 2.0

Summary and Outlook

� the LIR approach provides a very robust regression method for impreciselyobserved variables

� the imprecise result of the LIR analysis is the set of all functions that areplausible relations of X and Y in the light of the imprecise observations

� for the special case of simple linear regression with interval data, wedeveloped an exact algorithm to determine the set of all undominatedfunctions

� the exact algorithm is implemented in R as part of the linLIR package

� current / future work:

� further investigate statistical properties, in particular the confidence level of U� generalize algorithm to multiple linear regression

Summary and Outlook

� current / future work:� further investigate statistical properties, in particular the confidence level of U

� generalize algorithm to multiple linear regression

Summary and Outlook

� current / future work:� further investigate statistical properties, in particular the confidence level of U� generalize algorithm to multiple linear regression

Linear regression with interval data: the LIR approach · 2020-01-06 · Linear regression with...

Documents

Mehrfache und polynomiale Regression

Regression - semwiso.userweb.mwn.desemwiso.userweb.mwn.de/schaetzentesten1-ws0910/material/regression-kap... · Vorwort Regression ist die wohl am h ¨auﬁgsten eingesetzte statistische

05/06 - Verkehrserzeugung und Verkehrsverteilung - ethz.ch · Aggregat Klassifizierung Regression Regression Parzelle/Adresse Regression Regression 29 Abschätzung der Menge der angezogenen

Stückweise lineare Regression - nadirpoint.denadirpoint.de/Stueckweise_lineare_Regression.pdf · Herleitung und Durchführung der – diskreten, stückweisen, linearen Regression

Ausgleichungsrechnung II Gerhard Navratil Regression und Kollokation Regression –Lineare Regression Kovarianzfunktion Kollokation –Ansatz –Schätzung der

Wägen an der Kasse - mt.com · Dual-Interval-Waage Ariva-S Die Dual-Interval-Waage Ariva-S für das Wägen an der Kasse ist die ideale Lösung für Einzelhändler, die eine hohe

stat.ethz.chstat.ethz.ch/education/semesters/ss2012/regression/Regression.pdf · Inhaltsverzeichnis 1 Lineare Regression 5 1.1 Einf uhrung: Fragestellung . . . . . . . . . . . .

Die einfache/multiple lineare Regression

dT% lir ffiH frR6u %;Ln GAS TRANSMISSION COMPANY LTf). … · 2019-10-07 · dT% %;Ln GAS TRANSMISSION COMPANY LIMITED?1l lir ffi"H cqpr'ttft frR6u @ (celGr{Rqr{ q+E r+t=ttft) GAS

4 Regression - Uni Kiel · 4 Regression 4.1 Univariate multiple Regression Dieses Kapitel behandelt ein Thema, das aus der Elementarstatistik weitgehend bekannt ist, n˜amlich die

Visual regression test

Lineare Regression - ChristianHerta

Teil: lineare Regression - TU Dresden · Teil: lineare Regression . 1 Einführung 2 Prüfung der Regressionsfunktion 3 Die Modellannahmen zur Durchführung einer linearen Regression

Polynomiale Regression - Künstliche neuronale Netze · Inhaltsverzeichnis 1 Polynomiale Regression 2 Die Stufenfunktion 3 Die Basisfunktion 4 Spline-Regression 5 Literatur Marina

Logistische Regression - - - - - 24. Juni 2011 - bibb.de · PDF fileLogistische Regression • Die logistische Regression ist ein Verfahren zur multivariaten Analyse nicht-metrischer

Lineare Regression - SfSstat.ethz.ch/~stahel/courses/regression/reg1-script.pdf · Lineare Regression Werner Stahel Seminar f ur Statistik, ETH Z urich Mai 2008 / Sept. 2013 Unterlagen

Bivariate Und Multiple Lineare Regression

Angewandte statistische Regression II Vorlesung 3 · Gesamttest für die Regression Null-Devianz Residuen-Devianz. Gesamttest für die Regression (Beispiel: Aderverengung) Null-Devianz

3. Regression · Multiple lineare Regression § Multiple lineare Regression nimmt an, dass sich das abhängige Merkmal y als lineare Kombination der unabhängigen Merkmale x (,j)

Regression – ein kleiner Rückblick