56
Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron, Marie Chavent & Nathalie Viguerie [email protected] http://www.nathalievilla.org COMPSTAT - August 25th, 2016 Nathalie Villa-Vialaneix | SIR-STATIS 1/24

Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Multiway-SIR for Longitudinal Multi-table DataIntegration

Nathalie Villa-Vialaneix, Valérie Sautron, Marie Chavent & NathalieViguerie

[email protected]://www.nathalievilla.org

COMPSTAT - August 25th, 2016

Nathalie Villa-Vialaneix | SIR-STATIS 1/24

Page 2: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

Nathalie Villa-Vialaneix | SIR-STATIS 2/24

Page 3: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

Nathalie Villa-Vialaneix | SIR-STATIS 3/24

Page 4: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

A longitudinal study on weight loss induced by a lowcalorie diet

Data collected during EU project, 8 centers, 450 familiesPurpose: effect of glycemic index, protein content... onweight maintenance after a diet for obese people

Data description: at each Clinical Intervention Day (CID), measure of 15clinical variables (weight, HDL, ...) on 135 obese womenTargeted problem: exploratory data analysis aimed at explaining thesuccess/failure of the diet (in term of weight loss/regain)

Nathalie Villa-Vialaneix | SIR-STATIS 4/24

Page 5: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

A longitudinal study on weight loss induced by a lowcalorie diet

Data collected during EU project, 8 centers, 450 familiesPurpose: effect of glycemic index, protein content... onweight maintenance after a diet for obese people

Data description: at each Clinical Intervention Day (CID), measure of 15clinical variables (weight, HDL, ...) on 135 obese women

Targeted problem: exploratory data analysis aimed at explaining thesuccess/failure of the diet (in term of weight loss/regain)

Nathalie Villa-Vialaneix | SIR-STATIS 4/24

Page 6: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

A longitudinal study on weight loss induced by a lowcalorie diet

Data collected during EU project, 8 centers, 450 familiesPurpose: effect of glycemic index, protein content... onweight maintenance after a diet for obese people

Data description: at each Clinical Intervention Day (CID), measure of 15clinical variables (weight, HDL, ...) on 135 obese womenTargeted problem: exploratory data analysis aimed at explaining thesuccess/failure of the diet (in term of weight loss/regain)

Nathalie Villa-Vialaneix | SIR-STATIS 4/24

Page 7: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Data description and notations

X..t

n × p-matrix

(n > p)

for

t = 1, . . . , T

exploratory variables(clinical)

y

numeric vector of length ntarget variable

(evolution of weight:weight3−weight1

weight1)

Nathalie Villa-Vialaneix | SIR-STATIS 5/24

Page 8: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Method presented in this talk

Features:

1 longitudinal data integration (with T small; here T = 3)

2 multidimensional data analysis with simple graphical representations(similar to PCA)

3 designed to highlight differences in the numeric target variable

4 designed to highlight differences/commonalities between variablestructure (rather than individual structure) between the different timesteps

Nathalie Villa-Vialaneix | SIR-STATIS 6/24

Page 9: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Method presented in this talk

Features:

1 longitudinal data integration (with T small; here T = 3)

2 multidimensional data analysis with simple graphical representations(similar to PCA)

3 designed to highlight differences in the numeric target variable

4 designed to highlight differences/commonalities between variablestructure (rather than individual structure) between the different timesteps

Nathalie Villa-Vialaneix | SIR-STATIS 6/24

Page 10: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Method presented in this talk

Features:

1 longitudinal data integration (with T small; here T = 3)

2 multidimensional data analysis with simple graphical representations(similar to PCA)

3 designed to highlight differences in the numeric target variable

4 designed to highlight differences/commonalities between variablestructure (rather than individual structure) between the different timesteps

Nathalie Villa-Vialaneix | SIR-STATIS 6/24

Page 11: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Method presented in this talk

Features:

1 longitudinal data integration (with T small; here T = 3)

2 multidimensional data analysis with simple graphical representations(similar to PCA)

3 designed to highlight differences in the numeric target variable

4 designed to highlight differences/commonalities between variablestructure (rather than individual structure) between the different timesteps

Nathalie Villa-Vialaneix | SIR-STATIS 6/24

Page 12: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

Nathalie Villa-Vialaneix | SIR-STATIS 7/24

Page 13: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Factor analysis methods for multi-tables data integration

Standard methods to analyze data such as (X..t )t=1,...,T :

Multiple Factor Analysis (MFA) [Escofier and Pagès, 2008]

: weightsfor tables X..t according to a measure of their variability (using firsteigenvector of the PCA);

STATIS and DUAL STATIS [Lavit et al., 1994, Abdi et al., 2012]

: largerimportance to the most consensual tables⇒ common representationwhich is the best overall compromise. STATIS: compromise accordingto individuals and DUAL STATIS: compromise according tocorrelations between variables.

Nathalie Villa-Vialaneix | SIR-STATIS 8/24

Page 14: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Factor analysis methods for multi-tables data integration

Standard methods to analyze data such as (X..t )t=1,...,T :

Multiple Factor Analysis (MFA) [Escofier and Pagès, 2008]: weightsfor tables X..t according to a measure of their variability (using firsteigenvector of the PCA);

STATIS and DUAL STATIS [Lavit et al., 1994, Abdi et al., 2012]

: largerimportance to the most consensual tables⇒ common representationwhich is the best overall compromise. STATIS: compromise accordingto individuals and DUAL STATIS: compromise according tocorrelations between variables.

Nathalie Villa-Vialaneix | SIR-STATIS 8/24

Page 15: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Factor analysis methods for multi-tables data integration

Standard methods to analyze data such as (X..t )t=1,...,T :

Multiple Factor Analysis (MFA) [Escofier and Pagès, 2008]: weightsfor tables X..t according to a measure of their variability (using firsteigenvector of the PCA);

STATIS and DUAL STATIS [Lavit et al., 1994, Abdi et al., 2012]: largerimportance to the most consensual tables⇒ common representationwhich is the best overall compromise.

STATIS: compromise accordingto individuals and DUAL STATIS: compromise according tocorrelations between variables.

Nathalie Villa-Vialaneix | SIR-STATIS 8/24

Page 16: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Factor analysis methods for multi-tables data integration

Standard methods to analyze data such as (X..t )t=1,...,T :

Multiple Factor Analysis (MFA) [Escofier and Pagès, 2008]: weightsfor tables X..t according to a measure of their variability (using firsteigenvector of the PCA);

STATIS and DUAL STATIS [Lavit et al., 1994, Abdi et al., 2012]: largerimportance to the most consensual tables⇒ common representationwhich is the best overall compromise. STATIS: compromise accordingto individuals and DUAL STATIS: compromise according tocorrelations between variables.

Nathalie Villa-Vialaneix | SIR-STATIS 8/24

Page 17: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overall description of DUAL STATIS

Data: X =

X..1...

X..T

same number of variables, potentially

different number of individualsdata are supposed centered and

scaled for all t

inter-structure analysisanalysis of C̃ (T × T similarity

matrix)

table weights(α1, . . . , αT )

compromise matrixΓc =

∑Tt=1 αt Γt in which Γt is the

covariance matrix of X..t

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 9/24

Page 18: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overall description of DUAL STATIS

Data: X =

X..1...

X..T

same number of variables, potentially

different number of individualsdata are supposed centered and

scaled for all t

inter-structure analysisanalysis of C̃ (T × T similarity

matrix)

table weights(α1, . . . , αT )

compromise matrixΓc =

∑Tt=1 αt Γt in which Γt is the

covariance matrix of X..t

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 9/24

Page 19: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overall description of DUAL STATIS

Data: X =

X..1...

X..T

same number of variables, potentially

different number of individualsdata are supposed centered and

scaled for all t

inter-structure analysisanalysis of C̃ (T × T similarity

matrix)

table weights(α1, . . . , αT )

compromise matrixΓc =

∑Tt=1 αt Γt in which Γt is the

covariance matrix of X..t

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 9/24

Page 20: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overall description of DUAL STATIS

Data: X =

X..1...

X..T

same number of variables, potentially

different number of individualsdata are supposed centered and

scaled for all t

inter-structure analysisanalysis of C̃ (T × T similarity

matrix)

table weights(α1, . . . , αT )

compromise matrixΓc =

∑Tt=1 αt Γt in which Γt is the

covariance matrix of X..t

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 9/24

Page 21: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysis

Notations: ∀ t = 1, . . . , T , Γt = 1n X>..tX..t and Γ̃t = Γt

‖Γt ‖F

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γt , Γ̃t ′〉F

(cosinus between Γt and Γt ′).Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃t

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γc =∑T

t=1 αt Γ̃t

Nathalie Villa-Vialaneix | SIR-STATIS 10/24

Page 22: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysis

Notations: ∀ t = 1, . . . , T , Γt = 1n X>..tX..t and Γ̃t = Γt

‖Γt ‖F

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γt , Γ̃t ′〉F

(cosinus between Γt and Γt ′).

Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃t

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γc =∑T

t=1 αt Γ̃t

Nathalie Villa-Vialaneix | SIR-STATIS 10/24

Page 23: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysis

Notations: ∀ t = 1, . . . , T , Γt = 1n X>..tX..t and Γ̃t = Γt

‖Γt ‖F

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γt , Γ̃t ′〉F

(cosinus between Γt and Γt ′).Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃t

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γc =∑T

t=1 αt Γ̃t

Nathalie Villa-Vialaneix | SIR-STATIS 10/24

Page 24: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysis

Notations: ∀ t = 1, . . . , T , Γt = 1n X>..tX..t and Γ̃t = Γt

‖Γt ‖F

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γt , Γ̃t ′〉F

(cosinus between Γt and Γt ′).Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃t

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γc =∑T

t=1 αt Γ̃t

Nathalie Villa-Vialaneix | SIR-STATIS 10/24

Page 25: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (X̃, Ip ,D) with X̃..t = X..t√

‖Γt ‖Fand D = 1

n Diag(α1, . . . , αT ) ⊗ In

⇔ eigendecomposition of Γc

Solution writes:

X̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:observations: time-specific positions on the k -th principal component:

k -th column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 11/24

Page 26: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (X̃, Ip ,D) with X̃..t = X..t√

‖Γt ‖Fand D = 1

n Diag(α1, . . . , αT ) ⊗ In

⇔ eigendecomposition of Γc

Solution writes:

X̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:observations: time-specific positions on the k -th principal component:

k -th column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 11/24

Page 27: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (X̃, Ip ,D) with X̃..t = X..t√

‖Γt ‖Fand D = 1

n Diag(α1, . . . , αT ) ⊗ In

⇔ eigendecomposition of Γc

Solution writes:

X̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:observations: time-specific positions on the k -th principal component:

k -th column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 11/24

Page 28: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (X̃, Ip ,D) with X̃..t = X..t√

‖Γt ‖Fand D = 1

n Diag(α1, . . . , αT ) ⊗ In

⇔ eigendecomposition of Γc

Solution writes:

X̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:observations: time-specific positions on the k -th principal component:

k -th column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 11/24

Page 29: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (X̃, Ip ,D) with X̃..t = X..t√

‖Γt ‖Fand D = 1

n Diag(α1, . . . , αT ) ⊗ In

⇔ eigendecomposition of Γc

Solution writes:

X̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:observations: time-specific positions on the k -th principal component:

k -th column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 11/24

Page 30: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

SIR Regression framework

[Li, 1991]Y = f(X>a1, . . . , X>ad , ε)

for d < p and f : Rd+1 → R, an arbitrary (non linear) function.

SY |X = Span{a1, . . . , ad} is: Effective Dimension Reduction (EDR) space.

Equivalence between SIR and eigendecompositionSY |X is included in the space spanned by the first d Γ-orthogonaleigenvectors of the Γe , with Γ = E

[(X − E(X))T X

]and

Γe = E(E(Z |Y)TE(Z |Y)

)for Z = Γ−1/2(X − E(X)).

Nathalie Villa-Vialaneix | SIR-STATIS 12/24

Page 31: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

SIR Regression framework

[Li, 1991]Y = f(X>a1, . . . , X>ad , ε)

for d < p and f : Rd+1 → R, an arbitrary (non linear) function.

SY |X = Span{a1, . . . , ad} is: Effective Dimension Reduction (EDR) space.

Equivalence between SIR and eigendecompositionSY |X is included in the space spanned by the first d Γ-orthogonaleigenvectors of the Γe , with Γ = E

[(X − E(X))T X

]and

Γe = E(E(Z |Y)TE(Z |Y)

)for Z = Γ−1/2(X − E(X)).

Nathalie Villa-Vialaneix | SIR-STATIS 12/24

Page 32: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

SIR in practice

Estimation (when n > p)

compute x̄ = 1n∑n

i=1 xi and Γ = 1n XT (X − x̄)

split the range of Y into H different slices: τ1, ... τH and estimate

G =

1nh

∑i: yi∈τh

zi

h=1,...,H

with nh = |{i : yi ∈ τh}| and

Γe = G>MG

with M = Diag(

n1n , . . . ,

nHn

)generalized eigendecomposition of Γe with norm Γ

Nathalie Villa-Vialaneix | SIR-STATIS 13/24

Page 33: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

SIR in practice

Estimation (when n > p)

compute x̄ = 1n∑n

i=1 xi and Γ = 1n XT (X − x̄)

split the range of Y into H different slices: τ1, ... τH and estimate

G =

1nh

∑i: yi∈τh

zi

h=1,...,H

with nh = |{i : yi ∈ τh}| and

Γe = G>MG

with M = Diag(

n1n , . . . ,

nHn

)

generalized eigendecomposition of Γe with norm Γ

Nathalie Villa-Vialaneix | SIR-STATIS 13/24

Page 34: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

SIR in practice

Estimation (when n > p)

compute x̄ = 1n∑n

i=1 xi and Γ = 1n XT (X − x̄)

split the range of Y into H different slices: τ1, ... τH and estimate

G =

1nh

∑i: yi∈τh

zi

h=1,...,H

with nh = |{i : yi ∈ τh}| and

Γe = G>MG

with M = Diag(

n1n , . . . ,

nHn

)generalized eigendecomposition of Γe with norm Γ

Nathalie Villa-Vialaneix | SIR-STATIS 13/24

Page 35: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Back to our problem: multiway SIR

Basic ideas:

perform DUAL STATIS analysis on center of gravity of the slices (G..t )instead of the original variables

compromise analysis is similar to finding a compromise EDR space

using slices make the method similar to FDA but other estimates of Γet

(not based on slices) could be used

Nathalie Villa-Vialaneix | SIR-STATIS 14/24

Page 36: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overview of multiway SIR

Data: X̃ =

X..1...

X..T

and

y = (y1, . . . , yn)>same number of variables, potentially

different number of individuals

inter-structure analysisanalysis of C̃ (T × T similarity

matrix between the covariance

matrix of G..t , Γet )

table weights(α1, . . . , αT )

compromise matrixΓe,c =

∑Tt=1 αt Γ

et

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 15/24

Page 37: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overview of multiway SIR

Data: X̃ =

X..1...

X..T

and

y = (y1, . . . , yn)>same number of variables, potentially

different number of individuals

inter-structure analysisanalysis of C̃ (T × T similarity

matrix between the covariance

matrix of G..t , Γet )

table weights(α1, . . . , αT )

compromise matrixΓe,c =

∑Tt=1 αt Γ

et

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 15/24

Page 38: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overview of multiway SIR

Data: X̃ =

X..1...

X..T

and

y = (y1, . . . , yn)>same number of variables, potentially

different number of individuals

inter-structure analysisanalysis of C̃ (T × T similarity

matrix between the covariance

matrix of G..t , Γet )

table weights(α1, . . . , αT )

compromise matrixΓe,c =

∑Tt=1 αt Γ

et

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 15/24

Page 39: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Overview of multiway SIR

Data: X̃ =

X..1...

X..T

and

y = (y1, . . . , yn)>same number of variables, potentially

different number of individuals

inter-structure analysisanalysis of C̃ (T × T similarity

matrix between the covariance

matrix of G..t , Γet )

table weights(α1, . . . , αT )

compromise matrixΓe,c =

∑Tt=1 αt Γ

et

eig. dec.GSVD

compromise analysisrepresentations of variables and of individualscompromise positions or time-dependant positions

Nathalie Villa-Vialaneix | SIR-STATIS 15/24

Page 40: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysisNotations: ∀ t = 1, . . . , T , Γt = 1

n X>..t (X..t − 1nx>t ), Z..t =(X..t − 1nx>t

)Γ−1/2

t ,

G..t =(

1nh

∑i: yi∈τh

zi

)h=1,...,H

and Γet = G>..tM G..t and Γ̃e

t =Γe

t‖Γe

t ‖F.

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γet , Γ̃

et ′〉F

(cosinus between Γet and Γe

t ′).Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃et

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γe,c =∑T

t=1 αt Γ̃et

Nathalie Villa-Vialaneix | SIR-STATIS 16/24

Page 41: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysisNotations: ∀ t = 1, . . . , T , Γt = 1

n X>..t (X..t − 1nx>t ), Z..t =(X..t − 1nx>t

)Γ−1/2

t ,

G..t =(

1nh

∑i: yi∈τh

zi

)h=1,...,H

and Γet = G>..tM G..t and Γ̃e

t =Γe

t‖Γe

t ‖F.

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γet , Γ̃

et ′〉F

(cosinus between Γet and Γe

t ′).

Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃et

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γe,c =∑T

t=1 αt Γ̃et

Nathalie Villa-Vialaneix | SIR-STATIS 16/24

Page 42: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysisNotations: ∀ t = 1, . . . , T , Γt = 1

n X>..t (X..t − 1nx>t ), Z..t =(X..t − 1nx>t

)Γ−1/2

t ,

G..t =(

1nh

∑i: yi∈τh

zi

)h=1,...,H

and Γet = G>..tM G..t and Γ̃e

t =Γe

t‖Γe

t ‖F.

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γet , Γ̃

et ′〉F

(cosinus between Γet and Γe

t ′).Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃et

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γe,c =∑T

t=1 αt Γ̃et

Nathalie Villa-Vialaneix | SIR-STATIS 16/24

Page 43: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Inter-structure analysisNotations: ∀ t = 1, . . . , T , Γt = 1

n X>..t (X..t − 1nx>t ), Z..t =(X..t − 1nx>t

)Γ−1/2

t ,

G..t =(

1nh

∑i: yi∈τh

zi

)h=1,...,H

and Γet = G>..tM G..t and Γ̃e

t =Γe

t‖Γe

t ‖F.

Similarity between time steps: C̃ = (ctt ′)t ,t ′=1,...,T with

ctt ′ = 〈̃Γet , Γ̃

et ′〉F

(cosinus between Γet and Γe

t ′).Inter-structure analysis: Solve

u1 = arg maxu=(u1,...,uT )>∈RT , ‖u‖=1

∥∥∥∥∥∥∥T∑

t=1

ut Γ̃et

∥∥∥∥∥∥∥2

F

Solution: u1 is the first eigenvector of C̃⇒ compromise weights:

∀ t = 1, . . . , T , αt = u1t∑t′ u1t′

Γe,c =∑T

t=1 αt Γ̃et

Nathalie Villa-Vialaneix | SIR-STATIS 16/24

Page 44: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (G̃, Ip ,D) with G̃..t = G..t√

‖Γet ‖F

and D = Diag(α1, . . . , αT ) ⊗M

⇔ eigendecomposition of Γe,c

Solution writes:

G̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:slices: time-specific positions on the k -th principal component: k -th

column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 17/24

Page 45: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (G̃, Ip ,D) with G̃..t = G..t√

‖Γet ‖F

and D = Diag(α1, . . . , αT ) ⊗M

⇔ eigendecomposition of Γe,c

Solution writes:

G̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:slices: time-specific positions on the k -th principal component: k -th

column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 17/24

Page 46: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (G̃, Ip ,D) with G̃..t = G..t√

‖Γet ‖F

and D = Diag(α1, . . . , αT ) ⊗M

⇔ eigendecomposition of Γe,c

Solution writes:

G̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:slices: time-specific positions on the k -th principal component: k -th

column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 17/24

Page 47: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (G̃, Ip ,D) with G̃..t = G..t√

‖Γet ‖F

and D = Diag(α1, . . . , αT ) ⊗M

⇔ eigendecomposition of Γe,c

Solution writes:

G̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:slices: time-specific positions on the k -th principal component: k -th

column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 17/24

Page 48: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Compromise analysisGPCA of (G̃, Ip ,D) with G̃..t = G..t√

‖Γet ‖F

and D = Diag(α1, . . . , αT ) ⊗M

⇔ eigendecomposition of Γe,c

Solution writes:

G̃ = PΛQ> with P>DP = Q>Q = Ir and Λ = Diag(√λk )k=1,...,r

Representations of:slices: time-specific positions on the k -th principal component: k -th

column of F = PΛ =

F1...

FT

compromise positions on the k -th principal component: k -th columnof Fc =

∑Tt=1 αtF..t

variables: compromise positions on circle of correlations for variable jon k -th axis:

√λkσ̂j

Qjk (time-specific positions can also be derived)

Nathalie Villa-Vialaneix | SIR-STATIS 17/24

Page 49: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Sommaire

1 Background and motivation

2 Presentation of Multiway-SIR

3 Application

Nathalie Villa-Vialaneix | SIR-STATIS 18/24

Page 50: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Application of multiway SIR on Diogenes dataset I

H = 5, interstructure analysis:

CID1 and CID2 have more similar Γet than CID3.

Nathalie Villa-Vialaneix | SIR-STATIS 19/24

Page 51: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Application of multiway SIR on Diogenes dataset II

H = 5, compromise analysis: number of components

2 components are enough but we will analyze 4 for a deeperunderstanding of the data.

Nathalie Villa-Vialaneix | SIR-STATIS 20/24

Page 52: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Application of multiway SIR on Diogenes dataset III

H = 5, compromise analysis: analysis of the slices (compromise)

Nathalie Villa-Vialaneix | SIR-STATIS 21/24

Page 53: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Application of multiway SIR on Diogenes dataset IV

H = 5, compromise analysis: analysis of the variable (compromise)

Nathalie Villa-Vialaneix | SIR-STATIS 22/24

Page 54: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Application of multiway SIR on Diogenes dataset V

H = 5, compromise analysis: longitudinal analysis

Nathalie Villa-Vialaneix | SIR-STATIS 23/24

Page 55: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Conclusion

We have presented an exploratory analysis method

based on DUAL STATIS

able to focus on a numeric variable of interest similarly to SIR

able to explain longitudinal evolutions when the number of time stepsis small

Future work

an R package is under development

only valid for n ≥ p: regularization approach is under study to allowsfor the analysis of n < p cases

Nathalie Villa-Vialaneix | SIR-STATIS 24/24

Page 56: Multiway-SIR for Longitudinal Multi-table Data Integration · 2020-07-23 · Multiway-SIR for Longitudinal Multi-table Data Integration Nathalie Villa-Vialaneix, Valérie Sautron,

Abdi, H., Williams, L., Valentin, D., and Bennani-Dosse, M. (2012).STATIS and DISTATIS: optimum multitable principal component analysis and three way metric multidimensional scaling.Wiley Interdisciplinary Reviews: Computational Statistics, 4(2):124–167.

Escofier, B. and Pagès, J. (2008).Analyses Factorielles Simples et Multiples.Dunod.

Lavit, C., Escoufier, Y., Sabatier, R., and Traissac, P. (1994).The ACT (STATIS method).Computational Statistics and Data Analysis, 18(1):97–119.

Li, K. (1991).Sliced inverse regression for dimension reduction.Journal of the American Statistical Association, 86(414):316–342.

Nathalie Villa-Vialaneix | SIR-STATIS 24/24