49
Scalable infrastructures for personalization Anne-Marie Kermarrec

2706 system 1 googlemobii bx dbb

Tags:

Embed Size (px)

DESCRIPTION

Jssj bcb x nsns

Citation preview

Page 1: 2706 system 1 googlemobii bx dbb

Scalable infrastructures for personalization

Anne-Marie Kermarrec

Page 2: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Inria, France: 8 research centers, 150 research teams

June 2014

Page 3: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria) June 2014

Page 4: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

A cry for personalization

June 2014

Page 5: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Why is personalization so difficult?

• Huge volume of data: small portion of interest

• Dynamic interests

• Interesting stuff does not come always from friends

• Classical notification systems do not filter enough or too much

Scalable personalization infrastructures

June 2014

Page 6: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

KNN computation over large data

Basic building block for many applications

• Similarity search

• Machine learning

• Data mining

• Image processing

• Collaborative filtering

June 2014

Page 7: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

KNN-based user-centric collaborative filtering

Provide each user with her k closest neighbors

(Users owns a profile, the system has its favorite similarity metric)

Use this topology for

• personalized notifications

• recommendation

AliceBob

Carl

Dave

Ellie

June 2014

Page 8: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Dealing with truly big data

Want to scale? Think P2P

June 2014

Page 9: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Do not look exhaustively

June 2014

Page 10: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

The key to scalability in KNN graph construction

Consider a partial set of candidates

Sampling-based approach

June 2014

Page 11: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

P2P KNN graph construction

Which nodes are close?

How to discover them?

Similarity metric

Sampling

June 2014

Page 12: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Which nodes are close?

Model

U(sers) × I(tems) (items)

Profile(u) = vector of liked/shared/viewed items

Cosine similarity metric Jaccard metric

Minimal information: no tag, no user’s input, generic June 2014

Page 13: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Each node maintains a set of neighbors (c entries)

Peer exchange

ShuffleP Q

How to discover them: Gossip-based computing

Result random graph

Highly resilient against churn, partition

Small diameter

[JGKVV, ACM TOCS 2007]

June 2014

Page 14: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

KNN construction

Similarity computation

exchange ofneighbors lists

neighborhoodoptimization1 2

Alice Bob

Carl

DaveEllie

Frank

June 2014

Page 15: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Decentralized KNN selection

[FGKL Middleware 2010]

RPS layer providing random sampling

KPS clustering layer gossip-based topology clustering

Interest-based linkRandom link

AliceBob

Carl

Dave

Ellie

AliceBob

Carl

Dave

Ellie

June 2014

Page 16: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Convergence

Cycles

c current neighbors versus the c closest

Biasedsampling

Random sampling

June 2014

Page 17: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Applications

- Decentralized news recommendation [BFGJK, IPDPS

2013]

- Top-K [BGKL, ACM TODS 2011] [BGK, ACM TOIT 2014]

- Geo recommendation [BKKT, ICDCS 2012]

June 2014

Page 18: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

DECENTRALIZED NEWS RECOMMENDER

Notification is taking over

June 2014

Page 19: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

What’s wrong with news feed

Interest are dynamic

Wrong granularity for filtering of classical notification systems

Small portion of the available information is of interest

Interesting stuff does not come always from friends

An implicit notification system

based on collaborative filtering

June 2014

Page 20: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

WhatsUp in a nutshell

KNN selection

Dissemination

June 2014

Page 21: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Dissemination: orientation and amplification

Orientation: to whom?

Exploit: ForwardTo friends

Explore: Forward to random users

Amplification: to how many?

Increase Fanout(Log(n))

DecreaseFanout(1)

June 2014

Page 22: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

WhatsUp in action on the survey (480 users)

Precision Recall F1-Score Messages

Gossip (f=4) 0.34 0.99 0.51 2.3 M

Cosine-CF 0.50 0.65 0.57 5,9k

Whatsup (f=10)

0.471 0.83 0.60 2,4k

June 2014

Page 23: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Orientation (survey)

News items received through a dislike forward

Number of dislikes

0 1 2 3 4

Fraction of liked news

54% 31% 10% 3% 2%

June 2014

Page 24: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

WhatsUp versus Pub/Sub

Approach Precision Recall F1-Score

Pub/Sub 0.40 1.0 0.58

WhatsUp 0.47 0.83 0.60

June 2014

Page 25: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

WhatsUp versus cascading

Approach Precision Recall F1-ScoreCascading 0.57 0.09 0.16WhatsUp 0.56 0.57 0.57

June 2014

Page 26: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

PRIVACY MATTERS

June 2014

Page 27: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Privacy issues

During user clustering• Exchange of profile in clear

During item dissemination• Predictive nature of the protocol

Profile Obfuscation

Randomized dissemination

June 2014

Page 28: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Privacy

Obfuscation• Does not reveal the exact profile• Does not reveal the least sensitive information

Randomized dissemination• Flips the opinion with a given probability (pf)

June 2014

Page 29: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Structure profiles

Private Profile

Compact profile

In clear: Full information about the interests

Aggregate signatures of liked items

June 2014

Page 30: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Structure profiles

Private Profile

Compact profile

Filter profile

Item profile

Obfuscated profile

In clear: Full information about the interests

Aggregate signatures of liked items

Interests of users that like similar items

Least sensitive information about interests

Aggregate interests of users that liked it

June 2014

Page 31: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private profile

Profiles kept locally

June 2014

Page 32: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private profile Compact profile

News item(forwarded)

+

Profiles kept locally

Profiles exchanged with others

signatureitem

profile

June 2014

Page 33: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private Profile Compact Profile Filter Profile

Obfuscated ProfileNews item(forwarded)

x+

Profiles kept locally

Profiles exchanged with others

signatureitem

profileitem

profile

mask ofpopularity

System parameter

June 2014

Page 34: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Randomized dissemination

Flips the opinion with a given probability (pf)• Attacker could still learn from the profile• Private profile contains a field with the result of the randomized

decision

Generate Randomized compact profile• Users still use locally their non randomized profile for clustering• Differentially private protocol

June 2014

Page 35: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Experimental setup

Simulations and PlanetlabAlternatives• Cleartext profile (CT); • 2DP (DP dissemination and randomized profile for clustering)Metrics• Recommendation: recall/precision• Privacy: Distance between obfuscated profile and real profile; Dataset: Real survey, 120 users on 200 news items (4 instances)

June 2014

Page 36: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Impact of randomization

June 2014

Page 37: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Impact of randomization

Decrease of precision with increasing pf

June 2014

Page 38: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

http://131.254.213.98:8080/wup/

Operational prototype

Tested on 500 users @ TrentoRise last year

TRY IT

Take away message

Personalization is needed

Decentralization is healthy

Gossip-based computing is one (the) way to go

June 2014

Page 39: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

For those who are afraid of P2P

June 2014

Page 40: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Hybrid recommendation engine

June 2014

Page 41: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria) June 2014

Page 42: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

HyRec: Taking the best of both worlds

Online KNN selectionRestricted andidate set (k)No data stored at the client

HyRec client: Javascript (widget) running in the browser

June 2014

Page 43: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria) June 2014

Page 44: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

View similarity

June 2014

Dataset Users Items RatingsMovieLens1 943 1700 100,000

MovieLens2 6,040 4000 1,000,000

MovieLens3 69,878 10,000 10,000,000

Digg 59,167 7724 782,807

Page 45: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

Recommendation quality

June 2014

Page 46: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

HyRec versus the client load

Impact of HyRec Impact of the client load

Negligible disruption of HyRec 50% load<60ms on smartphone<10ms on laptop

June 2014

Page 47: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

HyRec versus a centralized recommender

Impact of the request stressImpact of the profile size

June 2014

Page 48: 2706 system 1 googlemobii bx dbb

A.-M. Kermarrec (Inria)

To take away

Personalization is crucial

P2P in a design mindset

Randomization and obfuscation provides a good tradeoff between privacy and quality

June 2014

Page 49: 2706 system 1 googlemobii bx dbb

Thank you