2706 system 1 googlemobii bx dbb

Tags:

Preview:

DESCRIPTION

Jssj bcb x nsns

Citation preview

Scalable infrastructures for personalization

Anne-Marie Kermarrec

A.-M. Kermarrec (Inria)

Inria, France: 8 research centers, 150 research teams

June 2014

A.-M. Kermarrec (Inria) June 2014

A.-M. Kermarrec (Inria)

A cry for personalization

June 2014

A.-M. Kermarrec (Inria)

Why is personalization so difficult?

• Huge volume of data: small portion of interest

• Dynamic interests

• Interesting stuff does not come always from friends

• Classical notification systems do not filter enough or too much

Scalable personalization infrastructures

June 2014

A.-M. Kermarrec (Inria)

KNN computation over large data

Basic building block for many applications

• Similarity search

• Machine learning

• Data mining

• Image processing

• Collaborative filtering

June 2014

A.-M. Kermarrec (Inria)

KNN-based user-centric collaborative filtering

Provide each user with her k closest neighbors

(Users owns a profile, the system has its favorite similarity metric)

Use this topology for

• personalized notifications

• recommendation

AliceBob

Carl

Dave

Ellie

June 2014

A.-M. Kermarrec (Inria)

Dealing with truly big data

Want to scale? Think P2P

June 2014

A.-M. Kermarrec (Inria)

Do not look exhaustively

June 2014

A.-M. Kermarrec (Inria)

The key to scalability in KNN graph construction

Consider a partial set of candidates

Sampling-based approach

June 2014

A.-M. Kermarrec (Inria)

P2P KNN graph construction

Which nodes are close?

How to discover them?

Similarity metric

Sampling

June 2014

A.-M. Kermarrec (Inria)

Which nodes are close?

Model

U(sers) × I(tems) (items)

Profile(u) = vector of liked/shared/viewed items

Cosine similarity metric Jaccard metric

Minimal information: no tag, no user’s input, generic June 2014

A.-M. Kermarrec (Inria)

Each node maintains a set of neighbors (c entries)

Peer exchange

ShuffleP Q

How to discover them: Gossip-based computing

Result random graph

Highly resilient against churn, partition

Small diameter

[JGKVV, ACM TOCS 2007]

June 2014

A.-M. Kermarrec (Inria)

KNN construction

Similarity computation

exchange ofneighbors lists

neighborhoodoptimization1 2

Alice Bob

Carl

DaveEllie

Frank

June 2014

A.-M. Kermarrec (Inria)

Decentralized KNN selection

[FGKL Middleware 2010]

RPS layer providing random sampling

KPS clustering layer gossip-based topology clustering

Interest-based linkRandom link

AliceBob

Carl

Dave

Ellie

AliceBob

Carl

Dave

Ellie

June 2014

A.-M. Kermarrec (Inria)

Convergence

Cycles

c current neighbors versus the c closest

Biasedsampling

Random sampling

June 2014

A.-M. Kermarrec (Inria)

Applications

- Decentralized news recommendation [BFGJK, IPDPS

2013]

- Top-K [BGKL, ACM TODS 2011] [BGK, ACM TOIT 2014]

- Geo recommendation [BKKT, ICDCS 2012]

June 2014

A.-M. Kermarrec (Inria)

DECENTRALIZED NEWS RECOMMENDER

Notification is taking over

June 2014

A.-M. Kermarrec (Inria)

What’s wrong with news feed

Interest are dynamic

Wrong granularity for filtering of classical notification systems

Small portion of the available information is of interest

Interesting stuff does not come always from friends

An implicit notification system

based on collaborative filtering

June 2014

A.-M. Kermarrec (Inria)

WhatsUp in a nutshell

KNN selection

Dissemination

June 2014

A.-M. Kermarrec (Inria)

Dissemination: orientation and amplification

Orientation: to whom?

Exploit: ForwardTo friends

Explore: Forward to random users

Amplification: to how many?

Increase Fanout(Log(n))

DecreaseFanout(1)

June 2014

A.-M. Kermarrec (Inria)

WhatsUp in action on the survey (480 users)

Precision Recall F1-Score Messages

Gossip (f=4) 0.34 0.99 0.51 2.3 M

Cosine-CF 0.50 0.65 0.57 5,9k

Whatsup (f=10)

0.471 0.83 0.60 2,4k

June 2014

A.-M. Kermarrec (Inria)

Orientation (survey)

News items received through a dislike forward

Number of dislikes

0 1 2 3 4

Fraction of liked news

54% 31% 10% 3% 2%

June 2014

A.-M. Kermarrec (Inria)

WhatsUp versus Pub/Sub

Approach Precision Recall F1-Score

Pub/Sub 0.40 1.0 0.58

WhatsUp 0.47 0.83 0.60

June 2014

A.-M. Kermarrec (Inria)

WhatsUp versus cascading

Approach Precision Recall F1-ScoreCascading 0.57 0.09 0.16WhatsUp 0.56 0.57 0.57

June 2014

A.-M. Kermarrec (Inria)

PRIVACY MATTERS

June 2014

A.-M. Kermarrec (Inria)

Privacy issues

During user clustering• Exchange of profile in clear

During item dissemination• Predictive nature of the protocol

Profile Obfuscation

Randomized dissemination

June 2014

A.-M. Kermarrec (Inria)

Privacy

Obfuscation• Does not reveal the exact profile• Does not reveal the least sensitive information

Randomized dissemination• Flips the opinion with a given probability (pf)

June 2014

A.-M. Kermarrec (Inria)

Structure profiles

Private Profile

Compact profile

In clear: Full information about the interests

Aggregate signatures of liked items

June 2014

A.-M. Kermarrec (Inria)

Structure profiles

Private Profile

Compact profile

Filter profile

Item profile

Obfuscated profile

In clear: Full information about the interests

Aggregate signatures of liked items

Interests of users that like similar items

Least sensitive information about interests

Aggregate interests of users that liked it

June 2014

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private profile

Profiles kept locally

June 2014

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private profile Compact profile

News item(forwarded)

+

Profiles kept locally

Profiles exchanged with others

signatureitem

profile

June 2014

A.-M. Kermarrec (Inria)

Obfuscation mechanism

News item(received)

Private Profile Compact Profile Filter Profile

Obfuscated ProfileNews item(forwarded)

x+

Profiles kept locally

Profiles exchanged with others

signatureitem

profileitem

profile

mask ofpopularity

System parameter

June 2014

A.-M. Kermarrec (Inria)

Randomized dissemination

Flips the opinion with a given probability (pf)• Attacker could still learn from the profile• Private profile contains a field with the result of the randomized

decision

Generate Randomized compact profile• Users still use locally their non randomized profile for clustering• Differentially private protocol

June 2014

A.-M. Kermarrec (Inria)

Experimental setup

Simulations and PlanetlabAlternatives• Cleartext profile (CT); • 2DP (DP dissemination and randomized profile for clustering)Metrics• Recommendation: recall/precision• Privacy: Distance between obfuscated profile and real profile; Dataset: Real survey, 120 users on 200 news items (4 instances)

June 2014

A.-M. Kermarrec (Inria)

Impact of randomization

June 2014

A.-M. Kermarrec (Inria)

Impact of randomization

Decrease of precision with increasing pf

June 2014

A.-M. Kermarrec (Inria)

http://131.254.213.98:8080/wup/

Operational prototype

Tested on 500 users @ TrentoRise last year

TRY IT

Take away message

Personalization is needed

Decentralization is healthy

Gossip-based computing is one (the) way to go

June 2014

A.-M. Kermarrec (Inria)

For those who are afraid of P2P

June 2014

A.-M. Kermarrec (Inria)

Hybrid recommendation engine

June 2014

A.-M. Kermarrec (Inria) June 2014

A.-M. Kermarrec (Inria)

HyRec: Taking the best of both worlds

Online KNN selectionRestricted andidate set (k)No data stored at the client

HyRec client: Javascript (widget) running in the browser

June 2014

A.-M. Kermarrec (Inria) June 2014

A.-M. Kermarrec (Inria)

View similarity

June 2014

Dataset Users Items RatingsMovieLens1 943 1700 100,000

MovieLens2 6,040 4000 1,000,000

MovieLens3 69,878 10,000 10,000,000

Digg 59,167 7724 782,807

A.-M. Kermarrec (Inria)

Recommendation quality

June 2014

A.-M. Kermarrec (Inria)

HyRec versus the client load

Impact of HyRec Impact of the client load

Negligible disruption of HyRec 50% load<60ms on smartphone<10ms on laptop

June 2014

A.-M. Kermarrec (Inria)

HyRec versus a centralized recommender

Impact of the request stressImpact of the profile size

June 2014

A.-M. Kermarrec (Inria)

To take away

Personalization is crucial

P2P in a design mindset

Randomization and obfuscation provides a good tradeoff between privacy and quality

June 2014

Thank you

Recommended