28
Sergej Zerr , Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, Portland, USA

Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova

35th Annual International ACM SIGIR Conference on Research & Development on Information Retrieval, Portland, USA

Page 2: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

• In the modern world people are producing a large amount of visual content

• Photo sharing is one of the most popular activities in social applications

235th SIGIR Conference Portland, USA 12/08/12

Page 3: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Such images can be of a highly sensitive nature, disclosing many details of

the users' private sphere. For example photos showing weddings, family

holidays and private parties.

Privacy directed

Search

and

Diversification

Support sharing

Decision

335th SIGIR Conference Portland, USA 12/08/12

Page 4: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Technical challenge

Private

Public

Work Sea Winter Water

Automatic privacy directed image detection and search

435th SIGIR Conference Portland, USA 12/08/12

Page 5: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

535th SIGIR Conference Portland, USA 12/08/12

Page 6: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Overview: Sensitive Information on Web

• Colleges keep track of student online activities. The posting of

personal information by students has consequences1,2

• Only a minimal percentage of users changes the highly permeable

privacy preferences (4000 students)3

~90% contain an image, birthday, real name; 40% phone

number

• Even people who did not publish any compromising information,

can leave discoverable footprints (mark-a-friend in Facebook)

1. V. Schleswig-Holstein. Statistische Erfassung zum Internetverhalten Jugendlicher und Heranwachsender. In A study of the

consumer organization in Schleswig-Holstein, Germany, March 2010.

2. S. B. Barnes. A privacy paradox: Social networking in the united states. First Monday, 11(9), Sept. 2006

3. Gross and A. Acquisti. Information revelation and privacy in online social networks. In WPES '05.

.

635th SIGIR Conference Portland, USA 12/08/12

Page 7: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Overview: State of the Art

• Privacy prediction: Based on tags and manually defined user privacy profile

(Vyas et al. 2009, Ahern et al. 2007)

•Access control policies: Access to parts of social graph, use of tags and FOAF relations

(Felt et al. 2008, Au Yeung et al. 2009)

• Image analysis: Textual features in Web2.0

(Figueiredo et al. 2009, San Pedro et al. 2009)

Visual features for photo quality

(Yeh et al. 2010)

735th SIGIR Conference Portland, USA 12/08/12

Page 8: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

835th SIGIR Conference Portland, USA 12/08/12

Page 9: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

DATA

• Gathering average community notion of privacy

• We crawled “most recently uploaded” Flickr photos (2 Months)

• Started a social annotation game (over the course of 2 weeks)

• 81 users (colleagues, social networks , forum users) , 6 teams

9

„Private are photos which have to do with the private

sphere (like self portraits, family, friends, your home) or

contain objects that you would not share with the entire

world (like a private email). The rest is public. In case no

decision can be made, the picture should be marked as

undecidable."

35th SIGIR Conference Portland, USA 12/08/12

Page 10: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

DATA: Inter Rater Agreement

• 37,535 images were judged, each by at least two persons

• 70% were labeled public or undecidable by all annotators

• 13% were labeled private by all annotators, 28% by at least one person

• 4,701 private, 27,405 public labels were assigned.

• Inter-Rater Agreement for 100 photos and 36 users: Fleiss kappa=0.6

1035th SIGIR Conference Portland, USA 12/08/12

Page 11: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

1135th SIGIR Conference Portland, USA 12/08/12

Page 12: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Features

• Frontal face detection: faces associated with higher privacy

• Edges: Long coherent edges correspond to artificial environments

• Colors: fewer dominant colors correspond to professional photos

• SIFT - Scale Invariant Feature Transform: Objects/Regions detection

• Text: Tags, image title

• Brightness/Sharpeness/Profile faces did not show strong discriminative

properties

1235th SIGIR Conference Portland, USA 12/08/12

Page 13: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Features: Colors

13

Public

Private

We determined most discriminative colors for each

class using Mutual Information Theory

Example of a public photo with a few dominant colors and a private photo.

35th SIGIR Conference Portland, USA 12/08/12

Page 14: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Features: Edges

14

Example of a public photo dominated by incoherent edges and a private photo of a

working place with a mix of coherent and incoherent edges.

35th SIGIR Conference Portland, USA 12/08/12

Page 15: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Features: SIFT

1535th SIGIR Conference Portland, USA 12/08/12

Page 16: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Features: Text

16

Family, Emotions, Sentiment Nature, Inanimate

35th SIGIR Conference Portland, USA 12/08/12

Page 17: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

1735th SIGIR Conference Portland, USA 12/08/12

Page 18: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Classification

18

• We used SVM classifier from SVMLight library

• We converted Edges and Colors histograms to feature vectors

• By SIFT and Text features each object or term is a dimension

• We normalized values in each dimension into the range [0,1] using

Platt’s sigmoid method

35th SIGIR Conference Portland, USA 12/08/12

Page 19: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Classification

19

• Labeled images: 4,701 private, 27,405 public

• Balanced set of 4,701 private and 4,701 randomly selected public images

• We used 60% as training data and 40% as test data

• We used Precision-Recall Curves and Break Even Points as quality

measure

• We tested visual, textual features and their combinations

35th SIGIR Conference Portland, USA 12/08/12

Page 20: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Textual Features P/R Curve

20

The pictures we used for classification experiments, contained good quality textual metadata (e.g titles and at

least three English tags). Thus the text features could provide a short but concise summary of the image

content and result in a BEP of 0.78.

35th SIGIR Conference Portland, USA 12/08/12

Page 21: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Visual Features P/R Curves

21

• The occurrence of faces in photos is an intuitive indicator for privacy, reflected by a

BEP of 0.63 for the face feature

• The edge-direction coherence feature achieves a BEP of 0.65

• SIFT features outperform all of the other visual features (BEP = 0.70)

35th SIGIR Conference Portland, USA 12/08/12

Page 22: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Feature Combinations P/R Curves

22

The combination of the visual and textual features leads to a BEP of 0.80, showing that

textual and visual features can complement each other in the privacy classification task

However, classification with only visual features alone also produces promising results, and

can be useful if no or insufficient textual annotations are available as is the case for many

photos on the web.

35th SIGIR Conference Portland, USA 12/08/12

Page 23: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

2335th SIGIR Conference Portland, USA 12/08/12

Page 24: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

24

Privacy Directed Search

35th SIGIR Conference Portland, USA 12/08/12

Page 25: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

25

PicAlert!

35th SIGIR Conference Portland, USA 12/08/12

Page 26: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Outline

• INTRODUCTION: Related Work

• DATA: Selection&Annotation

• FEATURES: Textual&Visual

• EVALUATION: Classification Model

• PRIVACY EXPLORER: Detection&Search

• FUTURE WORK: Ideas&Directions

2635th SIGIR Conference Portland, USA 12/08/12

Page 27: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

Conclusion and Future Work

• We applied classification using various visual and textual features

• Classification models were trained on a large-scale dataset with privacy

assignments obtained through a social annotation game

• Approach of using only visual features shows applicable results and can be

applied in scenarios where no textual annotation is available (e.g. personal

photo collections or mobile phone pictures)

Future Work:

• Using collaborative filtering for personalization

• Using other features like Color-Sift. Using context (mobile sensors)

• Larger user studies / annotation games / temporal developments study

• Integration into popular Web2.0 applications

2735th SIGIR Conference Portland, USA 12/08/12

Page 28: Sergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidoval3s.de/...classification_and_search_final_noanim.pdfSergej Zerr, Stefan Siersdorfer, Jonathan Hare, Elena Demidova 35th

PicAlert: http://l3s.de/picalert/

Sergej Zerr, Stefan Siersdorfer, Jonathon Hare

[email protected]

Data

Features

Search & Diversification Evaluation

Thank you!Special thanks to ACM SIGIR,

for providing the travelling grant!