19
Word Storms: Multiples of Word Clouds for Visual Comparison of Documents Quim Castellá, Charles Sutton (WWW-2014) Zoltán Szabó Gatsby Unit, Tea Talk Decembert 18, 2014 Zoltán Szabó Words Storms

Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Word Storms: Multiples of Word Clouds for

Visual Comparison of Documents

Quim Castellá, Charles Sutton (WWW-2014)

Zoltán Szabó

Gatsby Unit, Tea Talk

Decembert 18, 2014

Zoltán Szabó Words Storms

Page 2: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Motivation

Vast number of documents on the web.

Need for quick scanning.

Word clouds (Google: 963.000 hits; LDA - 172.000 hits):

One of the most popular generators: Wordle.

Font size = frequency of the word.

Zoltán Szabó Words Storms

Page 3: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Key Problem

Word clouds are difficult to compare visually.

Word storm:

made of word clouds,word cloud = subset of documents,

allows efficient contrasting, comparison of documents.

Goal: visualize an entire corpus.

Zoltán Szabó Words Storms

Page 4: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Cloud Examples

One cloud :=

one document: comparing individual docs,

one track of a conference: ∼ areas,

papers from a given period: ∼ time evolution,

one scientific field (+its subfield): ∼ hierarchical categories.

Zoltán Szabó Words Storms

Page 5: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Guiding Principles

1 Each cloud should represent its own document.

2 Clouds should be easy to compare/contrast.⇒ Co-occuring words: similar

font size, color,

position, orientation.

Zoltán Szabó Words Storms

Page 6: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Creating a Single Cloud: Notations

Word cloud = set of words: W = {w1, . . . ,wM}.

Each word w ∈ W has a

position: pw = (xw , yw),font size: sw , color: cw .

Importance of a word (=:its weight): tf.

W = words with the top M weights.

Zoltán Szabó Words Storms

Page 7: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Creating a Single Cloud

Font size ∝ word weight.

Color, orientation: random.

Position: spiral algorithm (next slide).

Zoltán Szabó Words Storms

Page 8: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Creating a Single Cloud: Spiral Algorithm

Given: word cloud with i − 1 words.

New word w to the desired/random location:If

no intersection with previous words, and

∈ frame, then goto next word.

Else: w is moved outward until a valid position.

Zoltán Szabó Words Storms

Page 9: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Spiral Algorithm: Formally

Zoltán Szabó Words Storms

Page 10: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Creating a Storm

i th document: ui = (uiw): count of word w in the i th doc.

i th word cloud: vi = (Wi , {piw}, {ciw}, {siw}).

Alg-1:

Color: α-channel = idf = log(

|docs||docs containing w |

)

.

⇒ transparent: the word appears in many docs.Locations:

Initialization: spiral method.

Iterate: desired locations := Eclouds[previous locations].

Zoltán Szabó Words Storms

Page 11: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Coordinated Layout: Alg-1

Problem: tends to move words far away from center.

Zoltán Szabó Words Storms

Page 12: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Coordinated Layout: Alg-2 – Objective

Set of documents: u1:N = {u1, . . . ,uN}. Storm: v1:N = {v1, . . . , vN}.

Objective (how well the storm fits the corpus):

fu1:N(v1:N) =

N∑

i ,j=1

[du(ui ,uj)− dv(vi , vj)]2

︸ ︷︷ ︸

similar docs are mapped to similar clouds

+N∑

i=1

c(ui , vi)

︸ ︷︷ ︸

faithful repr. of the own doc

.

First term: MDS. du: Euclidean distance. κ ≥ 0

dv (vi , vj) =∑

w∈Wi∪Wj

(siw − sjw )2 + κ

w∈Wi∩Wj

∥∥piw − pjw

∥∥2

2.

Second term:

c(ui , vi ) =∑

w∈Wi

(uiw − siw )2.

Zoltán Szabó Words Storms

Page 13: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Coordinated Layout: Alg-2 – Objective

Two more penalties (λ > 0, µ > 0):

r(v1:N ) = λ

N∑

i=1

w ,w ′∈Wi

O2i :w ,w ′

︸ ︷︷ ︸

words do not overlap

N∑

i=1

w∈Wi

‖piw‖22

︸ ︷︷ ︸

compact configuration

.

Oi :w ,w ′: minimum distance required to separate

overlapping words (w ,w ′).

Final objective: fu1:N(v1:N) + r(v1:N) → minv1:N

.

Optimization:

homotopy scheme in λ,

fixed subtask: gradient descent.

Zoltán Szabó Words Storms

Page 14: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Coordinated Layout: Combined Algorithm

Iterative algorithm: fast, but not compact.

Gradient method: compact storm, but slow.

In practise: combination gives decent results.

Zoltán Szabó Words Storms

Page 15: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Numerical Illustration

User study: users are better in

outlier document detection,

the discovery of the two most similar documents.

ICML-2012:

visualization of sessions,http://icml.cc/2012/whatson-all/.

Research grant abstract visualization (EPSRC):

1 − 5th = material sciences, 6th = maths.

independent vs. coordinated layout.

Zoltán Szabó Words Storms

Page 16: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

EPSRC programmes: independent clouds

Zoltán Szabó Words Storms

Page 17: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

EPSRC programmes: coordinated storm

Zoltán Szabó Words Storms

Page 18: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Coordinated Storm: Interpretation

(a)-(e) similar: ’material’, ’applications’, ’properties’.

Contrast, absence of words:

’coating’ only in (b) and (d),

no ’material’ in (f).

Informative words (transparency): ’electron’ (a), ’metal’ (b),

’light’ (c), ’crack’ (d), ’composite’ (e), ’problems’ (f).

Zoltán Szabó Words Storms

Page 19: Word Storms: Multiples of Word Clouds for Visual ...szabo/talks/tea_talk_Gatsby/Zoltan_Szabo_te… · Motivation Vast number of documents on the web. Need for quick scanning. Word

Summary

Independent word clouds are difficult to compare.

Word storm:

Similar clouds represent similar documents.Emphasizes the most informative words.

Useful in comparing/contrasting documents.

Source code: http://groups.inf.ed.ac.uk/cup/

wordstorm/wordstorm.html

Zoltán Szabó Words Storms