44
Christian Bizer: Fusing the Web of Data (12/08/2008) 3rd Asian Semantic Web Conference (ASWC 2008) DIST Workshop, Bangkok, Thailand 8 December 2008 Fusing the Web of Data Christian Bizer, Freie Universität Berlin

Christian Bizer, Freie Universität Berlin

  • Upload
    felcia

  • View
    44

  • Download
    1

Embed Size (px)

DESCRIPTION

3rd Asian Semantic Web Conference (ASWC 2008) DIST Workshop, Bangkok, Thailand 8 December 2008 Fusing the Web of Data. Christian Bizer, Freie Universität Berlin. Overview. The Web of Data Linked Data Principles Linked Data Deployment Applications that consume Linked Data Linked Data Fusion - PowerPoint PPT Presentation

Citation preview

Page 1: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

3rd Asian Semantic Web Conference (ASWC 2008)DIST Workshop, Bangkok, Thailand

8 December 2008 

Fusing the Web of Data

Christian Bizer, Freie Universität Berlin

Page 2: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Overview

1. The Web of Data Linked Data Principles

Linked Data Deployment

Applications that consume Linked Data

2. Linked Data Fusion1. The Linking Process

2. Inconsistency Resolution

3. Provenance Tracking and Explanations

Page 3: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

The Classic Web

B C

HTML HTMLHTML

Web Browsers

Search Engines

hyper-links

Single global information space

1. URLs as globally unique IDs

retrieval mechanism

2. HTML as shared content format

3. Hyperlinks

Shortcomings

Content is not well structured

You can not ask expressive queries

You can not process content within applications

A

Page 4: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Linked Data

B C

Thing

typedlinks

A D E

typedlinks

typedlinks

typedlinks

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Use Semantic Web technologies to1. publish structured data on the Web,2. set links between data from one data source

to data within other data sources.

Page 5: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Linked Data Principles

1. Use URIs as names for things.

2. Use HTTP URIs so that people can look up those names.

3. When someone looks up a URI, provide useful RDF information.

4. Include RDF statements that link to other URIs so that they can discover related things.

Tim Berners-Lee 2007

http://www.w3.org/DesignIssues/LinkedData.html

Page 6: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

The RDF Data Model

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri

Page 7: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Data objects are identified with HTTP URIs

pd:cygri

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri = http://richard.cyganiak.de/foaf.rdf#cygri

dbpedia:Berlin = http://dbpedia.org/resource/Berlin

Page 8: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Dereferencing URIs over the Web

dp:Cities_in_Germany

3.405.259dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

pd:cygri

Page 9: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Dereferencing URIs over the Web

dp:Cities_in_Germany

3.405.259dp:population

skos:subject

Richard Cyganiak

dbpedia:Berlin

foaf:name

foaf:based_near

foaf:Personrdf:type

dbpedia:Hamburg

dbpedia:Muenchen

skos:subject

skos:subject

pd:cygri

Page 10: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

The Disco – Hyperdata Browser

Page 11: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Page 12: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2. Linked Data Deployment on the Web

B C

Thing

typedlinks

A D E

typedlinks

typedlinks

typedlinks

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Is this real?

Page 13: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

W3C Linking Open Data Project

Community effort to publish existing open license datasets as Linked Data on the Web

interlink things between different data sources

Page 14: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

LOD Datasets on the Web: May 2007

Over 500 million RDF triples Around 120,000 RDF links between data sources

Page 15: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Example RDF Links

RDF links from DBpedia to other data sources

RDF link from a FOAF profile to DBpedia

<http://dbpedia.org/resource/Berlin> owl:sameAs

<http://sws.geonames.org/2950159> .

<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest

<http://dbpedia.org/resource/Semantic_Web> .

<http://dbpedia.org/resource/Tim_Berners-Lee> owl:sameAs

<http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007> .

Page 16: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

LOD Datasets on the Web: February 2008

Page 17: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

LOD Datasets on the Web: September 2008

> 2 billion RDF triples

> 6 million RDF links

Page 18: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

The Bio2RDF Project

Goals1. Make bioinformatics data available in RDF format on the Web.2. Promote the linked data vision within the bioinformatics community. 3. Answer questions which were not possible or practical to ask before.

Participants Université Laval, Canada Queensland University of Technology, Australia

Page 19: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

The Bio2RDF Cloud

27 data sources

260 million records

2,7 billion RDF triples

Page 20: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

3. Applications

B C

Thing

typedlinks

A D E

typedlinks

typedlinks

typedlinks

Thing

Thing

Thing

Thing

Thing Thing

Thing

Thing

Thing

Search Engines

Linked DataMashups

Linked DataBrowsers

What can I do with this?

Page 21: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Linked Data Browsers

Tabulator Browser (MIT, USA)

Disco Hyperdata Browser (FU Berlin, DE)

OpenLink RDF Browser (OpenLink, UK)

Zitgist RDF Browser (Zitgist, USA)

Humboldt (HP Labs, UK)

Fenfire (DERI, Irland)

Marbles (FU Berlin, DE)

Page 22: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Page 23: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Linked Data Mashups

Domain-specific applications using Linked Data from the Web

Page 24: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

DBtune Slashfacet

Visualizes music-related Linked Data Uses LastFM, MySpace, and BBC data

Page 25: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

DBpedia Mobile

Geospatial entry point into the Web of Data

Starts with DBpedia, Revyu and Flickr data

Page 26: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

DERI Semantic Web Pipes

Page 27: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Web of Data Search Engines

Falcons (IWS, China)

Sindice (DERI, Ireland)

MicroSearch (Yahoo, Spain)

Watson (Open University, UK)

SWSE (DERI, Ireland)

Swoogle (UMBC, USA)

Page 28: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Falcons

Page 29: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Page 30: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Is this good enough?

No.

Page 31: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2. Linked Data Fusion

DataObject 1

DataObject 2

DataObject 3

DataObject 4

DataObject 5

DataObject 6

IntegratedView

Application

B C

owl:sameAs

A

owl:sameAs

Users want an integrated view on all data that is available about an real-world entity!

Page 32: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Linked Data Fusion - Requirements

1. Map data into a single schema so that data can be rendered and queried properly.

2. Smush data from all sources about a single real-world entity while keeping track of information provenance.

3. Resolve inconsistencies in the data by applying different data fusion heuristics.

4. Be able to explain the fusion process Tim Berner-Lee‘s „Oh, yeah?“ button.

Page 33: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Roles in the Linked Data Scenario

Data Publisher1. Publish data itself

2. Set RDF links to other data items describing the same real-world entity.

3. Reuse terms from existing vocabularies or set links to related schemata.

4. Publish metadata about

- provenance

- timeliness

- data license

Client Application1. Map data into single

schema.

2. Smush data from different sources about real-world entity.

3. Resolve inconsistencies in the data.

4. Keep track of information provenance and lineage.

5. Explain fusion process.

Page 34: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2.1 Setting RDF Links

Today: Simple pattern- and graph-matching based techniques used to generate

links.

Usually proprietary code.

There is lots of existing work in database and knowledge representation communities on identity resolution to be used. Rule-based approaches

Distance-based techniques

Probabilistic matching

Supervised and unsupervised learning

Using a wide range of distance metrics

see: Elmagarmid et al: Duplicate Record Detection: A Survey. KaDE, 2007.

Page 35: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Linking Frameworks

Goal: (Semi-)automatically generate RDF Links based on declarative rules.

Ongoing work Oktei Hassanzadeh (University of Toronto): ODDLinker

Andriy Nikolov et al. (Open University): KnoFuss

Julius Volz (Freie Universität Berlin): XXXX

seeAlso: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/ EquivalenceMining

CREATE LINKS owl:sameAs BETWEEN a FROM dbpedia AND b FROM factbook RESTRICT a TO { ?a rdf:type dbpedia-owl:Country } METRIC { STRING_SIMILARITY(a/rdfs:label, b/rdfs:label), NUM_SIMILARITY(a/p:populationEstimate, b/factbook:population_total), NUM_SIMILARITY(a/p:areaKm, b/factbook:area_total) } THRESHOLDS MATCH 0.9 VERIFY 0.7;

Page 36: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Schema Level RDF Links

Today: Simple mappings: owl:equivalentClass

owl:equivalentProperty

rdfs:subClassOf

rdfs:subPropertyOf

UMBEL effort:

Lots of existing work on schema/ontology matching to build on.

Missing: Agreed-upon way to publish more expressive mappingrules on the Web.

Page 37: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2.2 Publish Metadata

Document Metadata Dublin Core, Semantic Web Publishing Vocabulary

Licensing Metadata Creative Commons Licensing Framework

Open Data Commons Public Domain Dedication & Licence (PDDL)

# Metadata and Licensing Information <http://dbpedia.org/data/Alec_Empire> rdf:type foaf:Document ; dc:publisher <http://dbpedia.org/resource/DBpedia> ; dc:date "2007-07-13"^^xsd:date ; dc:rights <http://en.wikipedia.org/wiki/WP:GFDL> .

# The Document Content <http://dbpedia.org/resource/Alec_Empire> rdf:type foaf:Person ; foaf:name "Empire, Alec" ; dbpedia-owl:associatedBand dbpedia:Atari_Teenage_Riot ;

Page 38: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2.3. Provenance and Lineage Tracking

Named Graphs data model part of W3C SPARQL Recommendation

implemented by an increasing number of RDF stores

# TriG Representation of three Named Graphs :G1 { :Monica ex:name "Monica Murphy" . :Monica ex:homepage <http://www.monicamurphy.org> . :Monica ex:email <mailto:[email protected]> .} :G2 { :Monica rdf:type ex:Person . :Monica ex:hasSkill ex:Programming }

:G3 { :G1 swp:assertedBy _:w1 . _:w1 swp:authority :Chris . _:w1 dc:date "2003-10-02"^^xsd:date . :G2 swp:quotedBy _:w2 . _:w2 swp:authority :Chris . _:w2 dc:date "2003-09-03"^^xsd:date . }

Page 39: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2.4. Inconsistency Resolution

There is lots of overlap betweenLOD datasets Places: Dbpedia, Geonames, Riese, …

People: Freebase, LinkedMDB, DBLP, …

Music: Dbpedia, Musicbrainz, Jamendo,..

There are naturally lots of inconsistencies Dbpedia: Person born at date X.

Freebase: Person born at date Y.

Dbpedia: Band album X.

Musicbrainz: Band album Y.

Geonames: City has geo-coordinates

Freebase: City has geo-coordinates

Page 40: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Inconsistency Resolution Strategies

Pass it on. Pass conflicting values to the user and let him decide.

Take the information If value is missing in dataset 1, use value from dataset 2

Trust your friends Prefer information from certain sources.

Cry with the wolfes Choose most common value

Meet in the middle Take the averadge of all values

Keep up to data Use the newest value

SeeAlso: Bleiholder and Naumann: Conflict Handling Strategies in an Integrated Information System. WWW2006.

Page 41: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

2.5. Explain Data Provenance and Fusion Steps

Tim Berner-Lee‘s „Oh, yeah?“ button.

Existing Work: Deborah McGuinness et al: Inference Web: Portable Explanations for the

Web.

Chris Bizer: Web Information Quality Assessment Framework (WIQA)

Page 42: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Example WIQA Explanations

Page 43: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Outlook

Lots of exiting open issues to solve! DIST related technologies will be one of the hot topics

for next years (see for instance WWW2009)

Important for LOD Progress with Publishing Schema Mappings on the Web

Progress with Data Fusion

Linked Data client applications that address all issues mentioned

Please submit such solutions and client applications to the Semantic Web Challenge 2009

Linked Data on the Web (LDOW2009) workshop at WWW2009

IJSWIS Special Issue on Linked Data

Page 44: Christian Bizer, Freie Universität Berlin

Christian Bizer: Fusing the Web of Data (12/08/2008)

Thanks!

References Linking Open Data Project Wiki

http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

Tutorial on How to Publish Linked Data on the Webhttp://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/