Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in...

Die ZBW ist Mitglied der Leibniz-Gemeinschaft

A Data Restore Model

for Reproducibility in Computational Statistics

Daniel Bahls, ZBW, I-Know 2013, Graz, Austria

Outline

1. Motivation – Repeatability in Empirical Research

2. Our Approach – The Data Restore Model

3. Outlook – Status of this Work / Next Steps

Seite 2

Repeatability in Science

• Fundamental criterion – to verify is the job of the community

• Experiments must lead to the same findings• different researchers• under certain constant parameters

• Further• Robustness (w.r.t measuring errors, etc.)• Repeatability vs. Reproducibility vs. Verifiability

Seite 3

Repeatability in Economicsand the infamous case of Rogoff and Reinhard

Seite 4

Improving Review Processes

Seite 5

- Justin Wolfers, Betsey Stevenson, economists at University of Michigan

....so we need access to the data

If we try it all on our own

and cannot reproduce the results,

what does it mean?

McCullough – Experiences & Recommendations

Seite 6

McCullough – Requirements & Experiences

Seite 7

McCullough – Requirements & Experiences

Seite 8

Sweave – Literate Programming for Statistics

Seite 9

Sweave – Literate Programming for Statistics

Seite 10

Data Publishing in Economics / Social Sciences

Different disciplines have different challenges

Characteristics of empirical research:

• sensitive / protected data

• distributed external data sources

Seite 11

Data Sharing

submit data bundles to 3rd-party repositories?

Data ManagementThe Black Box Approach

data reviewcuration legal situation

re-use transparency repeatability

Seite 12

a data set copy(some resource bundle)

Statistical Data on the Semantic Web

Seite 13

Outline

Seite 14

Data Restore Model

Seite 15

Spreadsheet

obs data set

Data Restore Model

Seite 16

Spreadsheet

obs data set

DataSet

UserDataSet

Data Items

Data Itemsfrom own survey

includesData

external dataset

buildScript

No gaps

Incentive

Seite 18

Source: EuroStatDataset: Household XZVersion: 0.2Published: Jan 2009[read more]

Integration with Research Environments

Seite 19

Seite 20

Review and Re-use

Seite 21

Client

Source CodeRepository

Archive DArchive CArchive B

Archive A

Code andData Templates

Authenticate & Request Data

Data Infrastructure Concept

• One source per data set

transparency, curation by highest expertise

• Data protection

make data publishing possible for all scenarios

• Data and code integration

one-click-solution – no manual efforts for replication attempts

• Precise Citation

traceable data provenance

Seite 22

Incentives for the Research Community

• Transparency increases trust:

no gaps – trust – incentive

• Easy re-use:

the research models applied live longer

• More impact:

more citation

Seite 23

Incentives for the Research Community

• Material for tutorials:

Students learn computational research in practice

• Research is more efficient:

Easier to understand and pick up the research of others

• Secured Knowledge:

Replication attempts in different research environments and context

discussion, inspiration, innovation

“Non-Findings” may get more recognition

Seite 24

Outline

Seite 25

What we are currently working on

Seite 26

The Rogoff and Reinhard / Herndon case

• apply Data Restore Model

• add semantic data documentation (partly available as RDF already)

• model by Data and Code ontology

Data and Code Ontology

Seite 27

Data and Code

System Environment

Resources

Replication Attempts

ExperimentSetup

• Maven• Make

• Build

• Virtualisation

• Emulation

• Linked Science

• Social M

Data References

• Semantic Coding?

What we are currently working on

Seite 28

The Koenker Zeileis case

• Model relations between Data and Code instances

protectedpublic use file

figures

data set

transformationby code

The Koenker Zeileis case

Data Access and Retrieval

Next Steps

Seite 30

1. Challenge, Goals, Requirements

2. The Data Restore Model

3. Semantic Linkup / Data Annotation

4. Data Retrieval and Reuse

5. System Architecture

6. Validation / Evaluation

Thank you

Daniel Bahls, ZBWd.bahls@zbw.eu

So there are still gaps

Examples:

•data set is titled “EU Unemployment statistics 2012, EuroStat”• age class? seasonal adjustments?

•Executing the code does not produce the results• wrong data? system environment? error?• cf. Herndon’s replication of Rogoff/Reinhard research

•DOI does not specify file format

Seite 32

Data and Code Ontology

Seite 33

observation string value

data ref

default value

for_stata

for_spss

Such relationship can be stated within the semantic model

Proxy Relations

Dataset foreconomic growth(GDP or the like)

Dataset forAluminium

Price Index

Describes the proxy relation: - details on correlation

- best practices - frequency of use

hasProxyRel

Die ZBW ist Mitglied der Leibniz-Gemeinschaft A Data Restore Model for Reproducibility in...

Documents

Examensarbeit Anja Bahls über den Europäischen Wettbewerb

Instrumentalanalytik von Lebensmitteln - BFH: Gesundheit · Repeatability (Wiederholbarkeit, Wiederholpräzision) Reproducibility (Nachvollziehbarkeit , Vergleichspräzision)

„Über sieben Brücken musst Du gehn‘,…“ · Die ZBW ist Mitglied der Leibniz-Gemeinschaft „Über sieben Brücken musst Du gehn‘,…“ Erfahrungsbericht zu Aufbau und

ahn Baden- ärzte Württemberg blatt...Dr.ElmarHellwig S_1.indd 1 01.02.2007 18:43:23 3 3/2006 ZBW Parkettwechsel.Kaumist derDurchbruchbei der sogenannten Gesundheitsreformvon den

Tag der offenen Tür | Basislehrjahr Informatik | ZbW

Unsere Zukunft gestalten - Oiltanking...in den Krisenjahren 2008/2009, als viele strauchelten, blieb unser Unternehmen stabil. Und wir haben einen klaren Plan, wie wir Marquard & Bahls

Informationskompetenz im Wissenschaftsbereich€¦ · “Science 2.0 in Bibliotheken – Ein neues Arbeitsfeld erfordert neue Kompetenzen!” (Zitat . Veranstaltung. BIB und ZBW,

Die Deutsche Zentralbibliothek für Wirtschaftswissenschaften (ZBW)

Social Media Richtlinie - Oiltanking€¦ · Social-Media-Plattformen für alle Beteiligten haben können. Marquard & Bahls erkennt den Nutzen von Social-Media-Plattformen und den

die nacht des wissens in der zbw - Startseite | ZBW · in Zusammenarbeit mit dem German Institute of Global and Area Studies (GIGA) Dr. Birte Pfeiffer (GIGA) Dr. Yun Schüler-Zhou

Interkulturelles Management - GBV › dms › zbw › 60667621X.pdfInterkulturelles Management Mit Beispielen aus Vietnam, Chinajapan, Russland und den Golf Staaten Von Professor Dr

strategie der zbw 2015-2020...STRATEGIE 2015-2020 3 Die ZBW – Deutsche Zentralbibliothek für Wirtschaftswissenschaften – Leibniz-Informations-zentrum Wirtschaft ist, bezogen auf

Schulgebäude Referenzen€¦ · ZBW – Zentrum für Bildung und Weiterbildung..... 23 Inselschule Fehmarn ... Januar 2016 – September 2017 ÖPP-Verfahren 13. 21266 Jesteburg Landkreis

Projektbeschreibung · nestor-Prakikertag 2013, ZBW Hamburg Projektbeschreibung Ein landesweites Angebot für die Langzeitarchivierung digitaler Materialien für Gedächtnisorganisationen

Jahresbericht 2017 der ZBW – Leibniz-Informationszentrum ... · Infografik Internationalität der ZBW Programmbereich Digitale Informationsinfrastrukturen ... um die digitalen Dienste

Wirtschaftswissenschaftliche Bibliotheken · 2013. 10. 25. · 9. Bibliothek des Instituts für Weltwirtschaft, Deutsche Zentralbibliothek für Wirtschaftswissenschaften, ZBW (206)

Www.zbw.eu Neues von Economists Online (NEEO) – Zwischenstand eines Open-Access- Projekts Ralf Flohr, ZBW Hamburg, 11.09.08

The ZBW is a member of Leibniz-Gemeinschaft Willkommen bei der EconBiz Roadshow!

On Quality in Open Science - OASPA Videos. peters_what is... · Die ZBW ist Mitglied der Leibniz-Gemeinschaft. On Quality in Open Science Isabella Peters, ZBW – Leibniz Information

Universität Bremen 2006 - uni-bremen.deelib.suub.uni-bremen.de/diss/docs/00010543.pdf · weak lines. The use of the Galatry profile improves the reproducibility in the line core