39
How many solutions does it take to change the face of research data? Todd Vision Dryad Digital Repository University of North Carolina at Chapel Hill KE Workshop 14-15 November 2011 Bonn, Germany

Knowledge Exchange, Nov 2011, Bonn

Embed Size (px)

DESCRIPTION

Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the WaveTowards a four country action programme on research data": http://www.knowledge-exchange.info/Default.aspx?ID=469

Citation preview

Page 1: Knowledge Exchange, Nov 2011, Bonn

How many solutions does it take to change the face of research data?

Todd Vision Dryad Digital Repository

University of North Carolina at Chapel Hill

KE Workshop 14-15 November 2011 Bonn, Germany

Page 2: Knowledge Exchange, Nov 2011, Bonn

“Es sollte nur ein Magazin der Kunst in der Welt sein wo der Künstler seine Kunstwerke nur hinzugeben hätte um zu nehmen was er brauchte”

“There ought to be in the world a repository of art, to which the artist need only bring his artworks in order to take what he needed”

Beethoven, letter to publisher F.A. Hoffmeister, 15 January 1801

Page 3: Knowledge Exchange, Nov 2011, Bonn

Open dissection of research: ���the Beethoven Repository

http://rockethub.com/projects/3755-open-dissection-of-research

Page 4: Knowledge Exchange, Nov 2011, Bonn
Page 5: Knowledge Exchange, Nov 2011, Bonn

Source: Publishing Research Consortium, http://publishingresearch.net

n=3824

Page 6: Knowledge Exchange, Nov 2011, Bonn

Info

rmat

ion

Con

tent

Time

Time of publication

Specific details

General details

Accident

Retirement or career change

Death

(Michener et al. 1997)

Page 7: Knowledge Exchange, Nov 2011, Bonn

Henry Oldenburg

Page 8: Knowledge Exchange, Nov 2011, Bonn

Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.

Page 9: Knowledge Exchange, Nov 2011, Bonn

Transparency

Page 10: Knowledge Exchange, Nov 2011, Bonn

Failure of peer-to-peer data sharing

Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.

“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied

Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.

Page 11: Knowledge Exchange, Nov 2011, Bonn

News alert: scientists are human “We related the reluctance to share research data for reanalysis to 1148

statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance”.

Wicherts et al. (2011) doi:10.1371/journal.pone.0026828

Not shared Shared

Page 12: Knowledge Exchange, Nov 2011, Bonn

Lang GI, Botstein D (2011) PLoS ONE doi:10.1371/journal.pone.0025290

101 pages!

Page 13: Knowledge Exchange, Nov 2011, Bonn

Joint Data Archiving Policy (JDAP)

Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.

As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.

Authors may elect to embargo access to the data for a period up to a year after publication.

Exceptions may be granted at the discretion of the editor, especially for sensitive information.

Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.

Page 14: Knowledge Exchange, Nov 2011, Bonn

Infrastructure

Page 15: Knowledge Exchange, Nov 2011, Bonn
Page 16: Knowledge Exchange, Nov 2011, Bonn

published data (with article citation)

published article(with data citation)

DRYAD

JOURNAL

prepare manuscript and related data files

submit manuscript

editor

manuscript review

curation

send articledescription

Dryad data package

accepted?

yesno

send data identifier (DOI)

author

accepted?

data curator

upload data

Page 17: Knowledge Exchange, Nov 2011, Bonn

See poster from Brian Hole

Page 18: Knowledge Exchange, Nov 2011, Bonn

Heather Piwowar

Page 19: Knowledge Exchange, Nov 2011, Bonn

Survey of authors

What are the policies of your funder as they apply to online public archiving? (n=983)

1% Forbids

21% Recommends 9% Requires 40% No policy

26% I don’t know 3% Other

Page 20: Knowledge Exchange, Nov 2011, Bonn

Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1

Data policies among bioscience journals

n=70

IF=3.6

IF=4.5

IF=6.0

Page 21: Knowledge Exchange, Nov 2011, Bonn

Reuse

Page 22: Knowledge Exchange, Nov 2011, Bonn
Page 23: Knowledge Exchange, Nov 2011, Bonn

Tracking data reuse

Piwowar, Carlson, Vision, unpublished

Page 24: Knowledge Exchange, Nov 2011, Bonn

H. Piwowar, J. Carlson, T. Vision, unpubl.

Page 25: Knowledge Exchange, Nov 2011, Bonn

H. Piwowar, unpubl.

Page 26: Knowledge Exchange, Nov 2011, Bonn

Incentives

Page 27: Knowledge Exchange, Nov 2011, Bonn

Does sharing imply that it need be altruistic?

Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.

•  For a set of 85 cancer microarray clinical trials   48% had publicly available data

  These received 85% of the article citations   Independent of journal impact factor,

publication date, author nationality

Page 28: Knowledge Exchange, Nov 2011, Bonn

Taxonomy of data archiving benefits

Modified from Beagrie et al. (2009) Keeping Research Data Safe 2

Direct Verification of published research Preserving accessibility to data Allowing reuse and repurposing of data Discoverability of data

Indirect (costs avoided) Redundant data collection Inefficient legacy data curation Burden of sharing-upon-request Opportunity cost of science not done

Near term Protection against personnel turnover Availability for review and validation

Long term Secure long-term stewardship Increased impact per publication

Private Increased citations New collaborations New research opportunities Fulfilling funding mandates

Public More efficient use of research dollars Public trust in science Educational opportunities Improved methodologies More informed policy

28

Page 29: Knowledge Exchange, Nov 2011, Bonn

Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze Attenenti alla Mecanica & I Movimenti Locali. Elsevier

Page 30: Knowledge Exchange, Nov 2011, Bonn

Funding

Page 31: Knowledge Exchange, Nov 2011, Bonn

Costs

•  Moderate economies of scale are required   At 10K packages/yr, <$50/deposit, depending on curation

•  What are the costs for SOM?   Journal of Clinical Investigation: $300 flat fee

  Ecological Archives: $250 <10Mb, more fees beyond that

  FASEB: $100 per file

Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010

Page 32: Knowledge Exchange, Nov 2011, Bonn

What is the return on investment?

•  A rigorous framework is lacking   But we can look at comparators

•  Marginal cost of data archiving   $50/article is <2% of of publication costs (>$2.5K)

  And 0.2% of grant costs/article (~$25K)

•  Is the data worth 2% of the research investment?   Using DNA microarray data in GEO as a model

  2,711 submissions in 2007

  Data reused by 3rd parties in >1,150 articles

Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285

Page 33: Knowledge Exchange, Nov 2011, Bonn

Training

Page 34: Knowledge Exchange, Nov 2011, Bonn

Building solutions

Page 35: Knowledge Exchange, Nov 2011, Bonn

DataONE network

Member Nodes •  diverse institutions •  serve local community •  provide resources for

managing their data •  retain copies of data

Coordinating Nodes •  retain complete

metadata catalog •  indexing for search •  network-wide services •  ensure content

availability (preservation)

•  replication services

Three major components for flexibility, scalability and sustainability

Investigator Toolkit

Page 36: Knowledge Exchange, Nov 2011, Bonn

Concluding thoughts

•  Archiving is essential •  Journals and learned societies will be at

least as important as institutions

•  Funders cannot be shy about policy, and must drive the marketplace

•  We can leverage for data lots of things that work well for traditional publications

•  International cooperation is a must

Page 37: Knowledge Exchange, Nov 2011, Bonn

•  http://datadryad.org •  http://blog.datadryad.org •  http://datadryad.org/wiki

•  http://code.google.com/p/dryad •  [email protected] •  @datadryad •  Dryad

Page 38: Knowledge Exchange, Nov 2011, Bonn

Images and sources 1.© Yael Fitzpatrick and AAAS, http://www.sciencemag.org/site/special/data/ScienceData-hi.pdf 2. Beethoven mit der Missa solemnis, by Joseph Stieler; photo CC BY-NC-SA 2.0 Taran

Rampersad Letter from Beethoven to Franz Anton Hoffmeister, © Beethoven-Haus Bonn 3. The Wikipedia Lesson of Dr Nicolaes Tulp, by Alasdair Forrest, http://

alasdairforrest.posterous.com/the-curious-case-of-the-changing-citation 4. © National Evolutionary Synthesis Center (http://nescent.org) 5. © Publishing Research Consortium, source: http://publishingresearch.net 6. After Michener et al. (1997) Ecological Applications 7(1):330–342. 7. Title page of Philosophical Transactions of the Royal Society, Vol. 1, 1665, public domain;

portrait of Henry Oldenburg, source: http://en.wikipedia.org/wiki/File:Henry_Oldenburg.jpg, public domain

8. source: Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226, public domain.

9. CC BY-NC-ND 2.0 by lebatihem, source:http://www.flickr.com/photos/lebatihem/2154686107/

11. CC-BY Wicherts JM, Bakker M, Molenaar D source: Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11): e26828, 2011. doi:10.1371/journal.pone.0026828

Page 39: Knowledge Exchange, Nov 2011, Bonn

12. CC-BY, Lang GI, Botstein D source: A Test of the Coordinated Expression Hypothesis for the Origin and Maintenance of the GAL Cluster in Yeast. PLoS ONE 6(9): e25290, 2011. doi:10.1371/journal.pone.0025290

13. CC BY-SA 2.0 avlxyz, source:http://www.flickr.com/photos/avlxyz/4589977933/ 16. courtesy of Peggy Schaeffer 20. CC-BY Piwowar HA, Chapman WW source: A review of journal policies for sharing

research data, Nature Precedings, hdl:10101/npre.2008.1700.1 21. CC BY 2.0, sashafatcat source:http://www.flickr.com/photos/sashafatcat/2381412445 23, 24. CC-BY H Piwowar, J Carlson, T Vision, unpublished 25. CC-BY H Piwowar, source: http://researchremix.wordpress.com/2011/05/28/dear-nsf-

reviewers/ 26. CC BY-ND 2.0 Sivaprakash Kannan source: http://www.flickr.com/photos/sivaprakash/

294755142/ 28. After: Beagrie N, Lavoie B, Woollard M (2010) Keeping Research Data Safe 2, http://

www.jisc.ac.uk/media/documents/publications/reports/2010/keepingresearchdatasafe2.pdf 29. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze

Attenenti alla Mecanica & I Movimenti Locali. Elsevier. Source: original unknown. 30. CC BY-NC-SA 2.0 Coralie Mercer, source: http://www.flickr.com/photos/koalie/394934841/ 33. CC BY-NC-ND 2.0 by www.english.school.nz, source: http://www.flickr.com/photos/iei/

2904115612/ 34. Liberty ship under construction, source:

http://en.wikipedia.org/wiki/File:Liberty_ship_construction_04_bottom.jpg, public domain