Upload
tjvision
View
545
Download
3
Embed Size (px)
DESCRIPTION
Keynote presented to KE workshop held in conjunction with the release of the report "A Surfboard for Riding the WaveTowards a four country action programme on research data": http://www.knowledge-exchange.info/Default.aspx?ID=469
Citation preview
How many solutions does it take to change the face of research data?
Todd Vision Dryad Digital Repository
University of North Carolina at Chapel Hill
KE Workshop 14-15 November 2011 Bonn, Germany
“Es sollte nur ein Magazin der Kunst in der Welt sein wo der Künstler seine Kunstwerke nur hinzugeben hätte um zu nehmen was er brauchte”
“There ought to be in the world a repository of art, to which the artist need only bring his artworks in order to take what he needed”
Beethoven, letter to publisher F.A. Hoffmeister, 15 January 1801
Open dissection of research: ���the Beethoven Repository
http://rockethub.com/projects/3755-open-dissection-of-research
Source: Publishing Research Consortium, http://publishingresearch.net
n=3824
Info
rmat
ion
Con
tent
Time
Time of publication
Specific details
General details
Accident
Retirement or career change
Death
(Michener et al. 1997)
Henry Oldenburg
Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226.
Transparency
Failure of peer-to-peer data sharing
Wicherts and colleagues requested data from from 141 articles in American Psychological Association journals.
“6 months later, after … 400 emails, [sending] detailed descriptions of our study aims, approvals of our ethical committee, signed assurances not to share data with others, and even our full resumes…” only 27% of authors complied
Wicherts, J.M., Borsboom, D., Kats, J., & Molenaar, D. (2006). The poor availability of psychological research data for reanalysis. American Psychologist, 61, 726-728.
News alert: scientists are human “We related the reluctance to share research data for reanalysis to 1148
statistically significant results reported in 49 papers published in two major psychology journals. We found the reluctance to share data to be associated with weaker evidence (against the null hypothesis of no effect) and a higher prevalence of apparent errors in the reporting of statistical results. The unwillingness to share data was particularly clear when reporting errors had a bearing on statistical significance”.
Wicherts et al. (2011) doi:10.1371/journal.pone.0026828
Not shared Shared
Lang GI, Botstein D (2011) PLoS ONE doi:10.1371/journal.pone.0025290
101 pages!
Joint Data Archiving Policy (JDAP)
Data are important products of the scientific enterprise, and they should be preserved and usable for decades in the future.
As a condition for publication, data supporting the results in the article should be deposited in an appropriate public archive.
Authors may elect to embargo access to the data for a period up to a year after publication.
Exceptions may be granted at the discretion of the editor, especially for sensitive information.
Whitlock, M. C., M. A. McPeek, M. D. Rausher, L. Rieseberg, and A. J. Moore. 2010. Data Archiving. American Naturalist. 175(2):145-146.
Infrastructure
published data (with article citation)
published article(with data citation)
DRYAD
JOURNAL
prepare manuscript and related data files
submit manuscript
editor
manuscript review
curation
send articledescription
Dryad data package
accepted?
yesno
send data identifier (DOI)
author
accepted?
data curator
upload data
See poster from Brian Hole
Heather Piwowar
Survey of authors
What are the policies of your funder as they apply to online public archiving? (n=983)
1% Forbids
21% Recommends 9% Requires 40% No policy
26% I don’t know 3% Other
Piwowar HA, Chapman WW (2008) A review of journal policies for sharing research data. Presented at ELPUB2008, Nature Precedings hdl:10101/npre.2008.1700.1
Data policies among bioscience journals
n=70
IF=3.6
IF=4.5
IF=6.0
Reuse
Tracking data reuse
Piwowar, Carlson, Vision, unpublished
H. Piwowar, J. Carlson, T. Vision, unpubl.
H. Piwowar, unpubl.
Incentives
Does sharing imply that it need be altruistic?
Piwowar H, et al. (2007) Sharing Detailed Research Data Is Associated with Increased Citation Rate. PLoS ONE 2(3): e308.
• For a set of 85 cancer microarray clinical trials 48% had publicly available data
These received 85% of the article citations Independent of journal impact factor,
publication date, author nationality
Taxonomy of data archiving benefits
Modified from Beagrie et al. (2009) Keeping Research Data Safe 2
Direct Verification of published research Preserving accessibility to data Allowing reuse and repurposing of data Discoverability of data
Indirect (costs avoided) Redundant data collection Inefficient legacy data curation Burden of sharing-upon-request Opportunity cost of science not done
Near term Protection against personnel turnover Availability for review and validation
Long term Secure long-term stewardship Increased impact per publication
Private Increased citations New collaborations New research opportunities Fulfilling funding mandates
Public More efficient use of research dollars Public trust in science Educational opportunities Improved methodologies More informed policy
28
Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze Attenenti alla Mecanica & I Movimenti Locali. Elsevier
Funding
Costs
• Moderate economies of scale are required At 10K packages/yr, <$50/deposit, depending on curation
• What are the costs for SOM? Journal of Clinical Investigation: $300 flat fee
Ecological Archives: $250 <10Mb, more fees beyond that
FASEB: $100 per file
Beagrie N, Eakin-Richards L, Vision TJ (2009) Business models and cost estimation: Dryad repository case study. iPRES 2010
What is the return on investment?
• A rigorous framework is lacking But we can look at comparators
• Marginal cost of data archiving $50/article is <2% of of publication costs (>$2.5K)
And 0.2% of grant costs/article (~$25K)
• Is the data worth 2% of the research investment? Using DNA microarray data in GEO as a model
2,711 submissions in 2007
Data reused by 3rd parties in >1,150 articles
Vision (2011) Open data and social contract of scientific publishing. BioScience, 60(5):330-330 Piwowar H, Vision TJ, Whitlock MC (2011) Data archiving is a good investment. Nature 473:285
Training
Building solutions
DataONE network
Member Nodes • diverse institutions • serve local community • provide resources for
managing their data • retain copies of data
Coordinating Nodes • retain complete
metadata catalog • indexing for search • network-wide services • ensure content
availability (preservation)
• replication services
Three major components for flexibility, scalability and sustainability
Investigator Toolkit
Concluding thoughts
• Archiving is essential • Journals and learned societies will be at
least as important as institutions
• Funders cannot be shy about policy, and must drive the marketplace
• We can leverage for data lots of things that work well for traditional publications
• International cooperation is a must
• http://datadryad.org • http://blog.datadryad.org • http://datadryad.org/wiki
• http://code.google.com/p/dryad • [email protected] • @datadryad • Dryad
Images and sources 1.© Yael Fitzpatrick and AAAS, http://www.sciencemag.org/site/special/data/ScienceData-hi.pdf 2. Beethoven mit der Missa solemnis, by Joseph Stieler; photo CC BY-NC-SA 2.0 Taran
Rampersad Letter from Beethoven to Franz Anton Hoffmeister, © Beethoven-Haus Bonn 3. The Wikipedia Lesson of Dr Nicolaes Tulp, by Alasdair Forrest, http://
alasdairforrest.posterous.com/the-curious-case-of-the-changing-citation 4. © National Evolutionary Synthesis Center (http://nescent.org) 5. © Publishing Research Consortium, source: http://publishingresearch.net 6. After Michener et al. (1997) Ecological Applications 7(1):330–342. 7. Title page of Philosophical Transactions of the Royal Society, Vol. 1, 1665, public domain;
portrait of Henry Oldenburg, source: http://en.wikipedia.org/wiki/File:Henry_Oldenburg.jpg, public domain
8. source: Bumpus HC (1898) The Elimination of the Unfit as Illustrated by the Introduced Sparrow, Passer domesticus. Biological Lectures from the Marine Biological Laboratory: 209-226, public domain.
9. CC BY-NC-ND 2.0 by lebatihem, source:http://www.flickr.com/photos/lebatihem/2154686107/
11. CC-BY Wicherts JM, Bakker M, Molenaar D source: Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results. PLoS ONE 6(11): e26828, 2011. doi:10.1371/journal.pone.0026828
12. CC-BY, Lang GI, Botstein D source: A Test of the Coordinated Expression Hypothesis for the Origin and Maintenance of the GAL Cluster in Yeast. PLoS ONE 6(9): e25290, 2011. doi:10.1371/journal.pone.0025290
13. CC BY-SA 2.0 avlxyz, source:http://www.flickr.com/photos/avlxyz/4589977933/ 16. courtesy of Peggy Schaeffer 20. CC-BY Piwowar HA, Chapman WW source: A review of journal policies for sharing
research data, Nature Precedings, hdl:10101/npre.2008.1700.1 21. CC BY 2.0, sashafatcat source:http://www.flickr.com/photos/sashafatcat/2381412445 23, 24. CC-BY H Piwowar, J Carlson, T Vision, unpublished 25. CC-BY H Piwowar, source: http://researchremix.wordpress.com/2011/05/28/dear-nsf-
reviewers/ 26. CC BY-ND 2.0 Sivaprakash Kannan source: http://www.flickr.com/photos/sivaprakash/
294755142/ 28. After: Beagrie N, Lavoie B, Woollard M (2010) Keeping Research Data Safe 2, http://
www.jisc.ac.uk/media/documents/publications/reports/2010/keepingresearchdatasafe2.pdf 29. Galilei, Galileo (1638) Discorsi e dimostrazioni matematiche, intorno à due nuove scienze
Attenenti alla Mecanica & I Movimenti Locali. Elsevier. Source: original unknown. 30. CC BY-NC-SA 2.0 Coralie Mercer, source: http://www.flickr.com/photos/koalie/394934841/ 33. CC BY-NC-ND 2.0 by www.english.school.nz, source: http://www.flickr.com/photos/iei/
2904115612/ 34. Liberty ship under construction, source:
http://en.wikipedia.org/wiki/File:Liberty_ship_construction_04_bottom.jpg, public domain