Sharing Paleodata (Part 4): MorphoSource
Today I continue my series highlighting repositories for paleontological raw data. Previous posts in this series can be found here, here, and here.
MorphoSource is a data repository for 3D data – raw CT and microCT data, but also surface models, slice movies, and 2D images. It’s the brainchild of Doug Boyer, an Evolutionary Anthropology professor at Duke. While there are plenty of data repositories for 2D images (e.g., MorphoBank and MorphBank), there really isn’t an equivalent for raw 3D data. Yes, some sites like Dryad or Figshare can host CT datasets (as Andy and I did for our 2013 “Dinosaur Joe” paper), and the website DigiMorph hosts these data in a more limited format, but currently there’s no dedicated repository for raw CT data.
CT has been an increasingly common tool in paleontological research for the last two decades, but for most of that time, CT datasets were too large to host online. Early on, Tim Rowe and his UT Austin colleagues set up DigiMorph as a central online repository for hosting the data, albeit in smaller file formats. DigiMorph has long been the only site focused on paleo/morphology CT data; they currently host an impressive 1ooo+ datasets. I’m a big fan of DigiMorph and use it regularly in both teaching and research, but it has some limitations. First, nearly all of the DigiMorph data comes in video form (movies that cycle through CT slices or show spinning rendered volumes) or annotated 2D images. Only ~50 have associated downloadable STL files (surface meshes/models), and none of the raw data (projections or slices) are downloadable. There’s also a backlog of specimens that have yet to be uploaded, because they only have so many techs and they all have many other responsibilities. MorphoSource answers these concerns by allowing users to upload their own raw data and STL files, in addition to movies and 2D images.
Full Disclosure: The statements about MorphoSource in the “Nitty Gritty” section were checked for accuracy by Doug Boyer (site director). The impressions are my own. I’ve used the site for a couple months now, and have submitted, shared, viewed, and downloaded data.
Impressions: Two weeks ago, University of Texas sent me some fossils. I opened the box on Wednesday night, photographed and measured my specimens on Thursday, and on Friday morning at 7:30AM, I scanned all 8 bones. By noon, I had made some surface meshes in Avizo, created a MorphoSource project, uploaded all the dicom and STL files (the raw data and my interpretations of it), and shared the whole shebang with the collections manager. In two days I went from a sealed box to having returned all the data to the lending institution. Yes, it’s highly unlikely that anyone will need to see this particular right femur of a juvenile false gharial in the few months I have these guys on loan, but if they do, Texas already has the data and a surface model they can share with people who need it. So, yeah, I really like MorphoSource and find it beneficial even prior to publishing anything publically on the site. It’s now my go-to site for 3D data management.
Part of its appeal is that it sticks to functionality and doesn’t get bogged down in unnecessary features – no social media stuff; it’s all business and functionality. In several months of use, I’ve never had problems with the site; no crashes, stalled downloads, or errors. I was able to figure out the site without needing to consult the User Guide, although it took me a couple hours to become comfortable navigating and uploading. If I had read the directions first, I probably wouldn’t have found some aspects clunky (see below), because I would have anticipated what information I needed to have on hand before uploading.
To begin, you create a project, and then add taxa and specimens before uploading media files. It auto-searches to see if someone has previously entered a taxon or specimen number, which is pretty handy. You determine who can view/download the media before the project is published by adding collaborators. Collaborators can see all specimens within a project, but you can also choose to share a given specimen with people outside your collaboration group (if you didn’t want to give them access to every specimen). Outside of collaborators, nobody can see your data until you publish it. Upon publication, you can decide whether everyone can download it without contacting you, or if they need to email you for permission before access is granted. You can also elect to be notified passively when people download your data. After publication, people can view your surface meshes in the browser, but if they want to download data, they need to register to do so.
So far, I’ve included collections managers as collaborators on projects that involve their museum’s specimens. One nice feature: if I want to, I can give them sole access to the project/media/specimen (i.e., transfer the project to them) after publication. I like this idea a lot; most museums ask for the raw CT data upon publication anyway, and this is a good way to both give them the data and allow them to control and track its use down the road.
In addition to sharing the tiff or dicom stack, you can also share movies and surface meshes, duplicating a major and minor feature of DigiMorph, respectively. An especially helpful tool is the ability to view and rotate surfaces in-browser, without having to download the file or requiring sophisticated 3D software. This has already proved useful in sharing data with my colleagues; they can view and manipulate the specimen virtually while we chat about it over the phone. As a tool for real-time collaborative research, this is pretty great. It also has a lot of promise as an outreach and education tool – for example, the Florida Museum of Natural History has uploaded surface meshes for two K-12 educational projects on Horse Evolution and the megasnake Titanoboa. I can see MorphoSource as a great supplement for in-class activities or virtual lab assignments; for example, using this incredible dataset of primate skulls in the Harvard MCZ collection.
If you want “official” credit for these open-data efforts, you can request a DOI for projects or media items. The first time, you’ll need approval as a DOI user and then it’s a simple request each time you need a DOI after that. They’ve also included some pretty cool features that make downloading files and managing metadata easier. If a large dataset is available for immediate download, you can choose specific specimens to download using a shopping cart button. Once you get to the “check out” page, you can opt to download only part of the data, say, only the meshes or tiff stacks. You can also download all the metadata for every specimen in your cart as a CSV – all the specimen data and scan parameters in one file, so you don’t have to hunt for the information later.
Another strong positive is Doug Boyer himself. I’ve been testing the site for a couple months now, and have sent him several suggestions by email. Each time, he took the time to read and respond, indicating which were already planned, which weren’t practical, and which were good ideas. Some of these have already been incorporated. He’s really invested in the functionality of the site, and wants to ensure that every feature directly relates to the primary goals of storing and sharing data.
There aren’t many negatives. I find searching a little clunky (they’re working on improving this), but browsing is ok once you get used to using in-window forward and back arrows to navigate. Initially I thought it was a little annoying to have to enter and save taxonomy, specimen, and scan information before uploading can begin, but really, it’s no more cumbersome than the same process on MorphoBank. One advantage over MorphoBank is that basic museum, specimen, and taxonomic information are accessible to the whole site once one person enters it – nobody will ever have to go through the trouble of entering the higher taxonomy of Tomistoma schlegelii again. Currently, each specimen requires entering a genus and species, which for my critters is not always possible (my best workaround is entering g: Phytosauria, sp: indet, or g: Incertae, sp: sedis). Future improvements will include the ability to search using higher taxonomy, which will get around this problem to some extent.
Bottom Line: MorphoSource is functional and fairly easy to use. Currently, it’s probably the best option for sharing raw CT data. Grant writers should feel comfortable listing MorphoSource in their Data Management Plan. Reviewers should feel comfortable asking authors post data to MorphoSource.
MORPHOSOURCE: THE NITTY GRITTY
What it is: 3D data repository for CT and microCT data
What it is, in their words: “a project-based data archive that allows researchers to store and organize, share, and distribute their own 3D data”; also “an image-sharing resource designed to allow users to upload, search, and browse high quality 3D images”
Who runs it: Doug Boyer, Duke University
Who funds it: Duke University (directly and by hosting the site), NSF (BCS 1317525; BCS 1304045)
Who uses it: Researchers who use, download, or submit 3D data.
Cost to submit: Free (registration required).
Cost to access: Free (registration required to download).
Data and file types supported: 2D images: JPEG, PNG, TIFF, DICOM and PSD. Video: MPEG-2, MPEG-4/H.264, FLV, QuickTime, and Windows Media. Surfaces: PLY and STL. You can also submit zipped archives of TIFF and DICOM files. I included Avizo files within my zipped archive with no uploading issues.
File sizes allowed: No limit
Copyright status: You choose whether or not the data is copyrighted and who owns the copyright. You select among Creative Commons licenses (CC0 and CC/BY/NC/SA/ND) or release the data for one-time use on MorphoSource. No option (yet) to upload a copyright permissions document, but you can copy text from those documents into the notes. Nifty feature: you can opt to receive a notification email when your data are downloaded, or require users to contact you for permission before data are downloaded.
Data available during peer review? Kinda, in that you can make the data live at submission rather than upon publication of the paper.
Allowed to post data from previous pubs? Yes, even publications published before MorphoSource existed, and they invite and encourage this practice.
Accession numbers provided? Every project, specimen, and media item gets their own accession number (similar to GenBank) as you upload them. You can also request a DOI.
Data goes live when: You choose to publish the project. Data can stay as an unpublished project forever, or you can publish it when you like – before or after the manuscript is accepted, when embargo is lifted, when the paper is published, etc.
Data is backed up? There are 4 copies (RAID1 drives in 2 separate physical locations). Currently applying for funding to expand physical storage, add additional cloud storage, and expand the archival system.
Stats provided? Number of views and downloads for each specimen and media item. You can also see which registered users have viewed/downloaded your files.
How to cite your data in your manuscript? Cite the MorphoSource software. Right now there are two extended abstracts (Boyer et al. 2014; Kaufman & Boyer 2014 , but there’s also a formal paper in the works
Boyer DM, Kaufman S, Gunnell GF, Gomes E, and Thostenson J. 2014. MorphoSource: A currently active project-based 3D digital web-accessible data archive for museums and individuals at Duke University USA. In: Mallison H, Vogel J, and Belvedere M, editors. Digital Specimen 2014 – Abstracts of Presentations. pp 8-12.
Kaufman S, and Boyer DM. 2014. Developing Platforms for Management and Distribution of Digital Specimen Data. In: Mallison H, Vogel J, and Belvedere M, editors. Digital Specimen 2014 – Abstracts of Presentations. p47.
You should also cite your data, something like: “TIFF stacks and STL files are available on MorphoSource (http://www.morphosource.org): Project XXX (DOI: ). Media accession numbers are listed in Table X.”
How to cite data you download from someone else’s project? DOI, if they have one. Otherwise cite the site, project and media numbers.
Can update after publication? Yes. In their words: “You always have the option of removing materials from public view, and adding new media to longstanding projects.” *This may not be the case for DOI-assigned media.
Benefits in a nutshell:
- Secure cloud storage of 3D surfaces and raw microCT and CT scan data
- Accessible to your collaborators before publication
- View and rotate 3D surface meshes in your browser
- Add new citations to a project as you/others re-use the data
- Notification when people view/download your dataset
- You can require people to ask permission before downloading data
Three recent paleo papers using it:
Evans, SE, JR Groenke, MEH Jones, AH Turner, DW Krause. 2014. New material of Beelzebufo, a hyperossified frog (Amphibia: Anura) from the Late Cretaceous of Madagascar. PLoS ONE 9(1): e87236. MorphoSource data example: media S1312, a presacral vertebra. YOU CAN DOWNLOAD YOUR OWN BEELZEBUFO, PEOPLE!
Chester, SGB, JI Bloch, DM Boyer, WA Clemens. 2015. Oldest known euarchontan tarsals and affinities of Paleocene Purgatorius to Primates. PNAS 112(5): 1487-1492. MorphoSource data example: media S1405, an astragalus.
Boyer, DM and ER Seiffert. 2013. Patterns of astragalar fibular facet orientation and their evolutionary implications in extant and fossil primates. American Journal of Physical Anthropology 151(3): 420-447. MorphoSource data example: media S1083, a different astragalus.
Strictly speaking, DICOMs are NOT raw data. They are what you typically get handed by the scanner operating staff, but the slices are calculated from the real raw scan data. It would be nice to be able to archive the full data, as in the future we may be able to calculate better slice data from the original scans using better algorithms than what we have available right now. However, the raw data typically is much larger, and often comes in formats that are hard to read unless you buy expensive software.
I think you can include that in the zipped archives, if you want. I also included my Avizo files in one of the datasets I uploaded.
Not that I would want to save every monster dataset, but especially in cases where the original specimen is likely to be lost (due to e.g. destructive sampling or chemical decay over time) it is important to preserve as much data as possible.
[…] 共有 Paleodata (第 4 報): MorphoSource|Sharing Paleodata (Part 4): MorphoSource […]