What is the work plan for the SGP?
The SGP was established in 2015, and is working towards its first goal, which is analysis of shale geochemical data from the Neoproterozoic through Paleozoic. Our goal for this project was to assemble or generate multi-proxy sedimentary geochemical data (iron, carbon, sulfur, major and trace metal abundance, and trace metal isotopes) from multiple regions worldwide for every Paleozoic Epoch and roughly equivalent 25 Ma Neoproterozoic time slice. In addition to data compilation, this has involved a major effort by SGP members to generate new geochemical data from ‘background’ intervals in the Paleozoic. As might be expected, most geochemical data to date has come from intervals of biotic interest such as mass extinctions, and so to fully understand long-term paleoenvironmental trends we are focusing efforts on under-sampled intervals. We have reached our first data freeze in January, 2019, and are currently working towards analyzing these data. Data analyses will be ongoing throughout the project, and these results will be shared internally with SGP members, and later with the broader geological community in conference presentations, as they develop. Following the first set of group papers focused on the Neoproterozoic-Paleozoic our goal is to expand to younger/older time intervals and the carbonate geochemical record.
What are the Working Groups studying?
Working Groups are proposed by SGP Members. There are ten Working Groups analyzing Phase 1 data:
- Sedimentary provenance, basalt weathering, and trace metal enrichments in the Neoproterozoic–Early Paleozoic (Justin Strauss, Dartmouth College; Tianran Zhang, Dartmouth College; Brenhin Keller, Dartmouth College; Akshay Mehra, Princeton University; Nick Tosca, Oxford University)
- Using δ34S in sedimentary pyrite to track key changes in the Paleozoic marine system (David Johnston, Harvard University; Jordon Hemingway, Harvard University; Ben Gill, Virginia Tech)
- Exploring the barium cycle in deep time (Peter Crockford, Weizmann Institute and Princeton University; Italy Halevy, Weizmann Institute; Tristan Horner, WHOI)
- Using multiple trace elements to probe ancient euxinic conditions (Minming Cui, Johns Hopkins University; Maya Gomes, Johns Hopkins University)
- A statistical survey of variable relationships across the database — quantitatively testing and identifying geochemical relationships on a 'medium data' scale (Devon Cole, Georgia Tech; Erin Saupe, Oxford University)
- Tracking the ancient global redox landscape with redox-sensitive metals in anoxic shale (Erik Sperling, Stanford University; Noah Planavsky, Yale University)
- Reconstructing late Paleozoic weathering changes using novel geochemical modeling (Alex Lipp, Imperial College, London; Oliver Shorttle, University of Cambridge)
- Tracking the dynamics of early Paleozoic marine redox using iron speciation records (Matt LeRoy, Virginia Tech; Ben Gill, Virginia Tech)
- A statistical comparison of redox-sensitive geochemical signatures in outcrop and drill core samples (Noah Planavsky, Yale University; Erik Sperling, Stanford University)
- Organic carbon deposition in space and time (Erik Sperling, Stanford University)
- SGP Phase 1 data product (a paper to be submitted describing the SGP database structure, website, API, and details of the Phase 1 data) (Una Farrell, Trinity College Dublin)
Is there more information on the SGP database itself?
SGP uses a PostgreSQL relational database. It is based around a modified version of the British Geological Survey Geochemistry Data Model (http://www.bgs.ac.uk/services/dataModels/geochemistry.html), with additional tables added for geological and geographical context data. While the database is tailored to our particular lab and research needs, it is also designed with the larger community in mind - we make sure to use common vocabularies and standard community-approved terms whenever possible, and the database structure shares features (and identifiers) with existing databases, such as EarthChem, Macrostrat and the Paleobiology Database. This will facilitate the transfer of data to larger public community databases in the future (see What is the future of these data?) and help incorporate existing datasets into our analyses. For further information on the database design, contact Una Farrell (firstname.lastname@example.org).
Aren’t there other geochemical databases?
Yes. As with many other fields across science, in recent years there has been a welcome trend in geochemistry towards increased data storage and accessibility. There are now several databases dedicated to geochemical data. Members of the SGP are actively engaged in the discussion of how geochemical data will be catalogued in the long term, and we strongly encourage all researchers to accession their data in databases such as EarthChem (http://www.earthchem.org), the Geobiodiversity Database (http://www.geobiodiversity.com), etc.
While the field is working towards long-term data access solutions, one issue is that busy researchers are reluctant to spend time accessioning data on these sites. This is true for new studies, but particularly true for legacy studies; why spend time searching through field notebooks and old excel files to get data onto a server on the off chance that someone else may use it someday? Considering this issue, and after discussions with leaders of research consortia in human statistical genetics such as the Psychiatric Genetics Consortium (PGC; https://www.med.unc.edu/pgc), we believe the best way to address our research questions is through a similar consortium framework.
Although research consortia and full community databases (Genbank, EarthChem, etc.) both aggregate data, the approaches and goals are different. Large community databases seek to store and make accessible essentially all data. The data resulting from research consortia are ultimately integrated with these community databases (see What is the future of these data?) but the fundamental purpose is to address specific research questions. The opportunity to collaborate on exciting questions within the Working Groups, of direct interest to the researchers, provides an obvious incentive to contribute data and metadata to the database.
The incentive to contribute within the group structure extends not only to new studies and legacy published data, but also to unpublished ‘orphan’ data. Many researchers have accumulated high-quality results that did not fit with publication plans for various reasons, but which are still very useful for larger-scale compilation studies. The SGP database contains thousands of unpublished data points provided by consortium members, which often complete the geochemical characterization of samples, allowing us to use those samples to address questions that would be impossible solely with the published data. Consequently we see our project as a complementary but parallel strand to larger community database efforts.
What is the future of these data?
As discussed under Aren’t there other geochemical databases?, the data to be collected and analyzed here is for the purpose of specific research questions in Earth history, and ultimately the goal will be to migrate these data to permanent data repositories. Given that researchers are unlikely to accession metadata, legacy data, or unpublished data without the incentive of an interesting collaborative research opportunity, the end result of this project will be the accession of considerable unpublished data and meta-data that would not otherwise be available. The use of common terms and dictionaries between our SGP database and other databases will facilitate this process.
Above and beyond accessibility in global databases, the datasets generated during this study will be of considerable interest to the Earth history community. Because our database contains unique geological context information not currently stored by other data repositories, we have built a web interface that will allow for open searches by the community of the entire dataset. This website is currently password-protected and only accessible to Working Group analysts. This website and the entire data product will be made available in 2020 (this was originally slated for spring 2020 but has been pushed back slightly due to coronavirus-related implementation issues).
Is this ‘big data’?
Not really. The term ‘big data’ has become a buzzword in science, business and modern life, and generally refers to datasets that are too large for traditional data management applications. The SGP is working to assemble an unprecedented amount of information for the scientific questions at hand but in comparison, for instance, to an analysis of global Facebook interactions it is relatively small potatoes. Consequently we think of SGP research as ‘medium data’: it represents a considerable jump in scale from previous studies in this field but still relies on a straightforward (but custom) relational database, with on the order of millions of analytical results (see Is there more information on the SGP database?).
Furthermore, our work focuses not only on gathering a relatively large dataset, but also on building a well-vetted, high-quality and complete dataset. The direct involvement of SGP researchers who conducted the original projects gives us a level of insight into samples and their geological and geographical context not possible in previous studies. SGP researchers help code information that was not included in their published papers, and conduct and/or provide unpublished geochemical analyses. A major goal has been to complete geochemical measurements on sample sets with nearly complete data (for instance, trace metal measurements for samples with iron/carbon/sulfur data, or TOC measurements for samples that have been previously studied for trace metal content). This is critical – in order to accurately interpret proxy data, such as Mo/TOC ratios in euxinic shale for example, a full suite of trace metal, iron speciation and total organic carbon data are needed. Ultimately it is this complete geochemical and contextual data matrix that will allow for our most insightful analyses.
Respectful Community in SGP
1. We are a community that respects and values a multiplicity of differences. This includes but is not limited to gender, age, culture, ethnicity, sexual orientation, religious and political beliefs, and academic background.
2. We value collaboration and foster collegial and productive relationships among team members. We expect that team members will collaborate and discuss results in a respectful and collegial manner. We do not engage in ‘combat science’ where not only are scientific ideas critiqued but the person themselves are criticized or disparaged.
3. We are committed to treating all members of the SGP community and broader scientific community with civility, courtesy and respect.
4. We take personal responsibility for our actions.
5. We support each individual and acknowledge the roles and contributions each person brings to the success of our Project.