Skip to main content

Data Management Philosophy & Plans

GLAMR — a centralized, public repository of ecosystem metabolism estimates and related data for lakes, ponds, reservoirs, and other lentic water bodies around the world.

Photo by Bopaiah Biddanda

Philosophy

In keeping with ongoing changes in the culture and requirements of data sharing in ecology (Nelson 2022), and inspired by the catalytic impacts of well-curated compilations of high-quality data in aquatic ecology and other fields (e.g. Baldocchi 2008, 2020, Read et al. 2017, Soranno et al. 2017, Pollard et al. 2018), we argue that advancing understanding of lake metabolism requires more than just data – it requires linking data to the community in ways that provide both credit for contributions and opportunity to engage in team science. We will create analysis-ready data products that are fully FAIR (findable, accessible, interoperable, re-usable), share these data products publicly, and encourage use of the data by all interested researchers.

Plans

Publish GLAMR via EDI – To build GLAMR we will organize and harmonize previously published datasets that include lake metabolism data and/or the data necessary to generate lake metabolism estimates. The harmonized GLAMR data set will be published as a data package via the Environmental Data Initiative (EDI). It will have the full benefits of all EDI data packages, including DOIs, provenance tracking, credit attribution, reproducible workflows, and accessibility through EDI’s application programming interface for R and Python software. We anticipate initially building two beta versions of GLAMR: a small v0.1 for stress-testing, and then a larger, analysis-ready v0.2, available to data contributors to facilitate early exploration of the data ahead of the first public release. The first public release (v1.0), and subsequent releases adding new data to the previously published versions, will be licensed under a Creative Commons Attribution - NonCommercial - ShareAlike 4.0 International (CC BY-NC-SA 4.0) license, with no conditional restrictions or proprietary software requirements, and will be available for public use subject to proper citation and recognition of original source.

Provide credit

We will take the following steps to provide credit to those who contribute data to GLAMR:

  • For all data included in GLAMR we will cite the original published data package (“contributed package”) in the provenance portion of the GLAMR data package description. The contributed package will also be indicated in a column in the GLAMR data itself.
  • We will invite each author of a contributed package to be listed as an author of GLAMR. “Author of a contributed package” includes anyone listed in the data package citation. “Author of GLAMR” means a person who is listed as a “creator” of GLAMR and who therefore appears in the citation for the GLAMR data package. Data contributors invited to be authors must accept responsibility for the accuracy and integrity of the data, including their willingness to promptly respond to and address inquiries or issues raised by data users. Authors of a contributed package who do not wish to accept this responsibility can still be listed as “associated parties” or “contributors” to the GLAMR data package.
  • We will invite all authors of GLAMR to contribute ideas to, and join as coauthors, a data paper that will accompany the first public release of the GLAMR data package, and several initial papers that we intend to write using the data.

Assist in publication of previously unpublished data sets

For researchers who hold previously unpublished data that they would like to contribute to GLAMR, we aim to provide assistance in publishing a researcher-led data package on EDI that can subsequently be integrated into GLAMR. We have some capacity to provide this assistance now, and we are working to secure funding that would increase this capacity.

Facilitate review

Data contributors will be invited to review their contribution to GLAMR, including the data themselves and the complete pipeline of code that brings those data into GLAMR, before the public release of the first GLAMR version that includes those data.

Publish software

All project-related software, including workflows for assembling GLAMR, will be written in non-proprietary languages (R and Python) and will be made available in publicly accessible GitHub repositories during and after development.

Encourage broad use and inclusive team science

To help encourage broad use and future growth of GLAMR, we plan to organize workshops at GLEON and/or other international meetings, coincident with the public release of GLAMR v1.0, to introduce new users to the data package and spark new collaborative efforts to use the data. In these and all other uses of the GLAMR data, we will ask project leads to follow GLEON’s norms of inclusive team science and invite meaningful contributions from data contributors and other interested researchers.

Literature cited

Baldocchi, D. 2008. Breathing of the terrestrial biosphere: lessons learned from a global network of carbon dioxide flux measurement systems. Australian Journal of Botany 56:1–26.

Baldocchi, D. D. 2020. How eddy covariance flux measurements have contributed to our understanding of Global Change Biology. Global change biology 26:242–260.

Nelson, A. 2022. Ensuring free, immediate, and equitable access to federally funded research. Office of Science and Technology Policy.

Pollard, A. I., S. E. Hampton, and D. M. Leech. 2018. The promise and potential of continental‐scale limnology using the US Environmental Protection Agency’s National Lakes Assessment. Limnology and Oceanography Bulletin 27:36–41.

Read, E. K., L. Carr, L. De Cicco, H. A. Dugan, P. C. Hanson, J. A. Hart, J. Kreft, J. S. Read, and L. A. Winslow. 2017. Water quality data for national‐scale aquatic research: The Water Quality Portal. Water Resources Research 53:1735–1745.

Soranno, P. A., L. C. Bacon, M. Beauchene, K. E. Bednar, E. G. Bissell, C. K. Boudreau, M. G. Boyer, M. T. Bremigan, S. R. Carpenter, and J. W. Carr. 2017. LAGOS-NE: a multi-scaled geospatial and temporal database of lake ecological context and water quality for thousands of US lakes. GigaScience 6:gix101.