This event has ended. Create your own event on Sched.
For over 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public. The theme of this year’s meeting is Leading Innovation in Earth Science Data Frontiers.
Back To Schedule
Tuesday, July 20 • 4:00pm - 5:30pm
Machine-Readable Descriptors for Heterogeneous Tabular Data

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.

Many Earth science observation datasets are inherently tabular in nature: rows and columns of numbers and text providing measurements of particular quantities at specified times and locations. Often these data are plain text files containing comma-separated values (CSV) or other separators. Such files are easy for humans to load into a spreadsheet or Pandas Dataframe, either interactively or using ad-hoc code that understands the structure of a particular file.

Unfortunately, tabular data files are heterogeneous. There are no mandatory standards or schema for important characteristics such as the presence of header rows, the naming and ordering of columns, the units used, and so forth. Even if there were a standard approach, a data archive facility may be obligated to accept data as submitted rather than converting to another format. The end result of this file variety is that human intervention is required to inspect and understand the contents of any new instance; automated data ingestion and verification are not easily done.

To solve this problem, a number of approaches have been proposed for machine-readable descriptors that provide metadata about the syntax and semantics of the rows of data. Examples include the World Wide Web Consortium (W3C) CSV on the Web (CSVW) technical recommendation (which uses JSON format), Table Schema (also in JSON), NOAA ERDDAP's NCCSV and British Atmospheric Data Center's BADC-CSV (both of which use CSV text), CSV YAML (CSVY), NASA Ames Format Specification (text), possibly NcML (XML not for this purpose but perhaps adaptable), and doubtless others. In each case the descriptor is either a separate sidecar file or comprises additional lines of metadata in the data file itself, prior to the actual CSV-style rows of data values.

This session will invite discussion of various approaches and their benefits or limitations including ease of creation, actual machine-readability, level of standardization, availability of tools, and breadth of community adoption.

  • Welcome and overview - Jeff de La Beaujardière/NCAR (15 min)
  • W3C CSV on the Web (CSVW) at Italian Ministry of Transportation - Paolo Starace/SciamLab (15 min)
  • ERDDAP's datasets.xml as a File Description System - Bob Simons/NOAA NMFS (15 min)
  • CSV YAML (CSVY) at ICARUS - Tran Nguyen/UC Davis (15 min)
  • Open discussion (30 min)
View Notes

Organizers & Speakers
avatar for Jeff de La Beaujardiere

Jeff de La Beaujardiere

Director, Information Systems Division, NCAR
I am the Director of the NCAR/CISL Information Systems Division. My focus is on the entire spectrum of geospatial data usability: ensuring that Earth observations and model outputs are open, discoverable, accessible, documented, interoperable, citable, curated for long-term preservation... Read More →
avatar for Bridget Thrasher

Bridget Thrasher

Data Stewardship Coordinator, NCAR

Eric Nienhouse

SE / Product Owner, UCAR
avatar for Bob Simons

Bob Simons

IT Specialist, NMFS Environmental Research Division
I work on ERDDAP, a free and open source data server that gives you a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP has been installed and used by more than 70 organizations around the... Read More →
avatar for Paolo Starace

Paolo Starace

Solution Architect & Co-founder, Sciamlab

Tuesday July 20, 2021 4:00pm - 5:30pm EDT