Open science is the ability to share and reproduce analysis without sharing a computer. We recognize users have limited resources, such as network bandwidth and memory, and often this prevents them from thinking outside the box when it comes to scaling and sharing science. Open science presents a clear need to standardize on and deliver more cloud-friendly data formats and services. During this session, we highlight advances in cloud-friendly data and services and strive to answer some ongoing research in how these formats and services will support new scales of science and do so openly.
Cloud-friendly data formats and services are central to delivering new innovation in Earth science. With cloud-optimized data formats and services, Earth scientists can achieve new scales of analyses and deliver reproducible research output and information products.
The conversation about data formats is not one that will be “closed” with a decision on “one format to rule them all”. We propose a session centered around discussions which surface new advances in data formats and standards which specifically support sharing and scaling science on the cloud. Many call these formats “cloud-friendly” and “cloud-optimized” formats, respectively.
Putting data on the cloud in cloud-friendly formats is a starting point. Necessary to the utility of this data is the metadata, tools and services which support users accessing these datasets. There have been new advances in cloud-friendly services as well, however there is a lot of room for improvement. During this session, we focus not just on the data formats themselves, but on the usability of those formats made possible by the support system around using them.
Agenda (150 minutes):Part 1: Lightning Talks - Provide a "lay of the land" and fodder for discussion:
- Aimee Barciauskas, Welcome to this session: What do we mean by cloud-optimized and why does it matter?
- 60 minutes of 7-10 minute lightning talks
- Trevor Skaggs, Element84, will speak on Entwine Point Tile store generated for ATL06
- Joe Roberts, NASA JPL, will speak about the Metadata Raster Format (MRF) and how it supports the features of NASA GIBS
- Charles Stern will talk about motivations and progress made on pangeo-forge
- Stavros Papadopoulos, creator of TileDB, will present "Time to depart from file formats and focus on engines and APIs"
- Steve Olson and Shane Mill will talk about NOAA's EDR API and how it enables programmatic access to both conventional and cloud-optimized data formats
- Aaron Friesz will talk about the platform NASA LPDAAC has built to leverage cloud-optimized data formats.
- 15 minutes: organize into 3-4 sub groups for continuing the conversation on a specific topic or presentation.
15 minutes: Break
Part 2: Small Group Discussions - Attendees and speakers will use this time to dive into discussions, questions and expertise on a sub-topic or specific question.
- 30 minutes: Small groups meet in a virtual sub-space. A session organizer or ESIP coordinator will meet with each small group to facilitate conversation and take notes.
- 25 minutes: Small groups present back to the larger group what was discussed
- 5 minutes: Wrap-up
View Notes