Loading…
This event has ended. Create your own event on Sched.
For over 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public. The theme of this year’s meeting is Leading Innovation in Earth Science Data Frontiers.
ALL SESSION RECORDINGS CAN NOW BE FOUND ON THE ESIP YOUTUBE CHANNEL.
Monday, July 19
 

11:00am EDT

Opening Plenary: Innovate @ ESIP
This year's theme is Leading Innovation in Earth Science Frontiers, and there are competitive programs at NASA, USGS, and ESIP to support the development of innovative tools, services, data and other technology for the Earth science community. But these great ideas don’t always make it to the users that need them. Therefore this session provides an opportunity for PIs to talk about their work and show how their big ideas can help solve common challenges in the scientific community.

Innovate@ESIP Speakers:
Agbeli Ameko
Aparna Bamzai
Ben Letcher
Hamed Alemohammad
Kelsey Breseman
Joe Hamman
Kathe Todd-Brown
Nga Chung


Organizers
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Leslie Hsu

Leslie Hsu

physical scientist, U.S. Geological Survey
Coordinator of the USGS Community for Data Integration and member of the USGS Science Data Management branch.
avatar for Sara Lubkin

Sara Lubkin

ESDIS Science Data Operations Manager, NASA
avatar for Susan Shingledecker

Susan Shingledecker

Executive Director, ESIP
Susan is Executive Director or ESIP, Earth Science Information Partners, a global community of Earth science data professionals who come together to find solutions and advance data management to enable and empower the use of data to solve some of our planet's greatest challenges... Read More →

Organizers & Speakers
avatar for Aparna Bamzai-Dodson

Aparna Bamzai-Dodson

Acting Regional Administrator, USGS North Central Climate Adaptation Science Center
Aparna Bamzai-Dodson is the USGS Acting Regional Administrator for the NC CASC. In this role, she undertakes stakeholder and partner engagement to identify strategic science goals, outputs, and objectives. She is also responsible for tracking budget planning and expenditures, organizing... Read More →
avatar for Kelsey Breseman

Kelsey Breseman

Attendee, Head Weaver
Tlingit, forest person, engineer, and activist. Working on climate research & communication on tribal lands with Sealaska and The Nature Conservancy. Always interested in how tech tools and the stories we tell shift the balance of power.
avatar for Hamed Alemohammad

Hamed Alemohammad

Executive Director and Chief Data Scientist, Radiant Earth Foundation
avatar for Ken Casey

Ken Casey

Deputy Chief, Data Stewardship Division, NOAA/NCEI
I serve as the Deputy Chief of the Data Stewardship Division at NCEI and am working on a variety of efforts, from accelerating the ingest of data to the archive to supporting our migration to the NESDIS Common Cloud Framework (NCCF). I am also honored to serve as the ESIP President... Read More →
avatar for Kathe Todd-Brown

Kathe Todd-Brown

Assistant Professor, University of Florida
I\\'m a computational biogeochemist who uses data and mathematics to study how dirt breaths.
avatar for Ben Letcher

Ben Letcher

USGS
Population ecologist at USGS. Stream ecology, hydrology, data visualization, science communication


Monday July 19, 2021 11:00am - 12:30pm EDT
TBA

12:30pm EDT

Break
Join a discussion with your peers here!

Monday July 19, 2021 12:30pm - 1:30pm EDT
TBA

1:30pm EDT

Assessing the Research Data Management Landscape Through Practice and Education

As research methodology and scope have expanded with technological advances, the management of data, software, and other digital objects has positioned itself as a critical and distinct component of the research lifecycle. The scale and diversity of new scientific data has required a reexamination of the roles of data managers in the research enterprise. Emphasis on the FAIR Data Principles, data management plan mandates, and the evolving digital contexts of both conducting and sharing research have made more complex the practice of data management.

Through multiple studies funded by the Institute of Museum and Library Services (IMLS), faculty and graduate students examined the professional terrain of data managers through demographic, industrial, and practical contexts. New work from the Collaborative Analysis Liaison Librarian (CALL) Laura Bush 21st Century Librarian grant program at the University of Tennessee and the University of Denver investigate the knowledge, skills, and abilities needed as well as tasks performed in these roles. By investigating what and how data management is realized and who is involved, we aim to stimulate a discussion on the trajectory of earth science data professionalism and assess potential educational needs of both those entering the workforce and currently practicing. CALL asserts that an understanding of professional priorities must emerge through conversations with current data professionals and their collaborators. How do we teach digital collaboration? Who is facilitating data management education? These discussions are essential in advocating for research data management in its varied contexts as well as preparing future practitioners for its associated responsibilities.

To prepare for this session, we invite participants to think about the following questions:
  1. What would be the ideal curriculum to prepare you for your job?
  2. What knowledge, skills, and abilities do you expect from prospective employees?
  3. What do you expect are things you must learn on the job?

The panel seeks feedback from the ESIP community to inform future avenues of inquiry, provide context for current research, and promote further engagement with data management education.
The scale and diversity of new scientific data requires the reexamination of the roles of data managers in the research enterprise. Emphasis on the FAIR Data Principles, data management plan mandates, and the evolving digital contexts of both conducting and sharing research have made the practice of data management more complex. Through multiple studies funded by the Institute of Museum and Library Services (IMLS), faculty and graduate students examined the professional terrain of data managers through demographic, industrial, and practical contexts. The Collaborative Analysis Liaison Librarian (CALL) Laura Bush 21st Century Librarian grant program at the University of Tennessee and the University of Denver investigates the knowledge, skills, and abilities needed as well as tasks performed in these roles. By investigating what and how data management is realized and who is involved, we aim to stimulate a discussion on the trajectory of earth science data professionalism and assess potential educational needs of both those entering the workforce and currently practicing.

An examination of workforce trends and development can open further discussions and help foster the professional alliance between information and research sciences. CALL has undertaken a job analysis, comprised of interviews and a survey with data professionals, of which the majority are ESIP members.

View Notes

Organizers & Speakers
avatar for Matthew Mayernik

Matthew Mayernik

Project Scientist, National Center for Atmospheric Research
Matt is a Project Scientist and Research Data Services Specialist in the NCAR/UCAR Library. His work is focused on research and service development related to research data curation. His research interests include metadata practices and standards, data curation education, data citation... Read More →
avatar for Wade Bishop

Wade Bishop

Professor, University of Tennessee
avatar for Hannah Collier

Hannah Collier

Metadata Coordinator, ORNL (ARM Data Center)
avatar for Ashley Orehek

Ashley Orehek

Instructional Librarian, Lindsey Wilson College
Early career librarian. I just graduated in May 2021 and started my first job in June 2021. I aspire to be an atmospheric science librarian someday.


Monday July 19, 2021 1:30pm - 3:00pm EDT
TBA

1:30pm EDT

Delivering Trusted Data to Real Users & Decision Makers
Since the ESIP Winter meeting the Disaster Lifecycle Cluster has further evolved its approach to Leading Innovation in Earth Science Data Frontiers to put more Earth science data to work in decision making environments. The ‘ecosystem of innovation’ places ESIP clusters on the pathway of user engagement through real-time collaboration and an expanding user base. This session looks to attract all ESIP Clusters who have a mission to put more data, processes, analytics and/or machine learning to work, add ORL ranking to their data to serve decision makers through improved data quality, data processing and data use.

Our session will show examples of incorporating data to support recovery efforts to ‘get back to business’ after disaster hits. A specific need is for tree canopy data to aid utility operators and enable them to assess the risk of power line breaks due to an approaching storm. We encourage participants to join the Disasters cluster to mature their products or services through the ecosystem of innovation for evaluation by end users such as the All Hazards Consortium and others.

This session will identify how the ecosystem of innovation works and how ESIP clusters and members can plug into this process by introducing their products, services and methods into the pathway for real users and decision makers to experience. To prepare for the session please consider what use cases and data and/or products you may have that could be valuable to the disasters community.

View Notes

Organizers & Speakers
avatar for Dave Jones

Dave Jones

CEO, StormCenter Communications
GeoCollaborate, is an SBIR Phase III technology (Yes, its a big deal) that enables real-time data access through web services, sharing and collaboration across multiple platforms. We call GeoCollaborate a 'Collaborative Common Operating Picture' that empowers decision making, situational... Read More →
avatar for Karen Moe

Karen Moe

Retired, NASA
Managing an air quality monitoring project for my town just outside of Washington DC and looking for free software!! Enjoying citizen science roles in environmental monitoring and sustainable practices in my town. Recipient of an ESIP 2022 Funding Friday grant with Dr Qian Huang to... Read More →


Monday July 19, 2021 1:30pm - 3:00pm EDT
TBA

1:30pm EDT

Emerging best-practices for publishing non-tabular, complex, and special-case ecological datasets
Over the past several decades, considerable progress has been made in the development of metadata standards and formats for publishing Earth and environmental data. Repositories for scientific data products have also proliferated. In the ecological research community, much of this development has centered on publishing tabular data. As data products become more complex, voluminous, and distributed, there is a clear need to develop new methods to handle the creation, publication, and discovery of datasets that don’t fit within the standards and workflows established for the most common cases. In this session we address several special cases that have become increasingly common in Earth and environmental datasets, such as drone data, sequencing and genomic data, image or document collections, and datasets distributed across multiple repositories. Presentations will focus on emerging best-practices, metadata standards, specialized repositories, provenance metadata, and methods for linking datasets with disparate structures, metadata, or published locations. A breakout group session will foster collaboration and coordination among the numerous groups that are leading efforts in this area.

Session goals and preparation:

To prepare for the session, please think about what types of Earth, environmental, or ecological datasets you work with in your research or data management activities, and consider where there are gaps in your knowledge of how to publish, access, or use these datasets. In general, are session goals are:   
  1. Introduce participants to recent developments around publishing, discovering, and re-using modern ecological and environmental datasets.
  2. Answer the question: Should this community be doing more to develop best practices for publishing modern datasets as they become more complex and varied? (And what? how? where?)

During the session participants will enter notes and questions in a Collaborative Doc.

Session Agenda:
View Notes

Organizers & Speakers
avatar for Renée F. Brown

Renée F. Brown

Information Manager, McMurdo Dry Valleys LTER
dryland ecosystem ecology, biogeochemical cycles, global change, research data management, environmental sensor networks
avatar for Corinna Gries

Corinna Gries

Scientist, Environmental Data Initiative
GM

Gregory Maurer

Data Scientist/Data Manager, New Mexico State University
An ecologist and the information manager for the Jornada Basin LTER, with research interests in global change, drylands, and data science.
DS

Douglas Schuster

Manager, Data Engineering and Curation Section, NCAR/UCAR
avatar for Carl Boettiger

Carl Boettiger

Assistant Professor, University of California, Berkeley
JW

Jane Wyngaard

University of Notre Dame
avatar for Michael Barton

Michael Barton

Director, CoMSES.Net
Professor in the School of Complex Adaptive Systems, in the School of Human Evolution & Social Change, and Director of the Center for Social Dynamics & Complexity at Arizona State University. My research centers around long-term human ecology and landscape dynamics, integrating computational... Read More →



Monday July 19, 2021 1:30pm - 3:00pm EDT
TBA

3:15pm EDT

Coffee Break Networking
You can find the link for coffee and networking events in Qiqochat. 


Monday July 19, 2021 3:15pm - 3:45pm EDT
TBA

4:00pm EDT

Best Practices for FAIR Research Software
Research software is a significant and vital component of research. It is integral to all stages of research from pre- to post- processing and analysis and modeling. Recognizing software roles in reproducibility and the need for recognition for those who develop software has energized conversations on what FAIR (Findable, Accessible, Interoperable, Reusable) means for research software. The FAIR For Research Software Working Group (FAIR4RS WG) is leading the research software community in the crucial step of agreeing how to apply the FAIR principles to research software by mid-2021, and is co-led by the Research Software Alliance (ReSA), Research Data Alliance (RDA) and FORCE11.

This workshop aims to:

- present updates on application of the FAIR principles to research software to advance community knowledge
- provide opportunities to become involved in this evolving work, enabling participants to become involved in policy development
- engage participants in identifying examples of best practice in earth sciences in developing FAIR/sustainable software for research
- identify areas for improvement
- enable participants to meet like-minded colleagues.

View Notes

Organizers & Speakers
avatar for Michelle Barker

Michelle Barker

Director, Research Software Alliance
avatar for Sandra Gesing

Sandra Gesing

University of Notre Dame
LH

Lorraine Hwang

Assoc. Director, CIG @ UC Davis
avatar for Paula Martinez

Paula Martinez

Research Software Alliance
Data scientist with experience in bioinformatics, data wrangling, and data visualization. Study all about data and how to make it available and digestible/understandable to others. Open data and open education advocate, for better science and for development. An active member of open... Read More →


Monday July 19, 2021 4:00pm - 5:30pm EDT
TBA

4:00pm EDT

Innovations in EnviroSensing Technology and Practice
This session, sponsored by the ESIP EnviroSensing Cluster, will include a series of presentations on emerging and proven approaches that further the collection, management, and exchange of in situ environmental monitoring and observation data. The EnviroSensing Cluster fosters collaborative, across-discipline exchange on technology and practices that facilitate innovation in sensor-based science and data. All are welcome to attend!

Presentations:
  • SWEX: Sundowner Winds Experiment - Gert-Jan Duine 
  • Improving HydroMet data availability - Christel Valentine
  • Using GCE Toolbox in data processing workflows: An example from H.J. Andrews Experimental Forest - Stephanie Schmidt
  • Low cost environmental sensing using hobby-level technology, opportunities and trade-offs - John Porter
  • High Altitude Soil Testing (HAST): Increasing accessibility of data in remote locations - Justin Rubalcaba
  • Ultra-low power LoRa-embedded microcontrollers that simplify eco-enviro-sensing designs - Daniel Fuka
View Notes

Organizers & Speakers
avatar for Renée F. Brown

Renée F. Brown

Information Manager, McMurdo Dry Valleys LTER
dryland ecosystem ecology, biogeochemical cycles, global change, research data management, environmental sensor networks
avatar for Scotty Strachan

Scotty Strachan

Director of Cyberinfrastructure, University of Nevada, Reno
Institutional cyberinfrastructure, sensor-based science, mountain climate observatories!
avatar for Joseph Bell

Joseph Bell

Hydrologist, USGS
KF

Kristina Fauss

University of California, Santa Barbara
avatar for Stephanie Schmidt

Stephanie Schmidt

Information Manager, US Forest Service/H.J. Andrews Experimental Forest & LTER
avatar for John Porter

John Porter

Res. Assoc. Prof., University of Virginia
avatar for Dan Fuka

Dan Fuka

Scientist, Virginia Tech
JR

Justin Rubalcaba

Montana Tech
GD

Gert-Jan Duine

University of California, Santa Barbara



Monday July 19, 2021 4:00pm - 5:30pm EDT
TBA

4:00pm EDT

“Movie Credits” for Data and other Research Artifacts
Data and other research artifacts such as software, samples, and ontologies need to be recognized as first-class objects in scientific discourse. As such, they must be fairly and appropriately credited. The Research Artifact Citation Cluster has been exploring what roles should be credited for different artifacts and how those roles should be credited. Similar work has been done for literature (notably through CRediT), but we have found that those approaches are only partially useful for data and other artifacts. We have learned that there are critical roles that deserve greater recognition and that citation is only one, limited mechanism to do so. Often people say that we need something like film credits — the long list of people and roles listed at the end of a movie — to describe the work that goes into producing a useful dataset. What is sometimes lost in that analogy is how contested and highly negotiated film credits are. Defining roles and credit is complex and sensitive.

In this session, we will review the work of the group over the last year, the lessons learned, and initial conclusions on what roles are important and how they should be credited for five major research artifacts: data, software, samples, semantic objects, and complex learning objects. We will then have a set of breakouts working to address the specific issue of how credit for data should be characterized. We will explore multiple role taxonomies, including CRediT, ISO19115, Data Cite, and maybe Rescognito. The goal is to develop or adopt a defined and consistent set of roles that can be acknowledged and captured in a citation as well as other places such as a data set landing page, documentation, and other places is the metadata.

View Notes

Organizers & Speakers
avatar for Madison Langseth

Madison Langseth

Science Data Manager, U.S. Geological Survey
Madison develops tools and workflows to make the USGS data release process more efficient for researchers and data managers. She also promotes data management best practices through the USGS’s Community for Data Integration Data Management Working Group and the USGS Data Management... Read More →
avatar for Mark Parsons

Mark Parsons

Research Scientist, University of Alabama in Huntsville


Monday July 19, 2021 4:00pm - 5:30pm EDT
TBA

5:30pm EDT

Break
Monday July 19, 2021 5:30pm - 6:00pm EDT
TBA

6:00pm EDT

 
Tuesday, July 20
 

11:00am EDT

Building a primer for biological data standards
The ESIP Biological Data Standards Cluster was formed after the ESIP-IOOS Biological Data Standards Workshop held in July 2020. At that workshop a list of action items was created, and the cluster voted on the list to determine the highest priority ones for the cluster to work on. Building a decision tree was one of the most popular options. In the cluster's efforts to build a decision tree to guide data managers who are new to biological data standards, we have realized a decision tree would become a decision forest. Instead we have shifted to creating an infographic that will serve as a primer to data managers who are new to biological data standards as they select the one(s) most useful for their needs. For this session we will seek input from the ESIP community on building a primer for biological data standards.

Session Goals:
  • Get more terrestrial/aquatic folks involved in the cluster
  • Get feedback from the broader ESIP community on the primer
  • Build awareness about the cluster and the primer
  • Determine a pathway to a final product
How to Prepare for this Session: 
Think about any biological data standards that you are familiar with and repositories that require certain standards for data to be shared, especially terrestrial/aquatic focused data repositories.

View Notes

Organizers & Speakers
avatar for Abby Benson

Abby Benson

Biologist, U.S. Geological Survey
DL

Diana LaScala-Gruenewald

Monterey Bay Aquarium Research Institute
avatar for Robert McGuinn

Robert McGuinn

Conservation Biologist / Data Systems Manager, NOAA/NCEI/Northern Gulf Institute
Robert McGuinn is a Research Program Manager at the Northern Gulf Institute, a NOAA Cooperative Institute which is affiliated with the National Centers for Environmental Information in Stennis, Mississippi. He is also the Data Systems Manager for the National Marine Fisheries Service's... Read More →
avatar for Erin Satterthwaite

Erin Satterthwaite

California Sea Grant & Scripps Institution of Oceanography
Marine ecology | International coordination | Ocean observations | Diverse engagement | Food | Surfing | Backpacking | Biking



Tuesday July 20, 2021 11:00am - 12:30pm EDT
TBA

11:00am EDT

CARE Principles for ESIP Data Repositories
The ESIP cluster “Sustainable Data Management” promotes mechanisms for repositories to collaborate to preserve their holdings (https://wiki.esipfed.org/Sustainable_Data_Management). Their current project is to produce recommendations for member repositories on implementing guidance principles within frameworks like FAIR (https://doi.org/10.1038/sdata.2016.18) and TRUST (https://doi.org/10.1038/s41597-020-0486-7). We are also including a third framework: the CARE principles for indigenous data governance (http://doi.org/10.5334/dsj-2020-043), where CARE stands for Collective Benefit, Authority to Control, Responsibility, and Ethics. These principles extend data management concerns to be more people- and purpose-oriented, and to respect indigenous sovereignty. As stated in the Data Science Journal paper, “The ‘CARE Principles for Indigenous Data Governance’ empower Indigenous Peoples by shifting the focus from regulated consultation to value-based relationships that position data approaches within Indigenous cultures and knowledge systems to the benefit of Indigenous Peoples”. This session will present the cluster’s recent examination of the CARE principles, how these are related to repository activities, and extend FAIR and TRUST. Introductory material on CARE and the cluster’s work will be presented, followed by discussion.

View Notes

Organizers & Speakers
avatar for Ruth Duerr

Ruth Duerr

Research Scholar, Ronin Institute for Independent Scholarship
avatar for Margaret O'Brien

Margaret O'Brien

Data Specialist, University of California
My academic background is in biological oceanography. Today, I am a data specialist working with the Environmental Data Initiative (EDI) plus ecosystem-level projects conducting primary research, like the LTER network, and a marine Biodiversity Observation Network. My primary data... Read More →
avatar for Shelley Stall

Shelley Stall

Vice President, Open Science Leadership, American Geophysical Union
Shelley Stall is the Vice President of the American Geophysical Union’s Open Science Leadership Program. She works with AGU’s members, their organizations, and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research... Read More →


Tuesday July 20, 2021 11:00am - 12:30pm EDT
TBA

11:00am EDT

OPeNDAP for Data Providers
Cloud systems have raised the visibility of online data and users are becoming (more) comfortable using data that they do not first download as files. OPeNDAP has been building web services to enable access to remote data for over two decades. In this session we will describe how data providers can make their data holdings accessible to tools like Python/Jupyter Notebook, Matlab, and ArcGIS Pro with little or no reformatting. At the same time, we will also show how those same data can be moved to the cloud and accessed both directly, using the zarr API, and using an OPeNDAP server.

This session will also cover different formats that can be served using OPeNDAP. We will show how the Hyrax OPeNDAP server is installed and configured, with a focus on the most important aspects of the service, including best practices for using OPeNDAP servers as a component in a larger data system design. We will also cover demonstrations of various clients accessing the server and discuss built-in interfaces in different clients, so that providers can point users to tutorials they are likely to find useful.

View Notes

Organizers & Speakers
JG

James Gallagher

EED3 Contractor, OPeNDAP
avatar for Patrick Quinn

Patrick Quinn

Software Engineer, Element 84


Tuesday July 20, 2021 11:00am - 12:30pm EDT
TBA

12:30pm EDT

Break
Tuesday July 20, 2021 12:30pm - 1:30pm EDT
TBA

1:00pm EDT

Teacher Workshop: Exploring Earth, Wind and Fire via Earth Science Data
The Earth Science Information Partners (ESIP) Education Committee will host a virtual workshop for 50 educators on Tuesday July 20 and Wednesday, July 21. (1:00 to 5:00pm EDT on both days). ESIP members will share an educational resource and lead participants through an activity using Earth science data to explore phenomena via different types of data. Tools and resources include:
  • The NOAA CrowdMag app,
  • NASA’s Earth System Data Explorer,
  • UNAVCO Velocity Viewer,
  • NOAA CIMSS satellite data activities,
  • NASA SEDAC Hazards Mapper and HazPop App,
  • En-ROADS Climate Decision Model, and
  • The Concord Consortium Wildfire Module, and
  • The “Out 2 Lunch” archive: Earth Science webinar demonstrations of data tools and resources
Participating STEM educators will also be eligible to apply for $500 implementation grants!
What better way to inspire innovation in Earth science data frontiers than training the teachers who educate our youth?

Agenda/Teacher Road Map: https://docs.google.com/document/d/1FGACsWSHPTXS8nEAXaTjpHB_-201xkfYsJDuOkAY9Rc/edit?usp=sharing

Organizers & Speakers
avatar for Shelley Olds

Shelley Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.
avatar for Elizabeth Joyner

Elizabeth Joyner

Community Coordinator, SSAI, Goddard Space Flight Center, NASA
Elizabeth Joyner joined the Earth Science Data Systems (ESDS) Program Communications Team in 2022 as the Community Coordinator and works across the program to promote the use of NASA data and resources with end users. She previously served as the Senior Outreach Coordinator for NASA... Read More →
avatar for Trinity Foreman

Trinity Foreman

Comms Consultant, ICMS LLC
Trinity Foreman supports the educational outreach and social media output of NOAA's NCEI. NCEI hosts and provides public access to one of the most significant archives for environmental data on Earth, and Trinity Foreman works to increase the accessibility of NCEI's data tools and... Read More →
avatar for Tamara Ledley

Tamara Ledley

STEM Consultant & Adjunct Professor, Sustaining Science Consulting & Bentley University
I am interested in moving ESIP forward in broadening the reach of “making data matter” into communities and organizations for whom Earth science data and information is essential to their decision making processes. Much of my work has focused on making Earth and climate science... Read More →
avatar for Carla McAuliffe

Carla McAuliffe

Educational Researcher and Curriculum Developer, TERC
avatar for Margaret Mooney

Margaret Mooney

Education Director, NOAA's Cooperative Institute for Meteorological Satellite Studies
avatar for Robert Downs

Robert Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Becky Reid

Becky Reid

Faculty, Cuesta College
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and 2020.



Tuesday July 20, 2021 1:00pm - 5:00pm EDT
TBA

1:30pm EDT

AI Data Readiness: Designing A Community-Driven Road Map for Data Standards and Tools
As artificial intelligence transforms the scientific discovery for Earth and space sciences, there is an urgent need to ensure that Earth and space science data is ready for AI applications. As a collaborative community that bridges government agencies, academia, private industries, and international initiatives, ESIP is a natural space to advance the development of data standards and tools to support the transformation of Earth and space science data for AI applications. The Data Readiness Cluster is created after 2021 ESIP Winter Meeting aims to steward community effort on the topic of AI ready data, from definition and standards, to tools and capacity building.

As a new cluster, the Data Readiness Cluster invites all members of the Earth and space science community to design a road map to guide the development of data standards and tools for AI data readiness. This session will build on the landscape analysis of data standards, tools, and research that are relevant to AI ready data. The community will co-design a path forward and identify major milestones for developing data standards and tools for AI ready data. We invite individuals and groups who are interested in the topic to join the session and contribute to the design of a road map that guides the cluster activities for the next two years.

Session agenda:
13:30–13:35 Welcom & session overview
13:36–14:00 "Celebrity Interview" - Data Readiness Cluster Workplan & Community Feedback on the workplan
14:01–14:50 Breakout Room (two rounds) - Data AI-readiness Checklist 

Session materials:
- Data Readiness Cluster Problem Statement
- Data Readiness Cluster Workplan
- Sample checklist for data AI-readiness

Relevant sessions during ESIP Summer Meeting 2021

View Notes

Organizers & Speakers
avatar for Tyler Christensen

Tyler Christensen

Data Management Architect, NOAA / NESDIS
avatar for Eric A. Kihn

Eric A. Kihn

Chief, Coasts, Oceans, and Geophysics Division, NOAA National Centers for Environmental Information
avatar for Douglas Rao

Douglas Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Rob Redmon

Rob Redmon

Scientist, NOAA Center for AI
Dr. Rob Redmon is a senior scientist with NOAA's National Centers for Environmental Information (NCEI). He is the Lead for NOAA's Center for Artificial Intelligence (NCAI, noaa.gov/ai), and the Space Weather Follow On (SWFO) Science Center.


Tuesday July 20, 2021 1:30pm - 3:00pm EDT
TBA

1:30pm EDT

Marine Data Cluster’s Controlled Vocabularies Decision Tree Development - Working Session
Bring a specific marine observation from a dataset and tell us how you choose an appropriate controlled vocabulary!

Attendees will help explore and develop a decision tree to help choose which controlled vocabulary/vocabularies a data manager/scientist should use to ensure their marine data follow the FAIR principles. We will be testing out an innovative approach to community source these decisions through a combination of Google Sheets, Python, and a decision tree generator (scikit-learn and dtreeviz). The session will start off with an overview of the effort followed by breakout groups, where participants will draft out their thought process for selecting a controlled vocabulary for a specific marine observation of their choosing. We will then reconvene and start consolidating the information from the breakout groups which will start to populate the marinedata-vocabulary-guidance repository.

 The goals of the session are to use an innovative approach to community source information to:
  • Identify the recommended vocabularies to choose from for marine observations.
  • Identify the entry points into the decision tree.
  • Identify the key questions to ask when selecting a vocabulary.
  • Start a vocabulary guidance for marine data document.

View Notes

Organizers & Speakers
avatar for Mathew Biddle

Mathew Biddle

Physical Scientist, NOAA/NOS/IOOS
avatar for Stephen Diggs

Stephen Diggs

Sr. Reseach Data Specialist, University of California Office of the President
ORCID: 0000-0003-3814-6104https://cchdo.io
avatar for Carolina Berys-Gonzalez

Carolina Berys-Gonzalez

Applications Programmer, CCHDO/SIO/UCSD
avatar for Chris Olson

Chris Olson

Geological Data Center, Scripps Institution of Oceanography


Tuesday July 20, 2021 1:30pm - 3:00pm EDT
TBA

1:30pm EDT

Science-on-Schema.org - Gathering Feedback for ESIP Assembly Endorsement

Slides: https://docs.google.com/presentation/d/1voih_wYRgP9plkbu31WmYAdjeWj7Mu6stR9Mn1JExR0/edit#slide=id.gafa808a5c8_0_39

Across geoinformatics, many initiatives deserve careful consideration and adoption to promote interoperability across our shared missions and projects. Because there are so many specifications for our discipline to evaluate and adopt, documentation and guidelines that streamline this process are highly beneficial. The ESIP “Science on Schema.org” cluster has been developing and testing a set of guidelines that help data repositories and other content providers adopt the schema.org vocabulary for publishing metadata about data resources in their HTML web documents. The goal of these guidelines is to document shared conceptualizations surrounding the description of scientific datasets and their respective data repositories for the purpose of providing consistent, machine actionable metadata using web publishing standards. In doing so, adopters achieve greater discovery of scientific datasets across the web from large scale search providers to local, domain specific metadata aggregators.

At the 2021 ESIP Winter Meeting, the community examined whether an ESIP Assembly Endorsement of Science-on-Schema.org guidelines documentation was useful and applicable. After discussion, it was decided that the Schema.org cluster would submit an updated version of the guidelines plus supporting materials for ESIP endorsement at or soon after the 2021 ESIP Summer Meeting. This session will: 1) present the ESIP endorsement package of the latest Science-on-Schema.org guidelines v.1.3; 2) host presentations from guidelines adopters regarding their experiences; and 3) gather feedback from attendees on ways to improve the submission package.

Guidelines: https://science-on-schema.org 

Endorsement Issues to Resolve:

View Notes

Organizers & Speakers
avatar for Stephen Richard

Stephen Richard

Geoinformatics consultant, personal
Stephen Richard is an independent contractor working from Tucson Arizona. He is currently involved in projects to implement a Geoscience ontology for the Loop3D project, the Technical Team for the EarthCube Office, and applications of geoscience vocabularies in AI applications. Interests... Read More →
avatar for Ruth Duerr

Ruth Duerr

Research Scholar, Ronin Institute for Independent Scholarship
avatar for Matt Jones

Matt Jones

Director of Informatics R&D, NCEAS / DataONE / UC Santa Barbara
DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Cyberinfrastructure
avatar for Mark Schildhauer

Mark Schildhauer

Senior Technology Fellow, NCEAS/UCSB
Data semantics, Ecoinformatics training, Arctic data, LTER data, Ecological synthesis
avatar for Adam Shepherd

Adam Shepherd

Technical Director, BCO-DMO
Architecting adaptive and sustainable data infrastructures.Co-chair of the ESIP schema.org clusterKnowledge Graphs | Data Containerization | Declarative Workflows | Provenance | schema.org
DV

Dave Vieglais

Research Professor, University of Kansas


Tuesday July 20, 2021 1:30pm - 3:00pm EDT
TBA

3:15pm EDT

Coffee Break Networking
Tuesday July 20, 2021 3:15pm - 3:45pm EDT
TBA

4:00pm EDT

Coordinating the triangle between publishers, data repositories, and researchers
Increasing attention is being directed toward the activities surrounding the publication of research data and its connection to its related scholarly publications. Communication and coordination of the full data publication process across the 3 key stakeholders (Publisher, Repository, Researcher) need strengthening in order to address pressing challenges associated with data sharing, management, publication and citation. The ESIP Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) cluster provides the necessary infrastructure and forum to address these challenges (https://copdess.org/).  

This session seeks to engage all stakeholders involved in the multifaceted process of publishing and linking geoscience research data to scholarly publications, in order to  consider the next steps for the COPDESS community. Our agenda includes the opportunity to develop next steps for COPDESS Cluster to further data, samples and software sharing across the global Earth, space, and environmental sciences communities, along with review and status update of current relevant activities, such as the publisher/repository workflow, previously shared in both ESIP and RDA sessions.

We welcome and encourage all stakeholders in the data, samples and software publication landscape to participate.

For the workflow, outcomes from the ESIP 2021 Winter session, two ESIP/RDA Earth, Space, and Environmental Sciences Interest Group’s sessions, and a recent workshop dedicated on this topic will be presented for consideration as we move towards developing a final recommendation.

For the COPDESS Cluster next steps, please bring your suggestions for challenges that you think need attention and which ones need prioritization. We also welcome ideas on which communities you think should be involved and how we can effectively engage community participation and discussion.

View Notes

Organizers & Speakers
BH

Brooks Hanson

American Geophysical Union
avatar for Danie Kinkade

Danie Kinkade

Director, Data Curator, BCO-DMO
avatar for Kerstin Lehnert

Kerstin Lehnert

Doherty Senior Research Scientist, Columbia University, Lamont-Doherty Earth Observatory
Kerstin Lehnert is Doherty Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of the Interdisciplinary Earth Data Alliance that operates EarthChem, the System for Earth Sample Registration, and the Astromaterials Data System. Kerstin... Read More →
avatar for Shelley Stall

Shelley Stall

Vice President, Open Science Leadership, American Geophysical Union
Shelley Stall is the Vice President of the American Geophysical Union’s Open Science Leadership Program. She works with AGU’s members, their organizations, and the broader research community to improve data and digital object practices with the ultimate goal of elevating how research... Read More →
avatar for Lesley Wyborn

Lesley Wyborn

Honorary Professor, Australian National University


Tuesday July 20, 2021 4:00pm - 5:30pm EDT
TBA

4:00pm EDT

Machine-Readable Descriptors for Heterogeneous Tabular Data
Many Earth science observation datasets are inherently tabular in nature: rows and columns of numbers and text providing measurements of particular quantities at specified times and locations. Often these data are plain text files containing comma-separated values (CSV) or other separators. Such files are easy for humans to load into a spreadsheet or Pandas Dataframe, either interactively or using ad-hoc code that understands the structure of a particular file.

Unfortunately, tabular data files are heterogeneous. There are no mandatory standards or schema for important characteristics such as the presence of header rows, the naming and ordering of columns, the units used, and so forth. Even if there were a standard approach, a data archive facility may be obligated to accept data as submitted rather than converting to another format. The end result of this file variety is that human intervention is required to inspect and understand the contents of any new instance; automated data ingestion and verification are not easily done.

To solve this problem, a number of approaches have been proposed for machine-readable descriptors that provide metadata about the syntax and semantics of the rows of data. Examples include the World Wide Web Consortium (W3C) CSV on the Web (CSVW) technical recommendation (which uses JSON format), Table Schema (also in JSON), NOAA ERDDAP's NCCSV and British Atmospheric Data Center's BADC-CSV (both of which use CSV text), CSV YAML (CSVY), NASA Ames Format Specification (text), possibly NcML (XML not for this purpose but perhaps adaptable), and doubtless others. In each case the descriptor is either a separate sidecar file or comprises additional lines of metadata in the data file itself, prior to the actual CSV-style rows of data values.

This session will invite discussion of various approaches and their benefits or limitations including ease of creation, actual machine-readability, level of standardization, availability of tools, and breadth of community adoption.

Agenda:
  • Welcome and overview - Jeff de La Beaujardière/NCAR (15 min)
  • W3C CSV on the Web (CSVW) at Italian Ministry of Transportation - Paolo Starace/SciamLab (15 min)
  • ERDDAP's datasets.xml as a File Description System - Bob Simons/NOAA NMFS (15 min)
  • CSV YAML (CSVY) at ICARUS - Tran Nguyen/UC Davis (15 min)
  • Open discussion (30 min)
View Notes

Organizers & Speakers
avatar for Jeff de La Beaujardiere

Jeff de La Beaujardiere

Director, Information Systems Division, NCAR
I am the Director of the NCAR/CISL Information Systems Division. My focus is on the entire spectrum of geospatial data usability: ensuring that Earth observations and model outputs are open, discoverable, accessible, documented, interoperable, citable, curated for long-term preservation... Read More →
avatar for Bridget Thrasher

Bridget Thrasher

Data Stewardship Coordinator, NCAR
EN

Eric Nienhouse

SE / Product Owner, UCAR
avatar for Bob Simons

Bob Simons

IT Specialist, NMFS Environmental Research Division
I work on ERDDAP, a free and open source data server that gives you a simple, consistent way to download subsets of gridded and tabular scientific datasets in common file formats and make graphs and maps. ERDDAP has been installed and used by more than 70 organizations around the... Read More →
avatar for Paolo Starace

Paolo Starace

Solution Architect & Co-founder, Sciamlab


Tuesday July 20, 2021 4:00pm - 5:30pm EDT
TBA

4:00pm EDT

SWEET Governance and Roadmapping working session
In this session we will openly discuss and strive to achieve consensus on how best to govern SWEET as a longstanding, domain level, semantic web resource. As such, there are several outstanding questions which either need to be addressed or past decisions confirmed and then documented (likely on the SWEET wiki). Examples, though not comprehensive, include:
1. How are SWEET issues or proposals raised?
2. What are the criteria used to evaluate SWEET issues or proposals?
3. Who, or whom, evaluates the issues or proposals?
4. Is there a ‘statute of limitations’ for any such issues or proposals?
5. How does the community arrive at a decision?
6. How is that decision recorded and/or documented for the community?
7. How or what is put in place to help ensure every member is abiding by those decisions?
8. Based on discussion of previous items, is a SWEET manager required or can the community self manage under a more specific set of guidelines?

View Notes

Organizers & Speakers
avatar for Bruce Caron

Bruce Caron

Executive Director, New Media Studio
avatar for Brandon Whitehead

Brandon Whitehead

environmental data scientist, manaaki whenua -- landcare research


Tuesday July 20, 2021 4:00pm - 5:30pm EDT
TBA

5:30pm EDT

Break
Tuesday July 20, 2021 5:30pm - 6:00pm EDT
TBA

6:00pm EDT

Plenary in honor of Dr. Peter Fox: X-informatics - Lessons Learned from Data and Information in Research
Informatics efforts emerged largely in isolation across a number of disciplines. This new discipline, generally cast as the science and engineering of information systems originated in the middle of the last century and has undergone many adaptations and in the last two decades flourishing into discipline-specific fields like geoinformatics, bioinformatics, astroinformatics and more. Recently, certain core elements in informatics have been recognized as applicable across disciplines. Hence, efforts at systematizing the common (or core, i.e. discipline neutral) aspects of informatics have been successful: use cases, human-centered design, iterative approaches, information models and more are some of the key elements. Dr. Peter Fox has been instrumental in convening the Earth Science Informatics community, defining Informatics and Data Science in Earth Sciences, for his vision of “X-informatics” and the evolution of these fields as interdisciplinary research becomes widely accepted, and new challenges arise from the increased attention to a data-intensive approach in general. This includes creating or adapting informatics to address data that are high-dimensional, heterogeneous, sparse or with uncertain quality. We would like to dedicate this session to Dr. Peter Fox, a visionary, champion and an avid explorer of boundaries when it comes to Informatics and its benefits in scientific research. This session will showcase the field of Informatics, its history, current research, visions for the future and the role Dr. Peter Fox has in shaping these ideas and approaches.

Featured presentations:
  • Mineral Informatics: Analytics, Visualization, and the Legacy of Peter Fox (Robert Hazen)
  • X-informatics: making data science down to earth in the real world (Xiaogang (Marshall) Ma)

Organizers & Speakers
avatar for Robert Hazen

Robert Hazen

Senior Scientist, Carnegie Institution for Science
I am a mineralogist who, In 2015, was mesmerized by Peter Fox's vision of data-driven discovery. In the past 6 years, working closely with Peter and his students, we have been attempting to usher in an era of mineral informatics. We have been constructing large data resources and... Read More →
avatar for Xiaogang Ma

Xiaogang Ma

Associate Professor, University of Idaho
Xiaogang (Marshall) Ma is an associate professor of computer science at the University of Idaho. He received his Ph.D. degree of Earth Systems Science and GIScience from University of Twente, Netherlands in 2011, and then completed postdoctoral training of Data Science at Rensselaer... Read More →
avatar for Mark Parsons

Mark Parsons

Research Scientist, University of Alabama in Huntsville
avatar for Susan Shingledecker

Susan Shingledecker

Executive Director, ESIP
Susan is Executive Director or ESIP, Earth Science Information Partners, a global community of Earth science data professionals who come together to find solutions and advance data management to enable and empower the use of data to solve some of our planet's greatest challenges... Read More →


Tuesday July 20, 2021 6:00pm - 7:30pm EDT
TBA
 
Wednesday, July 21
 

11:00am EDT

ESIP Air Quality Cluster Hackathon
Since the ESIP Winter meeting, the Air Quality Cluster has been hard at work developing several use cases for air quality data and tools. We are now ready to try developing some applications, and will be kicking off a competitive application development effort at our Summer Meeting session. Esri, an ESIP member, and the company behind ArcGIS, has generously offered us the use of the ArcGIS Online (AGOL) framework for this hackathon competition, providing accounts and credits to hackathon participants to support development activities for a full month. The objective is to develop AQ data visualization applications that will aid both citizens and local governments in decision making related to Air Quality issues.

Using the AQ Cluster's use case as a guide, we will form three core project teams that will compete to develop the best AQ visualization application. We will conduct preparatory work during the May and June AQC meetings, to help participants pre-register, set up AGOL accounts, and get briefed on using AGOL. At the Summer Meeting session, done remotely, we’ll break into teams. Each team will develop an application concept together an implementation plan. We'll come together at the end of the session, and each team will provide a brief report. At the August 26 AQ Cluster meeting, teams will present their application to a set of ESIP-selected judges with prizes awarded for most effective and most creative. A more complete description of the concept can be found at https://docs.google.com/document/d/19ajXwjepWzaXv0AKIXZqKnEW8nkJ7KmQC3rd1bwVuPk/edit?usp=sharing

View Notes

Organizers & Speakers
avatar for Curt Hammill

Curt Hammill

Senior Account Manager, Esri
Esri Account Manager, responsible for helping NASA implement Geospatial Information Systems (GIS)-based workflows with Commercial-Off-the-Shelf software. Former U.S. Navy Captain, Nuclear Propulsion Engineer. 2011 MS in Geographic and Cartographic Sciences from George Mason University... Read More →
avatar for Beth Huffer

Beth Huffer

Information Systems Engineer, Lingua Logica
ML

Mike Little

CISTO, NASA
Computational Technology to support scientific investigations



Wednesday July 21, 2021 11:00am - 1:30pm EDT
TBA

11:00am EDT

Graph-Based Data Science
This tutorial introduces graph-based data science work, where machine learning approaches can be combined with complementary knowledge graph work. The tutorials leverage a popular library `kglab` – an open source project that integrates RDFlib, OWL-RL, pySHACL, NetworkX, iGraph, pslpython, node2vec, PyVis, and more – to show how to use a wide range of graph-based approaches, blending smoothly into data science workflows, and working efficiently with popular data engineering practices.

Within this space of open source graph libraries in Python, there are several camps: semantic graphs, probabilistic graphs, graph algorithms, graph ML, interactive visualization, etc. Previously these "camps" did not collaborate much and the libraries were difficult to integrate. We'll show how to write brief Python code to build complementary "Hybrid AI" workflows, which is ideal for strategies such as self-supervised learning. All of the training material is available as Jupyter notebooks.

View Notes

Organizers & Speakers
avatar for Paco Nathan

Paco Nathan

Managing Partner, Derwen, Inc.
Known as a "player/coach", with core expertise in data sciencecloud computingnatural languagegraph technologies; ~40 years tech industry experience, ranging from Bell Labs to early-stage start-ups. Advisor for Amplify PartnersRecognaiKUNGFU.AI. Lead committer Py... Read More →


Wednesday July 21, 2021 11:00am - 1:30pm EDT
TBA

11:00am EDT

Identifying technology capabilities that meet wildfire science and practitioner requirements
What.  This session is co-organized by the Agriculture and Climate Cluster and the Semantic Harmonization Cluster (hereby collectively referred to as the “Clusters”).  The PDF poster on ESIP's figshare account gives you the big-picture schematic of how this session relates to data-science topics like AI/ML, semantic technology, graph database technology, etc.

Why.  Environmental risks are increasingly resulting in disasters that cost the taxpayer dearly in terms of lives lost, incurred damages, and future liabilities. A recent study on the comprehensive cost of the 2018 California wildfires estimated damages at $150B and the loss of thousands of lives. In this proposed session, the Clusters will lead transdisciplinary-oriented discussions focused on both science and technology topics for managing such environmental risks. Wildfire data and information should ideally be reusable and repurposable across different fire management phases (e.g. prediction, pre-fire planning, during fire, after-fire, recovery). For example, infrastructure that is vulnerable to wildfire-induced floods identified during the active-fight fighting phase should be easily discoverable to city managers weeks or even months later, when heavy rains on burn areas may trigger catastrophic debris-flow that threaten lives.  Features (e.g. buildings, vegetation patches, ridgelines, etc) identified by AI/ML algorithms from UAS imagery data that are used for mitigation planning should be made discoverable for fire managers making tactical fire-fighting decisions.

How.  The proposed session addresses the following question: how can we apply data and knowledge management technologies to fulfill the needs of wildfire mitigation and response? 

In this session, you will be invited to contribute your expertise to sketch out technical solutions that can be deployed to meet the speakers' stated needs.  Your ideas will be openly accessible to individuals who may use those ideas to apply for ESIP Lab and ESIP FUNding Friday projects.

Agenda
  • [11 am] Workshop begins
  • Introduction
    • Big-picture schematic of how this session relates to data-science topics like AI/ML, semantic technology, graph database, etc.
  • Slido poll: Which of the following wildfire experiences apply to you?
  • [11:10 am] Wildfire problem statement, requirements, and some focus on planning by polygon
    • Everett Hinkley, US Forest Service, Geospatial Management Office National Remote Sensing Program Manager
      • Wildfire Mapping--Leveraging AI/ML for needed improvements: Faster delivery, improved consistency, reduced subjectivity
    • Dave Zader, Wildland Fire Administrator for The City of Boulder, CO Fire Department (retired); Wildlife Fire Policy Committee member for the International Association of Fire Chiefs
      • Wildfire management and planning by polygon, a tool for improved decision-making and resources usage
    • Pier Buttigieg, Helmholtz Metadata Collaboration
      • Representing and aligning knowledge about wildfires - the need and challenge of semantic harmonization
  • [12:05 pm] Slido poll: Rank the following values-at-risk that are important to *YOUR* community: from most important (rank #1) to least important (rank #6)
  • [12:10 pm] Breakouts Part 1
    • Breakout group #1: Knowledge representation for wildfire planning and execution (Focus on Polygons)
    • Breakout group #2: Technological solutions for wildfire planning and execution
  • Short break / transition (10 min)
  • [~12:45 pm] Breakouts Part 2
    • Breakout group #1: Knowledge representation for wildfire planning and execution (Focus on Values-at-Risk)
    • Breakout group #2: Technological solutions for Wildfire Planning and Execution
  • [1:10 pm] Report out from breakout groups
  • [1:20 pm] Wrap up
  • [1:30 pm] Workshop ends

View Notes

Organizers & Speakers
avatar for Brian Wee

Brian Wee

Founder and Managing Director, Massive Connections, LLC
Transdisciplinary scientist invested in the use of environmental data and information for science, education, and decision-making for challenges at the nexus of global environmental change, natural resources, and society. Strategized and executed initiatives to engage the US Congress... Read More →
avatar for Bill Teng

Bill Teng

NASA GES DISC (ADNET)
avatar for Ruth Duerr

Ruth Duerr

Research Scholar, Ronin Institute for Independent Scholarship
PL

Pier Luigi Buttigieg

Senior Data Scientist, Helmholtz Metadata Collaboration
avatar for Everett Hinckley

Everett Hinckley

Geospatial Management Office National Remote Sensing Program Manager, US Forest Service
avatar for Dave Zader

Dave Zader

International Association of Fire Chiefs


Wednesday July 21, 2021 11:00am - 1:30pm EDT
TBA

11:00am EDT

The Saga Continues: Cloud-Optimized Data Formats
Open science is the ability to share and reproduce analysis without sharing a computer. We recognize users have limited resources, such as network bandwidth and memory, and often this prevents them from thinking outside the box when it comes to scaling and sharing science. Open science presents a clear need to standardize on and deliver more cloud-friendly data formats and services. During this session, we highlight advances in cloud-friendly data and services and strive to answer some ongoing research in how these formats and services will support new scales of science and do so openly.

Cloud-friendly data formats and services are central to delivering new innovation in Earth science. With cloud-optimized data formats and services, Earth scientists can achieve new scales of analyses and deliver reproducible research output and information products.
The conversation about data formats is not one that will be “closed” with a decision on “one format to rule them all”. We propose a session centered around discussions which surface new advances in data formats and standards which specifically support sharing and scaling science on the cloud. Many call these formats “cloud-friendly” and “cloud-optimized” formats, respectively.

Putting data on the cloud in cloud-friendly formats is a starting point. Necessary to the utility of this data is the metadata, tools and services which support users accessing these datasets. There have been new advances in cloud-friendly services as well, however there is a lot of room for improvement. During this session, we focus not just on the data formats themselves, but on the usability of those formats made possible by the support system around using them.

Agenda (150 minutes):

Part 1: Lightning Talks - Provide a "lay of the land" and fodder for discussion:
  • Aimee Barciauskas, Welcome to this session: What do we mean by cloud-optimized and why does it matter?
  • 60 minutes of 7-10 minute lightning talks
    • Trevor Skaggs, Element84, will speak on Entwine Point Tile store generated for ATL06
    • Joe Roberts, NASA JPL, will speak about the Metadata Raster Format (MRF) and how it supports the features of NASA GIBS
    • Charles Stern will talk about motivations and progress made on pangeo-forge
    • Stavros Papadopoulos, creator of TileDB, will present "Time to depart from file formats and focus on engines and APIs"
    • Steve Olson  and Shane Mill will talk about NOAA's EDR API and how it enables programmatic access to both conventional and cloud-optimized data formats
    • Aaron Friesz will talk about the platform NASA LPDAAC has built to leverage cloud-optimized data formats.
  • 15 minutes: organize into 3-4 sub groups for continuing the conversation on a specific topic or presentation.
15 minutes: Break 

Part 2: Small Group Discussions
- Attendees and speakers will use this time to dive into discussions, questions and expertise on a sub-topic or specific question.
  • 30 minutes: Small groups meet in a virtual sub-space. A session organizer or ESIP coordinator will meet with each small group to facilitate conversation and take notes.
  • 25 minutes: Small groups present back to the larger group what was discussed
  • 5 minutes: Wrap-up

View Notes

Organizers & Speakers
avatar for Aimee Barciauskas

Aimee Barciauskas

Data Engineer, Development Seed
avatar for Rich Signell

Rich Signell

Research Oceanographer, USGS
avatar for Robert Casey

Robert Casey

Deputy Director of Cyberinfrastructure, IRIS Data Services
Rob currently serves as Deputy Director of Cyberinfrastructure at the Incorporated Research Institutions for Seismology (IRIS) Data Management Center (DMC) in Seattle, WA. His responsibilities include management of software development and data services activities as well as leading... Read More →
AF

Aaron Friesz

LP DAAC/USGS
avatar for Steve Olson

Steve Olson

NOAA
I work for the National Weather Service (NWS) Meteorological Development Laboratory (MDL).  MDL conducts applied research and development for the improvement of diagnostic and prognostic weather information; data depiction and utilization; warning and forecast product preparation... Read More →
avatar for Shane Mill

Shane Mill

Senior Web Developer, NOAA
Shane Mill has been an Application Developer within the Weather Information and Applications Division of the Meteorological Development Lab of the National Weather Service since September of 2018. Since joining MDL, Shane has prototyped ways that existing standards can enhance operational... Read More →
JR

Joe Roberts

Science Data Visualization, Technical Lead, NASA JPL
avatar for Trevor Skaggs

Trevor Skaggs

Element 84
CS

Charles Stern

Data Infrastructure Engineer, Lamont-Doherty Earth Observatory


Wednesday July 21, 2021 11:00am - 1:30pm EDT
TBA

1:00pm EDT

Teacher Workshop: Exploring Earth, Wind and Fire via Earth Science Data
The Earth Science Information Partners (ESIP) Education Committee will host a virtual workshop for 50 educators on Tuesday July 20 and Wednesday, July 21. (1:00 to 5:00pm EDT on both days). ESIP members will share an educational resource and lead particiapnts through an activity using Earth science data to explore phenomena via different types of data. Tools and resources include:
  • The NOAA CrowdMag app,
  • NASA’s Earth System Data Explorer,
  • UNAVCO Velocity Viewer,
  • NOAA CIMSS satellite data activities,
  • NASA SEDAC Hazards Mapper and HazPop App,
  • En-ROADS Climate Decision Model, and
  • The Concord Consortium Wildfire Module, and
  • The “Out 2 Lunch” archive: Earth Science webinar demonstrations of data tools and resources
Participating STEM educators will also be eligible to apply for $500 implementation grants!
What better way to inspire innovation in Earth science data frontiers than training the teachers who educate our youth?

Agenda/Teacher Road Map: https://docs.google.com/document/d/1FGACsWSHPTXS8nEAXaTjpHB_-201xkfYsJDuOkAY9Rc/edit?usp=sharing

Organizers & Speakers
avatar for Shelley Olds

Shelley Olds

Science Education Specialist, UNAVCO
Data visualization tools, Earth science education, human dimensions of natural hazards, disaster risk reduction (DRR), resilience building.
avatar for Elizabeth Joyner

Elizabeth Joyner

Community Coordinator, SSAI, Goddard Space Flight Center, NASA
Elizabeth Joyner joined the Earth Science Data Systems (ESDS) Program Communications Team in 2022 as the Community Coordinator and works across the program to promote the use of NASA data and resources with end users. She previously served as the Senior Outreach Coordinator for NASA... Read More →
avatar for Trinity Foreman

Trinity Foreman

Comms Consultant, ICMS LLC
Trinity Foreman supports the educational outreach and social media output of NOAA's NCEI. NCEI hosts and provides public access to one of the most significant archives for environmental data on Earth, and Trinity Foreman works to increase the accessibility of NCEI's data tools and... Read More →
avatar for Tamara Ledley

Tamara Ledley

STEM Consultant & Adjunct Professor, Sustaining Science Consulting & Bentley University
I am interested in moving ESIP forward in broadening the reach of “making data matter” into communities and organizations for whom Earth science data and information is essential to their decision making processes. Much of my work has focused on making Earth and climate science... Read More →
avatar for Carla McAuliffe

Carla McAuliffe

Educational Researcher and Curriculum Developer, TERC
avatar for Margaret Mooney

Margaret Mooney

Education Director, NOAA's Cooperative Institute for Meteorological Satellite Studies
avatar for Robert Downs

Robert Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Becky Reid

Becky Reid

Faculty, Cuesta College
I discovered ESIP in the summer of 2009 when I was teaching science in Santa Barbara and attended the Summer meeting there. Ever since then, I have been volunteering with the ESIP Education Committee in various capacities, serving as Chair in 2013, 2019, and 2020.



Wednesday July 21, 2021 1:00pm - 5:00pm EDT
TBA

1:30pm EDT

Break
Wednesday July 21, 2021 1:30pm - 2:30pm EDT
TBA

2:30pm EDT

ESIP Air Quality Cluster Hackathon (Continued)
Since the ESIP Winter meeting, the Air Quality Cluster has been hard at work developing several use cases for air quality data and tools. We are now ready to try developing some applications, and will be kicking off a competitive application development effort at our Summer Meeting session. Esri, an ESIP member, and the company behind ArcGIS, has generously offered us the use of the ArcGIS Online (AGOL) framework for this hackathon competition, providing accounts and credits to hackathon participants to support development activities for a full month. The objective is to develop AQ data visualization applications that will aid both citizens and local governments in decision making related to Air Quality issues.

Using the AQ Cluster's use case as a guide, we will form three core project teams that will compete to develop the best AQ visualization application. We will conduct preparatory work during the May and June AQC meetings, to help participants pre-register, set up AGOL accounts, and get briefed on using AGOL. At the Summer Meeting session, done remotely, we’ll break into teams. Each team will develop an application concept together an implementation plan. We'll come together at the end of the session, and each team will provide a brief report. At the August 26 AQ Cluster meeting, teams will present their application to a set of ESIP-selected judges with prizes awarded for most effective and most creative. A more complete description of the concept can be found at https://docs.google.com/document/d/19ajXwjepWzaXv0AKIXZqKnEW8nkJ7KmQC3rd1bwVuPk/edit?usp=sharing

View Notes

Organizers & Speakers
avatar for Curt Hammill

Curt Hammill

Senior Account Manager, Esri
Esri Account Manager, responsible for helping NASA implement Geospatial Information Systems (GIS)-based workflows with Commercial-Off-the-Shelf software. Former U.S. Navy Captain, Nuclear Propulsion Engineer. 2011 MS in Geographic and Cartographic Sciences from George Mason University... Read More →
avatar for Beth Huffer

Beth Huffer

Information Systems Engineer, Lingua Logica
ML

Mike Little

CISTO, NASA
Computational Technology to support scientific investigations


Wednesday July 21, 2021 2:30pm - 5:00pm EDT
TBA

2:30pm EDT

Community Data Cluster Curate-a-thon
In this working session, we will work together to curate some obscure-but-public data about Flint water quality for preparation for deposit in a suitable repository. This data is public but obscure, and currently stored in a series of spreadsheets. Our curation efforts will involve creating metadata, cleaning the datasets, and making them easier to understand. Our hope is to deposit this data in Open Data Flint for easier access by journalists and the general public, but other repositories are an option. https://www.icpsr.umich.edu/web/pages/odf/index.html

View Notes

Organizers & Speakers
avatar for Stephen Diggs

Stephen Diggs

Sr. Reseach Data Specialist, University of California Office of the President
ORCID: 0000-0003-3814-6104https://cchdo.io
avatar for Andrea Thomer

Andrea Thomer

Assistant Professor, University of Michigan School of Information
I'm an information scientist interested in biodiversity and earth science informatics, natural history museum data, data curation, information organization, and computer-supported cooperative work! 


Wednesday July 21, 2021 2:30pm - 5:00pm EDT
TBA

2:30pm EDT

New Frontiers in AI for Earth and Space: Big Data and Parallel Computing
AI is lauded as a powerful tool for gaining insights and producing knowledge from the massive datasets we have access to today in the Earth sciences. One of the major challenges of integrating AI practices in the Earth and Space Sciences is the immense size of environmental and climate data. Intensive computational power is required for AI to efficiently learn from such massive amounts of data. The key question here, then, is what are the best strategies to make AI work and what kind of infrastructural constraints does the community face as a result? There are many parallel computing frameworks, e.g., GPU, Dask, Spark, Hadoop, CUDA, JobLib, ipyparallel, dispy, Ray, etc to assist with this challenge today. But which one is suitable for different use cases in Earth and Space sciences? On various deployment platforms such as HPC, Azure, AWS, GCP, institutional clusters, individual servers, or even personal computers, what is the best way to configure the environment for carrying out AI tasks on large spatial datasets?

This series consist of two sessions. The first session will invite speakers with experiences implementing AI at scale to share and communicate with the ESIP community working with parallel computing. We will accumulate a series of key strategies these speakers have used to move our research forward on AI4Earth&Space. In the second session, we will conduct a thorough step-by-step tutorial from environment setup (e.g., Dask-ML) to train/test AI using parallel computing on large datasets to equip the Earth and Space science community with some hands-on experiences.

Session 1: Talks (2.30 - 3.30pm)

1. Tom Augspurger, Microsoft
Title: Scalable Geospatial Analysis
Working with geospatial data can be challenging, regardless of the scale. We'll see how Microsoft's Planetary Computer is using STAC and Dask to facility large-scale geospatial data analysis. We'll use the Planetary Computer's STAC catalog to find the data matching some conditions, and a Dask cluster to process the data in parallel.

2. James Bednar, Director of Technical Consulting, Anaconda, Inc.
Title: How reproducible do you want your code to be?
Unless your simulation or analysis is reproducible, you can't be sure your results mean anything. But how reproducible does it need to be, across hardware, software environments, people, organizations, and time? I'll present a quick overview of the levels to choose from, along with a suggested way to achieve each one using Conda environments with Python.

3. Ryan McGranaghan, Data Scientist/Aerospace Engineering Scientist, ASTRA LLC
Title:
A survey of Cloud solutions for the Earth and Space Sciences
The Cloud has the potential to transform the way we collaborate and share science and to push the boundaries of what is possible with scientific computing. Cloud-based data science platforms are now being used to address challenges in the field of AI. Indeed, the Earth and Space Sciences are in an intense period of experimentation applying these platforms to more capably use AI for prediction and discovery. We will explore selected existing Cloud-based environments for the Earth and Space Sciences, particularly for the myriad components of the AI project lifecycle. We will use the survey of solutions to emerge the gaps and trends in this rapidly evolving landscape.

4. Ziheng Sun, Research Assistant Professor, George Mason University
Title: ESIP Geoweaver Update, Machine Learning Cluster Activity Overview & Future Plan
The automation of full stack workflow has become viral since the Earth data volume expontionally increases and the complexity of Earth system models and algorithms gets more difficult to manage and faciliate. The latest development in AI/ML technique brings a lot of new opportunities to significantly improve the accuracy, increase the model resilience and intelligence, and reduce the overall cost. However, managing and automating AI experiments is a grand challenge for the entire Earth science community. Geoweaver is a software developed to tackle this problem. We will show how to use Geoweaver to create AI workflow in one place and run the processes on various distributed platforms, separate code from computing resources for resilience, record the provenance of every workflow execution, and share and reuse workflows to boost knowledge accumulation and discovery. 

5. Cindy Lin, Postdoctoral Fellow, Cornell University 
Title: AI Ethics in Context
It has been broadly established by computer scientists working on AI in the environmental sciences that physical and computer science researchers pay more attention to the performance of AI-based models and less to how end users trust AI models (McGovern 2020). Accordingly, a lot of what makes an AI model usable depends on its trustworthiness; what is considered trustworthy may differ according to the needs of end user groups such as private industry and government. In this talk, I will discuss how a conundrum of political and socioeconomic factors, apart from the needs of end users, enable the establishment of AI trustworthiness in Indonesia. In particular, I provide an ethnographic account of a public-private partnership between an American IT firm and one of Indonesia’s leading engineering agency where new AI technologies are developed to address one of the world’s largest environmental concerns: tropical peatland fires.

Session 2: Demos (3.45 - 5.00pm)
1. Tom Augspurger, Microsoft
Demo Title: Scalable Geospatial Machine Learning with Dask and STAC
Abstract: In this workshop, attendees will work through several exercises to train a deep learning model to predict crop types using satellite imagery. We’ll work on a JupyterHub deployed to Azure, and will access data from Microsoft’s Planetary Computer’s data catalog

Preparation: Attendees do not need to prepare anything ahead of time. They will be provided with credentials to log into a JupyterHub during the session. 

The materials will all be at https://github.com/TomAugspurger/esip-summer-2021-geospatial-ml

2. James Bednar, Director of Technical Consulting, Anaconda, Inc.

Demo title: Using hvPlot for interactive plotting of Xarray, Pandas, and Dask data in Jupyter
Xarray and Pandas support calling .plot() to get basic matplotlib plots, and here we'll show you how to use the same commands to explore even the largest cloud or remote datasets fully interactively. hvPlot makes it easy to get small multiples, overlays, layouts, and categorical plots, with dynamic regridding of large datasets so that you can explore them in any browser. New hvPlot features now also let you replace just about any number or string in an xarray or pandas method or expression with a widget, so that you can quickly try out the effect of various parameters or dynamically filter your data to help you understand it.

Preparation: Please follow the installation instructions at https://holoviz.org/installation.html

3. Ziheng Sun, Research Assistant Professor...

View Notes

Organizers & Speakers
avatar for James Bednar

James Bednar

Director of Technical Consulting, Anaconda, Inc.
I work on HoloViz.org and PyViz.org, and am happy to chat about anything to do with visualizing data in Python.
avatar for Tom Augspurger

Tom Augspurger

Microsoft
Tom is a software engineer working at Microsoft on the Planetary Computer and is a member of the Pangeo Steering Council. Tom helps maintain several open-source libraries in the scientific Python ecosystem, including pandas and Dask.
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Julien Chastang

Julien Chastang

Software Engineer, UCAR - Unidata
Scientific software developer at UCAR-Unidata.
avatar for Douglas Rao

Douglas Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and machine learning in atmospheric and agricultural sciences.
avatar for Ryan McGranaghan

Ryan McGranaghan

Data Scientist/Aerospace Engineering Scientist, ASTRA LLC
Space scientist, engineer, data scientist, designer, podcast host. Observer of beauty in liminal spaces. I believe in being led around by your curiosity.
avatar for Cindy Lin

Cindy Lin

Postdoctoral Fellow, Cornell University
Cindy Lin is a Postdoctoral Fellow at the Atkinson Center for Sustainability, affiliated with the Department of Information Science. In Fall 2022, she will be an assistant professor at Pennsylvania State University’s College of Information Sciences and Technology. Her current research... Read More →



Wednesday July 21, 2021 2:30pm - 5:00pm EDT
TBA

2:30pm EDT

Would This Work for You? Help Us Finalize the Look & Feel of the Innovative Resource Assessment Capability of the Data Management Training Clearinghouse
The Institute of Museum & Library Services (IMLS) National Leadership Grant funded and ESIP supported Data Management Training Clearinghouse (DMTC) has added a new Resource Assessment capability to the DMTC’s resource description pages! The Resource Assessment capability is an innovative addition to the DMTC as very few, if any existing registries offer the option for a user of an educational resource (whether a learner or a trainer/instructor) to provide feedback to the resource creator or other potential users. The user assessments will provide feedback on the user’s reaction to the resource and the perceived impact of its use upon their data skills capacity. Before finalizing the design and navigation of this capability, however, the DMTC team and Advisory Board members would like to test the usability of the pages where the Assessment option appears in the new DMTC user interface (e.g., resource description pages). During this working session, ESIP Community members are invited to try out the pages using structured usability testing procedures. Session attendees will have options for performing moderated tasks, documenting the paths taken to fulfill specific needs, and commenting upon the utility, design, and look and feel of the pages. The feedback offered by the session attendees will significantly inform final tweaks to the design of the DMTC (and be GREATLY appreciated) as we close out our IMLS funded project and envision continued development and maintenance of the platform moving forward.

View Notes

Organizers & Speakers
avatar for Karl Benedict

Karl Benedict

Director of Research Data Services and IT Services, University of New Mexico, University Libraries
Since 1986 I have had parallel careers in Information Technology, Data Management and Analysis, and Archaeology. Since 1993 when I arrived at UNM I have worked as a Graduate Student in Anthropology, Research Scientist, Research Faculty, Applied Research Center Director, and currently... Read More →
avatar for Nancy Hoebelheinrich

Nancy Hoebelheinrich

Principal/Information Analyst, Knowledge Motifs LLC
See my LinkedIn profile at: https://www.linkedin.com/in/nancy-hoebelheinrich-0576ba3
SH

Sophie Hou

Data & Usability Analyst, Apogee Engineering/USGS
user-centered design (UI/UX) and data management/curation/stewardship: including but not limited to data life cycle, policies, sustainability, education and training, data quality, and trusted repositories.


Wednesday July 21, 2021 2:30pm - 5:00pm EDT
TBA

5:00pm EDT

Break
Wednesday July 21, 2021 5:00pm - 6:00pm EDT
TBA

6:00pm EDT

Research Showcase
The Research Showcase features virtual posters and recorded demos and tutorials from ESIP Meeting Attendees. You can view all contributions at any time during the meeting and leave questions or comments for the contributors. LIVE Q&A: We encourage you to attend the live Research Showcase Session (Wednesday July 21st, 6-7:30 pm ET) to chat live with contributors.

Check out the List of Presentations.

Wednesday July 21, 2021 6:00pm - 7:30pm EDT
TBA
 
Thursday, July 22
 

11:00am EDT

Best Practices for Reusability of Machine Learning Models: Guideline and Specification
Machine Learning (ML) is the frontier in revolutionizing how we conduct research across all Earth Science disciplines. ML techniques including deep learning are increasingly being applied to problems in Earth science for classification, regression, or clustering applications. Development of ML models involves choices of model architecture, training data, hyperparameters, training techniques and so on. These steps complicate reproducibility of ML model results unless the developer has shared a detailed description and code of their application. In addition, many ML model development efforts rely on existing modeling architectures and “pre-trained” models on benchmark data. Therefore, it is necessary to develop a set of guidelines for researchers and developers in the Earth science community to follow for publishing their models to ensure they are reproducible and reusable. This would also require a model metadata specification so enable cataloging and discoverability of models.

In this session speakers will present the latest developments for FAIR principles as it related to ML models, the activities of NASA ESDS WG on model reusability, and introduce Geospatial ML Model Catalog (GMLMC) to get the participants familiar with these efforts, and engage them to advance the guideline and specification. In the second part of the session, participants will engage in a set of breakout sessions and discuss questions related to FAIR principles and reusability of ML models. 

Agenda:
11:00 am - Welcome, Introduction & logistics
11:05 am - The road toward defining FAIR for Machine Learning, Fotis Psomopoulos 
11:15 am - ML Model Reusability ESDWG, Sanjay Purushotham, Hamed Alemohammad
11:25 am - Geospatial Machine Learning Model Catalog (GMLMC), Jon Duckworth
11:35 am - Breakout Sessions
Q&A for breakout sessions:
  1. How should FAIR be applied to ML? What changes/definitions are needed? Should this address only ML models, and/or also processes, and/or also platforms, etc.? 
  2. How much do you reuse your ML or other models? What are the good practices you follow for model reusability and/or reproducibility?
  3. What tools and specifications are needed to study if model reusability is feasible?  
  4. What metadata related to model metrics should be included in the GMLMC to communicate performance and possible bias?
  5. What properties should be considered as metadata to capture requirements of the training fragment in GMLMC? Should there be different requirements for reproducing the model results vs reusing the model?
12:10 pm - Break out report out and Q&A
12:30 pm - Closing remarks

View Notes

Organizers & Speakers
avatar for Hamed Alemohammad

Hamed Alemohammad

Executive Director and Chief Data Scientist, Radiant Earth Foundation
avatar for Jon Duckworth

Jon Duckworth

Tech Lead & Geospatial Software Engineer, Radiant Earth Foundation
Jon Duckworth is passionate about making geospatial data and tools accessible to people and organizations who are working to make the world a better place. He has extensive experience building scalable data pipelines to support machine learning on satellite imagery and bringing geospatial... Read More →
avatar for Sanjay Purushotham

Sanjay Purushotham

University of Maryland Baltimore County
avatar for Fotis Psomopoulos

Fotis Psomopoulos

Researcher, Institute of Applied Biosciences, Centre for Research and Technology Hellas


Thursday July 22, 2021 11:00am - 12:30pm EDT
TBA

11:00am EDT

Foraging for Dataset-Usage Relationships
Over the last year, the Discovery Cluster has been developing an innovative search paradigm called Usage-Based Discovery (UBD). UBD allows users to examine the datasets used in applications and research similar to the user’s own purpose. The database underpinning UBD needs a robust population of dataset-usage relationships. Please join us in providing relationships in real-time that will serve as a core population. These high quality relationships will also serve as training data for further machine-learning-based harvesting. Training, examples, and coaching will be provided to participants.

View Notes

Organizers & Speakers
avatar for Christopher Lynnes

Christopher Lynnes

Researcher, Self
Christopher Lynnes recently retired from NASA as System Architect for NASA’s Earth Observing System Data and Information System, known as EOSDIS. He worked on EOSDIS for 30 years, over which time he has worked multiple generations of data archive systems, search engines and interfaces... Read More →
avatar for Doug Newman

Doug Newman

Data Systems Deputy Technical Manager, NASA ESDIS


Thursday July 22, 2021 11:00am - 12:30pm EDT
TBA

11:00am EDT

Physical Samples Cluster Working Session
This session is a working session of the Physical Samples Curation Cluster. During this session, we will report on related activities in other ESIP Clusters, updates from the relevant external Sample Communities (iSamples, SESAR, ESS DIVE, RDA, IMLGS, etc.) and continue efforts from the Cluster monthly meetings.

To continue efforts from the monthly meetings, we will have focused discussions on three topics. First, identified at the 2021 ESIP Winter Meeting, the development of recommendations for samples for journals and publishers. Second, we will review the landscape of ongoing efforts related to sample metadata (including top level categorizations, controlled vocabularies, and registration workflows). Finally, we will provide space for discussing leading practices and community questions (e.g. recommendations for identifiers and barcodes to streamline sample analysis and publications workflows).

The Physical Samples Curation Cluster is a forum for the community supporting physical samples in the earth, space, and environmental sciences which includes but is not limited to geological and biological samples. The cluster’s goal is to enhance discoverability, access, and use of sample collections.

Meeting notes - https://bit.ly/36VARj7
Session slides - https://bit.ly/3zbz4Cx


Agenda
  • Introduction (5 minutes)
  • Presentations on related activities (30 minutes)
    • iSamples and Sampling Nature 
    • RDA  Physical Samples and Collections in the Research Data Ecosystem IG
    • Research Artifact Citation Cluster update
    • COPDESS Cluster update
    • Other
  • Breakout activity (45 minutes)
    • Recap/introduction to breakout activity (5 minutes)
    • Breakout group discussions (30 minutes)
    • Group reports (10 minutes)
  • Closing/Wrap Up (10 minutes)

Organizers & Speakers
SR

Sarah Ramdeen

Data Curator, Columbia University
avatar for Joan Damerow

Joan Damerow

Research Scientist, Lawrence Berkeley National Lab
avatar for Val Stanley

Val Stanley

Antarctic Core Curator, Oregon State University


Thursday July 22, 2021 11:00am - 12:30pm EDT
TBA

12:30pm EDT

Break
Thursday July 22, 2021 12:30pm - 1:30pm EDT
TBA

1:30pm EDT

Plenary: Frontiers of Exploration & Data Management on Mars
  • Frontiers in Mars Exploration (Hiro Ono)
  • Mars Sample Return Tube Documentation – or, There and Back Again (Sara Bond)

Organizers & Speakers
avatar for Masahiro (Hiro) Ono

Masahiro (Hiro) Ono

Group Lead, Robotic Surface Mobility Group, NASA Jet Propulsion Laboratory
Masahiro (Hiro) Ono is the Group Leader of the Robotic Surface Mobility Group. His broad interest is centered around the application of autonomy for space missions, with an emphasis on enhancing the safety, efficiency, and performance of robotic mobility though the applications... Read More →
avatar for Sara Bond

Sara Bond

Information Science Specialist, NASA Jet Propulsion Laboratory
avatar for Susan Shingledecker

Susan Shingledecker

Executive Director, ESIP
Susan is Executive Director or ESIP, Earth Science Information Partners, a global community of Earth science data professionals who come together to find solutions and advance data management to enable and empower the use of data to solve some of our planet's greatest challenges... Read More →


Thursday July 22, 2021 1:30pm - 3:00pm EDT
TBA

3:15pm EDT

Coffee Break Networking
Thursday July 22, 2021 3:15pm - 3:45pm EDT
TBA

3:15pm EDT

FUNding Friday Idea Sharing & Working Session
Join us HERE to ask questions, share your ideas, and get to work on your poster!  



Thursday July 22, 2021 3:15pm - 3:45pm EDT
TBA

4:00pm EDT

AI Data Readiness: What Does ML Training Data Interoperability Mean to You? Examples and Use Cases
The results from machine learning are only as good as their training data. At the same time, it’s difficult and time consuming to develop quality training data. It would be valuable if we could reuse training data in other contexts. What is necessary to make that happen? 

In this session we explore several examples of preparing and sharing ML training data and then explore whether there are certain attributes or processes that we can standardize in order to make trains data more interoperable.

Presentations:
David Roy, Univ. Mich., on the reuse of burned area data from Landsat
Gabriel Tseng, Univ MD., on the collection and later sharing of training data on agricultural conditions
Christian Schroeder de Witt, Univ. Oxford, on a benchmark data set for precipitation prediction

Community exercise to refine most effective ways to enhance ML training data reusability and readiness.

View Notes

Organizers & Speakers
avatar for Mark Parsons

Mark Parsons

Research Scientist, University of Alabama in Huntsville
AJ

Aleksandar Jelenek

The HDF Group
avatar for Tyler Christensen

Tyler Christensen

Data Management Architect, NOAA / NESDIS
avatar for Douglas Rao

Douglas Rao

Research Scientist, CISESS/NCICS/NCSU
I am currently a Research Scientist at North Carolina Institute for Climate Studies, affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating in situ measurements... Read More →


Thursday July 22, 2021 4:00pm - 5:30pm EDT
TBA

4:00pm EDT

Dynamics Soil Information Systems: where we are, where we need to go, and why
Soil observations are as old as agriculture and even more relevant under a carbon-dioxide driven climate. With the promise of new AI and machine learning methods, accurate and timely data becomes even more valuable for scientific insight and data-driven policy work. Yet there remains substantial barriers to soil data discovery, access, integration, and reuse. Many of these challenges are driven by the diversity in measurements, methods, and scales inherent in soils. In this session we will hear about current efforts to address these challenges. From new ontologies and semantic tools, to data formatting, to what data measurements are poised to drive novel insights, this session will focus on efforts in the US and around the world to create a dynamic soil information system for the 21st century.

View Notes

Organizers & Speakers
avatar for Kathe Todd-Brown

Kathe Todd-Brown

Assistant Professor, University of Florida
I\\'m a computational biogeochemist who uses data and mathematics to study how dirt breaths.
LD

Luís de Sousa

Federal University of Rio Grande do Sul



Thursday July 22, 2021 4:00pm - 5:30pm EDT
TBA

4:00pm EDT

GeoScience Ontology Landscape
Community adoption of some representation of geoscience concepts and relationships is an important step towards streamlining data integration and interoperability of geoscience data across domains. This session is intended to understand the requirements and applications that motivate several geoscience-related ontologies, and to promote conversation on the differences and similarities between them. The session will include overview presentations of some current geoscience and related ontologies: 
1)      OntoGeonous ontology, (Lombardo et al. 2018); an implementation of the GeoSciML conceptual model, with application to geologic maps.
2)      GeoCore ontology, (Garcia, et al.,  2020   https://doi.org/10.1016/j.cageo.2019.104387  ). Geoscience ontology, applications area in petroleum exploration.
3)      GeoScience Ontology; https://github.com/Loop3D/GKM , Developed for Loop3D project to provide background knowledge support for implicit generation of 3D geologic models. (Brodaric, Richard, GSC OFR https://doi.org/10.4095/328296 )
4)   SWEET and  ENVO; these are two widely used ontologies with broad scope; SWEET is managed by and ESIP Cluster, and efforts have been under way to update this ontology and align with ENVO.

The major discussion point is whether the use cases that have motivated these ontologies require different solutions, or if some convergence is possible into a more integrated and harmonized set of shared knowledge representations can be developed to promote interoperability.

View Notes

Organizers & Speakers
avatar for Stephen Richard

Stephen Richard

Geoinformatics consultant, personal
Stephen Richard is an independent contractor working from Tucson Arizona. He is currently involved in projects to implement a Geoscience ontology for the Loop3D project, the Technical Team for the EarthCube Office, and applications of geoscience vocabularies in AI applications. Interests... Read More →
avatar for Brandon Whitehead

Brandon Whitehead

environmental data scientist, manaaki whenua -- landcare research
avatar for Luan Fonseca Garcia

Luan Fonseca Garcia

Researcher, UFRGS
I'm a computer scientist focused on the development of ontologies for geosciences.
AM

Alizia Mantovani

Consiglio Nazionale delle Ricerche -IGG



Thursday July 22, 2021 4:00pm - 5:30pm EDT
TBA

5:30pm EDT

Break
Thursday July 22, 2021 5:30pm - 6:00pm EDT
TBA

6:00pm EDT

Toward Improving Representation of Data Quality Information
Session Agenda:

    Invited Presentations (1 hour) (see abstracts below)
        1. “Making Data Decision Ready” by David Green, Program Manager of the NASA Disaster Applications program
        2. “Creating Trust in Earth Observation Data” by Jasmine Muir, FrontiersSI, Australia
        3. “Evolving Operational Readiness Levels within ESIP’s Disaster Lifecycle Cluster” by Dave Jones/Karen Moe, ESIP Disasters Lifecycle Cluster
        4. “ESIP Information Quality Cluster Overview and Recent Efforts” by Yaxing Wei et al., ESIP Information Quality Cluster

    Panel Discussion (30 mins)


Session Description:

The ESIP Information Quality Cluster (IQC) has been collaborating across ESIP clusters and beyond ESIP with national and international domain experts on a number of fronts toward establishing a baseline of standards and best practices for Earth science data quality. These efforts include 1) exposition of the state-of-the-art practices and establishing recommendations to further promote the quantification, characterization, communication, and use of uncertainty information for broad classes of Earth science data, including on-orbit, airborne, field, and assimilated/modeled data; 2) developing community guidelines for consistently curating and representing dataset-level quality information; and 3) identifying challenges and potential approaches for improving citizen science data quality. The development of uncertainty recommendations and quality information guidelines is driven by community needs and the expected outcomes would have a high impact on the community. For example, the IQC is partnering with the ESIP Disasters Lifecycles Cluster (DLC) to mature a framework for determining the Operational Readiness Levels (ORLs) for data products driving disaster management decision-making. In this session, we will share updates with the ESIP community on the current status of those efforts and further strengthen the collaboration between the IQC and other clusters of ESIP to demonstrate the implications of recommendations and guidelines being developed by the IQC.

View Notes

Presentations Abstract:

1. Making Data Decision Ready (David Green)
It is true that data drives decisions, but not just any data. The NASA Disasters program embraces moving toward improvements in capturing, representing, and enabling data quality for risk reduction. Relevant and diverse data types must be discoverable sooner, more useful in complex scenarios, and used by a wider range of actors. These aims have motivated improvements to lower latency for faster dissemination of warnings and forecasts as well as higher resolution to increase local awareness but have not addressed analysis and decision readiness. From a Disasters program perspective, accurate, precise, and fit-for-use hazard data is still not sufficient since disaster “risk” is the consequence of vulnerability, exposure, and coping capacity. The NASA Disasters program utilizes an earth systems perspective and a user-centric approach, which advocates for data interoperability and open geospatial standards to facilitate integration and analysis readiness. Similarly, there have been tremendous improvements in geospatial information systems and frontier technologies, including collaborative tools for sharing, cloud processing, artificial intelligence, visualization, and natural language. We are moving closer to having a portfolio of data capabilities and attributes that make data decision ready, but the issues of representing quality must also mirror the shift we have experienced in supporting the Sendai Framework for Disaster Risk Reduction, specifically data quality for improved understanding of systemic risk and risk management. Metrics of data quality must inform the choices people make throughout the disaster management cycle, the situational awareness support needed during response, and the confidence in guidance as actions evolve. Certainly, within an environmental and socio-economic context the Disasters Program is starting to see the feasibility of quality in decision ready data. Knowing the data quality can also incentivize and support collaborative decision making with the increasing variety and velocity of data needed to meet measureable thresholds for guiding smarter and more resilient actions. 

2.  Creating Trust in Earth Observation Data (Jasmine Muir)
The presentation will provide an introduction to the Australian and New Zealand Data Quality Interest Group and their work on community standards for FAIR and Quality data. The standards will be demonstrated through a use case on creating trust in satellite Earth observation data and derived products.

Organizers & Speakers
avatar for Robert Downs

Robert Downs

Senior Digital Archivist, Columbia University
Dr. Robert R. Downs serves as the senior digital archivist and acting head of cyberinfrastructure and informatics research and development at CIESIN, the Center for International Earth Science Information Network, a research and data center of the Columbia Climate School of Columbia... Read More →
avatar for Dave Jones

Dave Jones

CEO, StormCenter Communications
GeoCollaborate, is an SBIR Phase III technology (Yes, its a big deal) that enables real-time data access through web services, sharing and collaboration across multiple platforms. We call GeoCollaborate a 'Collaborative Common Operating Picture' that empowers decision making, situational... Read More →
avatar for Karen Moe

Karen Moe

Retired, NASA
Managing an air quality monitoring project for my town just outside of Washington DC and looking for free software!! Enjoying citizen science roles in environmental monitoring and sustainable practices in my town. Recipient of an ESIP 2022 Funding Friday grant with Dr Qian Huang to... Read More →
avatar for David Moroni

David Moroni

System Engineer, JPL PO.DAAC
David is an Applied Science Systems Engineer with nearly 15 years of experience at the Jet Propulsion Laboratory (JPL) working on a plethora of projects and tasks in the realm of cross-disciplinary Earth Science data, informatics and open science platforms. Relevant to this particular... Read More →
avatar for Ge Peng

Ge Peng

Research Scholar, CISESS/NCEI
Dataset-centric scientific data stewardship, data quality management
avatar for H. K. “Rama” Ramapriyan

H. K. “Rama” Ramapriyan

Research Scientist, Subject Matter Expert, Science Systems and Applications, Inc.
YW

Yaxing Wei

research scientist, ORNL


Thursday July 22, 2021 6:00pm - 7:30pm EDT
TBA

6:00pm EDT

Vocabularies for rock type categories
The goal of this session is to analyze and compile use cases for a standardized rock type (lithology) vocabulary, and to learn about existing vocabularies in use. The session will start with presentations on existing vocabularies (e.g. CGI Simple Lithology, BGS Rock Names, EarthChem, MINDAT, Geological Survey of Queensland), focusing on their design requirements, how they are currently being used, and how they are accessed. This is an outstanding problem for data integration in geoscience and the time is ripe to look for convergence between the various activities.

Vocabularies to be discussed:

Discussion Questions:
  1. Do existing vocabularies meet requirements, if not, what is missing and what do we have to do next? 
  2. Who should govern a lithology vocabulary?
  3. How can the vocabulary be sustained? 

During the session, please keep an eye on the Rock Vocabulary session jam board, and post notes (use the sticky note tool) with your thoughts and questions. We'll review the board during the discussion time after presentations.

AGENDA:
  • 10 min. Welcome, overview, get organized. 
  • 10 min (5 min each), Kerstin, Lesley on their interest in/experience with lithology vocabularies 
  • 10 min: (Steve) CGI vocabs:  simple lithology, regional lithotectonic units, USGS GEMS ‘General Lithology’ and State Geologic Map Compilation (SGMC) vocabularies. 
  • 10 min: (Jolyon Ralph) MINDAT rock vocabulary 
  • 10 min: (Tim McCormick) BGS lithology SKOS resource, https://data.bgs.ac.uk/id/EarthMaterialClass/RockName/PA_RSD
  • 10 min: (Vance Kelly) Geological Survey of Queensland lithology vocabulary
  • 20 min: Q & A, Discussion
View Notes

Organizers & Speakers
avatar for Kerstin Lehnert

Kerstin Lehnert

Doherty Senior Research Scientist, Columbia University, Lamont-Doherty Earth Observatory
Kerstin Lehnert is Doherty Senior Research Scientist at the Lamont-Doherty Earth Observatory of Columbia University and Director of the Interdisciplinary Earth Data Alliance that operates EarthChem, the System for Earth Sample Registration, and the Astromaterials Data System. Kerstin... Read More →
avatar for Stephen Richard

Stephen Richard

Geoinformatics consultant, personal
Stephen Richard is an independent contractor working from Tucson Arizona. He is currently involved in projects to implement a Geoscience ontology for the Loop3D project, the Technical Team for the EarthCube Office, and applications of geoscience vocabularies in AI applications. Interests... Read More →
avatar for Lesley Wyborn

Lesley Wyborn

Honorary Professor, Australian National University
VK

Vance Kelly

Principal Data Manager, Geological Survey of Queensland
JR

Jolyon Ralph

MinDat.org and Hudson Institute of Mineralogy
TM

Tim McCormick

British Geological Survey



Thursday July 22, 2021 6:00pm - 7:30pm EDT
TBA
 
Friday, July 23
 

11:00am EDT

FUNding Friday & Awards Celebration
Did you know we hold a mini-grant competition DURING the Summer Meeting? It’s called FUNding Friday and we award three $5000 awards to ESIP members and three $3000 awards to students/teachers. It is collaborative, FUN, and uniquely ESIP.

In addition to celebrating FUNding Friday ideas, this session will also feature:
  • Using Remote Sensing & GIS to Identify Magmatic Strain Accommodation: Marsabit Volcano, Kenya
    Speaker: Cora Van Hazinga (Graduate Student at Salem State University & 2021 Robert G. Raskin Scholarship Recipient)
  • New NASA Tech for our Solar System’s Oceans
    Speaker: Ved Chirayath (Director of the NASA Laboratory for Advanced Sensing at Ames Research Center & 2021 Charles S. Falkenberg Award Recipient)

Organizers & Speakers
avatar for Ved Chirayath

Ved Chirayath

Director, Laboratory for Advanced Sensing, NASA Ames Research Center
Dr. Ved Chirayath is the director of the NASA Laboratory for Advanced Sensing at Ames Research Center in Silicon Valley. After earning his Ph.D. from Stanford University, Chirayath began inventing next-generation sensing technologies for NASA. His inventions are used to study and... Read More →
avatar for Cora Van Hazinga

Cora Van Hazinga

Graduate Student, Salem State University
Cora Van Hazinga GIS graduate student at Salem State University. Her background is in Geology (Salem State University) and Computer Science (Bunker Hill Community College). Cora is passionate about furthering our understanding of geological processes using GIS and data science tools... Read More →
avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP


Friday July 23, 2021 11:00am - 1:00pm EDT
TBA

1:00pm EDT

Break
Friday July 23, 2021 1:00pm - 1:30pm EDT
TBA

1:30pm EDT

Designing a Public Portal for Participatory Environmental Governance
Help us push the frontiers of democratic participation in environmental governance by joining this design workshop on a new data portal that enables members of environmental advocacy groups to ask geography-based questions about environmental enforcement!

Background: Vital data about federal enforcement actions against facilities that pollute the soil, air, and water is currently available but largely inaccessible in the U.S. Environmental Protection Agency’s (EPA) Enforcement and Compliance History Online (ECHO) database. We have been working for 1.5 years with data analysts, nonprofits, and community groups—and now with ESIP Lab funding—to develop well-documented and open source cloud-based Jupyter Notebooks that make ECHO data readily accessible and reportable by zip code, hydrologic unit code (to assess watersheds), state, and congressional district. However, we now have so many tools and reports that they can be hard to navigate and access!

What we’re making: We are now building a web portal to share our tools and reports. Our vision is an intuitive map-centric interface for three types of public interaction:
  1. Accessing already-generated reports
  2. Accessing our Jupyter Notebooks to generate custom reports (e.g. Clean Water Act violations in the Niagara River watershed)
  3. Sharing these custom reports and some context about why the findings are important or how they are surprising.
Where you come in: Are there best practices we should know about for displaying these kinds of reports and tools? What are similar projects we should look at during the design process? For example, EPA’s How’s My Waterway tool, justicemap.org, and DataONE (possible integration potential?)

This workshop will take place in two parts:
  • Part 1 is an introduction to the reports and tools. We will familiarize participants with the project through both a presentation and hands-on use of a Notebook.
  • Part 2 is a design workshop exploring ideas for the web portal: a structured, facilitated discussion focused on developing user scenarios to inform web development.

View Notes

Organizers & Speakers
avatar for Kelsey Breseman

Kelsey Breseman

Attendee, Head Weaver
Tlingit, forest person, engineer, and activist. Working on climate research & communication on tribal lands with Sealaska and The Nature Conservancy. Always interested in how tech tools and the stories we tell shift the balance of power.


Friday July 23, 2021 1:30pm - 3:00pm EDT
TBA

1:30pm EDT

Distributed Rapid Collaboration on Disaster related Information Products
An important goal of the 2021 OGC Disaster Pilot is to implement a (largely cloud-based) EO data flow ecosystem that facilitates rapid collaboration loops among data providers, analysts, and users in the field with regard to disaster situations. We would like to enlist ESIP and its members in envisioning and designing these collaboration loops, as well as developing and/or refining specific decision ready information products ("indicator recipes") that address the Pilot scenarios of flooding, landslide, and pandemic hazards.

View Notes

Organizers & Speakers
avatar for Joshua Lieberman

Joshua Lieberman

Director Innovation Programs, Open Geospatial Consortium
Josh Lieberman develops, leads, and manages OGC Collaborative Solutions and Innovation Program initiatives. Originally trained as a geologist and environmental scientist, Josh has been involved in OGC both as a member and as an initiative architect for almost two decades while serving... Read More →


Friday July 23, 2021 1:30pm - 3:00pm EDT
TBA

1:30pm EDT

Renewing Past Innovation: Taking the Next Step in the Innovation Lifecycle for the ESIP Data Management Short Course for Scientists.
In 2012, ESIP’s Data Stewardship Committee produced a set of 35 related educational modules called the Data Management Short Course for Scientists (Short Course) targeted to research scientists on RDM and data stewardship related topics. These succinct educational modules using the Kahn Academy format were some of the first to be created in this topic area, and also the first put together to form a cohesive whole of modules under four key categories: The Case for Data Stewardship; Data Management Plans; Local Data Management; and Responsible Data Use. Supported by funds from NOAA and the Data Conservancy, the Short Course modules were created, peer reviewed and edited by ESIP Data Stewardship members, stored on the ESIP Commons, given proper DOIs and preferred citations, and ultimately added to the ESIP-hosted Data Management Training Clearinghouse (http://dmtclearinghouse.esipfed.org/ ). Now, as part of the innovation lifecycle, the Data Stewardship Committee is committed to the process of reviewing and renewing these Short Course modules, deprecating those no longer current, updating those still relevant with new examples, practices and guidelines, and identifying new topics needed based on the evolving practice of RDM and data stewardship for Earth Science data. This working session will present the plan proposed for this work, discuss the previous tools created for module creation and peer review and solicit feedback on their re-use or adaptation, identify priorities in module review, updating and new creation, and solicit the participation of current RDM and data stewardship experts.

View Notes

Organizers & Speakers
avatar for Amber Budden

Amber Budden

Director for Learning and Outreach, NCEAS
Open science facilitator, community manager and data literacy trainer. I lead the NCEAS Learning Hub and short course activities and co-lead DataONE and the Arctic Data Center, with a focus on supporting the community in open science learning and practices... Read More →
avatar for Ruth Duerr

Ruth Duerr

Research Scholar, Ronin Institute for Independent Scholarship
avatar for Nancy Hoebelheinrich

Nancy Hoebelheinrich

Principal/Information Analyst, Knowledge Motifs LLC
See my LinkedIn profile at: https://www.linkedin.com/in/nancy-hoebelheinrich-0576ba3


Friday July 23, 2021 1:30pm - 3:00pm EDT
TBA

3:00pm EDT

Break
Friday July 23, 2021 3:00pm - 3:30pm EDT
TBA

3:30pm EDT

ESIP in 2031: how we got here, from a pandemic to a bright new future
ESIP in 2031

Join our ESIP fellows as they venture to the future.
Imagine the stories that ESIP veterans will tell; tales from the beforetimes, when meetings were all in person, anecdotes of the long pandemic year when ESIP learned how to increase participation across the continent and the world, and chronicles over the next decade as ESIP became a global Earth data leader in the planetary response to climate change.

You can make your own imaginary leap into the future and add your ideas and insights to how, where, and what ESIP could do in the next ten years to become the ESIP that our planet needs.
ESIP needs your help to capture the best solutions emerging from this pandemic year, so join in and have some fun exploring the future with your ESIP colleagues.

View Notes

Organizers & Speakers
avatar for Bruce Caron

Bruce Caron

Executive Director, New Media Studio


Friday July 23, 2021 3:30pm - 5:00pm EDT
TBA

3:30pm EDT

HDF Town Hall
Several petabytes of Earth Science data are already in the HDF file formats and the collections are still growing. The HDF Group is strategically committed to support producers and users of these data as their access patterns vary throughout the data use lifecycle with evolving applications, computing frameworks, and backend storage systems. This is especially important for seamless transition from on-prem filesystem-based to cloud computing. This commitment is reflected in HDF Group’s long-time collaboration with the netCDF library developers, and more recent work to support efficient access to HDF5 files for the Zarr-based applications. Both of these data formats are popular in the ESIP community.

View Notes

• Elena Pourmal (NASA EED-2 / HDF Group): HDF - Current Status and Future Directions
• J. P. Swinski (NASA): H5Coro: The HDF5 Cloud-Optimized Read-Only Library

NASA’s migration of science data products and services to AWS has sparked a debate on the best way to access science data stored in the cloud. Given that a large portion of NASA’s science data is in the HDF5 format or one of its derivatives, a growing number of efforts are looking at ways to efficiently access H5 files residing in S3. This presentation describes one of those efforts, H5Coro, and argues for the creation of a standardized subset of the HDF5 specification targeting cloud environments. H5Coro is an open-source C++ module written from scratch that implements a performant HDF5 reader for H5 files that reside in S3. It targets high latency/high throughput environments by minimizing the number of I/O operations through caching and intelligent range GETs. H5Coro is currently available as a C library and includes Python bindings.
• John Readey (The HDF Group): HDF for the Cloud - Serverless HDF

The HDF Server (HSDS) provides a convenient method for running HDF applications in the cloud (utilizing scalable compute and object-based storage), but sometimes setting up a server is just too much to deal with due to cost, time, or management concerns. In this talk we'll discuss two alternative ways to utilize HSDS technology but leaving the server aspect behind. The first, HSDS for AWS Lambda supports the HDF REST API but runs entirely using Lambda functions. The second approach is "HSDS Direct Access", a client-side library that enables HSDS-like features exclusively on the client: Object storage read and write, multi-threading support, and sql-stye queries.
Ellen Johnson (MathWorks): MATLAB Modernization on HDF5 1.10, Support for SWMR and VDS, and Cloud Data Access
 This talk presents our effort at MathWorks toward modernizing on HDF5 1.10.7 and adding support for the much-requested Single-Writer/Multiple-Reader and Virtual Dataset features. We will discuss our updated 1.10.7 HDF5 functionality available today for MATLAB users in the R2021b prerelease (with R2021b full release planned for September) and would like to hear early feedback from the community. We will also discuss MATLAB capabilities for working with HDF5 data hosted on S3, Azure, and Hadoop introduced in R2020b which we have now enabled for Virtual Datasets. We will wrap up with performance and compatibility considerations plus our tentative roadmap for future HDF5 enhancements.
 • Joe Lee (NASA EED-2 / HDF Group): HDFEOS.org User Analysis, Updates, and Future

Organizers & Speakers
AJ

Aleksandar Jelenek

The HDF Group
EP

Elena Pourmal

Engineering Director, HDF Group
HDF
JR

John Readey

Developer, The HDF Group



Friday July 23, 2021 3:30pm - 5:00pm EDT
TBA

3:30pm EDT

Science-on-Schema.org - Submitting the Guidelines for ESIP Assembly Endorsement
This is a working session for members of the Schema.org Cluster and others with interest to finalize and submit the Science-on-Schema.org Guidelines v1.3 for ESIP Assembly Endorsement. There are many tasks involved with preparing a new version of the guidelines - reviewing Pull Requests, committing to Github, preparing DOI metadata, and coordinating all these changes for a successful release at Github and Zenodo. This short session helps us work in real-time through those tasks to ensure all goes smoothly and nothing is missed. Finally, it gives cluster members an opportunity to celebrate their work together in real-time.

View Notes

Organizers & Speakers
avatar for Stephen Richard

Stephen Richard

Geoinformatics consultant, personal
Stephen Richard is an independent contractor working from Tucson Arizona. He is currently involved in projects to implement a Geoscience ontology for the Loop3D project, the Technical Team for the EarthCube Office, and applications of geoscience vocabularies in AI applications. Interests... Read More →
avatar for Ruth Duerr

Ruth Duerr

Research Scholar, Ronin Institute for Independent Scholarship
avatar for Matt Jones

Matt Jones

Director of Informatics R&D, NCEAS / DataONE / UC Santa Barbara
DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Cyberinfrastructure
avatar for Mark Schildhauer

Mark Schildhauer

Senior Technology Fellow, NCEAS/UCSB
Data semantics, Ecoinformatics training, Arctic data, LTER data, Ecological synthesis
avatar for Adam Shepherd

Adam Shepherd

Technical Director, BCO-DMO
Architecting adaptive and sustainable data infrastructures.Co-chair of the ESIP schema.org clusterKnowledge Graphs | Data Containerization | Declarative Workflows | Provenance | schema.org
DV

Dave Vieglais

Research Professor, University of Kansas


Friday July 23, 2021 3:30pm - 5:00pm EDT
TBA
 
  • Timezone
  • Filter By Date 2021 ESIP Summer Meeting Jul 19 -23, 2021
  • Filter By Venue Venues
  • Filter By Type
  • Break
  • Breakout Session
  • Hackathon
  • Networking
  • Plenary
  • Workshop
  • Keywords
  • Collaboration Area Tags


Filter sessions
Apply filters to sessions.