This event has ended. Create your own event on Sched.
For over 20 years, ESIP meetings have brought together the most innovative thinkers and leaders around Earth observation data, thus forming a community dedicated to making Earth observations more discoverable, accessible and useful to researchers, practitioners, policymakers, and the public. The theme of this year’s meeting is Leading Innovation in Earth Science Data Frontiers.
Back To Schedule
Wednesday, July 21 • 2:30pm - 5:00pm
New Frontiers in AI for Earth and Space: Big Data and Parallel Computing

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
AI is lauded as a powerful tool for gaining insights and producing knowledge from the massive datasets we have access to today in the Earth sciences. One of the major challenges of integrating AI practices in the Earth and Space Sciences is the immense size of environmental and climate data. Intensive computational power is required for AI to efficiently learn from such massive amounts of data. The key question here, then, is what are the best strategies to make AI work and what kind of infrastructural constraints does the community face as a result? There are many parallel computing frameworks, e.g., GPU, Dask, Spark, Hadoop, CUDA, JobLib, ipyparallel, dispy, Ray, etc to assist with this challenge today. But which one is suitable for different use cases in Earth and Space sciences? On various deployment platforms such as HPC, Azure, AWS, GCP, institutional clusters, individual servers, or even personal computers, what is the best way to configure the environment for carrying out AI tasks on large spatial datasets?

This series consist of two sessions. The first session will invite speakers with experiences implementing AI at scale to share and communicate with the ESIP community working with parallel computing. We will accumulate a series of key strategies these speakers have used to move our research forward on AI4Earth&Space. In the second session, we will conduct a thorough step-by-step tutorial from environment setup (e.g., Dask-ML) to train/test AI using parallel computing on large datasets to equip the Earth and Space science community with some hands-on experiences.

Session 1: Talks (2.30 - 3.30pm)

1. Tom Augspurger, Microsoft
Title: Scalable Geospatial Analysis
Working with geospatial data can be challenging, regardless of the scale. We'll see how Microsoft's Planetary Computer is using STAC and Dask to facility large-scale geospatial data analysis. We'll use the Planetary Computer's STAC catalog to find the data matching some conditions, and a Dask cluster to process the data in parallel.

2. James Bednar, Director of Technical Consulting, Anaconda, Inc.
Title: How reproducible do you want your code to be?
Unless your simulation or analysis is reproducible, you can't be sure your results mean anything. But how reproducible does it need to be, across hardware, software environments, people, organizations, and time? I'll present a quick overview of the levels to choose from, along with a suggested way to achieve each one using Conda environments with Python.

3. Ryan McGranaghan, Data Scientist/Aerospace Engineering Scientist, ASTRA LLC
A survey of Cloud solutions for the Earth and Space Sciences
The Cloud has the potential to transform the way we collaborate and share science and to push the boundaries of what is possible with scientific computing. Cloud-based data science platforms are now being used to address challenges in the field of AI. Indeed, the Earth and Space Sciences are in an intense period of experimentation applying these platforms to more capably use AI for prediction and discovery. We will explore selected existing Cloud-based environments for the Earth and Space Sciences, particularly for the myriad components of the AI project lifecycle. We will use the survey of solutions to emerge the gaps and trends in this rapidly evolving landscape.

4. Ziheng Sun, Research Assistant Professor, George Mason University
Title: ESIP Geoweaver Update, Machine Learning Cluster Activity Overview & Future Plan
The automation of full stack workflow has become viral since the Earth data volume expontionally increases and the complexity of Earth system models and algorithms gets more difficult to manage and faciliate. The latest development in AI/ML technique brings a lot of new opportunities to significantly improve the accuracy, increase the model resilience and intelligence, and reduce the overall cost. However, managing and automating AI experiments is a grand challenge for the entire Earth science community. Geoweaver is a software developed to tackle this problem. We will show how to use Geoweaver to create AI workflow in one place and run the processes on various distributed platforms, separate code from computing resources for resilience, record the provenance of every workflow execution, and share and reuse workflows to boost knowledge accumulation and discovery. 

5. Cindy Lin, Postdoctoral Fellow, Cornell University 
Title: AI Ethics in Context
It has been broadly established by computer scientists working on AI in the environmental sciences that physical and computer science researchers pay more attention to the performance of AI-based models and less to how end users trust AI models (McGovern 2020). Accordingly, a lot of what makes an AI model usable depends on its trustworthiness; what is considered trustworthy may differ according to the needs of end user groups such as private industry and government. In this talk, I will discuss how a conundrum of political and socioeconomic factors, apart from the needs of end users, enable the establishment of AI trustworthiness in Indonesia. In particular, I provide an ethnographic account of a public-private partnership between an American IT firm and one of Indonesia’s leading engineering agency where new AI technologies are developed to address one of the world’s largest environmental concerns: tropical peatland fires.

Session 2: Demos (3.45 - 5.00pm)
1. Tom Augspurger, Microsoft
Demo Title: Scalable Geospatial Machine Learning with Dask and STAC
Abstract: In this workshop, attendees will work through several exercises to train a deep learning model to predict crop types using satellite imagery. We’ll work on a JupyterHub deployed to Azure, and will access data from Microsoft’s Planetary Computer’s data catalog

Preparation: Attendees do not need to prepare anything ahead of time. They will be provided with credentials to log into a JupyterHub during the session. 

The materials will all be at https://github.com/TomAugspurger/esip-summer-2021-geospatial-ml

2. James Bednar, Director of Technical Consulting, Anaconda, Inc.

Demo title: Using hvPlot for interactive plotting of Xarray, Pandas, and Dask data in Jupyter
Xarray and Pandas support calling .plot() to get basic matplotlib plots, and here we'll show you how to use the same commands to explore even the largest cloud or remote datasets fully interactively. hvPlot makes it easy to get small multiples, overlays, layouts, and categorical plots, with dynamic regridding of large datasets so that you can explore them in any browser. New hvPlot features now also let you replace just about any number or string in an xarray or pandas method or expression with a widget, so that you can quickly try out the effect of various parameters or dynamically filter your data to help you understand it.

Preparation: Please follow the installation instructions at https://holoviz.org/installation.html

3. Ziheng Sun, Research Assistant Professor...

View Notes

Organizers & Speakers
avatar for James Bednar

James Bednar

Director of Technical Consulting, Anaconda, Inc.
I work on HoloViz.org and PyViz.org, and am happy to chat about anything to do with visualizing data in Python.
avatar for Tom Augspurger

Tom Augspurger

avatar for Annie Burgess

Annie Burgess

Lab Director, ESIP
avatar for Julien Chastang

Julien Chastang

Software Engineer, UCAR - Unidata
Scientific software developer at UCAR-Unidata.
avatar for Yuhan

Yuhan "Douglas" Rao

I am currently a Postdoctoral Research Scholar at North Carolina Institute for Climate Studies, also affiliated with NOAA National Centers for Environmental Information. My current research at NCICS focuses on generating a blended near-surface air temperature dataset by integrating... Read More →
avatar for Ziheng Sun

Ziheng Sun

Research Assistant Professor, George Mason University
My research interests are mainly on geospatial cyberinfrastructure and machine learning in atmospheric and agricultural sciences.
avatar for Ryan McGranaghan

Ryan McGranaghan

Data Scientist/Aerospace Engineering Scientist, ASTRA LLC
Space scientist, engineer, data scientist, designer, podcast host. Observer of beauty in liminal spaces. I believe in being led around by your curiosity.
avatar for Cindy Lin

Cindy Lin

Postdoctoral Fellow, Cornell University
Cindy Lin is a Postdoctoral Fellow at the Atkinson Center for Sustainability, affiliated with the Department of Information Science. In Fall 2022, she will be an assistant professor at Pennsylvania State University’s College of Information Sciences and Technology. Her current research... Read More →

Wednesday July 21, 2021 2:30pm - 5:00pm EDT