The results from machine learning are only as good as their training data. At the same time, it’s difficult and time consuming to develop quality training data. It would be valuable if we could reuse training data in other contexts. What is necessary to make that happen?
In this session we explore several examples of preparing and sharing ML training data and then explore whether there are certain attributes or processes that we can standardize in order to make trains data more interoperable.
Presentations:
David Roy, Univ. Mich., on the reuse of burned area data from Landsat
Gabriel Tseng, Univ MD., on the collection and later sharing of training data on agricultural conditions
Christian Schroeder de Witt, Univ. Oxford, on a benchmark data set for precipitation prediction
Community exercise to refine most effective ways to enhance ML training data reusability and readiness.
View Notes