Earth Science and Big Data
NASA Explores Innovative Approaches to Large Datasets at High Velocity
We are living in a time that clearly recognizes the necessity and potential of being able to utilize the information that is contained within big data. Though a lot of attention is given to the future of innovation with big data in the context of social media and traditional business information systems, in some ways the future is already here in the burgeoning area of Earth data science.
Consider the Earth Observing System Data and Information System (EOSDIS) that NASA uses to manage its large and growing archive of Earth science data. With sensors on board dozens of satellites and airborne platforms augmented by ongoing in-situ measurements, by 2013 the EOSDIS archives had exceeded 7.5 petabytes. Demonstrating both the interdisciplinary nature of the Earth sciences and the variety of its archives, EOSDIS serves a community of more than 1.5 million users across various disciplines, including atmospheric science, land processes, oceanography and hydrology. The archives include almost 7,000 unique data set types, although in many cases objects are stored in structured files using common formats such as ASCII and Geo Tiff.
NASA Earth Science Division Operating Missions. Credit: NASA
There is so much variety and complexity in the data maintained by EOSDIS that there are significant challenges with its discovery, access, and use. Data sets span multiple scientific disciplines, have varying and diverse parameters, and exhibit a wide variety of spectral, spatial, and temporal characteristics. EOSDIS manages networks and data centers distributed throughout the country in order to collect and distribute data and processed products. Non satellite data, such as that obtained from in-situ ground and ocean observations collected at locations such as towers and buoys or by hand-held instruments, can flow to EOSDIS data centers via the Internet as well as travel there on physical media, or through a variety of other means.
To support their diverse user groups, the EOSDIS data centers provide tools that perform common functions in areas such as searching, filtering, mapping, and visualization. EOSDIS supports centralized search capabilities to enable users to discover the data relevant to their queries. The public facing entry point to the comprehensive contents of EOSDIS is its web page, available at http://earthdata.nasa.gov.
The Land and Atmosphere Near real-time Capability for EOS (LANCE) gives an example of the type of application that EOSDIS supports. This system can provide products from the MODIS, OMI, AIRS, and MLS instruments within three hours of observation. This capability demonstrates how NASA has met the data velocity challenge of serving the time-sensitive needs of applications such as weather prediction, natural hazard monitoring, agriculture, disaster relief, and homeland security.
Meanwhile, other data sources are continually being added to the EOSDIS archives. NASA plans to use EOSDIS for archiving and distributing data far into the future, continuing its legacy as a successful system for proving innovative approaches to extremely large data sets and highly data-intensive activities.