Space-Based AI Shows the Promise of Big Data

This blog post was written by Elizabeth Howell, Ph.D as a guest author for Cloudera. 

At a distance of a million miles from Earth, the James Webb Space Telescope is pushing the edge of data transfer capabilities.

The observatory launched Dec. 25 2021 on a mission to look at the early universe, at exoplanets, and at other objects of celestial interest. But first it must pass a rigorous, months-long commissioning period to make sure that the data will get back to Earth properly.

Mission managers provided an update Feb. 11 noting that the primary mirror is aligning well, and that the instruments are starting to receive data from deep space.

“This is the first time we’re getting data on mirrors that are actually at zero gravity,”

said Lee Feinberg, Optical Telescope Element Manager for the James Webb Space Telescope at the NASA Goddard Space Flight Center, during the February press conference. 

“So far, our data is matching our models and expectations,” Feinberg added. Webb is continuing the alignment procedure for several more weeks and is expected to start sending back its first operational science data in the summer of 2022.

How to store and analyze data in space

But when the telescope is ready for work, a new problem will arise. Webb’s gimbaled antenna assembly, which includes the telescope’s high-data-rate dish antenna, must transmit about a Blu-ray’s worth of science data — that’s 28.6 gigabytes — down from the observatory, twice a day. The telescope’s storage ability is limited — 65 gigabytes — which requires regular sending back of data to keep from filling up the hard drive.

The problem is deciding where to look first through this richness. Luckily, Webb’s tools are largely available in Python and parts of the data may be shared with institutes around the world to get more help. That said, scientists have limited time. Although researchers can recruit “citizen scientists” to help look at images through crowdsourcing ventures such as Zooniverse, astronomy is turning to artificial intelligence (AI) to find the right data as quickly as possible.

AI requires good data and strong training algorithms, such as through machine learning, to make decisions about what data to send back to decision-makers. Happily, there’s a space industry Webb can borrow from; AI systems are getting more adept by the month in interpreting Earth observations from satellites. There are many companies and space agencies out there using AI to parse information quickly on fast-moving events such as climate-change related wildfires or flooding.

The process (in an ideal world) begins up in space, when the satellite makes decisions on board about what to send back to Earth. For example, the European Space Agency’s ɸ-sat-1 (“phi-sat-1”) satellite launched in 2020 to test this in-space filtering on images with too much cloud in them to be otherwise usable. To note, previous satellites had trouble with clouds and this satellite launched with technology to fix the issue. 

“To avoid downlinking these less than perfect images back to Earth, the ɸ-sat-1 artificial intelligence chip filters them out so that only usable data is returned,” ESA said in a blog post. “This will make the process of handling all this data more efficient, allowing users access to more timely information, ultimately benefiting society at large.”

This filtering is necessarily limited in space since only so much hardware will fit on a satellite. The images that make it down to Earth have a more robust set of techniques applied upon them with ground computers. It’s a process that some companies call geospatial intelligence (GI). 

GI, AI, and ML for all

GI is a quickly rising feature among Earth observation organizations. The goal is to use machine learning to extract features of relevance — such as browning crops, or rising waters — to plot change between images within seconds.

According to Ian Brooks, a principal solutions engineer at Cloudera, as information arrives on Earth it is useful to use parallel computing to sort out the data. Parallel computing allows several processors to break down a complex calculation into smaller, more bite-size jobs. The system could distribute the jobs among different machines in the same lab, or even in different zones around the world.

“You probably don’t even need to prioritize [data] at this stage, because of the level of computing power available,” Brooks pointed out. “Maybe you could have multiple destinations on Earth with the same dataset, doing different things.” 

Moreover, interpreting AI results from the data is not overly difficult. Beyond boot camps and computer science degrees, Brooks said that YouTube, massively open online courses (MOOCs), and other institutions have data science programs freely available online to assist with learning about the tools and techniques available. This e-learning allows lots of folks to assist with the AI.

Streaming analytics beyond Earth

The trend in machine learning is using streaming data — and attempting to perform analytics on that data as it flows back to Earth, rather than waiting for all of it to arrive before doing the processing. “You get faster sorts of alerts and dashboarding on that data coming in from the devices to [the programmer], who determines where the big trends are going,” Brooks said.

This theory of compressing and dealing with data will be particularly applicable for programs with Webb that seek a lot of information, such as those that seek signs of life. The Search for Extraterrestrial Intelligence (SETI) Institute, for example, sees a potential partnership Webb might engage in, with a ground-based radio telescope.

While the radio telescope on the ground looks for sky-based, narrow-band signals that move at the same rate as the Earth’s rotation — showing that the signal is coming from the sky — Webb might be able to send back information about oxygen, nitrogen, or other elements that indicate a planet may host life as we know it, said the institute’s senior astronomer, Seth Shostak. 

“It’s a clear case in which if you have machine learning, and you trained the software to recognize an actual signal and to reject all the ones you pick up that are not correct, that just speeds up the search,” he said. 

That of course assumes Webb might be able to see planets close to the size of our own, which is not a guarantee; most researchers say the telescope will be better situated to see huge, Jupiter-sized planets.

Cloudera’s Brooks points out that space-based AI has numerous applications for companies seeking to have organized information as quickly as possible, likening the process to having a “Star Wars”-like drone on a potentially habitable planet using AI to steer its way.

“You’re trying to pick a needle in a haystack. You’re just zeroing in on an object better … it’s a massive kind of a concept,” Brooks said of the filtering tools in place today. The right AI, he added, will assist telescope users with using the knowledge they have to move forward on the results turned up by machine learning, whether it’s an interesting black hole or a potential life-friendly world.

Back on Earth, it’s not just astronomers and astrophysicists who benefit from streaming data and AI. In healthcare, for example, doctors are starting to leverage ML for real-time analysis of data to improve medical care. As do many other industries, from retail and logistics to banking and insurance. 

No matter what industry, organizations like yours are likely to encounter large amounts of streaming data too. Learn how to tackle all of this data and use AI for your business

By Elizabeth Howell, Ph.D., a space writer and journalist based in Ottawa, Canada. You can read more of her work on her site or on Space.com

Cloudera Contributors
More by this author

Leave a comment

Your email address will not be published. Links are not permitted in comments.