Intro
One of the biggest challenges in acquiring insight from streaming data is ensuring quick, and secure transport while still having clear control. Cloudera DataFlow (CDF) provides the solution to grab the data from the edge and connect it to the cloud with visibility at each point in the data pipeline. Our goal is to showcase the process of building a self-driving car application using Cloudera technologies.
The driverless scale car that is used in this blog series is powered by three cameras, a LiDAR and an Xbox controller connected to an Nvidia Jetson TX2 board. Robot Operating System (ROS) is used to run the TX2 and enables us to control the car’s movement. When not in manual mode the car is controlled by a Convolutional Neural Network (CNN) which has been trained to clone the behavior of a driver on a closed track. The training of the CNN is done on a public cloud hosting the Cloudera Distribution Hadoop (CDH) and Cloudera Data Science Workbench (CDSW).
As you can see above, we send the data collected from the car to an instance of Hadoop Distributed File System (HDFS) in the cloud and use CDSW to build and train a Keras model on top of TensorFlow. Finally, we save the trained model back to HDFS and deploy it onto the car for autonomous driving. The model is trained by essentially cloning a person’s driving behavior on a racetrack with all the collected data. And when deployed, the model predicts steering angle based on center camera frames and the car adjusts the steering angle accordingly while driving at a constant speed.
Cloudera Edge Management Overview
A Data Engineer working on a data pipeline may be required to play the role of an Embedded Systems Engineer to handle edge computing when required to collect sensor data from an edge device, Cloudera Edge Management (CEM) can be used to create a data pipeline from the edge to the cloud.
CEM is an edge management solution composed of edge agents (C++ and Java agents) and an Edge Flow Manager. With it you can manage and monitor edge agents to collect data and store it back into Cloudera Distribution Hadoop (CDH). CEM makes it possible to deploy intelligence back to the edge agents to make the process of data collection even more efficient, for example, a smart car which has received an improved model through CEM can make better decisions which enable it to drive autonomously for longer periods of time; thus, collecting more data that can further improve the model.
CEM is primarily composed of an Edge Flow Manager (EFM) and Apache NiFi MiNiFi agents. The interaction of the two is engineered such that an organization using CEM will only need to interact with the EFM UI (shown below).
ROS Embedded Application
ROS allows us to communicate with all the sensors connected to our Jetson TX2 board and combine the collected data before sending it to EFM. Further, we use ROS because it allows us to interface with our game controller and collects camera, steering, and speed data. Although, we also have LIDAR and IMU sensors, the data from these sensors were not necessary for this project since we have focused our efforts on building robust vision-based models.
The ROS application built for this project reads and saves the camera, steering and speed data into a CSV file containing image details and a respective image. Data is collected when a user manually drives the car around our custom track and starts recording data, the ROS embedded application then stores the data onto the Jetson TX2’s local file system.
Enabling Edge Device to transport data to the cloud
In order to enable the transport data to the cloud, we installed MiNiFi on the car. Because the car uses a Jetson TX2, which has an aarch64 architecture, MiNiFi is built from source on the car itself. The MiNiFi agent is then installed, and the appropriate configurations are changed to enable communication between the MiNiFi agent and NiFi.
On the cloud instance that is running CEM, one can choose which MiNiFi agent(s) they want to build the data flow for by selecting the agent’s class, note that the class can be associated with one or more MiNiFi agents. The class name can be found, and changed, in the MiNiFi properties file.
So, once the data flow is finished being built, the user can click on the options dropdown, then press publish, so the data flow is deployed on the edge device where the MiNiFi agent is installed.
Building an Edge Data Pipeline
EFM UI was used to build a data flow for MiNiFi C++ agent running on the Jetson TX2 and stewart data from where it was collected and transmit it to the cloud. The pipeline begins with running a ROS application called train_mode.launch to allow the user to start collecting driving behavior data doing laps around the racetrack. The data is then extracted in the form of a CSV file and images that were saved to the Ubuntu local file system of the TX2. The extraction is done using two MiNiFi GetFile processors; finally, this data was transmitted to a remote NiFi data flow running in the cloud using Remote Process Group (RPG), such as on an AWS EC2 instance. Now when the data arrives in NiFi, it can be traced to where it originated on the MiNiFi agent.
- GetCSV retrieves metadata associated with each image collected in the form of a CSV file.
- GetJPG retrieves all of the images collected while driving the car in train mode.
- RPG holds the public URL for the NiFI service on our CDF cluster.
Conclusion
We’ve covered how our small-scale smart car collects data; additionally, we’ve briefly discussed how data flows from our smart car to a data lake, we also hinted at how CEM enables us to gather data from several sources. In future blogs we will explore how to deposit collected data into CDH and train a model. Learn to build your own simulated edge to AI pipeline by completing the Edge2AI autonomous car tutorial.
I don’t usually post comments on blogs, but yours is so riveting that I can’t stop myself. Keep up the excellent work; you’re doing an extremely good job.