Starship is building a fleet of robots to deliver packages locally according to demand. To achieve this successfully, robots need to be safe, polite and fast. But how do you get there without less computational resources and expensive sensors like LIDAR? This is an engineering reality that you have to deal with unless you live in a universe where customers gladly pay $ 100 for a delivery.
In the beginning, robots started sensing the world through radar, numerous cameras and ultrasonics.
The challenge, however, is that most of this knowledge is low-level and non-semantic. For example, a robot realizes that an object is ten meters away, yet without knowing the division of the object, it is difficult to make a safe driving decision.
Machine learning through neural networks is surprisingly effective in transforming this unorganized low-level data into high-level information.
Starship robots drive on most sidewalks and crossroads when needed. This creates a different kind of challenge than self-driving cars. Traffic on car roads is more structured and predictable. Cars run along lanes and often do not change direction where people often stop abruptly, turn around, walk with a dog with a rib, and do not signal their purpose with turn signal lights.
To understand the surroundings in real time, a central component of the robot is an object detection module – a program that inputs images and provides a list of object boxes.
It’s all very well, but how do you write such a program?
An image is a large three-dimensional format consisting of a myriad of numbers representing the intensity of the pixels. These values vary significantly when the image is taken at night instead of during the day; When the color, scale or position of the object changes, or when the object itself is cut or pasted.
For some complex problems, teaching is more normal than programming.
In robot software, we have a set of trainable units, mostly neural networks, where the code is written by the model itself. The program is represented by a set of weights.
At first, these numbers start randomly, and the output of the program is also random. Engineers present model examples of what they want to predict and tell the network to get better the next time they see similar input. By repeatedly changing the weight, the optimization algorithm searches for programs that predict bounding boxes more and more accurately.
However, the examples used for model training need to be thought through.
- Should the model be penalized or rewarded when it detects a car in a window reflection?
- What will it do when it detects a picture of a man from a poster?
- Should a car trailer full of cars be vaccinated as an entity or should each car be vaccinated separately?
These are just some of the goal setting shareware that we can use to create our own robotic object detection module.
When teaching a machine, big data alone is not enough. The information collected should be rich and varied. For example, using only uniformly sampled images and then annotating them will show many pedestrians and cars, yet the model lacks examples of motorcycles or skaters to reliably identify these categories.
Tough examples of the team and in rare cases need special mining, otherwise the model will not progress. Starship operates in different countries and enriches the set of examples of different weather conditions. Many were surprised when the starship delivery robots worked during the blizzard ‘Emma’ In the UK, However, airports and schools are closed.
At the same time, it takes time and resources to annotate data. Ideally, it is best to train and improve models with less data This is where architecture engineering comes into play. We encode prior knowledge in architecture and optimization processes to reduce search space in programs that are more likely to be in the real world.
In some computer vision applications, such as pixel-wise segmentation, it is useful for the model to know if the robot is on the sidewalk or crossing the road. To provide an indication, we encode the global image-level clue in the neural network architecture; The model then decides whether to use it without learning from scratch.
After data and architecture engineering, the model can work better. However, deep learning models require significant amounts of computing power, and this is a big challenge for the team because we cannot take advantage of the most powerful graphics cards in battery-powered low-cost delivery robots.
Starship wants our delivery to be low cost which means our hardware must be cheap. For the same reason Starship does not use LIDARs (an identification system that works on radar principle, but uses light from a laser) that will make the world much easier to understand – but we do not want our customers to pay more than they need for delivery. .
The state-of-the-art object detection system published in academic papers runs at about 5 frames per second. [MaskRCNN], And real-time object detection papers do not report significantly above 100 FPS. [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. What’s more, these numbers are reported in a single image; However, we need a 360-degree understanding (roughly equivalent to 5 single image processing).
To provide an overview, Starship models run above 2000 FPS when measured on a consumer-grade GPU and process a complete 360-degree panorama image on a forward pass. This is equivalent to 10,000 FPS when processing 5 single images including batch size 1.
Neural networks are better than humans in many visual problems, although they may still have bugs. For example, a closed box may be too wide, confidence may be too low, or it may be an object Hallucinated In a place that is actually empty.
Fixing these bugs is challenging.
Neural networks are considered black boxes that are difficult to analyze and understand. However, in order to improve the model, engineers need to understand the failures and dive deeper into the specifics of what the model has learned.
The model is represented by a set of weights and can visualize what each specific neuron is trying to detect. For example, the first layers of the starship network are activated in standard patterns, such as horizontal and vertical edges. Subsequent blocks of layers detect more complex textures, while higher layers identify parts and whole objects of the vehicle.
The technical loan gets another meaning with the machine learning model. Engineers are constantly improving the architecture, optimization process and datasets. As a result the model became more accurate. However, modifying the detection model does not necessarily guarantee success in the overall behavior of a robot.
There are dozens of components that use the output of an object detection model, each requiring a different accuracy and recall level that is set based on the existing model. However, the new model may work differently in different ways. For example, the output potential distribution may be biased for larger values or may be more broad. While the average performance is good, it can be bad for certain groups like big cars. To avoid these constraints, the team calibrates the probabilities and tests the regression across multiple stratified data sets.
Monitoring trainable software components poses different challenges than standard software monitoring. There is little concern about estimated time or memory usage, as these are mostly constant.
However, dataset transfer becomes the primary concern – the distribution of data used to train the model differs from where the model is currently deployed.
For example, an electric scooter may suddenly run on the sidewalk. If the model does not consider this category, it will be difficult to classify the model correctly The information obtained from the Object Detection module will not be compatible with other sensitive information, which will result in requests for assistance from human operators and will result in reduced delivery.
Neural networks give Starship robots the ability to stay safe at road crossings and on sidewalks, avoiding car-like obstacles by understanding what people and other obstacles might choose.
Starship robots achieve this by using cheap hardware which creates many engineering challenges but robot delivery makes it a powerful reality today. Starship’s robots are making real deliveries seven days a week in multiple cities around the world, and it’s exciting to see how our technology continues to bring more benefits to people’s lives.