Autonomous Car Vision

[Cross Post from Akshat Sharma Former Intern at McKinsey & Co. | Computer Vision | Deep Learning | Machine Learning | | Cassandra ]

We are in an era of autonomous vehicles. Every year autonomous cars are tested in various corners of the world. This amazing engineering obtained by combining different technologies.

This article describes the fusion of the autonomous vehicle and How are the sensors used in an autonomous vehicle?

Sensor Fusion uses and combines data from other sensors such as a Radar and Lidar to complement those obtained by the camera, which makes it possible to estimate the positions and speeds of the objects identified by the camera.

Sensors

Autonomous Vehicles use a large number of sensors to understand its environment, locate and navigate.

No alt text provided for this image

Different sensors on the self-driving car (source)

Camera

The camera transmits the vision of the driver. It is very often used to understand the environment with artificial intelligence by classifying roads, pedestrians, signs, vehicles and more.

Here, we analyze the real-time continuous images from the camera. In Image analysis, we generally speak of Deep Learning under the term Convolution Neural Network or ConvNets.

Autonomous vehicles rely on cameras placed on every side front, rear, left and right to stitch together a 360-degree view of their environment. Some have a wide field of view as much as 120 degrees and a shorter range. Others focus on a more narrow view to provide long-range visuals.

It’s also more difficult for camera-based sensors to detect objects in low visibility conditions, like fog, rain or nighttime.

Radar - Radio Detection and Ranging

The radar emits radio waves to detect objects within a radius of several meters. Radars have been in our cars for years to detect vehicles in blind spots or to avoid collisions. They have better results in moving objects than on static objects.

Unlike other sensors that calculate the difference in position between two measurements, the radar uses Doppler effect (an increase or decrease in the frequency of sound, light, or other waves as the source and observer move towards or away from each other.) by measuring the change in the next wave frequency if the vehicle moves towards us or moves away.

It allows to know the position and of a detected object but it can not determine the object being sensed.

While the data provided by surround radar and camera are sufficient for lower levels of autonomy, they don’t cover all situations without a human driver. That’s where lidar comes in.

Lidar - Light Detection and Ranging

The lidar uses infrared sensors to determine the distance to an object. A rotating system makes it possible to send waves and to measure the time taken for this wave to come back to it. This makes it possible to generate a point cloud of the environment around the sensor. A lidar can generate about 2 million points per second. This point cloud giving different 3D shapes, it is possible to make the classification of the object thanks to a lidar.

It allows a great distance (100 to 300 m) to estimate the position of objects around him. Its size is however cumbersome since it exceeds the roof of the vehicles.

According to Elon Musk, “Anyone relying on lidar is doomed. They are expensive sensors that are unnecessary.". That is a different part of the story.

Ultrasonic sensors

Ultrasonic sensors are used to estimate the position of static vehicles, for example, for parking assistance. They are much cheaper but have a range of a few meters.

Odometric sensors

They make it possible to estimate the speed of our vehicle by studying the displacement of its wheels.

Each of these sensors has advantages and disadvantages. The aim of sensor fusion is to use the advantages of each to precisely understand its environment. The camera is a very good for detecting roads, reading signs or recognizing a vehicle. The Lidar is better at accurately estimating the position of this vehicle while the Radar is better at accurately estimating the speed.

Kalman Filter

The Kalman filter is one of the most popular algorithms in data fusion. Invented in 1960 by Rudolph Kalman, it is now used in our phones or satellites for navigation and tracking. The most famous use of the filter was during the Apollo11 mission to send and bring the crew back to the moon.

When to use a Kalman Filter?

A Kalman filter can be used for data fusion to estimate the state of a dynamic system (evolving with time) in the present (namely filtering), the past (namely smoothing) or the (namely prediction). Sensors in autonomous vehicles emit measures that are sometimes incomplete and noisy. The inaccuracy of the sensors (noise) is critical and can be handled by the Kalman filters. It can also be used for tracking objects.

Kalman filter can benefit tracking in these ways:

  • Prediction of the object's future location.
  • Correction of the prediction based on new measurements.
  • Reduction of noise introduced by inaccurate detections.
  • Facilitating the process of association of multiple objects to their tracks.

Kalman filter is used to estimate the state of a system denoted x. This vector is composed of a position p and velocity v.

No alt text provided for this image

At each estimate, we associate a measure of uncertainty P.

By performing a fusion of sensors, we take into account different data for the same object. A radar can estimate that a pedestrian is 8 meters away while the Lidar estimates it to be 10 meters. The use of Kalman filters allows you to have a precise idea to decide how many meters really is the pedestrian by eliminating the noise of the two sensors.

A Kalman filter can generate estimates of the state of objects around it. To make an estimate, it only needs the current observations and the previous prediction. Measurement history is not necessary. This tool is therefore light and improves with time.

In general, a Kalman filter is an implementation of a Bayesian filter, ie a sequence of alternations between prediction and update or correction.

How does it Predict?

Our prediction consists of estimating a state x’ and an uncertainty P’ at time t from the previous states x and P at time t-1.

No alt text provided for this image

Prediction formulas

  • F: Transition matrix from t-1 to t
  • u: Noise added
  • Q: Covariance matrix including noise
  • B: control function()

How does it Update?

The update phase consists of using a z measurement from a sensor to correct our prediction and thus predict x and P.

No alt text provided for this image

Update formulas

  • y: Difference between actual measurement and prediction, ie the error.
  • S: Estimated system error
  • H: Matrix of transition between the marker of the sensor and ours.
  • R: Covariance matrix related to sensor noise (given by the sensor manufacturer).
  • K: Kalman gain. Coefficient between 0 and 1 reflecting the need to correct our prediction.

The update phase makes it possible to estimate an x and a P closer to reality than what the measurements provided.

A Kalman filter allows predictions in real-time, without data beforehand. We use a mathematical model based on the multiplication of matrices for each time defining a state x (position, speed) and uncertainty P.

This diagram shows what happens in a Kalman filter.

No alt text provided for this image

Kalman filter estimation (source)

  • Predicted state estimate represents our first estimate, our prediction phase. We are talking about prior.
  • Measurement is the measurement from one of our sensors. We have better uncertainty but the noise of the sensors makes it a measurement that is always difficult to estimate. We talk about likelihood.
  • Optimal State Estimate is our update phase. The uncertainty is this time the weakest, we accumulated information and allowed to generate value surer than with our sensor alone. This value is our best guess. We speak of posterior.

What a Kalman filter implements is actually a Bayes rule.

No alt text provided for this image

Bayes rule

In a Kalman filter, we loop predictions from measurements. Our predictions are always more precise since we keep a measure of uncertainty and regularly calculate the error between our prediction and reality. We are able from matrix multiplication and probability formulas to estimate velocities and positions of vehicles around us.

Extended filters and non-linearity

An essential problem arises. Our mathematical formulas are all implemented with linear functions of type y = ax + b.

A Kalman filter always works with linear functions. On the other hand, when we use a Radar, the data is not linear.

Radar Functionment (source)

 

  • ρ (rho): The distance to the object that is tracked.
  • φ (phi): The angle between the x-axis and our object.
  • ρ ̇ (rhodot): The change of ρ, resulting in radial velocity.

Radar Functionment (source)

These three values make our measurement a nonlinear given the inclusion of the angle φ.

 

Our goal here is to convert the data ρ, φ, ρ ̇ to Cartesian data (px, py, vx, vy).

If we enter non-linear data in a Kalman filter, our result is no longer in uni-modal Gaussian form and we can no longer estimate position and velocity.

  • Extended Kalman filters use the Jacobian and Taylor series to linearize the model.
  • Unscented Kalman filters use a more precise approximation to linearize the model.

To deal with the inclusion of non-linearity by the Radar, techniques exist and allow our filters to estimate the position and velocity of the objects that we wish to track.

Conclusion

Camera, Radar, and Lidar sensors provide rich data about the car environment. So rather than rely just on one type of sensor data at specific moments, sensor fusion makes it possible to fuse various information from the sensor suite such as shape, speed, and distance to ensure reliability. There are more algorithms that provide such perfection. Kalman Filter is one of them.

References

Leave a Reply