Current location - Education and Training Encyclopedia - Educational institution - Introduction to the Basic Knowledge of Autopilot Technology
Introduction to the Basic Knowledge of Autopilot Technology
Self-driving car is a kind of vehicle that can perceive the surrounding environment and navigate without human intervention. It uses radar, laser, ultrasonic, GPS, odometer, computer vision and other technologies to perceive its surrounding environment. Through the advanced calculation and control system, it can identify obstacles and various signs, and plan the appropriate path to control the vehicle.

SAE (American Society of Automotive Engineers) classifies autonomous driving into six grades: 0-5.

Level 0: No automation.

Without any automatic driving function or technology, human drivers have absolute control over all the functions of the car. Drivers need to be responsible for steering, accelerating, braking and observing road conditions. Any driver assistance technology, such as the existing forward collision warning, lane departure warning, automatic wiper and automatic headlight control. Although intelligent, it still needs people to control the vehicle, so it still belongs to level 0.

1 class: driving assistance.

Drivers are still responsible for driving safety, but some control rights can be entrusted to system managers, and some functions can be performed automatically, such as adaptive cruise control (ACC), emergency brake assist (EBA) and lane keeping support (LKS). 1 class is characterized by a single function, and the driver cannot control his hands and feet at the same time.

Level 2: partial automation

Human drivers and cars share control. The driver may not operate the car under some preset circumstances, that is, his hands and feet leave the control at the same time, but the driver still needs to be on standby at any time, be responsible for driving safety, and be ready to take over the driving right in a short time. For example, it combines the car-following function formed by ACC and LKS. The core of Level 2 is not to have more than two functions, but that the driver can no longer be the main operator.

The third level: conditional automation.

Automatic control can be achieved in limited circumstances, such as in preset road sections (such as high-speed and less crowded urban sections). Automatic driving can be fully responsible for the control of the whole vehicle, but in an emergency, the driver still needs to take over the car at a certain time, but there is enough warning time, such as the road ahead. Level 3 will liberate drivers, that is, they are no longer responsible for driving safety and do not have to monitor road conditions.

Level 4: Highly automated

Autonomous driving can be highly automated in some road conditions, such as closed parks, highways, urban roads or fixed driving routes. Under these limited conditions, human drivers can do it without intervention.

Level 5: Fully automated.

There is no restriction on the driving environment, and it can automatically cope with all kinds of complicated traffic conditions and road environments and drive from the starting point to the destination without assistance. It only needs the information of the starting point and the end point, and the car will always be responsible for driving safety, completely independent of the driver's intervention and not limited by specific roads.

Note: DDT (Dynamic Driving Task): Dynamic driving task refers to all the real-time operation and strategic functions required for a car to drive on the road, excluding strategic functions such as travel arrangement, destination and route selection.

The core of driverless system can be summarized as three parts: perception, planning and control. The following figure shows the interaction of these components and their interaction with vehicle hardware and other vehicles:

Perception refers to the ability of unmanned driving system to collect information from the environment and extract relevant knowledge from it. Among them, environmental awareness refers to the ability to understand the environment, such as the location of obstacles, the detection of road signs/signs, the detection of pedestrians and vehicles, and the semantic classification of other data. Generally speaking, positioning is also a part of perception, and positioning is the ability of unmanned vehicles to determine their position relative to the environment.

In order to ensure that unmanned vehicles can understand and grasp the environment, the environmental awareness part of unmanned driving system usually needs to obtain a lot of information about the surrounding environment, including the location, speed and possible behavior of obstacles, driving areas, traffic rules and so on. Unmanned vehicles usually obtain this information by fusing the data of various sensors such as lidar, camera and millimeter wave radar.

The functions, advantages and disadvantages of vehicle-mounted radar sensors are different, and the related comparison is shown in the following table:

Lidar is a kind of equipment that uses laser to detect and measure distance. It can send millions of light pulses to the environment every second, and its interior is a rotating structure, which enables lidar to build a three-dimensional map of the surrounding environment in real time.

Generally speaking, lidar scans the surrounding environment at the speed of 10Hz, and the result of one scan is a three-dimensional map composed of dense points, each of which has (x, y, z) information. This picture is called a point cloud picture. As shown in the figure below, it is a point cloud map established by the Willetten VLP-32c lidar:

Because of its reliability, lidar is still the most important sensor in unmanned system. However, in practical use, lidar is not perfect, and point clouds are often too sparse, and even some points are lost. It is difficult for lidar to identify the pattern of irregular surface. Another big challenge is that the sensing range of lidar is relatively close, and the average sensing range is about 150m, which depends on the environment and obstacles. Lidar is also far inferior to camera in angular resolution. Lidar is also sensitive to the environment. For example, on rainy days, the water splashed by vehicles is noisy on the lidar.

Millimeter-wave radar detects the existence, distance, speed and orientation of targets by emitting electromagnetic waves and detecting echoes. Millimeter-wave radar has become an important part of sensing equipment because of its relatively mature technology, low cost and good performance in bad weather. However, because of its low resolution, it can not be used as a substitute for lidar, but an important supplementary equipment for lidar.

There are four kinds of cameras according to different lenses and arrangements: monocular camera, binocular camera, binocular camera and panorama camera.

The monocular camera module only contains one camera and one lens. Because many researches on image algorithms are based on monocular cameras, the algorithm maturity of monocular cameras is higher than other types of cameras. But monocular has two congenital defects. First, its field of vision depends entirely on the lens. A lens with a short focal length has a wide field of view, but lacks long-distance information. Or vice versa, Dallas goes to the audience. Therefore, monocular cameras generally choose lenses with moderate focal length. Second, the single line-of-sight accuracy is low. The image of the camera is a perspective view, that is, the farther the object is, the smaller the image is. Nearby objects need to be described by hundreds or even thousands of pixels; The same object in the distance may only need a few pixels to describe it. This feature will lead to the farther the pixel is, the greater the distance it represents, so the farther the object is for monocular, the lower the accuracy of ranging.

Due to the defect of single line of sight, binocular camera came into being. When two cameras close to each other shoot an object, the pixel offset of the same object on the camera imaging plane will be obtained. Using the information of pixel offset, camera focal length and the actual distance between two cameras, the distance of the object can be obtained according to mathematical transformation. Although binocular can get high-precision ranging results and provide the ability of image segmentation, like monocular, the field of view of the lens depends entirely on the lens. Moreover, the principle of binocular ranging requires more installation positions and distances of two lenses, which will bring trouble to camera calibration.

The three-eye camera has some defects in both monocular and binocular, so the camera scheme widely used in unmanned driving is the three-eye camera. Monocular cameras are actually composed of three monocular cameras with different focal lengths. According to the different focal length, the sensing range of each camera is also different. For the camera, the perception range either loses the field of vision or loses the distance. Three-eye camera can make up for the problem of perception range. Therefore, it is widely used in the industry. It is precisely because the field of view of each camera of the three-eye camera is different that the near range is given to the wide field of view camera, the middle range to the main field of view camera and the far range to the narrow field of view camera. In this way, each camera can give full play to its advantages. The disadvantage of three uses is that three cameras need to be calibrated at the same time, so the workload is more. Secondly, the software part needs to correlate the data of three cameras, which also requires high algorithm.

Looking around the camera, the lenses of the three cameras mentioned above are all non-fisheye, and the lens of the camera is fisheye lens, and the installation position is facing the ground. Some high-profile models will have a "360 panoramic display" function, using a panoramic camera. Four fisheye lenses installed in front of the vehicle, under the left and right rearview mirrors of the vehicle and behind the vehicle collect images. In order to obtain a large enough field of view, the fisheye lens will produce serious image distortion. Panoramic camera has a small sensing range, which is mainly used for obstacle detection within 5~ 10 meters of the car body, garage line identification when parking independently, etc.

In order to understand point cloud information, generally speaking, we do two operations on point cloud data: segmentation and classification. Among them, segmentation is to cluster the discrete points in the point cloud map into several whole, while classification is to distinguish which kind of these whole belong to (such as pedestrians, vehicles, obstacles). Segmentation algorithms can be divided into the following categories:

After the segmentation of point cloud targets is completed, it is necessary to classify the segmented targets correctly. In this link, the classification algorithm in machine learning, such as support vector machine (SVM), is generally used to classify the features of clustering. In recent years, due to the development of deep learning, the industry began to use specially designed Convolutional Neural Network (CNN) to classify 3D point cloud clustering.

In practical application, whether it is the feature extraction method-SVM or the original point cloud method-CNN, because of the low resolution of lidar point cloud itself, the classification based on point cloud is not reliable for targets with sparse reflection points (such as pedestrians). Therefore, in practical applications, we often integrate radar and camera sensors, classify targets by using the high resolution of the camera, detect and measure obstacles by using the reliability of lidar, and combine their advantages to complete environmental perception.

In unmanned systems, we usually use image vision to detect roads and targets on roads. Road detection includes lane detection and drivable area detection. The detection of road signs includes the detection and classification of all traffic participants, such as vehicle detection, pedestrian detection and traffic sign detection.

The detection of lane lines involves two aspects: the first is to identify lane lines, and the curvature can be calculated for curved lane lines; The second is to determine the deviation of the vehicle itself from the lane line (that is, the position of the unmanned vehicle itself in the lane line). One method is to extract some lane features, including edge features (usually gradient, such as Sobel operator) and color features of lane lines. And use polynomial to fit the pixels that we think may be lane lines, and then determine the curvature of the lane line ahead and the deviation between the vehicle and the lane according to the polynomial and the current position of the camera installed on the vehicle.

At present, one method to detect the driving area is to use the depth neural network to segment the scene directly, that is, to cut the driving area in the image by training a pixel-by-pixel depth neural network.

At present, the detection and classification of traffic participants mainly rely on deep learning models, and the commonly used models include two types:

The sensor layer sends data frame by frame at a fixed frequency to the downstream, but the downstream can't make decisions and can't integrate with the data of each frame. Because the state of the sensor is not 100% effective, it is extremely irresponsible for downstream decision-making to judge whether there is an obstacle ahead only according to the signal of a certain frame (which may be detected by the sensor by mistake). Therefore, the upstream needs to preprocess the information to ensure that the obstacles in front of the vehicle always exist in the time dimension, rather than flash by.

Here we will use an algorithm that is often used in the field of intelligent driving-Kalman filter.

Kalman filter is an efficient recursive filter (autoregressive filter), which can estimate the state of dynamic system from a series of incomplete and noisy measurements. Kalman filter will consider the joint distribution at different times according to the values of each measurement at different times, and then generate the estimation of unknown variables, so it will be more accurate than the estimation method based on only one measurement.

Kalman filter has many applications in the technical field. Common is the guidance, navigation and control of aircraft and spacecraft. Kalman filter is also widely used in time series analysis, such as signal processing and econometrics. Kalman filter is also one of the important topics in robot motion planning and control, and sometimes it is included in trajectory optimization. Kalman filter is also used to model the motion control of central nervous system. Because there is a time difference between giving motion instructions and receiving feedback from sensory nerves, using Kalman filter is helpful to establish a practical system, estimate the current state of the motion system and update the instructions.

Information fusion refers to the multi-in-one operation of information with the same attributes.

For example, the camera detects an obstacle in front of the vehicle, the millimeter wave also detects an obstacle in front of the vehicle, and the lidar also detects an obstacle in front, but there is actually only one obstacle in front, so what we need to do is to fuse the information of this vehicle under the multi-sensor and tell the downstream that there is a car in front, not three cars.

Coordinate transformation is very important in the field of automatic driving.

Sensors are installed in different places, such as ultrasonic radar (if there is an obstacle on the right side of the vehicle, which is 3 meters away from the ultrasonic radar, then do we think that the obstacle is 3 meters away from the vehicle? Not necessarily, because the decision-making control layer completes the vehicle motion planning in the body coordinate system (the body coordinate system-the center of the rear axle is generally the O point), so all sensor information needs to be transferred to the vehicle coordinate system. Therefore, after the perception layer obtains the obstacle position information of 3m, it must transfer the obstacle position information in this chapter to the vehicle coordinate system, which can be used for planning and decision-making. Similarly, the camera is usually installed under the windshield, and the data obtained is also based on the camera coordinate system. For downstream data, it is also necessary to convert to vehicle coordinate system.

On the perception level of unmanned vehicles, the importance of positioning is self-evident. The unmanned vehicle needs to know its exact position relative to the environment, and the positioning error here cannot exceed 10cm. Imagine that if the positioning error of our unmanned vehicle is 30 cm, it will be a very dangerous unmanned vehicle (for pedestrians and passengers), because the planning and execution layer of the unmanned vehicle does not know that it has a 30 cm error, and they still follow. It can be seen that unmanned vehicles need high-precision positioning.

At present, the most widely used positioning method for unmanned vehicles is the integration of global positioning system (GPS) and inertial navigation system, in which the positioning accuracy of GPS is between tens of meters and centimeters, and the price of high-precision GPS sensors is relatively expensive. The positioning method based on GPS/IMU can't achieve high-precision positioning when the GPS signal is missing and weak, such as underground parking lots and urban areas surrounded by tall buildings, and can only be applied to unmanned tasks in some scenes.

Map-aided positioning algorithm is another widely used unmanned vehicle positioning algorithm. Synchronous Location and Map Creation (SLAM) is the representative of this kind of algorithm. SLAM's goal is to use this map for positioning while building a map. SLAM determines the current vehicle position and the currently observed features by using the observed environmental features. This is the process of estimating the current position by using previous and current observations. In practice, we usually use Bayesian filtering, including Kalman filtering, extended Kalman filtering and particle filtering. Although SLAM is a research hotspot in the field of robot positioning, there are problems in using SLAM in the actual development of unmanned vehicles. Unlike robots, the movement of unmanned vehicles is long-distance and open. In long-distance movement, with the increase of distance, the deviation of SLAM positioning will gradually increase, leading to positioning failure.

In practice, an effective way to locate unmanned vehicles is to change the scanning matching algorithm in the original SLAM. Specifically, instead of mapping while positioning, we use sensors such as lidar to construct a point cloud map of this area in advance, and add some "semantics" to the map through procedures and manual processing (such as the specific labeling of lane lines, the location of road network, traffic lights, traffic rules of current sections, etc.). ). This map with semantics is the height of our unmanned vehicle. In the actual positioning, we use the current lidar scanning and the pre-built high-precision map to match the point cloud to determine the specific position of our unmanned vehicle in the map. These methods are collectively called scan matching, and the most common scan matching method is Iterative Nearest Point (ICP), which completes point cloud registration based on the distance measurement between the current scan and the target scan.

In addition, normal distribution transformation is also a common point cloud registration method, which is based on the feature histogram of point clouds. The positioning method based on point cloud registration can also achieve the positioning accuracy within 10 cm. Although point cloud registration can give the global positioning of unmanned vehicles relative to the map, this method relies too much on the high-precision map built in advance, and it still needs to be used in conjunction with GPS positioning in open roads. In the road section with relatively simple scene (such as expressway), the cost of adding point cloud matching with GPS is relatively high.

Extended Reading: Challenges and Solutions of Perception System in L4 Autopilot

Analysis on an important link of automatic driving: the development status and direction of perception system

The planning module of unmanned vehicle is divided into three layers: task planning, behavior planning and action planning. Among them, task planning is usually also called path planning or route planning, which is responsible for the top-level path planning, such as path selection from the starting point to the end point. We can regard our current road system as a directed graph network, which can express the information such as the links between roads, traffic rules, road width, etc., which is essentially the "semantic" part of the high-precision map mentioned in the positioning part earlier. This directed graph is called routing network graph, as shown in the following figure:

Every directed edge in such a road network graph is weighted, so the path planning problem of unmanned vehicles becomes a process of selecting the optimal (that is, the least loss) path based on a certain method in order to make vehicles reach a certain goal (usually from A to B) in the road network graph, and then the problem becomes a directed graph search problem. Dijkstra algorithm and other traditional algorithms.

Behavior planning is sometimes called decision maker. Its main task is to make the decision that the unmanned vehicle should execute next according to the goal of mission planning and the current local situation (the position and behavior of other vehicles and pedestrians, current traffic rules, etc.). This floor can be understood as the co-pilot of the vehicle. According to the target and the current traffic situation, he instructs the driver whether to follow or overtake, whether to stop and wait for pedestrians to pass or bypass pedestrians, and so on.

One method of behavior planning is to use a complex finite state machine (FSM) containing a large number of action phrases. Starting from a basic state, FSM will jump to different action states according to different driving scenarios, and pass action phrases to the lower action planning layer. The following figure is a simple finite state machine:

As shown in the above figure, each state is a decision on vehicle action, and there are certain jumping conditions between states, and some states can be self-circulating (such as tracking state and waiting state in the above figure). Although it is the mainstream behavior decision-making method used by unmanned vehicles at present, finite state machine still has great limitations: first, to realize complex behavior decision, a large number of States need to be designed manually; The vehicle may fall into a state that is not considered by the finite state machine; If the finite state machine is not designed with deadlock protection, the vehicle may even fall into some kind of deadlock.

The process of planning a series of actions to achieve a certain purpose (such as avoiding obstacles) is called an action plan. Generally speaking, considering the performance of an action planning algorithm usually uses two indicators: computational efficiency and completeness. The so-called computational efficiency is the processing efficiency of completing an action planning, and the computational efficiency of an action planning algorithm largely depends on the configuration space. An action planning algorithm is said to be complete if it can return a solution in a limited time when there is a solution to the problem and no solution when there is no solution.

Configuration space: defines the set of all possible configurations of the robot, which defines the dimensions that the robot can move. The simplest two-dimensional discrete problem, then the configuration space is [x, y], and the configuration space of unmanned vehicle can be very complex, depending on the motion planning algorithm used.

After introducing the concept of configuration space, the action planning of unmanned vehicle becomes: given an initial configuration, a target configuration and some constraints, a series of actions are searched in the configuration space to achieve the target configuration, and the execution result of these actions is to transfer the unmanned vehicle from the initial configuration to the target configuration while satisfying the constraints. In the application scenario of unmanned vehicle, the initial configuration is usually the current state (current position, speed and angular velocity, etc.). ), and the target configuration comes from the upper action planning, and the constraint conditions are the movement limit of the vehicle (maximum angular amplitude, maximum acceleration, etc.). Obviously, the calculation of action planning in high-dimensional configuration space is very huge. In order to ensure the integrity of the planning algorithm, we have to search almost all possible paths, which forms the "dimension disaster" problem in continuous action planning. At present, the core idea of solving this problem in action planning is to transform continuous space model into discrete model, and the specific methods can be summarized into two categories: combined planning and sampling-based planning.

The combination method of motion planning finds the path through continuous configuration space without approximation. Because of this property, they can be called exact algorithms. The combination method finds a complete solution by establishing a discrete representation of the planning problem. For example, in the Darpa City Challenge, CMU's driverless car BOSS used the action planning algorithm. They first use the path planner to generate alternative paths and target points (these paths and target points can be realized through integrated dynamics), and then choose the optimal path through optimization algorithm. Another discretization method is grid decomposition method. After gridding the configuration space, we can usually use the discrete graph search algorithm (such as A*) to find an optimal path.

Sampling-based methods are widely used because of their probabilistic integrity. The most common algorithms are PRM (Probability Roadmap), RRT (Fast Utilization of Random Tree) and FMT (Fast Moving Tree). In the application of unmanned vehicles, the state sampling method needs to consider the control constraints of two States, and at the same time, it needs a method that can effectively query whether the sampling state and the parent state can be reached.

The vehicle control technology of self-driving car aims at planning the target trajectory according to the decision based on environment awareness technology. Through the cooperation of longitudinal and transverse control systems, the car can accurately and stably follow the track of the target, and at the same time, the car can realize basic operations such as speed adjustment, distance keeping, lane changing and overtaking during driving.

Internet technology companies mainly do software, mainly in the upper layer of engineering machines; In fact, the car factory mainly focuses on the assembly of the following layers, that is, OEM production, and is not so familiar with cars. In the fields of braking, throttle and steering, the right to speak is still concentrated in Tier 1 such as Bosch and China.

The core technology of automatic driving control is the longitudinal control and lateral control technology of vehicles. Longitudinal control, that is, driving and braking control of the vehicle; Lateral control, namely steering wheel angle adjustment and tire force control. The vertical and horizontal automatic control is realized, and the vehicle operation can be automatically controlled according to the given goals and constraints. Therefore, from the car itself, autonomous driving is a comprehensive longitudinal and lateral control.

Vehicle longitudinal control is the control in the direction of driving speed, that is, the vehicle speed and the distance between the vehicle and the front and rear vehicles or obstacles are automatically controlled. Cruise control and emergency braking control are both typical longitudinal control cases of automatic driving. This kind of control problem can be attributed to the control of motor drive, engine, transmission and braking system. Various motor-engine-transmission models, automobile running models and braking process models, combined with different controller algorithms, form various longitudinal control modes, and the typical structure is shown in the figure.

In addition, the slip rate control for tire force is the key part of longitudinal stability control. Slip rate control system can adjust the longitudinal dynamic characteristics of the vehicle by controlling the wheel slip rate, prevent the vehicle from excessive driving slip or braking lock, and thus improve the stability and handling performance of the vehicle. Anti-lock braking system (ABS) automatically controls the braking force when braking, so that the wheels are not locked and are in a rolling state (the slip rate is about 20%), thus ensuring that the ground can provide the maximum braking force for the wheels. Some intelligent slip rate control strategies use sufficient environmental awareness information to design the maximum slip rate regulator for the wheels that change with the road environment, thus improving the tire stress effect.

Intelligent control strategies, such as fuzzy control, neural network control, rolling time domain optimization control, etc. , has also been widely studied and applied in longitudinal control, and achieved good results, which is considered to be the most effective method.

Traditional control methods, such as PID control and feedforward open-loop control, generally establish approximate linear models of engine and automobile motion process, and design controllers on this basis. The control realized by this method has great dependence on the model, large model error and poor accuracy and adaptability. From the current papers and research projects, it is still the main content to seek simple and accurate models of motor-engine-transmission system, braking process and automobile movement, as well as controllers that are robust to random disturbances and adapt to automobile performance changes.

Vehicle lateral control refers to the control perpendicular to the direction of motion, which is also the steering control of the vehicle. The goal is to control the car to automatically maintain the desired driving route, with good ride comfort and stability under different speeds, loads, wind resistance and road conditions.

There are two basic design methods of vehicle lateral control, one is based on driver simulation; The second is to give the control method of the mechanical model of automobile lateral motion. Based on the method of driver simulation, one strategy is to design the controller by using simple motion mechanics model and driver's manipulation rules; Another strategy is to train the controller to obtain the control algorithm with the data of the driver's manipulation process. The method based on kinematic mechanics model should establish a more accurate lateral motion model of automobile. The typical model is the so-called monorail model or bicycle model, which means that the characteristics of the left and right sides of the car are the same. The basic structure of the lateral control system is as follows. The control target is generally the deviation between the center of the car and the center line of the road, and it is also constrained by comfort and other indicators.