Current location - Education and Training Encyclopedia - Graduation thesis - Dense cloud details, okay?
Dense cloud details, okay?
Miyun, details. Good dense point cloud generation network

Name: Dense 3D Point Cloud Reconstruction Using Depth Pyramid

abstract

In this paper, DensePCR, a deep pyramid network model for point cloud reconstruction, is proposed. Firstly, a low-resolution point cloud is predicted, and then the density of the point cloud is increased by aggregating global features and local features. One of the highlights of this model is to deal with point clouds in stages, while the previous reconstruction methods based on point clouds all predict point clouds in one stage, which will lead to two problems:

This will significantly increase the parameters of the model.

Loss functions such as EMD will greatly increase the amount of calculation.

This paper solves two important problems of single view reconstruction:

Given a two-dimensional picture, is there enough information to predict dense point clouds?

How to up-sample sparse point clouds to obtain dense point clouds closer to the surface of objects?

For the first question, this paper uses a network with deep pyramid structure. Firstly, a low-density sparse point cloud is predicted, and then the density of the point cloud is increased by layers. For the second question, a mechanism is proposed to "distort" the local mesh around each point by using the neighborhood terrain information and global shape attributes.

The contribution of this paper can be summarized as follows:

DensePCR, a deep pyramid network model for point cloud reconstruction, is proposed, and the density of point cloud is gradually increased through hierarchical steps.

Compared with the previous network model, the parameters of this model are reduced by 3 times.

The architecture of this model is as shown in the figure:

DensePCR architecture diagram. The training pipeline first predicts the low-resolution point cloud, and then increases the point cloud density in layers. Multiscale training strategy is used to constrain each intermediate structure. The intermediate point cloud (Xp) extracts global (Xg) and local (Xl) point features, and adjusts them on the coordinate grid (Xc) around each point to generate dense prediction, thus achieving super resolution.

Then the multi-stage training, global feature learning, local feature learning, feature aggregation and grid conditions are introduced respectively.

Multi-stage training

The network consists of several training stages. Firstly, the image is generated into a low-density point cloud through a point network, and then a high-density point cloud is obtained through a dense reconstruction network. Moreover, the ranging loss function used in each stage is different.

As shown in figure a above, the picture generates sparse point cloud X p through the network of coding and decoding structure. Because the point cloud is disordered, the designed loss function must be able to avoid the uncertainty of the result caused by the disorder of the point cloud. Next, two loss functions are introduced: chamfer distance (CD) and bulldozer distance (EMD).

The definition of CD is as follows:

Xp is a truly distributed point cloud. For each prediction point, find the point with the minimum distance from the point in Xp, and then calculate the minimum distance, and do this for all prediction points. According to each point in Xp in turn, find out the minimum distance from the point in the predicted point cloud, do this operation for all the real points, and finally add up the distances to get the final distance, which is the chamfer distance. (Note: Although this algorithm is simple and fast, it is obvious that the chamfer distance algorithm cannot guarantee the consistency of two-point cloud distribution. )

EMD is defined as follows:

EMD can alleviate the problems caused by CD, here? It is a bijection, because it can map one point set to another, thus ensuring the consistency of the two point sets, but EMD has a significant disadvantage, that is, the time complexity is too large. In view of this, this paper uses CD and EMD in stages. Firstly, after the sparse point cloud is generated for the first time, EMD can not only keep the distribution of the generated point cloud consistent with the real point cloud, but also reduce the calculation amount. Then the subsequent dense point clouds use CD to calculate the loss.

The generation method of sparse point cloud here is a series of convolution layers, batchnorm layers and relu layers, and the subsequent fully connected layers. Through the analysis of subsequent components, the generation method of dense point cloud is introduced.

Dense reconfiguration network

The generation of dense point clouds needs global features, local features, feature aggregation and grid coordinate adjustment.

Global feature learning

As we all know, global features are very important for 3D reconstruction. In order to extract global features, DensePCR adopts the same processing method as dot net and MLP.

MLP can extract the global features of point clouds through the perceptron layer with * * * parameters. The maxpool in the above figure is a symmetrical operation. It can be seen that although the global features can be extracted through the operation of MLP+ maximum pool, the local over-features of the object are inadvertently erased because of the existence of the maximum pool.

Local feature learning

In the field of point cloud segmentation and point cloud classification, local features are extremely important. In order to better reconstruct the details of objects, the local characteristics of point clouds are also very important for reconstruction. So DensePCR uses PointNet++ network model to extract the local structure of the object (please refer to my other article for the introduction of PointNet++).

Specifically, DensePCR constructs a neighborhood sphere around each point, and the global features in this neighborhood can be obtained by using MLP in each neighborhood, which is also a local feature relative to the whole point cloud. Then a matrix of n×nl can be obtained, where N is the number of input point clouds and nl is the output channel of the last MLP.

Feature aggregation and coordinate grid adjustment

At this time, we have got global features and local features, so we need a mechanism to generate dense point clouds from global and local features and sparse point clouds.

In order to achieve this task, DensePCR proposed the feature vector corresponding to the output point, which consists of point coordinates, global features and local features respectively, and is expressed as [Xp, Xg, Xl], and its dimension is n×(3+ng+nl).

How did n dots become 4n dots?

DensePCR tiled n×(3+ng+nl) into 4n×(3+ng+nl) by up-sampling factor.