Multi-scale image processing is an important tool in computer vision, and we can automatically extract interesting detection points from different scale spaces. An invariant local descriptor can be obtained for each detection point. This multi-scale feature algorithm is very important in the framework of modern computer vision.
The basic idea of multi-scale is simple: to increase the time or scale by properly filtering the original image, so as to create the zoom space of the image. For example, Gaussian scale space is realized by adding convolution of Gaussian kernel with standard deviation to the original image. For larger kernel values, we get a simpler image representation. Using the multi-scale representation of images, we can detect and describe the features of images in different scale spaces and resolutions. Some authors also prove that under some general assumptions, Gaussian kernel and its partial derivative set are smooth kernels of scale space analysis. However, it should be noted that Gaussian scale space is only one example of linear diffusion, because other linear scale spaces are also possible.
Gaussian kernel may be the simplest method to represent image scale space, but it is not unique. However, it also has some disadvantages. In Gaussian scale space, the advantage of choosing coarse scale is to reduce noise and highlight structure. At the cost of losing local accuracy. The reason is that Gaussian blur does not preserve the natural boundaries of objects, and noise and image details are equally smoothed on all scales. The greater the Gaussian blur, the greater the loss of local area detection features in rough scale space.
It seems more appropriate to locally adapt the blur to the image data, so that the noise will be blurred, but the details or edges will still be invalid. In order to achieve this, different nonlinear scale space methods are proposed to improve the Gaussian scale space method. Generally speaking, the performance of nonlinear diffusion method is much better than that of linear diffusion method, and impressive results have been achieved in different applications such as image segmentation or denoising. However, as far as we know, this paper is the first paper that uses an effective scheme of nonlinear diffusion filtering in the context of multi-scale feature detection and description. Using nonlinear scattering technology, we can detect and describe the image areas on different scales through nonlinear scale space, which can improve the repeatability and clarity of the image.
One reason why nonlinear diffusion filtering is not widely used in practical computer vision components such as feature detection and description may be that most methods are inefficient. These methods usually include discretization of functions by forward Euler method. Euler method needs a small step to converge, so it needs many iterations to reach the required scale, and the calculation cost is very high. Weikert and others accidentally introduced an effective scheme of nonlinear diffusion filtering. One of the backgrounds of these schemes is the use of aos technology. By aos method, a stable nonlinear scale space with arbitrary step size can be obtained. A key problem of aos scheme is to solve the tridiagonal equations of linear equations, which can be realized by Thomas algorithm, a special variant of Gaussian elimination algorithm.
It is proposed to realize automatic feature detection and description in nonlinear scale space. This paper introduces how to construct nonlinear scale space by using effective aos technology and variable conductance scattering, and how to obtain high repeatability and salient features under different image transformations. We evaluated our new functions in detail in the standard evaluation framework, as well as the actual image matching application using deformed surfaces.
Our topic is named Kaze to pay tribute to Iijima, the father of scale-space analysis. Kaze is a Japanese word which means wind. In nature, wind is defined as large-scale airflow, which is usually controlled by nonlinear processes. On this basis, the nonlinear diffusion process in image domain is simulated. The rest of this paper is arranged as follows: In the second part, we describe the related work. The third part briefly introduces the basic principle of nonlinear diffusion filtering. Section 4 describes kaze feature algorithm in detail. Finally, the detailed experimental results and conclusions are given in sections 5 and 6 respectively.
Feature detection and description is a very active research field in computer vision. In many different applications, it is very important to obtain features that show high repeatability and uniqueness under different image transformations (such as viewpoint, blur, noise, etc.). ). The most popular multi-scale feature detection and description algorithms are Scale Invariant Feature Transform (SIFT) and Fast Enhanced Robust Feature (SURF).
Sift feature is a milestone in feature detection and image matching, and it is still widely used in mobile robots and target recognition. In SIFT, the maximum and minimum values of Gaussian difference operator (DOG) results are obtained by Gaussian scale space. In order to establish the scale space, the original image of Gaussian fuzzy pyramid is calculated. Proportional space consists of different sub-and octaves. For a set of detected features, a descriptor is constructed based on the main gradient direction on the local region of interest of the detected key points. Then, a rectangular grid usually composed of 4×4 sub-regions (according to the main direction) is defined, and the gradient direction histogram weighted according to its size is established to get the descriptor vector of 128 elements.
Inspired by SIFT, Bay and others proposed surf detection and descriptor. Surf features show better results in repeatability, salience and robustness, but at the same time, because of the use of integral images, it can be calculated faster, which means that Gaussian derivatives of different scales can be approximated by simple square filters without calculating the whole Gaussian scale space. Similar to SIFT, a rectangular grid composed of 4×4 sub-regions (according to the main direction) is defined, and the sum of HAAR wavelet responses of each region (Gaussian weighting centered on the key points of interest) is calculated. The final descriptor dimension is usually 64 or an extended counter of 128. Agrawal and Konolige improved surf by using central surround detection (CenSurE) and modified surf(m-surf) descriptors. M-surf is a variant of the original surf descriptor, but it handles the descriptor boundary effect better and adopts a more robust and intelligent two-stage Gaussian weighting scheme.
These methods and many subsequent correlation algorithms rely on Gaussian scale space and Gaussian derivative set as smooth kernels for scale space analysis. However, once again, Gaussian scale space does not retain objects and smooth natural boundaries, and handles details and noise equally on all scales. Through nonlinear diffusion filtering, multi-scale features with higher repeatability and significance can be obtained than previous algorithms based on Gaussian scale space. Compared with surf or CenSurE, the calculation cost is slightly increased, and our results show that the performance has been greatly improved in feature detection and description.
The nonlinear diffusion method describes the change of image brightness with the increase of scale as the divergence of some flow functions that control the diffusion process. These methods are usually described by nonlinear partial differential equations (PDES), because the nonlinear nature of the differential equations involved extends the brightness of the image to the nonlinear scale space. Equation 1 gives the classical nonlinear diffusion formula:
Where and represent divergence and gradient operations, respectively.
Due to the introduction of transfer function, it is possible to diffuse adaptive image local structure. The function depends on the local differential structure of the image, and the function can represent both scalar and tensor. Time is a scale parameter, and the larger the value, the simpler the image. In this paper, we will focus on variable conduction diffusion, that is, using the gradient amplitude of the image to control the diffusion at each scale level.
In the field of computer vision, it was first mentioned that nonlinear diffusion filtering was put forward by perona and Malik, which correlated the function with the gradient amplitude. In order to reduce the calculation loss of local edges, a smoother region is used instead of the boundary. In this way, the function can be defined as:
Where the gradient of the Gaussian smoothed version of the original image is represented. Perona and Malik described two different equations of the transfer function:
Where this parameter is the contrast factor that controls the diffusion level. Function is beneficial to high contrast boundary and large area rather than small area. For the region with rapidly decreasing diffusivity, Weickert proposed a slightly different diffusion function: the effect of edge smoothing is better than that of edge smoothing. Compared with inter-region blurring, selective smoothing is more inclined to intra-region smoothing. The function is called and defined as follows:
Contrast parameters can be selected manually or automatically according to some estimated values of image gradient. The contrast parameter determines whether the gradient should be enhanced or cancelled. In this paper, we use the empirical value, which is set to 70% of the histogram of the gradient value of the smooth version of the original image. In our experiments, this empirical value can usually achieve good results. However, for some images, more detailed analysis of contrast parameters may get better results. Figure 1 describes the conduction functions of different parameter values in perona and Malik equations. Generally, for higher values, only larger gradients are considered.
The partial differential equation in nonlinear diffusion filtering has no analytical solution. Therefore, it is necessary to approximate the differential equation by numerical method. One possible discretization method of diffusion equation is the so-called linear interpolation or semi-implicit scheme. In the vector matrix representation, the discretization of the equation 1 can be expressed as:
Where is a matrix encoding the mirror conduction of each dimension. In the semi-implicit scheme, in order to calculate the solution, it is necessary to solve a system of linear equations. The solution of can be obtained in the following ways:
The semi-implicit scheme is absolutely stable for any step size. In addition, it also creates a discrete nonlinear diffusion scale space for any large time step. In the semi-implicit scheme, linear equations need to be solved, in which the system matrix is tridiagonal and diagonally dominant. Thomas algorithm is a variant of the famous Gauss elimination algorithm for tridiagonal equations, which can solve this kind of equations very effectively.
In this part, we will introduce new methods of feature detection and description in nonlinear scale space. Given an input image, we use AOS technology and variable conduction diffusion to establish a nonlinear scale space until the maximum evolution time. Then, we detect the two-dimensional features of interest through nonlinear scale space, which show the maximum of the scale normalized determinant of Hessian response. Finally, the principal directions of key points are calculated, and the scale and rotation invariant descriptors considering the first derivative of the image are obtained. Now, we will describe each major step in the formula.
We use a method similar to SIFT to disperse the scale space with logarithmic step size and arrange it into a series of layers and layers. Note that we always use the resolution of the original image, and we don't perform any downsampling at every new level as in SIFT. Levels and layers are determined by discrete superscripts and superscripts. Levels and layers are mapped to their corresponding scales by the following formula:
Where is the basic scale level and the total number of filtered images. Now, we need to convert the discrete scale level set of pixel units into time units. The reason for this conversion is that nonlinear diffusion filtering is defined by time term. In the case of Gaussian scale space, the convolution between the image and Gaussian standard deviation (in pixels) is equivalent to filtering the image for a period of time. We apply this transformation to obtain a set of evolution times, and transform the scale space into time units through the following mapping:
It must be mentioned here that we only use mapping to obtain a set of evolution time, from which we construct a nonlinear scale space. Generally speaking, in the nonlinear scale space of each filtered image, the resulting image does not correspond to the convolution of the original image and the standard deviation Gaussian. But our framework is also compatible with Gaussian scale space, because we can get the equation of Gaussian scale space by setting the diffusion function equal to 1 (that is, constant function). In addition, as long as we evolve in the nonlinear scale space, except for the strong image edge corresponding to the object boundary, the conduction function of most image pixels tends to be constant.
Given an input image, we first convolve the image with Gaussian kernel with standard deviation to reduce noise and possible image artifacts. According to the basic image, we calculate the gradient histogram of the image according to the automatic program described in section 3. 1, and obtain the contrast parameters. Then, given the contrast parameters and a set of evolution time, the AOS scheme (which is absolutely stable for any step size) is intuitively used to iteratively construct the nonlinear scale space:
Fig. 2 depicts the comparison between Gaussian scale space and nonlinear scale space (using conduction function) in several evolution times given the same reference image. It can be observed that Gaussian blur smoothes all the structures in the image, while in the nonlinear scale space, the edge of the strong image remains unchanged.
In order to detect the points of interest, we calculate the response of Hessian's scale normalized determinant at multiple scale levels. For multi-scale feature detection, it is necessary to normalize the subset of differential operators of this scale, because usually the amplitude of spatial derivatives decreases with the decrease of scale:
Among them, they are the second-order horizontal derivative and vertical derivative respectively, and they are the second-order cross derivative. Given a set of filtered images in nonlinear scale space, we analyze the response of the detector at different scale levels. We look for the maximum of scale and spatial position. Perform extreme value search in all filtered images except sum. Search the extreme value of each filtered image on the rectangular window of the current, upper and lower sizes. In order to speed up the search for the extreme value, we first check the response on the 3×3 pixel window so as to quickly discard the non-maximum response. Finally, sub-pixel accuracy is used to estimate the position of key points.
The first and second derivative sets are approximated by Scharr filters with different derivative steps. The second derivative is approximated by using a continuous Scharr filter in the desired derivative coordinates. These filters are closer to rotation invariance than other popular filters (such as Sobel filter or standard central difference filter). Please note that although we need to calculate the multi-scale derivative of each pixel, we save the calculation workload in the description step because we reuse the same set of derivatives calculated in the detection step.
Find the main direction. In order to obtain the rotation invariant descriptor, it is necessary to estimate the dominant direction in the local neighborhood centered on the key point position. Similar to SURF, we find the main direction in a circular area with radius, and the sampling step is. For each sample in the circular region, the first derivative sum uses Gaussian weighting centered on the point of interest. Then, the derivative response is expressed as a point in the vector space, and the principal direction is determined by summing the responses in the sliding circle segment covering the angle. The principal direction is obtained from the longest vector.
Build descriptors. The M-SURF descriptor we use is suitable for our nonlinear scale space framework. For the detection features with size, the first derivative sum with size is calculated on the rectangular grid. The grid is divided into sub-regions, the size is, and the overlapping part is. The derivative response in each sub-region is centered on the center of the sub-region, weighted by Gaussian (), and summed into descriptor vectors. Then, Gaussian () is used to weight each sub-region vector, which is defined on the mask and centered on the key point of interest. When the principal direction of the key points is considered, each sample in the rectangular grid will rotate according to the principal direction. In addition, the derivative is calculated according to the principal direction. Finally, the descriptor vector with the length of 64 is normalized to the unit vector to achieve contrast invariance.