In order to better introduce the history and present situation of face recognition research, this paper divides the research history of AFR into three stages according to the characteristics of research content and technical methods, as shown in table 1.
This table summarizes the development history of face recognition research, the representative research work in each historical stage and its technical characteristics.
The following briefly introduces the research progress in three stages:
The first stage (1964 ~ 1990)
At present, face recognition is usually only studied as a general pattern recognition problem, and the main technical scheme adopted is the method based on facial geometric features.
This is mainly reflected in people's research on profile, and people have done a lot of research on the extraction and analysis of the structural characteristics of facial profile curves.
Artificial neural network has been used by researchers for face recognition.
In addition to bledsoe, there are other early researchers engaged in AFR research, such as Goldstein, Harmon and Kinderwood.
Jin Wufu completed his first doctoral thesis on AFR at Kyoto University on 1973. Until now, as a professor at the Robotics Institute of Carnegie Mellon University (CMU), he is still one of the active figures in the field of face recognition.
His research group is also an important force in the field of face recognition.
Generally speaking, this stage is the primary stage of face recognition research, and there are not many very important achievements, and it has basically not been applied in practice.
The second stage (199 1 year ~ 1997)
Although this stage is relatively short, it is a * * * period of face recognition research, which can be described as fruitful: not only a number of representative face recognition algorithms have been born, but also the famous FERET face recognition algorithm has been tested by the US military, and several commercially operated face recognition systems have emerged, such as the most famous Visionics (now Identix) FaceIt system.
The "feature face" method proposed by Turk and Pentland of MIT Media Lab is undoubtedly the most famous face recognition method in this period.
Many subsequent face recognition technologies are more or less related to feature face, and now feature face has become the benchmark algorithm for face recognition performance test together with normalized correlation method.
Another important work in this period is the comparative experiment conducted by Brunelli and Poggio of the artificial intelligence laboratory of MIT around 1992. They compared the recognition performance between the method based on structural features and the method based on template matching, and gave a clear conclusion: the method based on template matching is better than the method based on features.
This guiding conclusion works together with feature face * * *, which basically stops the research of face recognition method based on structural features, and promotes the development of face recognition method based on appearance and statistical pattern recognition technology and linear subspace modeling to a great extent, making it gradually become the mainstream face recognition technology.
The Fisherface face recognition method proposed by Belhumeur is another important achievement in this period.
Firstly, principal component analysis is used to reduce the dimension of image apparent features.
On this basis, the linear discriminant analysis (LDA) method is used to transform the principal components after dimensionality reduction, so as to obtain "as large as possible between-class divergence and as small as possible within-class divergence".
At present, this method is still one of the mainstream face recognition methods, which has produced many different variants, such as zero-space method, subspace discriminant model, enhanced discriminant model, direct LDA discriminant method and some recent improved strategies based on kernel learning.
On the other hand, Moghaddam of MIT proposed a face recognition method based on Bayesian probability estimation in dual space.
This method transforms the similarity calculation of two face image pairs into a two-class (intra-class difference and inter-class difference) classification problem by "difference method". Both intra-class difference and inter-class difference data must be reduced by principal component analysis, and the conditional probability density of the two types of data must be calculated. Finally, face recognition is carried out by Bayesian decision (maximum likelihood or maximum posterior probability).
Another important method in face recognition-Elastic Graph Matching (EGM) was also put forward at this stage.
The basic idea is to describe the face with an attribute graph: the vertex of the attribute graph represents the key feature points of the face, and its attribute is the multi-resolution and multi-directional local feature-Gabor transform [12] feature at the corresponding feature point, and the attribute called Jet edge is the geometric relationship between different feature points.
For any input face image, elastic graph matching uses optimized search strategy to locate some predefined key face feature points, and at the same time extract their Jet features to get the attribute map of the input image.
Finally, the recognition process is completed by calculating the similarity between it and the known face attribute map.
The advantage of this method is that it not only retains the global structural features of the face, but also models the key local features of the face.
Recently, this method has been extended.
Local feature analysis technology was proposed by Atick and others of Rockefeller University.
In essence, LFA is a low-dimensional object description method based on statistics. Compared with PCA, which can only extract global features but can't maintain local topological structure, LFA can extract local features based on global PCA description while maintaining global topological information, so it has better description and discrimination ability.
LFA technology, as a famous FaceIt system, has been commercialized, so no new academic progress has been published in the later period.
The FERET project funded by the Office of Anti-drug Technology Development Program of the US Department of Defense is undoubtedly a crucial event at this stage.
The goal of FERET project is to develop AFR technology that can be used by security, intelligence and law enforcement departments.
The project includes three parts: funding a number of face recognition research, creating a FERET face image database, and organizing the performance evaluation of FERET face recognition.
The project organized three face recognition evaluations in 1994, 1995 and 1996 respectively, and several most famous face recognition algorithms participated in the test, which greatly promoted the perfection and practicality of these algorithms.
Another important contribution of this test is to give the further development direction of face recognition: face recognition under non-ideal acquisition conditions such as illumination and posture has gradually become a hot research direction.
Flexible models including active shape model (ASM) and active appearance model (AAM) are important contributions to face modeling in this period.
ASM/AAM describes the face as two independent parts, 2D shape and texture, which are modeled by statistical method (PCA) respectively, and then they are further integrated by PCA to statistically model the face.
Flexible model has good ability of face synthesis, and image analysis technology based on synthesis can be used for feature extraction and modeling of face images.
Flexible model has been widely used in face alignment and recognition, and many improved models have appeared.
Generally speaking, face recognition technology is developing very rapidly at this stage, and the proposed algorithm has achieved very good performance in ideal image acquisition conditions, object coordination and small and medium-sized frontal face database, so several well-known face recognition commercial companies have emerged.
From the technical scheme, linear subspace discriminant analysis, statistical apparent model and statistical pattern recognition method of 2D face images are the mainstream technologies at present.
The third stage (1998 ~ now)
The evaluation of FERET'96 face recognition algorithm shows that the mainstream face recognition technology is not robust to illumination and posture changes caused by non-ideal acquisition conditions or uncoordinated objects.
Therefore, illumination and posture problems have gradually become research hotspots.
At the same time, the commercial system of face recognition has been further developed.
Therefore, on the basis of FERET test, the US military organized two commercial system evaluations in 2000 and 2002.
Gehiades et al. put forward a face recognition method based on illumination cone model under multi-pose and multi-illumination conditions, which is one of the important achievements in this period. They proved an important conclusion: all images of the same face in the same perspective and under different illumination conditions form a convex cone in the image space, that is, the illumination cone.
In order to calculate the illumination cone from a small number of face images with unknown illumination conditions, they also extended the traditional photometric stereo vision method, which can restore the three-dimensional shape of the object and the surface reflection coefficient of the surface points, convex surface and distant light source according to seven unknown images with the same illumination conditions under the assumption of Lambert model (traditional photometric stereo vision can restore the normal vector direction of the object surface according to three given images with known illumination conditions), so that images with arbitrary illumination conditions under this perspective can be easily synthesized to complete the calculation of the illumination cone.
Recognition is accomplished by calculating the distance from the input image to each illumination cone.
During this period, the statistical learning theory represented by support vector machine is also applied to face recognition and confirmation.
Support vector machines are two kinds of classifiers, while face recognition is a multi-class problem.
There are usually three strategies to solve this problem, namely: intra-class difference/inter-class difference method, one-to-many method and one-to-one method.
The face image analysis and recognition method based on three-dimensional deformation model proposed by Brands and Vetter is a pioneering work at this stage.
This method is essentially based on comprehensive analysis technology. Its main contribution lies in the statistical deformation model based on 3D shape and texture (similar to 2D AAM). At the same time, the perspective projection and illumination model parameters in the process of image acquisition are modeled by means of graphic simulation, so that the internal attributes of face, such as face shape and texture, are completely separated from external parameters, such as camera configuration and illumination, which is more conducive to the analysis and recognition of face images.
Blanz experiments show that this method achieves high recognition rate on CMU- Pai (multi-pose, illumination and expression) face database and FERET multi-pose face database, which proves the effectiveness of this method.
At the 200 1 International Computer Vision Conference (ICCV), researchers Viola and Jones of Compaq Research Institute demonstrated their real-time face detection system based on simple rectangular features and AdaBoost, and the speed of detecting quasi-positive faces in CIF format reached more than 15 frames per second.
The main contributions of this method include: 1) using simple rectangular features that can be calculated quickly as face image features; 2) Combining a large number of weak classifiers based on AdaBoost to form a learning method of strong classifiers; 3) Cascade technology is adopted to improve the detection speed.
At present, the strategy based on this face/non-face learning has been able to realize quasi-real-time multi-pose face detection and tracking.
This provides a good foundation for back-end face recognition.
Shashua proposed a face image recognition and rendering technology based on quotient graph [13] in 200 1.
This technology is a rendering technology based on the learning of specific object image set, which can synthesize a synthetic image of any input face image under various illumination conditions according to a small number of images with different illumination in the training set.
Based on this, Shasuha and others also gave the definition of face signature image with constant illumination, which can be used for face recognition with constant illumination. Experiments have proved its effectiveness.
Basri and Jacobs use spherical harmonic function to represent illumination and convolution process to describe Lambert reflection, which analytically proves an important conclusion: the set of all Lambert reflection functions obtained by any distant light source constitutes a linear subspace.
This means that the image set of a convex Lambert surface object under various illumination conditions can be approximated by a low-dimensional linear subspace.
This is not only consistent with the empirical experimental results of previous illumination statistical modeling methods, but also further promotes the development of linear subspace target recognition methods in theory.
Moreover, it is possible to force the illumination function to be non-negative by convex optimization method, which provides an important idea for solving the illumination problem.
After the FERET project, several commercial face recognition systems appeared.
The relevant departments of the US Department of Defense have further organized the evaluation of FRVT for the commercial system of face recognition, which has been held twice so far: FRVT2000 and FRVT2002.
On the one hand, these two tests compare the performance of well-known face recognition systems. For example, FRVT2002 test shows that Cognitec, Identix and Eyematic are far ahead of other systems, but there is little difference between them.
On the other hand, the development status of face recognition technology is comprehensively summarized: under ideal conditions (positive visa photos), the highest preferred recognition rate of face recognition is 73%, and the equal error rate of face verification (EER[ 14]) is 37,437 people, 1589 images.
Another important contribution of FRVT test is that it further points out some problems that need to be solved urgently in the current face recognition algorithm.
For example, the test of FRVT2002 shows that the performance of commercial face recognition system is still very sensitive to indoor and outdoor lighting changes, posture, time span and other changing conditions, and the problem of effective recognition on large-scale face database is also very serious. These problems still need further efforts.
Generally speaking, under the condition of non-ideal imaging (especially illumination and posture) and the incongruity of objects, face recognition on large-scale face database has gradually become a hot issue.
Nonlinear modeling method, statistical learning theory, learning technology based on Boosting [15], face modeling and recognition method based on 3D model, etc. It has gradually become the development trend of technology.
In a word, face recognition is a research topic with both scientific research value and broad application prospects.
A large number of international researchers have achieved fruitful research results in decades, and automatic face recognition technology has been successfully applied under certain restrictions.
These achievements have deepened our understanding of automatic face recognition, especially its challenge.
Although the existing automatic face recognition system may have surpassed human beings in the comparison speed and even accuracy of massive face data, it is far less robust and accurate than human beings for general face recognition problems under complex changing conditions.
The essential reason of this gap is still unknown, after all, our understanding of human visual system is still superficial.
However, from the perspective of pattern recognition and computer vision, this may not only mean that we have not found an effective sensor to sample facial information reasonably (considering the difference between monocular camera and human binocular system), but also mean that we have adopted an inappropriate face modeling method (the internal representation of face), and may also mean that we have not realized the extreme accuracy that automatic face recognition technology can achieve.
But in any case, it is the dream of many researchers in this field to give computing devices the ability of human face recognition.
I believe that with the deepening of research, our understanding should be able to approach the correct answers to these questions more accurately.