 Research
 Open
 Published:
Crosspose face recognition by integrating regression iteration and interactive subspace
EURASIP Journal on Wireless Communications and Networkingvolume 2019, Article number: 105 (2019)
Abstract
At present, the pose change of the face test sample is the main reason that affects the accuracy of face recognition, and the design of crosspose recognition algorithm is a technical problem to be solved. In this paper, a crosspose face recognition algorithm which integrates the regression iterative method and the interactive subspace method was proposed, and through the regression iteration, the target function converges rapidly and important characteristics of the sample were extracted. Then, the posture of crosspose face image was estimated, and finally, the interactive subspace method was applied to judge the similarity of the human face. The experimental results of FERET face database (± 45°and ± 90° posture) and MITCBCL face database (Nfold crossvalidation) showed that the proposed RIMISM algorithm had a higher recognition rate and robustness, and it could effectively solve the difficulty of crosspose face recognition.
Introduction
The crosspose face recognition problem is caused by texture changes brought by the rotation transformation of the face. When face pose change is bigger, and withincluster variation is greater than intercluster variation, the recognition rate will fell sharply. Therefore, crosspose face recognition is an important research topic in recent years.
In order to solve the problem of crosspose recognition, there are two main research methods: the algorithms based on pose correction and featurebased algorithms. Pose correction method rotates the face image and fits it into a new image, and this synthesis method includes two types of face models based on 3D and 2D.Y. Taigman et al. proposed DeepFace [1]. First of all, 3D face alignment was used for face pose correction, and then face frontal was processed. Later, it was then fed into the convolutional neural network for feature extraction, using the classifier for face authentication. Masi et al. proposed poseaware CNN models (PAMs) method to deal with pose changes for large pose changes [2]. The method uses a convolutional neural network to learn a specific multipose model and converts the multipose image to the learned model. Zhu et al. designed a new multiview sensor, which can separate pose and sample information by using different neurons in the network and reconstruct various pose images of a single 2D image [3]. M. Kan et al. designed a stacked progressive autoencoders (SPAE) neural network model to achieve nonlinear modeling of posture changes in a smallscale data [4]. J. Yim et al. designed a deep neural network based on multitask learning, the network can achieve any pose and any illumination face rotated to the designated face pose and normal illumination. Featurebased methods are to extract face features which are robust to pose for face recognition [5]. Y. Sun proposed DeepID network structure, which considered local and global characteristics simultaneously, and introduced a larger data set for model training [6]. Florian Schroff et al. proposed FaceNet framework to directly learn the coding method from image to Euclidean space, and then conduct face recognition, face verification and face clustering on basis of this coding [7]. Omkar M. Parkhi applied the deep network VGG to face recognition task. This network structure is implemented by a 16layer convolutional neural network and a threelayer fully connected layer, and it has been tested in LFW and YouTube Faces database and achieved a better recognition effect [8]. Recently, crosspose recognition methods based on features such as scaleinvariant feature transform (SIFT) [9], LBP [10], and Gabor [11] have achieved good recognition effects under controlled conditions and partially unconstrained conditions.
Based on the study of these two kinds of methods, and through combining regression iterative method with interactive subspace method, this study designed a RIMISM crosspose recognition algorithm: first of all, the regression iteration method was adopted to regress crosspose face shape to the location close to the real shape and detect important characteristics of the sample. Secondly, the deflection angle of the 2D face image corresponding to the 3D rectangular coordinate system was calculated in the 2D plane and the pose correction was carried out. Finally, the similarity of crosspose facial image was measured by using the interactive subspace method. In order to verify the effectiveness of RIMISM algorithm, largescale experiments were conducted in the FERET database and the MITCBCL database. Compared with other algorithms, the experimental results show that the proposed RIMISM crosspose face recognition algorithm had great advantages.
Methodology
Extraction of important characteristics
Crosspose face recognition is to recognize or identify faces of any pose in an image. The human face is a 3D structure, and the face pose can change in the three dimensions (X, Y, and Z), as shown in Fig. 1. Pitch = θ_{x} rotates around Xaxis, yaw = θ_{y} rotates around Yaxis horizontally, and roll = θ_{z} rotates around Zaxis plane.
The position of key feature points changes due to the change of pose, and the regression iterative method was adopted in this paper to deal with the change. The basic principle of regression iterative method is to provide an initial shape for a given face image, and through constant iterations, the initial shape is regressed to the position which is close to or equal to that of the real shape, and then obtains a series of downward directions and the directions on the scale, so as to make the target function converge to the minimum value quickly through studying, which improves the running speed of the program.
Given a face image containing m pixels, x ∈ R^{m × 1}, d(x) ∈ R^{p × 1} represent p important characteristics of the sample, and h(·) represents a nonlinear characteristic extraction function. Then, on the basis of a given initial shape x_{0}, regression iterative method was used to regress x_{0} to the correct shape x_{*} of the face, that is, to solve Δx which minimized f(x_{0} + Δx) in Formula (1):
Where ϕ_{∗} = h(d(x_{∗})) is SIFT characteristics obtained from the real feature points of the face. The objective function was expanded by Taylor expansion as follows:
Where, J_{f}(x_{0}) is Jacobian matrix, H(x_{0}) is Hessian matrix, and Formula (2) solved the derivative of Δx:
Where Δx = x_{k} − x_{k − 1}, J_{h} is Jacobian matrix, and ϕ_{k − 1} = h(d(x_{k − 1})) the feature vector extracted from face feature point x_{k − 1}. The regression iterative method was applied to solve the optimization problem of Formula (1). The optimal solution could be obtained by solving Formula (3), and Formula (3) could be rewritten as the regression iterative formula:
Where R_{k − 1} is the gradient optimization direction, b_{k − 1} is the average deviation parameter, and R_{k − 1} and b_{k − 1} could be obtained directly through training and learning. The regression iterative method could be used to detect the important characteristics of the sample in real time and extract the eyes, mouth, nose, and other important characteristic positions of the sample. At last, a total of 45 pairs of important characteristics were detected, as shown in Fig. 2.
Estimation of pose deflection
Face pose deflection estimation is to calculate the deflection angle (θ_{x}, θ_{y}, θ_{z}) of the 2D face image corresponding to the 3D rectangular coordinate system in the 2D plane, extract the important characteristics (eyes, nose, mouth, etc.) of the face image, and then determine the feature correspondence between the 3D model and the 2D model.
The multipose sample image was defined as I_{Q}, and p_{i}(x_{i}, y_{i})i = 1, 2, … , 45 is an important characteristic of the sample detected by the regression iterative method, as shown in Fig. 2c–d. According to the perspective projection principle of the camera, the corresponding relationship between the key feature point P of the 3D model and the key feature point p of the current face image I_{Q} could be obtained as follows:
The mapping relation between important characteristics from the 2D model to the 3D model was defined as C_{Q}, thus:
Where A_{Q} is the camera’s internal parameter mapping matrix, t_{Q} is the translation mapping matrix, and R_{Q} is the rotation mapping matrix. R_{Q} contains the attitude information of the image to be recognized, and its mapping relation is
Formula (5), (6), and (7) can determine the pose of the face image I_{Q} to be recognized, that is, the deflection angle (θ_{x}, θ_{y}, θ_{z}) of the 2D face image in the 3D coordinate system. The rotation matrix can be expressed as
The rotation mapping matrix R_{Q} can be expressed as
Formula (8) to (11) can be used to calculate the deflection angle (θ_{x}, θ_{y}, θ_{z}), respectively. The calculation formula is as follows:
Determination of face similarity
After the face deflection angle was estimated, the crosspose face image was rotated to correct the pose, so as to fit the approximate positive attitude image and establish the subspace G after pose correction, and vector g ∈ G. The reference subspace of frontal face was defined as D, vector d ∈ D; through calculating the similarity between two subspaces, it determined whether the two subspaces were the same category. Subspace method was usually used as the similarity criterion through computing two subspace cosine of the angle θ, namely,
Where φ_{n} is the eigenvector of D, and N is the dimension of D. In order to avoid the interference brought by artificial selection of reference subspace, and improve the stability of the similarity between samples, this paper adopted the interactive subspace method (ISM) to measure the similarity between crosspose face images: one subspace was used as the input subspace, and the other subspace was used as a reference subspace. Then the subspace method calculation was conducted for two times, and the two new spatial vectors were obtained. Later, the included angle between the two new vectors θ was calculated, and the principle was shown in Fig. 3.
The calculation formula is as follows:
Formula (16) has a local maximum value, so matrix P and Q were defined as the orthogonal projection matrix of subspace D and G, and the value of cosθ could be obtained by calculating the maximum value λ_{max} of the eigenvalue of matrix PQP, namely:
Through using the interactive subspace method, the eigenvalue problem of calculating highdimensional matrix PQP was transformed into calculating the eigenvalue of a lowdimensional matrix, and let the matrix X be
Where ρ_{j} is the eigenvector of G, L is the dimensions of G (L ≤ N), and the matrix X could be decomposed into
Where the diagonal value of the matrix Λ is the eigenvalue of X, and the maximum eigenvalue λ_{max} was selected to evaluate the similarity between D and G:
According to Formula (17), 0 ≤ S_{ISM} ≤ 1, if S_{ISM} is closer to 1, the greater the similarity between the two samples will be, and the greater the possibility that the face image it represents is the same person. On the contrary, if S_{ISM} is closer to 0, the greater the difference between the two samples will be, and the face image it represents cannot be the same person.
Experiments
FERET database crosspose recognition experiment
In order to verify the crosspose face recognition performance of the proposed method, an experiment was conducted on FERET database, which was created by the FERET project of the U.S. military. It contains 14,051 crosspose, multiillumination grayscale face images. The database contains subsets for different orientations. Figure 4 shows an example of the images of FERET database posture subset.
Using the “leaveoneout” crossvalidation method, the RIMISM method was compared with the three methods of DeepFace, SIFT, and Gabor, and a face image in the database was removed and used as test data, and the remaining images were used for training.
The experiment was repeated until all the images were tested again. Although the calculation was tedious, the sample utilization rate was the highest, so it was suitable for small samples.
We chose face image of ± 45°and ± 90°with a larger difficulty to test, and the experimental data were shown in Figs. 5 and 6.
Nfold crossvalidation
In order to test the robustness and recognition efficiency of the RIMISM algorithm, Nfold crossvalidation was performed in MITCBCL face database. The Nfold crossvalidation randomly divided the sample data into N parts and selected one of them as the test sample set and N1 parts as the training sample set. The crossvalidation was repeated for N times, and each subsample was verified once, and the average result of N times was the estimation of the final single sample. The advantage of this test method is that it can randomly select generated subsamples repeatedly for training and verification, so as to verify the robustness of face recognition algorithm in the face of random samples. At this time, the obtained average recognition rate and average recognition time had a more scientific statistical significance [12].
MITCBCL face database consisted of 10 people, and 200 images were collected from each of them, so there are a total of 2000 images. The sample image contained a large range of continuous pose changes, lighting and expression changes, in which the pose changes included horizontal pose changes and pitch angle changes. Figure 7 shows an example of MITCBCL face database pose change image.
The MITCBCL database was randomly divided into 10 parts, and each part contained 1 pose change image of each person. Therefore, each group was composed of 10 images, and each image corresponded to a different person. A 10fold crossvalidation test was conducted. In each experimental operation, 9 groups (90 images in total) were used as the training samples, and 1 group (10 images in total) was reserved as the test samples. The experimental data was shown in Table 1, showing the efficiency comparison of the RIMISM method with the DeepFace, SIFT, and Gabor methods.
Results and discussion
The data in Figs. 5 and 6 showed that when the samples’ characteristic dimensions were relatively low, the RIMISM algorithm is close to the recognition rate of DeepFace algorithm. With the increase of the samples’ characteristic dimensions, the RIMISM algorithm had a higher accuracy than DeepFace, SIFT, and Gabor method. In the experiment of ± 45° pose change, the recognition rate was up to 97.1%, and in the experiment of ± 90° pose change, the recognition rate was up to 79.3%, which were higher than the performance of the other three methods in the same condition. The recognition rate in Fig. 5 was significantly higher than that in Fig. 6, indicating that the difficulty of face recognition increased with the increase of pose deflection angle. Since the pose of the test samples was changed from ± 45 to ± 90°, the extraction of key feature points were insufficient, and the 45 pairs of feature points were reduced to 19 pairs, resulting in the calculation deviation of the three deflection angles in face pose deflection estimation. When using ISM to determine the similarity between crosspose face images, the accuracy was greatly reduced.
As can be seen from the data in Table 1, for random multipose face samples, the RIMISM method proposed in this paper could achieve an average recognition rate of 97.52% in the 10fold crossvalidation test, which was higher than DeepFace, SIFT, and Gabor method. In this experiment, the average recognition time of a single test sample was 36.28 ms, which was far lower than that of the other three methods. There were two main reasons for the high efficiency of the RIMISM method:

(1)
Through constant iterations, learning a series of decreasing directions and scales in that direction, the initial shape was regressed to the position which was close to or equal to that of the real shape, so as to make the objective function converge to the minimum value at a very fast speed. Therefore, it avoided the problem of solving the Jacobian matrix and Hessian matrix, improved the running speed of the algorithm, and simplified the complexity of the algorithm.

(2)
Through using the interactive subspace method, the eigenvalue problem of highdimensional matrix was transformed into the eigenvalue of a lowdimensional matrix, which greatly reduced the computation and improved the recognition efficiency.
To sum up, the RIMISM algorithm had strong robustness and higher recognition efficiency when dealing with random crosspose samples.
Conclusion
Aiming at the problem of pose change recognition in face recognition technology, this paper proposed a crosspose face recognition method based on regression iterative method and interactive subspace method. Through the regression iteration, the target function converges rapidly, and 45 pairs of important characteristic positions such as eyes, mouth, and nose of the target face sample were extracted. Later, the pose correction was carried out by solving the pose deflection angle of the crosspose face image. Finally, the similarity of crosspose face image was measured by using the interactive subspace method. The crosspose recognition experiments in FERET database and MITCBCL database indicated that the RIMISM method proposed in this paper had a higher recognition accuracy, and it simplified the computation complexity. Meanwhile, it had a strong robustness in dealing with face samples with large angle pose changes.
Abbreviations
 ISM:

Interactive subspace method
 MVP:

Multiview perceptron
 PAMs:

Poseaware CNN models
 RIM:

Regression iterative method
 SIFT:

Scaleinvariant feature transform
 SPAE:

Stacked progressive autoencoders
References
 1.
Y. Taigman, M. Yang, M. Ranzato, in Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference.. DeepFace: Closing the gap to humanlevel performance in face verification (Columbus, 2014), pp. 1701–1708
 2.
I. Masi, S. Rawls, G. Medioni, in Computer Vision and Pattern Recognition (CVPR), 2016 IEEE Conference. Poseaware face recognition in the wild (Las Vegas, 2016), pp. 4838–4846
 3.
Z. Zhu, P. Luo, X. Wang, in the 27th International Conference on Neural Information Processing Systems. Multiview perceptron: a deep model for learning face identity and view representations (Cambridge, 2014), pp. 217–225
 4.
M. Kan, S. Shan, H. Chang, in Computer Vision and Pattern Recognition(CVPR), 2014 IEEE Conference. Stacked Progressive Encoders (SPAE) for face recognition across poses (Columbus, 2014), pp. 1883–1890
 5.
J. Yim, H. Jung, B.I. Yoo, in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference. Rotating your face using multitask deep neural network (Boston, 2015), pp. 676–684
 6.
Y. Sun, X. Wang, X. Tang, in Computer vision and pattern recognition (CVPR), 2014 IEEE Conference. Deep learning face representation from predicting 10,000 classes (2014), pp. 1891–1898
 7.
F. Schroff, D. Kalenichenko, J. Philbin, in Computer Vision and Pattern Recognition (CVPR), 2015 IEEE Conference. FaceNet: A unified embedding for face recognition and clustering (Boston, 2015), pp. 815–823
 8.
O.M. Parkhi, A. Vedaldi, A. Zisserman, in British Machine Vision Conference. Deep face recognition, vol 171 (2015), pp. 1–171 12
 9.
L. Wu, Z. Peng, Y. Hou, et al., Complete pose binary SIFT for face recognition with pose variation. Chin. J. Sci. Instrum. 36(4), 736–742 (2015)
 10.
C. Li, W. Wei, J. Li, et al., A cloudbased monitoring system via face recognition using Gabor and CSLBP features. J. Supercomput. 73(4), 1532–1546 (2017)
 11.
B.S. Oh, K.A. Toh, A. Teoh, et al., An analytic Gabor feedforward network for singlesample and poseinvariant face recognition. IEEE Trans. Image Process. 27(6), 2791–2805 (2018)
 12.
R.S. Babatunde, S.O. Olabiyisi, E.O. Omidiora, Assessing the performance of random partitioning and Kfold cross validation methods of evaluation of a face recognition system. Adv. Image Video Process. 3(6), 18–26 (2015)
Acknowledgements
This research had received support from the Science and Technology Department of Henan Province.
Funding
The authors acknowledge the National Natural Science Foundation of ChinaHenan Joint Fund (Grant: U1804162), the Henan Province Science and Technology Project (Grant: 192102310180).
Availability of data and materials
Not applicable.
Author information
Affiliations
Contributions
LKF is the main writer of this paper. He completed the derivation process of RIMISM crosspose face recognition algorithm, completed the simulation in FERET database and MITCBCL database, and analyzed the experimental result. HQZ provided some important suggestions for face pose estimation and experimental methods. Both authors read and approved the final manuscript.
Corresponding author
Correspondence to Kefeng Li.
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Regression iterative method (RIM)
 Crosspose face recognition
 Pose estimation
 Interactive subspace method (ISM)
 Nfold crossvalidation