Skip to main content

Learning local embedding deep features for person re-identification in camera networks

Abstract

In this paper, we propose a novel feature learning method named local embedding deep features (LEDF) for person re-identification in camera networks. In order to learn the structural information of pedestrian, we first utilize the verification network that does not require explicit identity labels to obtain the local summing maps. We then combine all local summing maps of a pedestrian image to form the holistic summing map which has the same identity label with the original pedestrian image. Finally, we take the holistic summing maps as the input to train the identification network, and then obtain the LEDF from the last fully connected layer. The proposed LEDF fully considers the structural information by learning the local features and meanwhile possesses strong discriminative ability by learning global features. The experimental results on two large-scale datasets (Market-1501 and CUHK03) demonstrate that the proposed LEDF achieves better results than the state-of-the-art methods.

1 Introduction

Person re-identification in camera networks [1–4] aims at matching the query person in a large image gallery where the images are captured from the different camera sensors in the network as shown in Fig. 1. The performance of person re-identification in sensor networks is closely related to many other applications, such as person retrieval, behavior analysis, long-term person tracking, and so on [5]. However, person re-identification in sensor networks is a very challenging problem for two reasons. Firstly, the same person observed in different cameras often undergoes high variations in illumination, poses, viewpoints, and occlusions. Secondly, a surveillance camera in the public captures hundreds of pedestrians within 1 day, and some of them have similar appearance. The core of person re-identification is to find a discriminative representation and a good metric to evaluate the similarity between pedestrian images.

Fig. 1
figure 1

The definition of person re-identification in sensor networks. The green bounding box indicates a query image which is utilized to search images of the same pedestrian from the gallery. The red bounding boxes and the blue bounding boxes show the correctly and wrongly matched pedestrian images from the gallery, respectively

Recently, the progress of person re-identification mainly consists of feature learning using the convolutional neural network (CNN) on large-scale datasets [6–11]. There are two major kinds of CNN models for person re-identification, i.e., verification network and identification network. The verification network [12–14] treats person re-identification as a binary classification task, and its structure is shown in Fig. 2a. The input of verification network is a pair of pedestrian images, and the output indicates whether the input image pair belongs to the same class or not. The verification network maps the images into a discriminative space where two images from the same person are close to each other and two images from different person are far apart. Unlike the verification network, the identification network [15–18] regards person re-identification as a multi-class classification task, which takes full advantage of the identity labels of pedestrian images. The identification network learns the non-linear functions from the training images, and its structure is shown in Fig. 2b. Although the identification network has stronger discriminative ability than that of verification network, most existing methods train the identification network using the whole pedestrian images, which discards the structural information provided by the region of a pedestrian. The structural information, which is prone to be robust for the changes in viewpoints and poses, could be fully discovered by learning the regions of pedestrians. However, it is hard to define the identity label for the region of a pedestrian, and the identity label is essential for training the identification network.

Fig. 2
figure 2

Two types of CNNs. a The verification network. b The identification network

In this paper, we propose a novel feature learning method named local embedding deep features (LEDF) for person re-identification in sensor networks which train an identification network in a local way. For learning the structural information, we first divide each pedestrian image into several regions. The corresponding regions of two images from the same pedestrian are considered as local similar pairs as shown in Fig. 3a, and the corresponding regions of two images from different pedestrians are considered as local dissimilar pairs as shown in Fig. 3b. We expect to force the regions to the same feature space, and therefore, we first utilize the verification network, which does not require explicit identity labels, to learn the local features for similar and dissimilar pairs. Then, we extract local feature maps visualized in Fig. 4a and average them as the local summing map visualized in Fig. 4b for each region in the verification network. Since the discrimination of identification network is stronger than that of verification network, we utilize the identification network to learn the global information. However, the local summing maps do not have the identity labels. To overcome this drawback, we connect all local summing maps of a pedestrian image to form the holistic summing map visualized in Fig. 4c. Afterwards, the holistic summing map is assigned the same identity label with the original image, so that we can train the identification network using the holistic summing maps and their identity labels. We extract the final feature representation for each pedestrian image from the fully connected layer of the identification network. The extracted features are denoted as LEDF which fully consider the structural information of pedestrian images by analyzing the local features and meanwhile possess strong discrimination by using the identification network. We validate the proposed LEDF on two large-scale person re-identification datasets (Market-1501 [19] and CUHK03 [7]), and the experimental results demonstrate the effectiveness of the proposed LEDF.

Fig. 3
figure 3

Two kinds of local image pairs. a The corresponding regions of two images belonging to the same pedestrian are considered as local similar pairs. b The corresponding regions of two images belonging to different pedestrians are considered as local dissimilar pairs

Fig. 4
figure 4

The network structure of the proposed method. The verification network includes two CNNs, one square layer, one additional fully connected layer and one loss. The identification network consists of one CNN, one fully connected layer and one loss. a, b, and c indicate local feature map, local summing map, and holistic summing map, respectively

The rest of this paper is organized as follows. In Section 2, we review and discuss the related works. In Section 3, we introduce the proposed LEDF in detail. Then, we show the experimental results in Section 4. Finally, we conclude this paper in Section 5.

2 Related work

With the development of sensor network theories [20, 21], there are many applications in sensor networks. The person re-identification is one of the most important applications. In this section, we review the related works for person re-identification in two aspects: the traditional methods and the CNN-based methods.

The traditional methods for person re-identification mainly include two key steps, i.e., feature representation and metric learning. Many approaches propose effective features for describing the appearances of person under various environments [22–28]. Cheng et al. [22] utilized the Custom Pictorial Structure (CPS) to extract an ensemble of features considering both part-based color information and color displacement for person re-identification. Gheissari et al. [23] employed the spatiotemporal segmentation algorithm to generate salient edges that are robust to the color of clothing. In [24], a novel descriptor based on a simple seven-dimensional feature vector was proposed for person re-identification. Liao et al. [25] proposed an effective feature representation called Local Maximal Occurrence (LOMO), which analyzes the horizontal occurrence of local features and maximizes the occurrence to make a stable representation against viewpoint changes. Besides robust feature representation, metric learning methods play an essential role in person re-identification [29–34]. Koestinger et al. [29] presented KISSME method to learn a distance metric from equivalence constraints based on a statistical inference perspective. Ye et al. [30] proposed the Regularized Classical Linear Discriminant Analysis (RLDA) metric which provides a simple strategy to overcome the singularity problem by applying a regularization term. In [31], a discriminative Mahalanobis metric learning method was proposed to obtain a simplified formulation. Zheng et al. [32] formulated person re-identification as a distance learning problem. They aim to learn the optimal distance which maximizes the matching accuracy regardless of the choice of features.

Recently, the convolutional neural network (CNN) has made great progress in the task of person re-identification, and they are mainly based on the verification network and the identification network. For the verification network, Ahmed et al. [35] added the neighborhood difference layer and the subsequent layer to compare convolutional features in the patch level and summarize the neighborhood differences of each patch. In [36], Ding et al. utilized a series of triplet samples to train the networks which consider the distances between pedestrian images from the same class and the distances between pedestrian images from the different classes, simultaneously. To fully discover the structural information, some methods train the CNN in part-based way. Varior et al. [9] combined the verification network with some gating functions to selectively emphasize common local patterns by comparing the mid-level features across pairs of images. In [37], pedestrian images were cropped into three overlapped parts and three networks were trained jointly. Similarly, Cheng et al. [38] presented to learn both the full-body and local-body features for the pedestrian images. However, the verification network does not fully consider the identity information of images and cannot predict the identity labels of pedestrians. In order to take full advantages of the identity labels, the identification network which has strong discrimination is employed for person re-identification. Zheng et al. [39] fine tuned the identification network on three large datasets and achieved competitive results. Wu et al. [40] proposed the feature extraction network named Feature Fusion Net (FFN) for pedestrian image representation, which combines CNN embeddings with the hand-crafted features in a fully connected layer. Moreover, Zheng et al. [10] combined the verification network with the identification network to learn more discriminative pedestrian descriptors. However, the abovementioned methods learn the identification network in a holistic way, which ignores the structural information.

3 Method

3.1 Overall network

The architecture of the proposed network contains two parts, i.e., the verification network and the identification network, as shown in Fig. 4.

The verification network includes two CNNs, one square layer, one fully connected layer and one loss function. The square layer combines the outputs of two CNNs, i.e., f1 and f2 resulting in a vector f′. Since the verification network regards person re-identification as a binary classification task, we add a fully connected layer to convert f′ into a 2-dim vector that determines whether the input local pair belongs to the same class or not. The softmax function is utilized to obtain the predicted probability of the input local pair belonging to the same class.

The identification network consists of one CNN, one fully connected layer and one loss function. We set the final fully connected layer of the CNN with N-dim for fine-tuning the network on person re-identification datasets, where N denotes the number of identity labels. We regard the vector f from the fully connected layer as the final feature representation for pedestrian images.

3.2 Learning the local summing maps

The identification network could enhance the feature discrimination for person re-identification. Most existing methods train the identification network using the whole pedestrian images, which ignores the structural information of pedestrians. The structural information could be provided by local regions. However, the region identity label of a pedestrian image is difficult to define, and meanwhile, the identity label is essential for training the identification network.

To solve this dilemma, we first utilize the verification network to learn the structural information from regions, which does not require explicit identity labels. Concretely, the verification network takes the local similar and dissimilar pairs as the input. Then we train the verification network using weight sharing strategy which means that the parameters of two CNNs are the same. The square layer is employed to connect the outputs of two CNNs

$$\begin{array}{@{}rcl@{}} f^{\prime}=\left(f_{1}-f_{2}\right)^{2} \end{array} $$
(1)

where f′ is the output vector of square layer. Then, f′ is fed into the fully connected layer and afterwards passes through the softmax function resulting in a 2-dim vector (\(\tilde {p_{1}},\tilde {p_{2}}\)) which indicates the predicted probability of the input local pair belonging to the same class. Here, \(\tilde {p_{1}}+\tilde {p_{2}}=1\). The verification network takes the person re-identification as a binary classification task and we utilize the cross-entropy as the loss function

$$\begin{array}{@{}rcl@{}} \text{Loss}_{vr}=-\sum\limits_{i=1}^{2}p_{i}log(\tilde{p_{i}}) \end{array} $$
(2)

where

$$\begin{array}{@{}rcl@{}} \tilde{p_{i}}=\frac{e^{x_{i}}}{\sum_{j=1}^{2}e^{x_{j}}} \end{array} $$
(3)

where \(\tilde {p_{i}}\) is the predicted probability, x i is the output of the fully connected layer, and p i is the true label. If the input is a local similar pair, then p1=1 and p2=0; otherwise, p1=0 and p2=1.

After training the verification network, we extract the local summing map for each region as shown in Fig. 4a and b. The local summing map could obtain the local and complete structural information for pedestrian images. The local summing map of the k-th region is formulated as

$$\begin{array}{@{}rcl@{}} Z_{k}=\sum\limits_{m=1}^{M}{z_{k}^{m}}/M \end{array} $$
(4)

where M is the number of local feature maps, and \({z_{k}^{m}}\) denotes the m-th local feature map of the k-th region.

3.3 Local embedding deep features

To complement with the verification network, we utilize the identification network to take full advantages of the identity labels and learn global features with strong discrimination. However, there is still a problem that we cannot assign the identity label to the local summing map. To deal with the problem, we connect all local summing maps to form the holistic summing map for a pedestrian image

$$\begin{array}{@{}rcl@{}} Z=\left[Z_{1};Z_{2};\cdots;Z_{k};\cdots;Z_{K}\right] \end{array} $$
(5)

where K is the number of cropped regions in each pedestrian image, and the holistic summing map Z has the same identity label with the original image. The holistic summing maps contain both the local and global information of pedestrians, and they make possible to train the identification network in a local way. We take the holistic summing map as the input of the identification network as shown in Fig. 4c. In the identification network, we also employ the cross-entropy loss

$$\begin{array}{@{}rcl@{}} \text{Loss}_{id}=-\sum\limits_{n=1}^{N}q_{n}log(\tilde{q_{n}}) \end{array} $$
(6)

where

$$\begin{array}{@{}rcl@{}} \tilde{q_{n}}=\frac{e^{y_{n}}}{\sum_{j=1}^{N}e^{y_{j}}} \end{array} $$
(7)

where \(\tilde {q_{n}}\in \left [0,1\right ]\) is the predicted probability that the input holistic summing map belongs to the n-th class, y n is the output of the fully connected layer, and q n is the true class label. Assuming that t is the true identity label, q n is defined as

$$ q_{n}=\left\{ \begin{array}{ll} 0& n \neq t\\ 1& n = t \end{array} \right. $$
(8)

In the test stage, we first obtain the holistic summing maps from the verification network, and then feed them into the identification network. We also extract the LEDF from the fully connected layer of the identification network, and the dimension of the proposed LEDF is 2048-dim when the ResNet-50 [41] network is treated as the CNN model. The Euclidean distances between LEDF are sorted from gallery to the query to obtain the rank list.

4 Results and discussion

In this section, we evaluate the proposed LEDF on two large-scale datasets, Market-1501 [19] and CUHK03 [7]. During the experiments, we implement the verification network and the identification network using the MatConvNet [42] package, and utilize the ResNet-50 [41] network as the CNN model for training and testing. For the verification network, we set the training epoch to 75, and set the learning rate to 0.001 and 0.0001 in the first 70 epochs and the final 5 epochs, respectively. The batch size is 24 and the ration between local similar pairs and local dissimilar pairs is 1:1 in the verification network. The identification network has the same epoch and learning rate with the verification network, and the batch size of the identification network is set to 48. The mini-batch stochastic gradient descent (SGD) is utilized to update the parameters and the weight decay is set to 0.0005 for two types of networks. There are four blocks in the ResNet-50 network, and each block contains nine convolution layers. We extract the local feature maps from the ninth convolution layer of the first block in the ResNet-50 network. The number of divided regions for each pedestrian image is the same, and each pedestrian image is divided into three regions. We resize each region to 224×224 as the input of verification network in pairs.

4.1 Datasets

Market-1501 dataset [19] is a large-scale person re-identification dataset. It contains 32,668 annotated bounding boxes of 1501 identities collected with six cameras. There are 12,936 pedestrian images of 751 identities for training set and 19,732 pedestrian images of 750 identities for test set. The Market-1501 dataset [19] also includes 3368 query images. For each query, we search the same pedestrian images from the test set. All images in this dataset are automatically detected by the Deformable Part Model (DPM) detector [43], which is closer to the realistic setting. Figure 5 shows that the same pedestrian under a camera network suffers from large variations in illumination, poses, and viewpoints.

Fig. 5
figure 5

The same pedestrian under camera networks suffers from large variations in illumination, poses and viewpoints

CUHK03 dataset [7] contains 14,097 images of 1467 identities. Each identity is captured by two cameras on the CUHK campus and has 9.6 images in average. There are two kinds of bounding boxes, i.e., labeled bounding box and detected bounding box. We evaluate the proposed LEDF using the detected bounding box. The detected bounding box possesses some misalignments and part missing, which is closer to the realistic performance. According to the experimental setting in [6, 7], we partition this dataset into a training set of 1367 pedestrians and a test set of 100 pedestrians, and the performance of single-shot setting is reported with 20 random splits.

The rank-1 accuracy computed by the Euclidean distance and mean average precision (mAP) is utilized to evaluate the performance of the proposed LEDF on Market1501 [19] and CUHK03 [7] datasets.

4.2 Evaluation

4.2.1 Comparison with the state-of-the-art algorithms

We compare the proposed LEDF with other state-of-the-art algorithms on the Market1501 dataset [19]. From Table 1, it can be seen that the proposed LEDF achieves 80.77% rank-1 accuracy and 61.34% mAP, outperforming the other precious works. Especially, the proposed LEDF is 1.26 and 1.47% higher than the state-of-the-art method Verif-identif [10] in rank-1 accuracy and mAP, respectively. The two methods both utilize the verification network and identification network to learn the features, while the proposed LEDF explicitly considers the structural information of pedestrian.

Table 1 Comparison with state-of-the-art results on the Market-1501 dataset [19]

Table 2 shows the results reported on the CUHK03 dataset [7] with the single-shot setting. There are only one correct image in the gallery, and the rank-1, rank-5, and rank-10 accuracy and mAP are listed. As shown in Table 2, the proposed LEDF yields 84.6, 96.8, and 98.5% at rank-1, rank-5, and rank-10 accuracy, respectively, which is comparable with the state-of-the-art methods. In addition, the mAP also achieves a new state of the art.

Table 2 Comparison with state-of-the-art results on the CUHK03 dataset [7] using the singe-shot setting

4.2.2 Evaluation with different inputs in the identification network

As shown in Table 3, we evaluate the performance of identification network with different inputs which contain the original pedestrian images (ResNet-50 (O)), the whole feature maps extracted from the original pedestrian images (ResNet-50 (W)), and the holistic summing maps (LEDF). Experimental results show that ResNet-50 (W) outperforms the ResNet-50 (O) because whole feature maps learned by CNN model contain high-level semantic information. From Table 3, it can be also seen that the proposed LEDF achieves better results than ResNet-50 (W). It is because that the LEDF method could fully consider the structural information of pedestrian images by analyzing the local features in the holistic summing maps.

Table 3 Results on Market-1501 [19] and CUHK03 [7] with different inputs in the identification network

4.3 Evaluation on different number of regions

The purpose of dividing each pedestrian image into several regions is to fully consider the structural information. Hence, it is very important to determine the number of regions. In order to choose the optimal number of regions, we conduct experiments on the Market-1501 dataset based on the above-mentioned experimental setting, and the rank-1 accuracy and mAP with varying number of cropped regions are listed. From Table 4, it can be observed that training the identification network in a local way plays an essential part in improving rank accuracy and mAP, and we achieve the best result when K=3.

Table 4 The rank-1 accuracy and mAP with varying number of cropped regions on the Market-1501 dataset

5 Conclusions

In this paper, we have proposed a novel feature learning method named LEDF for person re-identification in camera networks, which trains the identification network in a local way. There are two characteristics of the proposed LEDF. First, we learn the local features by dividing each pedestrian image into several regions, and take pairs of regions as the input of the verification network to extract local summing maps. Second, all local summing maps of a pedestrian image are connected to form the holistic summing map. The holistic summing maps have the same labels with the original images, and contain the local and global information. We learn the identification network by taking the holistic summing maps as input, and then extract the final LEDF. The experimental results have indicated that the proposed LEDF improves the performance of person re-identification in camera networks.

Abbreviations

CNN:

Convolutional neural network

LEDF:

Local embedding deep features

mAP:

Mean average precision

SGD:

Stochastic gradient descent

References

  1. T D’Orazio, G Cicirelli, in IEEE International Conference on Image Processing. People re-identification and tracking from multiple cameras: A review (IEEEOrlando, 2012), pp. 1601–1604.

    Google Scholar 

  2. F Zhao, X Sun, H Chen, R Bie, Outage performance of relay-assisted primary and secondary transmissions in cognitive relay networks. EURASIP J. Wirel. Commun. Netw.2014(1), 60 (2014).

    Article  Google Scholar 

  3. MZ Hasan, H Al-Rizzo, M Günay, Lifetime maximization by partitioning approach in wireless sensor networks. EURASIP J. Wirel. Commun. Netw.2017(1), 15 (2017).

    Article  Google Scholar 

  4. F Zhao, L Wei, H Chen, Optimal time allocation for wireless information and power transfer in wireless powered communication systems. IEEE Trans. Veh. Technol.65(3), 1830–1835 (2016).

    Article  Google Scholar 

  5. WS Zheng, S Gong, T Xiang, in Person Re-Identification. Group association: assisting re-identification by visual context (SpringerLondon, 2014), pp. 183–201.

    Chapter  Google Scholar 

  6. L Zheng, Y Yang, AG Hauptmann, Person re-identification: past, present and future. arXiv preprint arXiv:1610.02984 (2016).

  7. W Li, R Zhao, T Xiao, X Wang, in IEEE Conference on Computer Vision and Pattern Recognition. Deepreid: deep filter pairing neural network for person re-identification (IEEEColumbus, 2014), pp. 152–159.

    Google Scholar 

  8. L Wu, C Shen, A Hengel, Personnet: person re-identification with deep convolutional neural networks. arXiv preprint arXiv:1601.07255 (2016).

  9. RR Varior, M Haloi, G Wang, in European Conference on Computer Vision. Gated siamese convolutional neural network architecture for human re-identification (SpringerCham, 2016), pp. 791–808.

    Google Scholar 

  10. Z Zheng, L Zheng, Y Yang, A discriminatively learned cnn embedding for person re-identification. ACM Trans. Multimed. Comput. Commun. Appl.14(1), 13 (2017).

    Article  MathSciNet  Google Scholar 

  11. T Xiao, S Li, B Wang, L Lin, X Wang, End-to-end deep learning for person search. arXiv preprint arXiv:1604.01850 (2016).

  12. J Bromley, I Guyon, Y LeCun, E Säckinger, R Shah, in Advances in Neural Information Processing Systems. Signature verification using a siamese time delay neural network (NIPSSan Francisco, 1994), pp. 737–744.

    Google Scholar 

  13. T Sikora, The mpeg-4 video standard verification model. IEEE Trans. Circ. Syst. Video Technol.7(1), 19–31 (1997).

    Article  MathSciNet  Google Scholar 

  14. KR Paap, SL Newsome, JE McDonald, RW Schvaneveldt, An activation–verification model for letter and word recognition: The word-superiority effect. Psychol. Rev.89(5), 573 (1982).

    Article  Google Scholar 

  15. T Xiao, H Li, W Ouyang, X Wang, in IEEE Conference on Computer Vision and Pattern Recognition. Learning deep feature representations with domain guided dropout for person re-identification (IEEELas Vegas, 2016), pp. 1249–1258.

    Google Scholar 

  16. R Girshick, in IEEE International Conference on Computer Vision. Fast r-cnn (IEEESantiago, 2015), pp. 1440–1448.

    Google Scholar 

  17. D Heinke, GW Humphreys, Attention, spatial representation, and visual neglect: simulating emergent attention and spatial memory in the selective attention for identification model (saim). Psychol. Rev.110(1), 29 (2003).

    Article  Google Scholar 

  18. A Abbott, D Collins, A theoretical and empirical analysis of a state of the art talent identification model. High Ability Stud.13(2), 157–178 (2002).

    Article  Google Scholar 

  19. L Zheng, L Shen, L Tian, S Wang, J Wang, Q Tian, in IEEE International Conference on Computer Vision. Scalable person re-identification: A benchmark (IEEESantiago, 2015), pp. 1116–1124.

    Google Scholar 

  20. F Zhao, W Wang, H Chen, Q Zhang, Interference alignment and game-theoretic power allocation in mimo heterogeneous sensor networks communications. Signal Process.126:, 173–179 (2016).

    Article  Google Scholar 

  21. F Zhao, B Li, H Chen, X Lv, Joint beamforming and power allocation for cognitive mimo systems under imperfect csi based on game theory. Wirel. Pers. Commun.73(3), 679–694 (2013).

    Article  Google Scholar 

  22. DS Cheng, M Cristani, M Stoppa, L Bazzani, V Murino, Custom pictorial structures for re-identification. 2(5), 6 (2011).

  23. N Gheissari, TB Sebastian, R Hartley, in IEEE Conference on Computer Vision and Pattern Recognition. Person reidentification using spatiotemporal appearance (IEEENew York, 2006), pp. 1528–1535.

    Google Scholar 

  24. B Ma, Y Su, F Jurie, in Europeon Conference on Computer Vision. Local descriptors encoded by fisher vectors for person re-identification (SpringerBerlin, 2012), pp. 413–422.

    Google Scholar 

  25. S Liao, Y Hu, X Zhu, SZ Li, in IEEE Conference on Computer Vision and Pattern Recognition. Person re-identification by local maximal occurrence representation and metric learning (IEEEBoston, 2015), pp. 2197–2206.

    Google Scholar 

  26. F Zhao, H Nie, H Chen, Group buying spectrum auction algorithm for fractional frequency reuse cognitive cellular systems. Ad Hoc Netw.58:, 239–246 (2017).

    Article  Google Scholar 

  27. Y Hu, S Liao, Z Lei, D Yi, S Li, in IEEE Conference on Computer Vision and Pattern Recognition. Exploring structural information and fusing multiple features for person re-identification (IEEEPortland, 2013), pp. 794–799.

    Google Scholar 

  28. O Hamdoun, F Moutarde, B Stanciulescu, B Steux, in IEEE International Conference on Distributed Smart Cameras. Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences (IEEEStanford, 2008), pp. 1–6.

    Google Scholar 

  29. M Koestinger, M Hirzer, P Wohlhart, PM Roth, H Bischof, in IEEE Conference on Computer Vision and Pattern Recognition. Large scale metric learning from equivalence constraints (IEEEProvidence, 2012), pp. 2288–2295.

    Google Scholar 

  30. J Ye, T Xiong, Q Li, R Janardan, J Bi, V Cherkassky, C Kambhamettu, in ACM International Conference on Information and Knowledge Management. Efficient model selection for regularized linear discriminant analysis (ACMNew York, 2006), pp. 532–539.

    Google Scholar 

  31. M Hirzer, PM Roth, M Köstinger, H Bischof, in European Conference on Computer Vision. Relaxed pairwise learned metric for person re-identification (SpringerBerlin, 2012), pp. 780–793.

    Google Scholar 

  32. WS Zheng, S Gong, T Xiang, in IEEE Conference on Computer Vision and Pattern Recognition. Person re-identification by probabilistic relative distance comparison (IEEEProvidence, 2011), pp. 649–656.

    Google Scholar 

  33. KQ Weinberger, J Blitzer, LK Saul, in Advances in Neural Information Processing Systems. Distance metric learning for large margin nearest neighbor classification (NIPSVancouver, 2006), pp. 1473–1480.

    Google Scholar 

  34. Z Li, S Chang, F Liang, TS Huang, L Cao, JR Smith, in IEEE Conference on Computer Vision and Pattern Recognition. Learning locally-adaptive decision functions for person verification (IEEEPortland, 2013), pp. 3610–3617.

    Google Scholar 

  35. E Ahmed, M Jones, TK Marks, in IEEE Conference on Computer Vision and Pattern Recognition. An improved deep learning architecture for person re-identification (IEEEBoston, 2015), pp. 3908–3916.

    Google Scholar 

  36. S Ding, L Lin, G Wang, H Chao, Deep feature learning with relative distance comparison for person re-identification. Pattern Recog.48(10), 2993–3003 (2015).

    Article  Google Scholar 

  37. D Yi, Z Lei, S Liao, SZ Li, in International Conference on Pattern Recognition. Deep metric learning for person re-identification (IEEEStockholm, 2014), pp. 34–39.

    Google Scholar 

  38. D Cheng, Y Gong, S Zhou, J Wang, N Zheng, in IEEE Conference on Computer Vision and Pattern Recognition. Person re-identification by multi-channel parts-based cnn with improved triplet loss function (IEEELas Vegas, 2016), pp. 1335–1344.

    Google Scholar 

  39. L Zheng, Z Bie, Y Sun, J Wang, C Su, S Wang, Q Tian, in European Conference on Computer Vision. Mars: A video benchmark for large-scale person re-identification (SpringerCham, 2016), pp. 868–884.

    Google Scholar 

  40. S Wu, YC Chen, X Li, AC Wu, JJ You, WS Zheng, in IEEE Winter Conference on Applications of Computer Vision. An enhanced deep feature representation for person re-identification (IEEELake Placid, 2016), pp. 1–8.

    Google Scholar 

  41. K He, X Zhang, S Ren, J Sun, in IEEE Conference on Computer Vision and Pattern Recognition. Deep residual learning for image recognition (IEEELas Vegas, 2016), pp. 770–778.

    Google Scholar 

  42. A Vedaldi, K Lenc, in ACM International Conference on Multimedia. Matconvnet: Convolutional neural networks for matlab (ACMNew York, 2015), pp. 689–692.

    Chapter  Google Scholar 

  43. P Felzenszwalb, D McAllester, D Ramanan, in IEEE Conference on Computer Vision and Pattern Recognition. A discriminatively trained, multiscale, deformable part model (IEEEAnchorage, 2008), pp. 1–8.

    Google Scholar 

  44. C Su, S Zhang, J Xing, W Gao, Q Tian, in European Conference on Computer Vision. Deep attributes driven multi-camera person re-identification (SpringerCham, 2016), pp. 475–491.

    Google Scholar 

  45. E Ustinova, Y Ganin, V Lempitsky, in IEEE International Conference on Advanced Video and Signal Based Surveillance. Multi-region bilinear convolutional neural networks for person re-identification (IEEELecce, 2017), pp. 1–6.

    Google Scholar 

  46. L Wu, C Shen, VDA Hengel, Deep linear discriminant analysis on fisher networks: A hybrid architecture for person re-identification. Pattern Recognit.65:, 238–250 (2017).

    Article  Google Scholar 

  47. H Liu, J Feng, M Qi, J Jiang, S Yan, End-to-end comparative attention networks for person re-identification. IEEE Trans. Image Process.26(7), 3492–3506 (2017).

    Article  MathSciNet  Google Scholar 

  48. D Chen, Z Yuan, B Chen, N Zheng, in IEEE Conference on Computer Vision and Pattern Recognition. Similarity learning with spatial constraints for person re-identification (IEEELas Vegas, 2016), pp. 1268–1277.

    Google Scholar 

  49. L Zhang, T Xiang, S Gong, in IEEE Conference on Computer Vision and Pattern Recognition. Learning a discriminative null space for person re-identification (IEEELas Vegas, 2016), pp. 1239–1248.

    Google Scholar 

  50. F Wang, W Zuo, L Lin, D Zhang, L Zhang, in IEEE Conference on Computer Vision and Pattern Recognition. Joint learning of single-image and cross-image representations for person re-identification (IEEELas Vegas, 2016), pp. 1288–1296.

    Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor and anonymous reviewers for their helpful comments and suggestions in improving the quality of this paper.

Funding

This work was supported by National Natural Science Foundation of China under Grant No. 61711530240, Natural Science Foundation of Tianjin under Grant No. 17JCZDJC30600, the Fund of Tianjin Normal University under Grant No. 135202RC1703, the Open Projects Program of National Laboratory of Pattern Recognition under Grant No. 201700001, and the China Scholarship Council No. 201708120040.

Availability of data and materials

The databases are available online.

Author information

Authors and Affiliations

Authors

Contributions

ZZ contributed to the main idea and algorithm design of the study. MH conducted the experiments, and ZZ analyzed the results. ZZ and MH drafted and revised the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhong Zhang.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional information

Authors’ information

Zhong Zhang received the Ph.D. degree from Institute of Automation, Chinese Academy of Sciences. He is currently an Associate Professor at Tianjin Normal University.

Meiyan Huang is currently pursuing the M.S. degree at Tianjin Normal University.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, Z., Huang, M. Learning local embedding deep features for person re-identification in camera networks. J Wireless Com Network 2018, 85 (2018). https://doi.org/10.1186/s13638-018-1101-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13638-018-1101-x

Keywords