A deep convolution neural network for facial expression recognition
Keywords:
convolution neural network, facial expression, global average pooling, hinge loss function, ReLU, SVMAbstract
Facial expressions play an important role in non-verbal communication processes to identify people and recognize emotions. This article proposes a deep convolution neural network (CNN) for recognizing seven basic emotional states (anger, disgust, fear, happiness, neutral, sadness, and surprise) and presents results comparing the accuracy of the proposed and existing methods. A support vector machine (SVM) was used in the convolutional layer to classify the input images. Then a rectified linear unit (ReLU) was introduced to train the inputs more easily and to improve performance. To reduce the tendency of overfitting, global average pooling was employed, and the rescaled hinge loss function was introduced to decrease noise in the classification system. Finally, to reduce the memory and time requirements, Nestorov-accelerated adaptive moment estimation was introduced. When implemented with different data sets, the proposed method demonstrated increased recognition accuracy compared to existing methods. The recognition accuracy of the proposed method improved significantly to 78.76% and 82.58%, respectively, for the Radbound Faces Database (RaFD) and the Karolinska Directed Emotional Faces (KDEF) database and BU-3DFE data sets while the overall average accuracy was 62.3% for randomly downloaded images from the Internet.
References
Abbas, Q., Ibrahim, M. E. A., & Jaffar, M. A. (2019). A comprehensive review of recent advances on deep vision systems. Artificial Intelligence Review, 52(1), 39-76. DOI: https://doi.org/10.1007/s10462-018-9633-3
Blei, D., & McAuliffe, J. (2008). Supervised topic models. Proceedings of Advanced Neural Information Processing System, 20, 121-128.
Bonaccio, S., O’Reilly, J., O’Sullivan, S. L., & Chiocchio, F. (2016). Nonverbal behavior and communication in the workplace: A review and an agenda for research. Journal of Management, 42(5), 1044-1074. DOI: 10.1177/0149206315621146
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2(2), 121-167. DOI: 10.1023/A:1009715923555
Calvo, M. G., & Lundqvist, D. (2008). Facial expressions of emotion (KDEF): Identification under different display-duration conditions. Behavior research methods, 40(1), 109-115.
Dhall, A., Goecke, R., Lucey, S., & Gedeon, T. (2011). Static facial expression analysis in tough conditions: Data, evaluation protocol and benchmark. Proceedings of 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), 2106-2112. DOI:10.1109/ICCVW.2011.6130508
Dozat, T. (2016). Incorporating Nesterov Momentum into Adam. ICLR Workshop, (1), 2013-2016. https://openreview.net/forum?id=OM0jvwB8jIp57ZJjtNEZ
Dubey, S. R., & Chakraborty, S. (2021). Average biased ReLU based CNN descriptor for improved face retrieval. Multimedia Tools and Applications, 80, 23181-23206. DOI: https://doi.org/10.1007/s11042-020-10269-x
Francis, D. P., & Raimond, K. (2021). Major advancements in kernel function approximation. Artificial Intelligence Review, 54(2), 843-876. DOI: https://doi.org/10.1007/s10462-020-09880-z
Gross, R., Matthews, I., Cohna, J., Kanade, T., & Baker, S. (2010). Multi-pie. Proceedings of the International Conference on Automatic Face and Gesture Recognition, 28(5), 807-813. DOI: 10.1016/j.imavis.2009.08.002
Gu, L., & Wu, H. Z. (2009). Applying a novel decision rule to the sphere-structured support vector machines algorithm. Neural Computing and Applications, 18(3), 275-282. DOI: https://doi.org/10.1007/s00521-008-0179-1
He, X., Chen, B. W., Ji, W., Rho, S., & Kung, S. Y. (2016). Erratum to: Large-scale image colorization based on divide-and-conquer support vector machines. The Journal of Supercomputing, 72(4), 1678-1678. DOI: https://doi.org//10.1007/s11227-015-1414-z
Hesse, N., Gehrig, T., Gao, H., & Ekenel, H. K. (2012, November). Multi-view facial expression recognition using local appearance features. In Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012) (pp. 3533-3536). IEEE.
Lade, P. (2015). Probabilistic topic models for human emotion analysis. Ph.D. dissertation, Arizona State University, USA. Retrieved form https://repository.asu.edu/attachments/146460/content/Lade_asu_0010E_14644.pdf
Langner, O., Dotsch, R., Bijlstra, G., Wigboldus, D. H. J., Hawk, S. T., & van Knippenberg, A. (2010). Presentation and validation of the Radboud Faces Database. Cognition and Emotion, 24(8), 1377-1388. DOI: https://doi.org/10.1080/02699930903485076
Li, X., Zhang, X., Yang, H., Duan, W., Dai, W., & Yin, L. (2020). An EEG-based multi-modal emotion database with both posed and authentic facial actions for emotion analysis. 15th IEEE International Conference on Automatic Face and Gesture Recognition (FG). DOI: 10.1109/FG47880.2020.00050
Lin, M., Chen, Q., & Yan, S. (2014). Network In Network. Retrieved form https://arxiv.org/abs/1312.4400
Mao, Q., Rao, Q., Yu, Y., & Dong, M. (2017). Hierarchical Bayesian theme models for multipose facial expression recognition. IEEE Transactions on Multimedia, 19(4), 861-873. DOI: 10.1109/TMM.2016.2629282
Naveen, P., & Sivakumar, P. (2021a). Adaptive morphological and bilateral filtering with ensemble convolutional neural network for pose-invariant face recognition. Journal of Ambient Intelligence and Humanized Computing, 12, 10023–10033. DOI: https://doi.org/10.1007/s12652-020-02753-x
Naveen, P., & Sivakumar, P. (2021b). Pose and head orientation invariant face detection based on optimised aggregate channel feature. Annals of the Romanian Society for Cell Biology, 25(5), 4368-4390. https://www.annalsofrscb.ro/index.php/journal/article/view/5455
Ni, B., Moulin, P., & Yan, S. (2015). Pose adaptive motion feature pooling for human action analysis. International Journal of Computer Vision, 111, 229-248. DOI: https://doi.org/10.1007/s11263-014-0742-4
Ojo, J. A., & Adeniran, S. A. (2010). One-sample face recognition using HMM model of fiducial areas. International Journal of Image Processing, 5(1): 58-68.
Pantic, M., & Rothkrantz, L. J. M. (2000). Automatic analysis of facial expressions: the state of the art. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(12), 1424-1445.
Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. In Xianghua Xie, Mark W. Jones, and Gary K. L. Tam, editors, Proceedings of the British Machine Vision Conference (BMVC), 41.1-41.12. BMVA Press. DOI: 10.5244/C.29.41
Rahulamathavan, Y., Phan, R. C.-W., Chambers, J. A., & Parish, D. J. (2013). Facial expression recognition in the encrypted domain based on local Fisher Discriminant Analysis. IEEE Transactions on Affective Computing, 4(1), 83-92. DOI: 10.1109/T-AFFC.2012.33
Wang, S.-H, Phillips, P., Dong, Z.-C., & Zhang, Y.-D. (2018). Intelligent facial emotion recognition based on stationary wavelet entropy and Jaya algorithm. Neurocomputing, 272(C), 668-676. DOI: https://doi.org/10.1016/j.neucom.2017.08.015
Watson, P. (2012). A multi-level security model for partitioning workflows over federated clouds. Journal of Cloud Computing: Advances, Systems and Applications, 1(1), 1-15. DOI: https://doi.org/10.1186/2192-113X-1-15
Xu, G., Cao, Z., Hu, B.-G., Principe, J. C. (2017). Robust support vector machines based on the rescaled hinge loss function, Pattern Recognition, 63(C), 139-148, ISSN 0031-3203. DOI: https://doi.org/10.1016/j.patcog.2016.09.045
Yang, M. (2002). Kernel Eigenfaces vs. Kernel Fisherfaces: Face recognition using kernel methods. Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition, 21-21 May 2002, 215-220. DOI: 10.1109/AFGR.2002.4527207
Yin, L., Wei, X., Sun, Y., Wang, J., & Rosato, M. J. (2006). A 3D facial expression database for facial behavior research. Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR06), 211-216. DOI: 10.1109/FGR.2006.6
Zeng, Z., Pantic, M., Roisman, G. I., & Huang, T. S. (2009). A survey of affect recognition methods: audio, visual, and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(1), 39-58. DOI: 10.1109/TPAMI.2008.52
Zhang, T., Zheng, W., Cui, Z., Zong, Y., Yan, J., & Yan, K. (2016). A deep neural network-driven feature learning method for multi-view facial expression recognition. IEEE Transactions on Multimedia, 18(12), 2528-2536.
Zhang, Y.-D., Yang, Z.-J., Lu, H.-M., Zhou, X.-X., Phillips, P., Liu, Q.-M., & Wang, S. (2016). Facial emotion recognition based on biorthogonal wavelet entropy, fuzzy support vector machine, and stratified cross validation. IEEE Access, 4, 8375-8385. DOI: 10.1109/ACCESS.2016.2628407
Zhang, W., Zhang, Z., Chao, H.-C., & Tseng, F.-H. (2018). Kernel mixture model for probability density estimation in Bayesian classifiers. Data Mining and Knowledge Discovery volume, 32, 675-707. DOI: https://doi.org/10.1007/s10618-018-0550-5
Zou, D., Cao, Y., Zhou, D., & Gu, Q. (2020). Gradient descent optimizes over-parameterized deep ReLU networks. Machine Learning, 109(3), 467-492. DOI: https://doi.org/10.1007/s10994-019-05839-6
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.