Facemask Detection
Deep Learning using Python and Tensorflow Keras
Abstract
The COVID-19 pandemic caused the creation of many new regulations in an effort to prevent or slow the spread of the virus. One of the most widespread regulation was the requirement for wearing a facemask. Although many establishments have lifted facemask requirements, some places like hospitals and senior living places still require it. This project creates a machine learning model to detect faces and determine whether or not someone is wearing a mask.
Background
Facemask are intended to prevent the spread of the coronavirus by filtering droplets exhaled from a person while also potentially filtering droplets being inhaled by another person (Steinbrook, 2020). Andrejko et al. (2022) conducted a study to see how effective facemasks are at preventing the spread of COVID-19. They concluded that if you wear a respirator, such as a N95 mask, you reduce your likelihood of testing positive for COVID-19 by 83%. If wearing a surgical mask, the odds are 66% lower and for cloth mask it is 56% lower (Andrejko et al., 2022).
Creating a face detection model and deploying it real time can be used to monitor certain areas where risk of infections can be high, such as in gyms. It can also be used at the entrance of hospitals to detect if staff or patients entering the building are wearing a facemask. If they are not, it can sound a friendly alert to put one on.
There are many facemask detection systems but they all use some form machine learning, mainly deep learning algorithms. They require labeled data to train the model to detect facemasks. The general methodology for creating a facemask detection model is using inputs to feed a deep learning model, extracting regions of interest, then performing classification (Nowrin et al., 2021).
Literature review
Since the start of the COVID-19 pandemic, there have been many computer vision programs created at detecting face mask. Some projects started from scratch while others were built upon pretrained models. The benefit of using a general pretrained model and then building on top of that is that it is simpler to execute and less training data is needed.
Saravanan et al. (2022) created two different models using a pretrained VGG-16 deep learning model as the base layer. To train the last layer, which is the fully connected layer, they used a facemask dataset with 7200 images. On the other model, they trained the fully connected layer on 1484 images but this time used image augmentation to generate more data from the same images. The model trained on the dataset with 7200 images had a testing accuracy of 91.25% while the model trained on the dataset with 1484 images plus augmented data had a testing accuracy of 96.50% (Saravanan et al., 2022).
Militante and Dionisio (2020) took a similar approach and used the VGG-16 architecture as the foundation. However, they trained their model on a much larger dataset with 25,000 images. They also used image augmentation on the dataset as well. Training was done on a computer with a GPU and they achieved a validation accuracy of 96%. The model was saved and loaded on a raspberry pi to perform detection on a live video stream (Militante and Dionisio, 2020).
Sandesara, D. Joshi and S. Joshi (2020) took a different approach to the problem and trained their model from scratch. They utilized the Stacked Conv-2D architecture which is a series of convolutional layers consisting of different filters. They were able to achieve testing accuracy of 95%. They also implemented a notification system that sends out an email if someone is detected without a facemask (Sandesara, D. Joshi and S. Joshi, 2020).
Method
The method used for creating a facemask detection involves first identifying faces in an image and then extracting them. Smys, Bestak, Kotuliak and Palanisamy (2021) built a face detection model using the Haar cascade and Caffe model methods. The Caffe model was more accurate and was able to track most head movements (Smys et al., 2021).
The next step is to build the model. MobileNetV2 is a deep neural network that can be used for classification and Tensorflow Keras has a package to load the model already pretrained on the ImageNet dataset (Nagrath et al., 2021). Using MobileNetV2 as the base layer, only the fully connector (FC) layers needed to be trained. To create the FC layers, the Tensorflow Keras library can be utilized to add different layer types like convolution, pooling and dense (Sharma et al., 2019). The FC layers are to be trained on images of faces masked and not masked. The strengths of using transfer learning to build the model is that less training data is required and it doesn’t require a lot of computational power to train the model. Its weakness is that the pretrained layers can negatively affect the outcome which is known as negative transfer (Ge et al., 2014).
The final step is to capture video and process them to find faces and determine if they are masked or not. To capture the video and process it, OpenCV can be used to read in a video stream and process it frame by frame (Sharma et al., 2019). Each frame can then be feed into the facial detection model then passed through to the mask detection model.
Findings
The model was trained on 200 images of myself, 100 with a mask on and 100 without a mask. Of those images, the Caffemodel was able to detect faces in 196 images. Those 196 images were split 80/20 into test and training sets. The trained model was able to achieve an overall accuracy of 100%. To further validate the model, an additional 70 images were captured and the model was able to achieve a 100% accuracy on that dataset as well.
The model was able to achieve such a high accuracy likely because the dataset was too similar. To create a more diverse dataset, variations in people, facemask, lighting and background should be used.
Conclusion
The model was able to perform really well over video of myself. The model was mediocre when shown people other than myself, with different facemasks and other variations that the model was never trained on. It was able to correctly classify most people with black and blue mask on and some people without mask. The model really struggled with people wearing a white mask, people with a lot of facial hair or when images are crowded with people and faces are partially hidden.
To improve the model, different deep learning models should have been tested to see which ones perform better. Smys, Bestak, Kotuliak and Palanisamy (2021) tested a Multi-task cascaded neural network (MTCNN) and it performed better than the Caffe model on facial recognition. Saravanan et al. (2022) was able to achieve success on facemask detection using a VGG-16 model so that can be tested and compared to this model that used MobileNetV2. However, the biggest improvement to the model would likely be to use a more diverse dataset during training.
References
Andrejko, K.L., Pry, J.M., Myers, J.F., et al. (2022) Effectiveness of Face Mask or Respirator Use in Indoor Public Settings for Prevention of SARS-CoV-2 Infection — California, February–December 2021. MMWR Morb Mortal Wkly Rep 2022;71:212–216.
Ge, L., Gao, J., Ngo, H., Li, K. & Zhang, A. (2014), On handling negative transfer and imbalanced distributions in multiple source transfer learning. Statistical Analy Data Mining, 7: 254-271.
Militante, S.V. & Dionisio, N.V. (2020). Real-Time Facemask Recognition with Alarm System using Deep Learning. 2020 11th IEEE Control and System Graduate Research Colloquium (ICSGRC), 106-110.
Nowrin, A., Afroz, S., Rahman, M. S., Mahmud, I. & Cho, Y. (2021) Comprehensive Review on Facemask Detection Techniques in the Context of Covid-19. IEEE Access, vol. 9, pp. 106839-106864.
Sandesara, A., Joshi, D. & Joshi, S. (2020). Facial Mask Detection Using Stacked CNN Model. International Journal of Scientific Research in Computer Science, Engineering and Information Technology. 264-270.
Saravanan, T. M., Karthiha, K., Kavinkumar, R., Gokul, S., & Mishra, J. P. (2022). A novel machine learning scheme for face mask detection using pretrained convolutional neural network. Materials today. Proceedings, 58, 150–156.
Sharma, A., Ravi Vishwesh, S., & Beyeler, M. (2019). Machine Learning for OpenCV 4: Intelligent algorithms for building image processing apps using OpenCV 4, Python, and scikit-learn. Birmingham: Packt Publishing.
Smys, S., Bestak, R., Kotuliak, I. & Palanisamy, R. (2021) Computer Networks and Inventive Communication Technologies: Proceedings of Fourth ICCNCT. Switzerland: Springer Singapore.
Steinbrook, R. (2020) Filtration Efficiency of Face Masks Used by the Public During the COVID-19 Pandemic. JAMA Intern Med. 2021;181(4):470.
Nagrath, P., Jain, R., Madan, A., Arora, R., Kataria, P. & Hemanth, J., (2021) SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2, Sustainable Cities and Society, Volume 66, 2021, 102692.