Harisu Abdullahi Shehu

Emotions are considered to convey much meaning in communication. Hence, artificial methods for emotion categorization are being developed to meet the increasing demand to introduce intelligent systems, such as robots, into shared workspaces. Deep learning algorithms have demonstrated limited competency in categorizing images from posed datasets with the main features of the face being visible. However, the use of sunglasses and face masks are common in our daily lives, especially with the outbreak of communicable diseases such as the recent coronavirus. Anecdotally, partial coverings of the face reduce the effectiveness of human communication, so would this have hampering effects on computer vision, and if so, would different emotion categories be affected equally? Here, we analyze the performance of emotion classification systems when faces are partially covered with simulated sunglasses and face masks. Deep neural networks consider all pixels in an image as equally important unlike the neuroscientific findings on how humans recognize emotions. Hence, we propose a method that considers different constituent parts (e.g. mouth, eyes, and jaw) separately, giving more attention to relevant (uncovered) regions of the face. The method is compared with three standard, partial coverings-based and attention-based methods. We found that face coverings worsen emotion categorization by up to 74% for the state-of-the-art methods, whereby emotion categories are affected differently by different coverings, e.g. clear mouth coverings have little effect on categorizing happiness, but sadness is affected badly. The proposed method (on average 60.43%) has significantly improved the performance over the standard deep learning (< 46% on average), partial coverings-based (< 47% on average), as well as the attention-based (< 51% on average) methods for both the CK+, KDEF, and RAF-DB datasets when faces were partially covered with sunglasses or different face masks.