LSCAformer: Long and short-term cross-attention-aware transformer for depression recognition from video sequencesShow others and affiliations
2024 (English)In: Biomedical Signal Processing and Control, ISSN 1746-8094, E-ISSN 1746-8108, Vol. 98, p. 1-10, article id 106767Article in journal (Refereed) Published
Abstract [en]
Depression will be the first prevalent mental disorder to result in the negative impact on individuals and society globally by 2030. Artificial intelligence (AI) algorithms have the potentials to significantly advance depression treatment. Existing deep learning-based architectures for the automatic diagnosis of a patient depression severity have the two primary challenges: (1) How to effectively learn both long-term and short-term patterns of depression? (2) How to efficiently merge long-term and short-term depressive features to achieve extended predictions from facial videos? To mitigate these challenges, a novel long short-term cross-attention-aware Transformer (LSCAformer) that is engineered for video-based depression recognition. Within LSCAformer, two architectures are introduced, i.e., a long short-term feature extraction (LSTFE) and a cross-attention-aware Transformer. Initially, LSTFE employs two separate branches to capture depression behaviors across long and short-term intervals. Subsequently, cross-attention-aware Transformer is implemented to identify complementary patterns within both long-term and short-term features, employing temporal-directed attention (TDA) to discern complementary temporal patterns across the long and short duration branches. On the AVEC2013/AVEC2014, the LSCAformer demonstrated superior performances with a root mean square error (RMSE), a mean absolute error (MAE) and a concordance correlation coefficient (CCC) of 7.69/5.89/0.868 and 7.55/5.91/0.845, respectively. Additionally, cross dataset experiments are performed to valid the generalization of the LSCAformer with a RMSE of 7.21, a MAE of 5.63, and a CCC of 0.874 (AVEC2013 for training, and the Northwind task of AVEC2014 for testing). Moreover, the proposed method can model the complementary behavioral patterns between long-term and short-term sequences for depression recognition. Code will be available at: https://github.com/helang818/LSCAformer/. © 2024
Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2024. Vol. 98, p. 1-10, article id 106767
Keywords [en]
Affective computing, Depression, Facial regions, Long and short-term, LSCAformer
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hh:diva-54561DOI: 10.1016/j.bspc.2024.106767Scopus ID: 2-s2.0-85201464281OAI: oai:DiVA.org:hh-54561DiVA, id: diva2:1895477
Note
Funding: National Natural Science Foundation of China (grant 62376215), the Open Fund of National Engineering Laboratory for Big Data System Computing Technology (Grant No. SZU-BDSC-OF2024-16), the Humanities and Social Sciences Program of the Ministry of Education (22YJCZH048), the Key Research and Development Project of Shaanxi Province (2024GX-YBXM-137), the Open Fund of Key Laboratory of Modem Teaching Technology, Minsity of Education, the National Natural Science Foundation of China (grant 62276210), the Shaanxi Provincial Social Science Foundation(grant 2021K015), the Weinan Key Research and Development Plan Project (grant WSYKJ2022-5).
2024-09-052024-09-052025-10-01Bibliographically approved