hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
LSCAformer: Long and short-term cross-attention-aware transformer for depression recognition from video sequences
Xi'an University of Posts and Telecommunications, Xi'an, China; Xi'an University Of Posts And Telecommunications, Xi'an, China; Xi'an University of Posts and Telecommunications, Xi'an, China.
Xi'an University of Technology, Xi'an, China.
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-2851-4260
The First Affiliated Hospital of Xi'an Jiaotong University, Xi'an, China.
Show others and affiliations
2024 (English)In: Biomedical Signal Processing and Control, ISSN 1746-8094, E-ISSN 1746-8108, Vol. 98, p. 1-10, article id 106767Article in journal (Refereed) Published
Abstract [en]

Depression will be the first prevalent mental disorder to result in the negative impact on individuals and society globally by 2030. Artificial intelligence (AI) algorithms have the potentials to significantly advance depression treatment. Existing deep learning-based architectures for the automatic diagnosis of a patient depression severity have the two primary challenges: (1) How to effectively learn both long-term and short-term patterns of depression? (2) How to efficiently merge long-term and short-term depressive features to achieve extended predictions from facial videos? To mitigate these challenges, a novel long short-term cross-attention-aware Transformer (LSCAformer) that is engineered for video-based depression recognition. Within LSCAformer, two architectures are introduced, i.e., a long short-term feature extraction (LSTFE) and a cross-attention-aware Transformer. Initially, LSTFE employs two separate branches to capture depression behaviors across long and short-term intervals. Subsequently, cross-attention-aware Transformer is implemented to identify complementary patterns within both long-term and short-term features, employing temporal-directed attention (TDA) to discern complementary temporal patterns across the long and short duration branches. On the AVEC2013/AVEC2014, the LSCAformer demonstrated superior performances with a root mean square error (RMSE), a mean absolute error (MAE) and a concordance correlation coefficient (CCC) of 7.69/5.89/0.868 and 7.55/5.91/0.845, respectively. Additionally, cross dataset experiments are performed to valid the generalization of the LSCAformer with a RMSE of 7.21, a MAE of 5.63, and a CCC of 0.874 (AVEC2013 for training, and the Northwind task of AVEC2014 for testing). Moreover, the proposed method can model the complementary behavioral patterns between long-term and short-term sequences for depression recognition. Code will be available at: https://github.com/helang818/LSCAformer/. © 2024

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2024. Vol. 98, p. 1-10, article id 106767
Keywords [en]
Affective computing, Depression, Facial regions, Long and short-term, LSCAformer
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hh:diva-54561DOI: 10.1016/j.bspc.2024.106767Scopus ID: 2-s2.0-85201464281OAI: oai:DiVA.org:hh-54561DiVA, id: diva2:1895477
Note

Funding: National Natural Science Foundation of China (grant 62376215), the Open Fund of National Engineering Laboratory for Big Data System Computing Technology (Grant No. SZU-BDSC-OF2024-16), the Humanities and Social Sciences Program of the Ministry of Education (22YJCZH048), the Key Research and Development Project of Shaanxi Province (2024GX-YBXM-137), the Open Fund of Key Laboratory of Modem Teaching Technology, Minsity of Education, the National Natural Science Foundation of China (grant 62276210), the Shaanxi Provincial Social Science Foundation(grant 2021K015), the Weinan Key Research and Development Plan Project (grant WSYKJ2022-5).

Available from: 2024-09-05 Created: 2024-09-05 Last updated: 2025-10-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Tiwari, Prayag

Search in DiVA

By author/editor
Tiwari, Prayag
By organisation
School of Information Technology
In the same journal
Biomedical Signal Processing and Control
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 68 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf