Open this publication in new window or tab >>Show others...
2026 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 126, no B, p. 1-11, article id 103632Article in journal (Refereed) In press
Abstract [en]
Depression profoundly impacts multiple dimensions of an individual's life, including personal and social functioning, academic achievement, occupational productivity, and overall quality of life. With recent advancements in affective computing, deep learning technologies have been increasingly adopted to identify patterns indicative of depression. However, due to concerns over participant privacy, data in this domain remain scarce, posing significant challenges for the development of robust discriminative models for depression detection. To address this limitation, we build a Large-scale Multimodal Vlog Dataset (LMVD) for depression recognition in real-world settings. The LMVD dataset comprises 1,823 video samples, totaling approximately 214 h of content, collected from 1,475 participants across four major multimedia platforms: Sina Weibo, Bilibili, TikTok, and YouTube. In addition, we introduce a novel architecture, MDDformer, specifically designed to capture non-verbal behavioral cues associated with depressive states. Extensive experimental evaluations conducted on LMVD demonstrate the superior performance of MDDformer in depression detection tasks. We anticipate that LMVD will become a valuable benchmark resource for the research community, facilitating progress in multimodal, real-world depression recognition. The dataset and source code will be made publicly available at: https://github.com/helang818/LMVD. © 2025 Elsevier B.V., All rights reserved.
Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2026
Keywords
Deep Learning, Depression Detection, Multimodal, Transformer, Vlog, Behavioral Research, Data Privacy, Human Computer Interaction, Interactive Computer Systems, Large Datasets, Learning Systems, Multimedia Systems, Academic Achievements, Deep Learning, Depression Detection, Large-scales, Multi-modal, Multiple Dimensions, Overall Quality, Quality Of Life, Transformer, Vlog
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-57350 (URN)10.1016/j.inffus.2025.103632 (DOI)2-s2.0-105014021546 (Scopus ID)
2025-09-182025-09-182025-10-01Bibliographically approved