hh.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 118) Show all publications
He, L., Chen, K., Zhao, J., Wang, Y., Pei, E., Chen, H., . . . Tiwari, P. (2026). LMVD: A large-scale multimodal vlog dataset for depression detection in the wild. Information Fusion, 126(B), 1-11, Article ID 103632.
Open this publication in new window or tab >>LMVD: A large-scale multimodal vlog dataset for depression detection in the wild
Show others...
2026 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 126, no B, p. 1-11, article id 103632Article in journal (Refereed) In press
Abstract [en]

Depression profoundly impacts multiple dimensions of an individual's life, including personal and social functioning, academic achievement, occupational productivity, and overall quality of life. With recent advancements in affective computing, deep learning technologies have been increasingly adopted to identify patterns indicative of depression. However, due to concerns over participant privacy, data in this domain remain scarce, posing significant challenges for the development of robust discriminative models for depression detection. To address this limitation, we build a Large-scale Multimodal Vlog Dataset (LMVD) for depression recognition in real-world settings. The LMVD dataset comprises 1,823 video samples, totaling approximately 214 h of content, collected from 1,475 participants across four major multimedia platforms: Sina Weibo, Bilibili, TikTok, and YouTube. In addition, we introduce a novel architecture, MDDformer, specifically designed to capture non-verbal behavioral cues associated with depressive states. Extensive experimental evaluations conducted on LMVD demonstrate the superior performance of MDDformer in depression detection tasks. We anticipate that LMVD will become a valuable benchmark resource for the research community, facilitating progress in multimodal, real-world depression recognition. The dataset and source code will be made publicly available at: https://github.com/helang818/LMVD. © 2025 Elsevier B.V., All rights reserved.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2026
Keywords
Deep Learning, Depression Detection, Multimodal, Transformer, Vlog, Behavioral Research, Data Privacy, Human Computer Interaction, Interactive Computer Systems, Large Datasets, Learning Systems, Multimedia Systems, Academic Achievements, Deep Learning, Depression Detection, Large-scales, Multi-modal, Multiple Dimensions, Overall Quality, Quality Of Life, Transformer, Vlog
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-57350 (URN)10.1016/j.inffus.2025.103632 (DOI)2-s2.0-105014021546 (Scopus ID)
Available from: 2025-09-18 Created: 2025-09-18 Last updated: 2025-10-01Bibliographically approved
Hashemi-Nazari, Y., Tajaddini, A., Saberi-Movahed, F., Alonso-Fernandez, F. & Tiwari, P. (2026). Robust oblique projection and weighted NMF for hyperspectral unmixing. Pattern Recognition, 170, 1-16, Article ID 112029.
Open this publication in new window or tab >>Robust oblique projection and weighted NMF for hyperspectral unmixing
Show others...
2026 (English)In: Pattern Recognition, ISSN 0031-3203, E-ISSN 1873-5142, Vol. 170, p. 1-16, article id 112029Article in journal (Refereed) In press
Abstract [en]

Hyperspectral unmixing (HU) is a crucial method for interpreting remotely sensed hyperspectral images (HSIs), with the aim of splitting the image into pure spectral components (endmembers) and their abundance fractions in every pixel of the scene. However, the effectiveness of this procedure is hindered by the presence of noise and anomalies. These kind of disruptions mainly arise from real-world factors such as atmospheric effects and endmember variability. To address this challenge, a novel approach called Graph-Regularized Oblique Projection Weighted NMF (GOP-WNMF) is introduced, which is grounded in a more precise separation of signal and noise subspaces, aiming to enhance the accuracy and robustness of the analysis. GOP-WNMF achieves this by constructing an oblique projector that projects each pixel onto the signal subspace, i.e., the space formed by signatures of endmembers, and parallel to the noise subspace. This approach effectively suppresses noise while preserving crucial spectral information. Furthermore, our new oblique NMF framework includes a unique residual-based weighting approach to detect and remove anomalies in pixels and spectral bands simultaneously. In addition to this, another weighting matrix is proposed by establishing a bipartite graph connecting endmembers and pixels to promote smoothness and sparsity in the resulting abundance maps. GOP-WNMF also enhances abundance map estimation accuracy by mitigating the negative effects of pixel outliers through the utilization of Laplacian eigenmaps technique to maintain the manifold structure of data. The effectiveness of GOP-WNMF is evaluated through comprehensive testing on synthetic and real HSIs, and its superiority is demonstrated over multiple state-of-the-art approaches. The source code is also available at https://github.com/yasinhashemi/GOP-WNMF. © 2025 The Authors

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2026
Keywords
Anomaly detection, Laplacian eigenmaps, Non-negative matrix factorization (NMF), Oblique projection, Sparse unmixing
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-57086 (URN)10.1016/j.patcog.2025.112029 (DOI)2-s2.0-105009880361 (Scopus ID)
Funder
Swedish Research CouncilVinnova
Available from: 2025-07-23 Created: 2025-07-23 Last updated: 2025-10-01Bibliographically approved
Yao, C., Zhang, X., Liu, Y., Zhao, B., Tiwari, P. & Kumar, N. (2025). A Blockchain-Enabled Secure and Decentralized ADS-B System for Intelligent Vehicles With Robust Authentication. IEEE Transactions on Vehicular Technology, 74(7), 11294-11309
Open this publication in new window or tab >>A Blockchain-Enabled Secure and Decentralized ADS-B System for Intelligent Vehicles With Robust Authentication
Show others...
2025 (English)In: IEEE Transactions on Vehicular Technology, ISSN 0018-9545, E-ISSN 1939-9359, Vol. 74, no 7, p. 11294-11309Article in journal (Refereed) Published
Abstract [en]

As the next-generation air traffic surveillance technology, the Automatic Dependent Surveillance-Broadcast (ADS-B) system plays a critical role in broadcast communication for intelligent vehicles such as unmanned aerial vehiclesand aircraft. However, the unauthenticated channels expose the ADS-B system to a high risk of attacks. Besides, the centralization of intelligent vehicles management center is susceptible to single point failures and performance bottlenecks. In this paper, we propose a robust and distributed intelligent vehicles management architecture leveraging blockchain technology to resolve security and performance issues lying in the registration, update, and revocation of intelligent vehicles' identity public keys. Then, we leverage certificate-less short signature (CLSS) to achieve the dynamic identity verification of ADS-B system participants while ensuring message integrity, non-repudiation, and compatibility with low-data-bit message packets. Furthermore, to enhance intelligent vehicles transaction process efficiency within the proposed architecture, we develop an Verifiable Random Function-based Byzantine Fault Tolerance protocol (VRBFT) combining threshold signatures. Finally, through a comprehensive security analysis and feasibility assessment, the results indicate that authentication requires less than 30 ms, block generation occurs within 200 ms, and the system exhibits high throughput and low latency. These findings demonstrate that our solution significantly enhances the security of the ADS-B system and offers a practical and promising approach for real-time, large-scale intelligent vehicle management. © 2025 IEEE.

Place, publisher, year, edition, pages
Piscataway: IEEE, 2025
Keywords
blockchain, byzantine fault tolerance consensus, certificate-less short signature, decentralized ADS-B system, intelligent vehicles, robust authentication for ADS-B
National Category
Communication Systems
Identifiers
urn:nbn:se:hh:diva-55720 (URN)10.1109/TVT.2025.3546947 (DOI)2-s2.0-86000464880 (Scopus ID)
Note

Funding: Ministry of Industry and Information Technology (Grant Number: 23100002022102001)

Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-10-01Bibliographically approved
Hao, M., Gu, Y., Dong, K., Tiwari, P., Lv, X. & Ning, X. (2025). A prompt regularization approach to enhance few-shot class-incremental learning with Two-Stage Classifier. Neural Networks, 188, 1-11, Article ID 107453.
Open this publication in new window or tab >>A prompt regularization approach to enhance few-shot class-incremental learning with Two-Stage Classifier
Show others...
2025 (English)In: Neural Networks, ISSN 0893-6080, E-ISSN 1879-2782, Vol. 188, p. 1-11, article id 107453Article in journal (Refereed) Published
Abstract [en]

With a limited number of labeled samples, Few-Shot Class-Incremental Learning (FSCIL) seeks to efficiently train and update models without forgetting previously learned tasks. Because pre-trained models can learn extensive feature representations from big existing datasets, they offer strong knowledge foundations and transferability, which makes them useful in both few-shot and incremental learning scenarios. Additionally, Prompt Learning improves pre-trained deep learning models’ performance on downstream tasks, particularly in large-scale language or vision models. In this paper, we propose a novel Prompt Regularization (PrRe) approach to maximize the fusion of prompts by embedding two different prompts, the Task Prompt and the Global Prompt, inside a pre-trained Vision Transformer (ViT). In the classification phase, we propose a Two-Stage Classifier (TSC), utilizing K-Nearest Neighbors for base session and a Prototype Classifier for incremental sessions, integrated with a global self-attention module. Through experiments on multiple benchmark tests, we demonstrate the effectiveness and superiority of our method. The code is available at https://github.com/gyzzzzzzzz/PrRe. © 2025 Elsevier Ltd

Place, publisher, year, edition, pages
Oxford: Elsevier, 2025
Keywords
Few-shot class-incremental learning, Global self-attention module, Prompt regularization, Two-Stage Classifier
National Category
Natural Language Processing Signal Processing
Identifiers
urn:nbn:se:hh:diva-55930 (URN)10.1016/j.neunet.2025.107453 (DOI)001469409400001 ()2-s2.0-105002288715 (Scopus ID)
Available from: 2025-04-30 Created: 2025-04-30 Last updated: 2025-10-01Bibliographically approved
Özen, C., Nowaczyk, S., Tiwari, P. & Pashami, S. (2025). Assessing the Graph Structure Learning in Graph Deviation Networks. In: Georg Krempl; Kai Puolamäki; Ioanna Miliou (Ed.), Advances in Intelligent Data Analysis XXIII (IDA 2025): Proceedings. Paper presented at 23rd International Symposium on Intelligent Data Analysis, IDA 2025, Konstanz, Germany, 7-9 May, 2025 (pp. 97-109). Cham: Springer
Open this publication in new window or tab >>Assessing the Graph Structure Learning in Graph Deviation Networks
2025 (English)In: Advances in Intelligent Data Analysis XXIII (IDA 2025): Proceedings / [ed] Georg Krempl; Kai Puolamäki; Ioanna Miliou, Cham: Springer, 2025, p. 97-109Conference paper, Published paper (Refereed)
Abstract [en]

Statistical modeling of multivariate time-series data poses significant challenges due to their high dimensionality and complex inter-variable relationships. Reliable forecasts or anomaly detection on these datasets require capturing such relationships within and between the features. While traditional deep learning architectures are good at capturing temporal non-linear patterns within features, they are less efficient at modeling inter-variable relationships explicitly structured as graphs-a capability where Graph Neural Networks (GNNs) excel. Inspired by the success of GNNs, Graph Deviation Network (GDN) was originally proposed for anomaly detection on industrial multivariate time-series data. After proving its merits through experiments with real-world data, GDN gained significant popularity in the research community, claiming to learn the hidden graph structure in any multivariate time-series data. Various modifications to GDN were proposed over the years, but essentially all of them kept its Graph Structure Learning (GSL) module intact. However, until now, this module has never been rigorously evaluated. This work scrutinizes the contribution of the GSL module. Our experiments reveal that the graph learned by GSL is relatively ineffective, and the key to the overall performance achieved by GDN lies almost entirely in the downstream Graph Attention Network (GAT) module. We hope our findings will garner attention for further development of the GSL module of GDN, whose fidelity can improve the performance of GDN variants. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

Place, publisher, year, edition, pages
Cham: Springer, 2025
Series
Lecture Notes in Computer Science ; 15669
Keywords
GNNs for Time-Series Anomaly Detection, Graph Deviation Network, Graph Neural Networks, Graph Structure Learning
National Category
Computer Sciences
Research subject
Smart Cities and Communities, Future industry
Identifiers
urn:nbn:se:hh:diva-56289 (URN)10.1007/978-3-031-91398-3_8 (DOI)2-s2.0-105005282687 (Scopus ID)978-3-031-91397-6 (ISBN)978-3-031-91398-3 (ISBN)
Conference
23rd International Symposium on Intelligent Data Analysis, IDA 2025, Konstanz, Germany, 7-9 May, 2025
Funder
Knowledge FoundationVinnova
Available from: 2025-07-08 Created: 2025-07-08 Last updated: 2025-10-01Bibliographically approved
Rong, L., Zhang, Y., Tiwari, P. & Yu, M. (2025). BegoniaGPT: Cultivating the large language model to be an exceptional K-12 English teacher. Neural Networks, 189, 1-15, Article ID 107488.
Open this publication in new window or tab >>BegoniaGPT: Cultivating the large language model to be an exceptional K-12 English teacher
2025 (English)In: Neural Networks, ISSN 0893-6080, E-ISSN 1879-2782, Vol. 189, p. 1-15, article id 107488Article in journal (Refereed) Published
Abstract [en]

Large language models (LLMs) have taken the natural language processing (NLP) domain by storm, and their transformative momentum has surged into the domain of education, giving rise to a nascent wave of education-tailored LLMs. Despite their potential to facilitate homework assistance, such LLMs fall short in the fine-grained domain of elementary and secondary school (i.e., K-12) education. They often indiscriminately incorporate broad knowledge across diverse disciplines, overlooking the stark disparities in cognitive demands and curricular content among elementary, middle, and high school phases. To fill this gap, we propose a new English teaching LLM, called BegoniaGPT, which discards irrelevant knowledge from other disciplines, and shapes the general LLM to be an exceptional English teacher by emphasizing four key aspects: foundational English knowledge, professional proficiency, international vision, and psychological support. In particular, we build a large-scale English corpus named EngCorpus, including 35,000 instructions and conversations tailored towards three roles: students, teachers, and parents, as well as 30,000 emotional conversations. By continued pre-training and supervised fine-tuning the general LLM on the carefully curated EngCorpus and aligning it with reinforcement learning with expert feedback, BegoniaGPT could provide refined, specialized, personalized and compassionate English education. Through a comprehensive empirical comparison on four English benchmarks, e.g., E-EVAL, 2023–2024 the PEP edition of entrance examination for middle school in China (EEM), 2024 the PEP edition of entrance examination for high school (EEH), 2024 Gaokao, National Paper I (Eng-Gaokao), we show that BegoniaGPT achieves the state-of-the-art performance over 10 SOTA LLMs. Further Claude 3-opus and expert manual evaluations further validate BegoniaGPT's teaching advantages. © 2025 Elsevier Ltd

Place, publisher, year, edition, pages
Oxford: Elsevier, 2025
Keywords
English teaching, Large language model, K-12 education, Fine-tuning
National Category
Studies of Specific Languages Didactics
Identifiers
urn:nbn:se:hh:diva-56147 (URN)10.1016/j.neunet.2025.107488 (DOI)001493193500001 ()40375418 (PubMedID)2-s2.0-105004878325 (Scopus ID)
Note

Funding: This paper is partly supported by a grant under Hong Kong RGC Theme-based Research Scheme (project no. T45-401/22-N). This work is supported by National Science Foundation of China under grant No. 62006212, Fellowship from the China Postdoctoral Science Foundation, China (2023M733907), Natural Science Foundation of Hunan Province of China (242300421412), Foundation of Key Laboratory of Dependable Service Computing in Cyber–Physical-Society (Ministry of Education), Chongqing University, China (PJ.No: CPSDSC202103).

Available from: 2025-06-24 Created: 2025-06-24 Last updated: 2025-10-01Bibliographically approved
Tan, Z., Zhang, G., Tan, Z., Tiwari, P., Wang, Y. & Yang, Y. (2025). CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification. Information Fusion, 120, 1-11, Article ID 103011.
Open this publication in new window or tab >>CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification
Show others...
2025 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 120, p. 1-11, article id 103011Article in journal (Refereed) Published
Abstract [en]

Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the CAMemra-specific Class Activation Map (CAM2), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the CAM2-guided Vision Transformer, which is termed CAM2Former, with three core designs. First, we develop Fusion of CAMmera-specific Class Activation Map, termed CAM2Fusion, which consists of positive and negative CAM2 that operate in synergy to capture visual patterns representative of the discriminative foreground components. Second, to enhance the representation ability of pivotal foreground components, we introduce a CAM2Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative CAM2. Third, since the enhancement of foreground representations in CAM2Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via CAM2Former. This facilitates the extraction of discriminative foreground representations, circumventing CAM2 dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes. © 2025 Published by Elsevier B.V.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2025
Keywords
Fusion-attention mechanism, Camera-specific Class Activation Map, Occluded person ReID
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:hh:diva-55721 (URN)10.1016/j.inffus.2025.103011 (DOI)001449880400001 ()2-s2.0-86000785121 (Scopus ID)
Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-10-01Bibliographically approved
Alonso-Fernandez, F., Hernandez-Diaz, K., Buades Rubio, J. M., Tiwari, P. & Bigun, J. (2025). Deep network pruning: A comparative study on CNNs in face recognition. Pattern Recognition Letters, 189, 221-228
Open this publication in new window or tab >>Deep network pruning: A comparative study on CNNs in face recognition
Show others...
2025 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 189, p. 221-228Article in journal (Refereed) Published
Abstract [en]

The widespread use of mobile devices for all kinds of transactions makes necessary reliable and real-time identity authentication, leading to the adoption of face recognition (FR) via the cameras embedded in such devices. Progress of deep Convolutional Neural Networks (CNNs) has provided substantial advances in FR. Nonetheless, the size of state-of-the-art architectures is unsuitable for mobile deployment, since they often encompass hundreds of megabytes and millions of parameters. We address this by studying methods for deep network compression applied to FR. In particular, we apply network pruning based on Taylor scores, where less important filters are removed iteratively. The method is tested on three networks based on the small SqueezeNet (1.24M parameters) and the popular MobileNetv2 (3.5M) and ResNet50 (23.5M) architectures. These have been selected to showcase the method on CNNs with different complexities and sizes. We observe that a substantial percentage of filters can be removed with minimal performance loss. Also, filters with the highest amount of output channels tend to be removed first, suggesting that high-dimensional spaces within popular CNNs are over-dimensioned. The models of this paper are available at https://github.com/HalmstadUniversityBiometrics/CNN-pruning-for-face-recognition. © 2025.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2025
Keywords
Convolutional Neural Networks, Deep learning, Face recognition, Mobile biometrics, Network pruning, Taylor expansion
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:hh:diva-55571 (URN)10.1016/j.patrec.2025.01.023 (DOI)2-s2.0-85217214565 (Scopus ID)
Funder
Vinnova, PID2022-136779OB-C32Swedish Research CouncilEuropean Commission
Note

This work was partly done while F. A.-F. was a visiting researcher at the University of the Balearic Islands . F. A.-F., K. H.-D., and J. B. thank the Swedish Research Council (VR) and the Swedish Innovation Agency (VINNOVA) for funding their research. This work is part of the Project PID2022-136779OB-C32 (PLEISAR) funded by MICIU/ AEI /10.13039/501100011033/ and FEDER, EU.

Available from: 2025-02-28 Created: 2025-02-28 Last updated: 2025-10-01Bibliographically approved
Zhang, Y., Wang, M., Wu, Y., Tiwari, P., Li, Q., Wang, B. & Qin, J. (2025). DialogueLLM: Context and emotion knowledge-tuned large language models for emotion recognition in conversations. Neural Networks, 192, 1-15, Article ID 107901.
Open this publication in new window or tab >>DialogueLLM: Context and emotion knowledge-tuned large language models for emotion recognition in conversations
Show others...
2025 (English)In: Neural Networks, ISSN 0893-6080, E-ISSN 1879-2782, Vol. 192, p. 1-15, article id 107901Article in journal (Refereed) In press
Abstract [en]

Large language models (LLMs) and their variants have shown extraordinary efficacy across numerous downstream natural language processing tasks. Despite their remarkable performance in natural language generating, LLMs lack a distinct focus on the emotion understanding domain. As a result, using LLMs for emotion recognition may lead to suboptimal and inadequate precision. Another limitation of the current LLMs is that they are typically trained without leveraging multi-modal information. To overcome these limitations, we formally model emotion recognition as text generation tasks, and thus propose DialogueLLM, a context and emotion knowledge tuned LLM that is obtained by fine-tuning foundation large language models. In particular, it is a context-aware model, which can accurately capture the dynamics of emotions throughout the dialogue. We also prompt ERNIE Bot with expert-designed prompts to generate the textual descriptions of the videos. To support the training of emotional LLMs, we create a large scale dataset of over 24K utterances to serve as a knowledge corpus. Finally, we offer a comprehensive evaluation of DialogueLLM on three benchmarking datasets and significantly outperform 15 state-of-the-art baselines and 3 state-of-the-art LLMs. The emotion intelligence test shows that DialogueLLM achieves 109 score and surpasses 72 % humans. Additionally, DialogueLLM-7B can be easily reproduced using LoRA on a 40GB A100 GPU in 5 hours. © 2025 Elsevier Ltd

Place, publisher, year, edition, pages
Oxford: Elsevier, 2025
Keywords
Context modeling, Emotion recognition, Large language models, Natural language processing
National Category
Natural Language Processing
Identifiers
urn:nbn:se:hh:diva-57201 (URN)10.1016/j.neunet.2025.107901 (DOI)001544939300003 ()2-s2.0-105012239350 (Scopus ID)
Available from: 2025-09-26 Created: 2025-09-26 Last updated: 2025-10-01Bibliographically approved
Liang, G., Tiwari, P., Nowaczyk, S., Byttner, S. & Alonso-Fernandez, F. (2025). Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatiotemporal Forecasting. IEEE Transactions on Neural Networks and Learning Systems, 33(5), 9524-9537
Open this publication in new window or tab >>Dynamic Causal Explanation Based Diffusion-Variational Graph Neural Network for Spatiotemporal Forecasting
Show others...
2025 (English)In: IEEE Transactions on Neural Networks and Learning Systems, ISSN 2162-237X, E-ISSN 2162-2388, Vol. 33, no 5, p. 9524-9537Article in journal (Refereed) Published
Abstract [en]

Graph neural networks (GNNs), especially dynamic GNNs, have become a research hotspot in spatiotemporal forecasting problems. While many dynamic graph construction methods have been developed, relatively few of them explore the causal relationship between neighbor nodes. Thus, the resulting models lack strong explainability for the causal relationship between the neighbor nodes of the dynamically generated graphs, which can easily lead to a risk in subsequent decisions. Moreover, few of them consider the uncertainty and noise of dynamic graphs based on the time series datasets, which are ubiquitous in real-world graph structure networks. In this article, we propose a novel dynamic diffusion-variational GNN (DVGNN) for spatiotemporal forecasting. For dynamic graph construction, an unsupervised generative model is devised. Two layers of graph convolutional network (GCN) are applied to calculate the posterior distribution of the latent node embeddings in the encoder stage. Then, a diffusion model is used to infer the dynamic link probability and reconstruct causal graphs (CGs) in the decoder stage adaptively. The new loss function is derived theoretically, and the reparameterization trick is adopted in estimating the probability distribution of the dynamic graphs by evidence lower bound (ELBO) during the backpropagation period. After obtaining the generated graphs, dynamic GCN and temporal attention are applied to predict future states. Experiments are conducted on four real-world datasets of different graph structures in different domains. The results demonstrate that the proposed DVGNN model outperforms state-of-the-art approaches and achieves outstanding root mean square error (RMSE) results while exhibiting higher robustness. Also, by F1-score and probability distribution analysis, we demonstrate that DVGNN better reflects the causal relationship and uncertainty of dynamic graphs. The website of the code is https://github.com/gorgen2020/DVGNN.

Place, publisher, year, edition, pages
Piscataway: IEEE, 2025
Keywords
Diffusion process, graph neural networks (GNNs), spatiotemporal forecasting, variational graph autoencoders (VGAEs)
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-55718 (URN)10.1109/tnnls.2024.3415149 (DOI)001271405600001 ()38980780 (PubMedID)
Funder
VinnovaSwedish Research Council
Available from: 2025-03-31 Created: 2025-03-31 Last updated: 2025-10-01Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2851-4260

Search in DiVA

Show all publications