hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Looking Clearer with Text: A Hierarchical Context Blending Network for Occluded Person Re-Identification
College of Computing and Data Science, Singapore, Singapore.ORCID iD: 0000-0002-4056-4922
Shanghai University of Finance and Economics, Shanghai, China.
College of Computing and Data Science, Singapore, Singapore.
College of Computing and Data Science, Singapore, Singapore.
Show others and affiliations
2025 (English)In: IEEE Transactions on Information Forensics and Security, ISSN 1556-6013, E-ISSN 1556-6021, Vol. 20, p. 4296-4307Article in journal (Refereed) Published
Abstract [en]

Existing occluded person re-identification (re-ID) methods mainly learn limited visual information for occluded pedestrians from images. However, textual information, which can describe various human appearance attributes, is rarely fully utilized in the task. To address this issue, we propose a Text-guided Hierarchical Context Blending Network ( THCB-Net) for occluded person re-ID. Specifically, at the data level, informative multi-modal inputs are first generated to make full use of the auxiliary role of textual information and make image data have a strong inductive bias for occluded environments. At the feature expression level, we design a novel Hierarchical Context Blending (HCB) module that can adaptively integrate shallow appearance features obtained by CNNs and multi-scale semantic features from visual transformer encoder. At the model optimization level, a Multi-modal Feature Interaction (MFI) module is proposed to learn the multi-modal information of pedestrians from texts and images, then guide the visual transformer encoder and HCB module to further learn discriminative identity information for occluded pedestrians through Image-Multimodal Contrastive (IMC) learning. Extensive experiments on standard occluded person re-ID benchmarks demonstrate that the proposed THCB-Net outperforms state-of-the-art methods. © 2025 IEEE. 

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2025. Vol. 20, p. 4296-4307
Keywords [en]
Contrastive Learning, Feature Blending, Multi-modal Learning, Occluded Person Re-identification, Transformer
National Category
Computer graphics and computer vision Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:hh:diva-55929DOI: 10.1109/TIFS.2025.3558586Scopus ID: 2-s2.0-105002302035OAI: oai:DiVA.org:hh-55929DiVA, id: diva2:1955447
Available from: 2025-04-30 Created: 2025-04-30 Last updated: 2025-04-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Tiwari, Prayag

Search in DiVA

By author/editor
Wang, ChangshuoTiwari, Prayag
By organisation
School of Information Technology
In the same journal
IEEE Transactions on Information Forensics and Security
Computer graphics and computer visionComputer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 4 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf