hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identification
Chinese Academy of Sciences, Shenzhen, China; Sangfor Technologies Inc., Shenzhen, China.
Beihang University, Beijing, China.
Wuhan Textile University, Wuhan, China.
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-2851-4260
Show others and affiliations
2025 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 120, p. 1-11, article id 103011Article in journal (Refereed) Epub ahead of print
Abstract [en]

Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the CAMemra-specific Class Activation Map (CAM2), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the CAM2-guided Vision Transformer, which is termed CAM2Former, with three core designs. First, we develop Fusion of CAMmera-specific Class Activation Map, termed CAM2Fusion, which consists of positive and negative CAM2 that operate in synergy to capture visual patterns representative of the discriminative foreground components. Second, to enhance the representation ability of pivotal foreground components, we introduce a CAM2Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative CAM2. Third, since the enhancement of foreground representations in CAM2Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via CAM2Former. This facilitates the extraction of discriminative foreground representations, circumventing CAM2 dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes. © 2025 Published by Elsevier B.V.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2025. Vol. 120, p. 1-11, article id 103011
Keywords [en]
Fusion-attention mechanism, Camera-specific Class Activation Map, Occluded person ReID
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hh:diva-55721DOI: 10.1016/j.inffus.2025.103011ISI: 001449880400001Scopus ID: 2-s2.0-86000785121OAI: oai:DiVA.org:hh-55721DiVA, id: diva2:1952618
Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-04-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Tiwari, Prayag

Search in DiVA

By author/editor
Tiwari, Prayag
By organisation
School of Information Technology
In the same journal
Information Fusion
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 5 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf