CAM2Former: Fusion of Camera-specific Class Activation Map matters for occluded person re-identificationShow others and affiliations
2025 (English)In: Information Fusion, ISSN 1566-2535, E-ISSN 1872-6305, Vol. 120, p. 1-11, article id 103011Article in journal (Refereed) Epub ahead of print
Abstract [en]
Occluded person re-identification (ReID) is challenging since persons are frequently perturbed by various occlusions. Existing mainstream schemes prioritize the alignment of fine-grained body parts by error-prone computation-intensive information, which might come with high estimation error and much computation. To this end, we present the CAMemra-specific Class Activation Map (CAM2), designed to identify critical foreground components with interpretability and computational efficiency. Expanding on this foundation, we launched the CAM2-guided Vision Transformer, which is termed CAM2Former, with three core designs. First, we develop Fusion of CAMmera-specific Class Activation Map, termed CAM2Fusion, which consists of positive and negative CAM2 that operate in synergy to capture visual patterns representative of the discriminative foreground components. Second, to enhance the representation ability of pivotal foreground components, we introduce a CAM2Fusion-attention mechanism. This strategy imposes sparse attention weights on identity-agnostic interference discerned by positive and negative CAM2. Third, since the enhancement of foreground representations in CAM2Former depends on camera-specific classifiers, which are not available during inference, we introduce a consistent learning scheme. This design ensures that representations derived from vanilla ViT align consistently with those obtained via CAM2Former. This facilitates the extraction of discriminative foreground representations, circumventing CAM2 dependencies during inference without additional complexity. Extensive experimental results demonstrate that the proposed method achieves state-of-the-art performance on two occluded datasets (Occluded-Duke and Occluded-REID) and two holistic datasets (Market1501 and MSMT17), achieving an R1 of 74.4% and a mAP of 64.8% on Occluded-Dukes. © 2025 Published by Elsevier B.V.
Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2025. Vol. 120, p. 1-11, article id 103011
Keywords [en]
Fusion-attention mechanism, Camera-specific Class Activation Map, Occluded person ReID
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hh:diva-55721DOI: 10.1016/j.inffus.2025.103011ISI: 001449880400001Scopus ID: 2-s2.0-86000785121OAI: oai:DiVA.org:hh-55721DiVA, id: diva2:1952618
2025-04-162025-04-162025-04-16Bibliographically approved