hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Swin-MGNet: Swin Transformer based Multi-view Grouping Network for 3D Object Recognition
Chinese Academy Of Sciences, Beijing, China; University Of Chinese Academy Of Sciences, Beijing, China.ORCID iD: 0000-0001-7897-1673
Chinese Academy Of Sciences, Beijing, China; University Of Chinese Academy Of Sciences, Beijing, China.
Chinese Academy Of Sciences, Beijing, China; University Of Chinese Academy Of Sciences, Beijing, China.
Chinese Academy Of Sciences, Beijing, China; University Of Chinese Academy Of Sciences, Beijing, China.
Show others and affiliations
2025 (English)In: IEEE Transactions on Artificial Intelligence, ISSN 2691-4581, Vol. 6, no 3, p. 747-758Article in journal (Refereed) Published
Abstract [en]

Recent developments in Swin Transformer have shown its great potential in various computer vision tasks, including image classification, semantic segmentation, and object detection. However, it is challenging to achieve desired performance by directly employing the Swin Transformer in multi-view 3D object recognition since the Swin Transformer independently extracts the characteristics of each view and relies heavily on a subsequent fusion strategy to unify the multi-view information. This leads to the problem of the insufficient extraction of interdependencies between the multi-view images. To this end, we propose an aggregation strategy integrated into the Swin Transformer to reinforce the connections between internal features across multiple views, thus leading to a complete interpretation of isolated features extracted by the Swin Transformer. Specifically, we utilize Swin Transformer to learn view-level feature representations from multi-view images and then calculate their view discrimination scores. The scores are employed to assign the view-level features to different groups. Finally, a grouping and fusion network is proposed to aggregate the features from view and group levels. The experimental results indicate that our method attains state-of-the-art performance compared to prior approaches in multi-view 3D object recognition tasks. The source code is available at https://github.com/Qishaohua94/DEST. ©2024 IEEE.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2025. Vol. 6, no 3, p. 747-758
Keywords [en]
3D Object Classification, 3D Object Retrieval, Feature Fusion, Grouping Mechanism, Multi-view learning, Swin Transformer
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hh:diva-54973DOI: 10.1109/TAI.2024.3492163Scopus ID: 2-s2.0-85208686299OAI: oai:DiVA.org:hh-54973DiVA, id: diva2:1915950
Funder
VinnovaSwedish Research Council
Note

This work is supported by the National Natural Science Foundation of China No. 62373343, Beijing Natural Science Foundation No. L233036, Swedish Research Council (VR) and the Swedish Innovation Agency (VINNOVA).

Available from: 2024-11-26 Created: 2024-11-26 Last updated: 2025-10-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Tiwari, PrayagAlonso-Fernandez, Fernando

Search in DiVA

By author/editor
Ning, XinLi, LusiTiwari, PrayagAlonso-Fernandez, Fernando
By organisation
School of Information Technology
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 353 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf