hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-8067-9521
Halmstad University, School of Information Technology.
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-5712-6777
2024 (English)In: IEEE Intelligent Vehicles Symposium, Proceedings, Piscataway: Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 3220-3226Conference paper, Published paper (Refereed)
Abstract [en]

Although LiDAR sensors are crucial for autonomous systems due to providing precise depth information, they struggle with capturing fine object details, especially at a distance, due to sparse and non-uniform data. Recent advances introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional modalities such as cameras to enhance 3D object detection. We present a novel LiDAR-only framework that augments raw scans with denser pseudo point clouds by solely relying on LiDAR sensors and scene semantics, omitting the need for cameras. Our framework first utilizes a segmentation model to extract scene semantics from raw point clouds, and then employs a multi-modal domain translator to generate synthetic image segments and depth cues without real cameras. This yields a dense pseudo point cloud enriched with semantic information. We also introduce a new semantically guided projection method, which enhances detection performance by retaining only relevant pseudo points. We applied our framework to different advanced 3D object detection methods and reported up to 2.9% performance upgrade. We also obtained comparable results on the KITTI 3D object detection test set, in contrast to other state-of-the-art LiDAR-only detectors. © 2024 IEEE.

Place, publisher, year, edition, pages
Piscataway: Institute of Electrical and Electronics Engineers (IEEE), 2024. p. 3220-3226
Series
IEEE Intelligent Vehicles Symposium, ISSN 1931-0587, E-ISSN 2642-7214
Keywords [en]
Point cloud compression, Solid modeling, Three-dimensional displays, Laser radar, Semantics, Pipelines, Object detection
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:hh:diva-53114DOI: 10.1109/IV55156.2024.10588782ISI: 001275100903062Scopus ID: 2-s2.0-85199793095OAI: oai:DiVA.org:hh-53114DiVA, id: diva2:1849660
Conference
35th IEEE Intelligent Vehicles Symposium, IV 2024, Jeju Island, South Korea, 2-5 June, 2024
Part of project
ROADVIEW - Robust Automated Driving in Extreme Weather, European Commission
Funder
EU, Horizon Europe, 101069576
Note

Som manuscript i avhandling/As manuscript in thesis.

Available from: 2024-04-08 Created: 2024-04-08 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Semantics-aware Multi-modal Scene Perception for Autonomous Vehicles
Open this publication in new window or tab >>Semantics-aware Multi-modal Scene Perception for Autonomous Vehicles
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous vehicles represent the pinnacle of modern technological innovation, navigating complex and unpredictable environments. To do so effectively, they rely on a sophisticated array of sensors. This thesis explores two of the most crucial sensors: LiDARs, known for their accuracy in generating detailed 3D maps of the environment, and RGB cameras, essential for processing visual cues critical for navigation. Together, these sensors form a comprehensive perception system that enables autonomous vehicles to operate safely and efficiently.

However, the reliability of these vehicles has yet to be tested when key sensors fail. The abrupt failure of a camera, for instance, disrupts the vehicle’s perception system, creating a significant gap in sensory input. This thesis addresses this challenge by introducing a novel multi-modal domain translation framework that integrates LiDAR and RGB camera data while ensuring continuous functionality despite sensor failures. At the core of this framework is an innovative model capable of synthesizing RGB images and their corresponding segment maps from raw LiDAR data by exploiting the scene semantics. The proposed framework stands out as the first of its kind, demonstrating for the first time that the scene semantics can bridge the gap across different domains with distinct data structures, such as unorganized sparse 3D LiDAR point clouds and structured 2D camera data. Thus, this thesis represents a significant leap forward in the field, offering a robust solution to the challenge of RGB data recovery without camera sensors.

The practical application of this model is thoroughly explored in the thesis. It involves testing the model’s capability to generate pseudo point clouds from RGB depth estimates, which, when combined with LiDAR data, create an enriched perception dataset. This enriched dataset is pivotal in enhancing object detection capabilities, a fundamental aspect of autonomous vehicle navigation. The quantitative and qualitative evidence reported in this thesis demonstrates that the synthetic generation of data not only compensates for the loss of sensory input but also considerably improves the performance of object detection systems compared to using raw LiDAR data only.

By addressing the critical issue of sensor failure and presenting viable solutions, this thesis contributes to enhancing the safety, reliability, and efficiency of autonomous vehicles. It paves the way for further research and developiment, setting a new standard for autonomous vehicle technology in scenarios of sensor malfunctions or adverse environmental conditions.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2024. p. 40
Series
Halmstad University Dissertations ; 117
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:hh:diva-53115 (URN)978-91-89587-50-2 (ISBN)978-91-89587-51-9 (ISBN)
Public defence
2024-06-13, Wigforss, hus J, Kristian IV:s väg 3, Halmstad, 09:00 (English)
Opponent
Supervisors
Available from: 2024-05-07 Created: 2024-04-08 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Cortinhal, TiagoGouigah, IdrissAksoy, Eren

Search in DiVA

By author/editor
Cortinhal, TiagoGouigah, IdrissAksoy, Eren
By organisation
School of Information Technology
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 136 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf