hh.sePublications
123 3 of 3
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Semantics-aware Multi-modal Scene Perception for Autonomous Vehicles
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-8067-9521
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous vehicles represent the pinnacle of modern technological innovation, navigating complex and unpredictable environments. To do so effectively, they rely on a sophisticated array of sensors. This thesis explores two of the most crucial sensors: LiDARs, known for their accuracy in generating detailed 3D maps of the environment, and RGB cameras, essential for processing visual cues critical for navigation. Together, these sensors form a comprehensive perception system that enables autonomous vehicles to operate safely and efficiently.

However, the reliability of these vehicles has yet to be tested when key sensors fail. The abrupt failure of a camera, for instance, disrupts the vehicle’s perception system, creating a significant gap in sensory input. This thesis addresses this challenge by introducing a novel multi-modal domain translation framework that integrates LiDAR and RGB camera data while ensuring continuous functionality despite sensor failures. At the core of this framework is an innovative model capable of synthesizing RGB images and their corresponding segment maps from raw LiDAR data by exploiting the scene semantics. The proposed framework stands out as the first of its kind, demonstrating for the first time that the scene semantics can bridge the gap across different domains with distinct data structures, such as unorganized sparse 3D LiDAR point clouds and structured 2D camera data. Thus, this thesis represents a significant leap forward in the field, offering a robust solution to the challenge of RGB data recovery without camera sensors.

The practical application of this model is thoroughly explored in the thesis. It involves testing the model’s capability to generate pseudo point clouds from RGB depth estimates, which, when combined with LiDAR data, create an enriched perception dataset. This enriched dataset is pivotal in enhancing object detection capabilities, a fundamental aspect of autonomous vehicle navigation. The quantitative and qualitative evidence reported in this thesis demonstrates that the synthetic generation of data not only compensates for the loss of sensory input but also considerably improves the performance of object detection systems compared to using raw LiDAR data only.

By addressing the critical issue of sensor failure and presenting viable solutions, this thesis contributes to enhancing the safety, reliability, and efficiency of autonomous vehicles. It paves the way for further research and developiment, setting a new standard for autonomous vehicle technology in scenarios of sensor malfunctions or adverse environmental conditions.

Place, publisher, year, edition, pages
Halmstad: Halmstad University Press, 2024. , p. 40
Series
Halmstad University Dissertations ; 117
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:hh:diva-53115ISBN: 978-91-89587-50-2 (electronic)ISBN: 978-91-89587-51-9 (print)OAI: oai:DiVA.org:hh-53115DiVA, id: diva2:1849702
Public defence
2024-06-13, Wigforss, hus J, Kristian IV:s väg 3, Halmstad, 09:00 (English)
Opponent
Supervisors
Available from: 2024-05-07 Created: 2024-04-08 Last updated: 2024-05-24Bibliographically approved
List of papers
1. SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving
Open this publication in new window or tab >>SalsaNext: Fast, Uncertainty-aware Semantic Segmentation of LiDAR Point Clouds for Autonomous Driving
2021 (English)In: Advances in Visual Computing: 15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020, Proceedings, Part II / [ed] Bebis, G., Yin, Z., Kim, E., Bender, J., Subr, K., Kwon, B.C., Zhao, J., Kalkofen, D., Baciu, G., Cham: Springer, 2021, Vol. 12510, p. 207-222Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we introduce SalsaNext for the uncertainty-aware semantic segmentation of a full 3D LiDAR point cloud in real-time. SalsaNext is the next version of SalsaNet which has an encoder-decoder architecture where the encoder unit has a set of ResNet blocks and the decoder part combines upsampled features from the residual blocks. In contrast to SalsaNet, we introduce a new context module, replace the ResNet encoder blocks with a new residual dilated convolution stack with gradually increasing receptive fields and add the pixel-shuffle layer in the decoder. Additionally, we switch from stride convolution to average pooling and also apply central dropout treatment. To directly optimize the Jaccard index, we further combine the weighted cross entropy loss with Lovász-Softmax loss. We finally inject a Bayesian treatment to compute the epistemic and aleatoric uncertainties for each point in the cloud. We provide a thorough quantitative evaluation on the Semantic-KITTI dataset, which demonstrates that the proposed SalsaNext outperforms other published semantic segmentation networks and achieves 3.6% more accuracy over the previous state-of-the-art method. We also release our source code1. © 2020, Springer Nature Switzerland AG.

[1] https://github.com/TiagoCortinhal/SalsaNext

Place, publisher, year, edition, pages
Cham: Springer, 2021
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 12510
Keywords
Semantic Segmentation, LiDAR Point Clouds, Deep Learning
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-43528 (URN)10.1007/978-3-030-64559-5_16 (DOI)2-s2.0-85098103699 (Scopus ID)978-3-030-64559-5 (ISBN)978-3-030-64558-8 (ISBN)
Conference
15th International Symposium, ISVC 2020, San Diego, CA, USA, October 5–7, 2020
Projects
SHARPEN
Funder
Vinnova
Available from: 2020-11-26 Created: 2020-11-26 Last updated: 2024-05-07Bibliographically approved
2. Semantics-aware Multi-modal Domain Translation: From LiDAR Point Clouds to Panoramic Color Images
Open this publication in new window or tab >>Semantics-aware Multi-modal Domain Translation: From LiDAR Point Clouds to Panoramic Color Images
2021 (English)In: 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Los Alamitos: IEEE Computer Society, 2021, p. 3032-3041Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we present a simple yet effective framework to address the domain translation problem between different sensor modalities with unique data formats. By relying only on the semantics of the scene, our modular generative framework can, for the first time, synthesize a panoramic color image from a given full 3D LiDAR point cloud. The framework starts with semantic segmentation of the point cloud, which is initially projected onto a spherical surface. The same semantic segmentation is applied to the corresponding camera image. Next, our new conditional generative model adversarially learns to translate the predicted LiDAR segment maps to the camera image counterparts. Finally, generated image segments are processed to render the panoramic scene images. We provide a thorough quantitative evaluation on the SemanticKitti dataset and show that our proposed framework outperforms other strong baseline models. Our source code is available at https://github. com/halmstad-University/TITAN-NET. © 2021 IEEE.

Place, publisher, year, edition, pages
Los Alamitos: IEEE Computer Society, 2021
Keywords
Computer vision, Image segmentation, Laser radar, Three-dimensional displays, Image synthesis, Computational modeling, Semantics, Color
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-45850 (URN)10.1109/ICCVW54120.2021.00338 (DOI)000739651103014 ()2-s2.0-85122119016 (Scopus ID)978-1-6654-0191-3 (ISBN)978-1-6654-0192-0 (ISBN)
Conference
2021 IEEE/CVF International Conference on Computer Vision Workshops ICCVW 2021, Montreal, BC, Canada, Virtual, Online, 11-17 October, 2021
Projects
SHARPEN
Funder
Vinnova, 2018-05001
Available from: 2021-11-08 Created: 2021-11-08 Last updated: 2024-05-07Bibliographically approved
3. Depth- and semantics-aware multi-modal domain translation: Generating 3D panoramic color images from LiDAR point clouds
Open this publication in new window or tab >>Depth- and semantics-aware multi-modal domain translation: Generating 3D panoramic color images from LiDAR point clouds
2024 (English)In: Robotics and Autonomous Systems, ISSN 0921-8890, E-ISSN 1872-793X, Vol. 171, p. 1-9, article id 104583Article in journal (Refereed) Published
Abstract [en]

This work presents a new depth-and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7% margin in terms of IoU. © 2023 The Author(s). 

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2024
Keywords
Multi-modal domain translation, Semantic perception, LiDAR
National Category
Computer Vision and Robotics (Autonomous Systems) Robotics
Identifiers
urn:nbn:se:hh:diva-52943 (URN)10.1016/j.robot.2023.104583 (DOI)001125648600001 ()
Funder
European Commission, 10106 9576
Available from: 2024-03-22 Created: 2024-03-22 Last updated: 2024-05-07Bibliographically approved
4. Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection
Open this publication in new window or tab >>Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Although LiDAR sensors are crucial for autonomous systems due to providing precise depth information, they struggle with capturing fine object details, especially at a distance, due to sparse and non-uniform data. Recent advances introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional modalities such as cameras to enhance 3D object detection. We present a novel LiDAR-only framework that augments raw scans with denser pseudo point clouds by solely relying on LiDAR sensors and scene semantics, omitting the need for cameras. Our framework first utilizes a segmentation model to extract scene semantics from raw point clouds, and then employs a multi-modal domain translator to generate synthetic image segments and depth cues without real cameras. This yields a dense pseudo point cloud enriched with semantic information. We also introduce a new semantically guided projection method, which enhances detection performance by retaining only relevant pseudo points. We applied our framework to different advanced 3D object detection methods and reported up to 2.9% performance upgrade. We also obtained comparable results on the KITTI 3D object detection dataset, in contrast to other state-of-the-art LiDAR-only detectors. 

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-53114 (URN)
Conference
35th IEEE Intelligent Vehicles Symposium, Jeju Island, Korea, June 2-5, 2024
Projects
ROADVIEW
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2024-04-08 Created: 2024-04-08 Last updated: 2024-05-07

Open Access in DiVA

fulltext(1242 kB)84 downloads
File information
File name FULLTEXT02.pdfFile size 1242 kBChecksum SHA-512
aae5b71ec0c8a9efb894082056ad9b7f39322ced84c7a70323b337e3fa2ad010f5b8193f7c5d6e5dea328ac7447ea4ba6d1bc598c356a458310fe40232f21a73
Type fulltextMimetype application/pdf

Authority records

Cortinhal, Tiago

Search in DiVA

By author/editor
Cortinhal, Tiago
By organisation
School of Information Technology
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 84 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 642 hits
123 3 of 3
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf