hh.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 20) Show all publications
Cortinhal, T. & Aksoy, E. (2024). Depth- and semantics-aware multi-modal domain translation: Generating 3D panoramic color images from LiDAR point clouds. Robotics and Autonomous Systems, 171, 1-9, Article ID 104583.
Open this publication in new window or tab >>Depth- and semantics-aware multi-modal domain translation: Generating 3D panoramic color images from LiDAR point clouds
2024 (English)In: Robotics and Autonomous Systems, ISSN 0921-8890, E-ISSN 1872-793X, Vol. 171, p. 1-9, article id 104583Article in journal (Refereed) Published
Abstract [en]

This work presents a new depth-and semantics-aware conditional generative model, named TITAN-Next, for cross-domain image-to-image translation in a multi-modal setup between LiDAR and camera sensors. The proposed model leverages scene semantics as a mid-level representation and is able to translate raw LiDAR point clouds to RGB-D camera images by solely relying on semantic scene segments. We claim that this is the first framework of its kind and it has practical applications in autonomous vehicles such as providing a fail-safe mechanism and augmenting available data in the target image domain. The proposed model is evaluated on the large-scale and challenging Semantic-KITTI dataset, and experimental findings show that it considerably outperforms the original TITAN-Net and other strong baselines by 23.7% margin in terms of IoU. © 2023 The Author(s). 

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2024
Keywords
Multi-modal domain translation, Semantic perception, LiDAR
National Category
Computer Vision and Robotics (Autonomous Systems) Robotics
Identifiers
urn:nbn:se:hh:diva-52943 (URN)10.1016/j.robot.2023.104583 (DOI)001125648600001 ()
Funder
European Commission, 10106 9576
Available from: 2024-03-22 Created: 2024-03-22 Last updated: 2024-05-07Bibliographically approved
Inceoglu, A., Aksoy, E. & Sariel, S. (2024). Multimodal Detection and Classification of Robot Manipulation Failures. IEEE Robotics and Automation Letters, 9(2), 1396-1403
Open this publication in new window or tab >>Multimodal Detection and Classification of Robot Manipulation Failures
2024 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, no 2, p. 1396-1403Article in journal (Refereed) Published
Abstract [en]

An autonomous service robot should be able to interact with its environment safely and robustly without requiring human assistance. Unstructured environments are challenging for robots since the exact prediction of outcomes is not always possible. Even when the robot behaviors are well-designed, the unpredictable nature of the physical robot-object interaction may lead to failures in object manipulation. In this letter, we focus on detecting and classifying both manipulation and post-manipulation phase failures using the same exteroception setup. We cover a diverse set of failure types for primary tabletop manipulation actions. In order to detect these failures, we propose FINO-Net (Inceoglu et al., 2021), a deep multimodal sensor fusion-based classifier network architecture. FINO-Net accurately detects and classifies failures from raw sensory data without any additional information on task description and scene state. In this work, we use our extended FAILURE dataset (Inceoglu et al., 2021) with 99 new multimodal manipulation recordings and annotate them with their corresponding failure types. FINO-Net achieves 0.87 failure detection and 0.80 failure classification F1 scores. Experimental results show that FINO-Net is also appropriate for real-time use. © 2016 IEEE.

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2024
Keywords
Robot sensing systems, Robots, Task analysis, Monitoring, Hidden Markov models, Collision avoidance, Real-time systems, Deep learning methods, data sets for robot learning, failure detection and recovery, sensor fusion
National Category
Robotics Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-52942 (URN)10.1109/lra.2023.3346270 (DOI)001136735400012 ()2-s2.0-85181561810 (Scopus ID)
Note

Funding: The Scientific and Technological Research Council of Türkiye under Grant 119E-436.

Available from: 2024-03-22 Created: 2024-03-22 Last updated: 2024-03-28Bibliographically approved
Tzelepis, G., Aksoy, E., Borras, J. & Alenyà, G. (2024). Semantic State Estimation in Robot Cloth Manipulations Using Domain Adaptation from Human Demonstrations. In: Petia Radeva; Antonino Furnari; Kadi Bouatouch; A. Augusto Sousa (Ed.), Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP. Paper presented at 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2024, Rome, Italy, 27-29 February, 2024 (pp. 172-182). Setúbal: SciTePress, 4
Open this publication in new window or tab >>Semantic State Estimation in Robot Cloth Manipulations Using Domain Adaptation from Human Demonstrations
2024 (English)In: Proceedings of the 19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications - Volume 4: VISAPP / [ed] Petia Radeva; Antonino Furnari; Kadi Bouatouch; A. Augusto Sousa, Setúbal: SciTePress, 2024, Vol. 4, p. 172-182Conference paper, Published paper (Refereed)
Abstract [en]

Deformable object manipulations, such as those involving textiles, present a significant challenge due to their high dimensionality and complexity. In this paper, we propose a solution for estimating semantic states in cloth manipulation tasks. To this end, we introduce a new, large-scale, fully-annotated RGB image dataset of semantic states featuring a diverse range of human demonstrations of various complex cloth manipulations. This effectively transforms the problem of action recognition into a classification task. We then evaluate the generalizability of our approach by employing domain adaptation techniques to transfer knowledge from human demonstrations to two distinct robotic platforms: Kinova and UR robots. Additionally, we further improve performance by utilizing a semantic state graph learned from human manipulation data. © 2024 by SCITEPRESS – Science and Technology Publications, Lda.

Place, publisher, year, edition, pages
Setúbal: SciTePress, 2024
Series
VISIGRAPP, E-ISSN 2184-4321
Keywords
Cloth, Domain Adaptation, Garment Manipulation, Robotic Perception, Semantics, Transfer Learning
National Category
Robotics
Identifiers
urn:nbn:se:hh:diva-53273 (URN)10.5220/0012368200003660 (DOI)2-s2.0-85190696583 (Scopus ID)978-989-758-679-8 (ISBN)
Conference
19th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, VISIGRAPP 2024, Rome, Italy, 27-29 February, 2024
Projects
CHLOE-GRAPHCO-HERENTCLOTHILDE
Funder
EU, Horizon 2020, ERC–2016–ADG–741930
Note

Funding: The Spanish State Research Agency through the project CHLOE-GRAPH (PID2020-118649RB-l00); by MCIN/ AEI/10.13039/501100011033 and by the ”European Union (EU) NextGenerationEU/PRTR under the project CO-HERENT (PCI2020-120718-2); and the EU H2020 Programme under grant agreement ERC–2016–ADG–741930 (CLOTHILDE).

Available from: 2024-05-31 Created: 2024-05-31 Last updated: 2024-05-31Bibliographically approved
Cortinhal, T., Gouigah, I. & Aksoy, E. (2024). Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection. In: : . Paper presented at 35th IEEE Intelligent Vehicles Symposium, Jeju Island, Korea, June 2-5, 2024.
Open this publication in new window or tab >>Semantics-aware LiDAR-Only Pseudo Point Cloud Generation for 3D Object Detection
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Although LiDAR sensors are crucial for autonomous systems due to providing precise depth information, they struggle with capturing fine object details, especially at a distance, due to sparse and non-uniform data. Recent advances introduced pseudo-LiDAR, i.e., synthetic dense point clouds, using additional modalities such as cameras to enhance 3D object detection. We present a novel LiDAR-only framework that augments raw scans with denser pseudo point clouds by solely relying on LiDAR sensors and scene semantics, omitting the need for cameras. Our framework first utilizes a segmentation model to extract scene semantics from raw point clouds, and then employs a multi-modal domain translator to generate synthetic image segments and depth cues without real cameras. This yields a dense pseudo point cloud enriched with semantic information. We also introduce a new semantically guided projection method, which enhances detection performance by retaining only relevant pseudo points. We applied our framework to different advanced 3D object detection methods and reported up to 2.9% performance upgrade. We also obtained comparable results on the KITTI 3D object detection dataset, in contrast to other state-of-the-art LiDAR-only detectors. 

National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:hh:diva-53114 (URN)
Conference
35th IEEE Intelligent Vehicles Symposium, Jeju Island, Korea, June 2-5, 2024
Projects
ROADVIEW
Note

Som manuscript i avhandling/As manuscript in thesis

Available from: 2024-04-08 Created: 2024-04-08 Last updated: 2024-05-07
Rosberg, F., Aksoy, E., Alonso-Fernandez, F. & Englund, C. (2023). FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping. In: Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023: . Paper presented at 23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, Hawaii, USA, 3-7 January 2023 (pp. 3443-3452). Piscataway: IEEE
Open this publication in new window or tab >>FaceDancer: Pose- and Occlusion-Aware High Fidelity Face Swapping
2023 (English)In: Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023, Piscataway: IEEE, 2023, p. 3443-3452Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we present a new single-stage method for subject agnostic face swapping and identity transfer, named FaceDancer. We have two major contributions: Adaptive Feature Fusion Attention (AFFA) and Interpreted Feature Similarity Regularization (IFSR). The AFFA module is embedded in the decoder and adaptively learns to fuse attribute features and features conditioned on identity information without requiring any additional facial segmentation process. In IFSR, we leverage the intermediate features in an identity encoder to preserve important attributes such as head pose, facial expression, lighting, and occlusion in the target face, while still transferring the identity of the source face with high fidelity. We conduct extensive quantitative and qualitative experiments on various datasets and show that the proposed FaceDancer outperforms other state-of-the-art networks in terms of identityn transfer, while having significantly better pose preservation than most of the previous methods. © 2023 IEEE.

Place, publisher, year, edition, pages
Piscataway: IEEE, 2023
Keywords
Algorithms, Biometrics, and algorithms (including transfer, low-shot, semi-, self-, and un-supervised learning), body pose, face, formulations, gesture, Machine learning architectures
National Category
Signal Processing
Identifiers
urn:nbn:se:hh:diva-48618 (URN)10.1109/WACV56688.2023.00345 (DOI)000971500203054 ()2-s2.0-85149000603 (Scopus ID)9781665493468 (ISBN)
Conference
23rd IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2023, Waikoloa, Hawaii, USA, 3-7 January 2023
Available from: 2022-11-15 Created: 2022-11-15 Last updated: 2024-03-18Bibliographically approved
Rosberg, F., Aksoy, E., Englund, C. & Alonso-Fernandez, F. (2023). FIVA: Facial Image and Video Anonymization and Anonymization Defense. In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW): . Paper presented at 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Paris, France, 2-6 October, 2023 (pp. 362-371). Los Alamitos, CA: IEEE
Open this publication in new window or tab >>FIVA: Facial Image and Video Anonymization and Anonymization Defense
2023 (English)In: 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Los Alamitos, CA: IEEE, 2023, p. 362-371Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a new approach for facial anonymization in images and videos, abbreviated as FIVA. Our proposed method is able to maintain the same face anonymization consistently over frames with our suggested identity-tracking and guarantees a strong difference from the original face. FIVA allows for 0 true positives for a false acceptance rate of 0.001. Our work considers the important security issue of reconstruction attacks and investigates adversarial noise, uniform noise, and parameter noise to disrupt reconstruction attacks. In this regard, we apply different defense and protection methods against these privacy threats to demonstrate the scalability of FIVA. On top of this, we also show that reconstruction attack models can be used for detection of deep fakes. Last but not least, we provide experimental results showing how FIVA can even enable face swapping, which is purely trained on a single target image. © 2023 IEEE.

Place, publisher, year, edition, pages
Los Alamitos, CA: IEEE, 2023
Series
IEEE International Conference on Computer Vision Workshops, E-ISSN 2473-9944
Keywords
Anonymization, Deep Fakes, Facial Recognition, Identity Tracking, Reconstruction Attacks
National Category
Computer Sciences
Identifiers
urn:nbn:se:hh:diva-52592 (URN)10.1109/ICCVW60793.2023.00043 (DOI)2-s2.0-85182917356 (Scopus ID)9798350307443 (ISBN)
Conference
2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW 2023), Paris, France, 2-6 October, 2023
Available from: 2024-02-08 Created: 2024-02-08 Last updated: 2024-03-18Bibliographically approved
Ak, A. C., Aksoy, E. & Sariel, S. (2023). Learning Failure Prevention Skills for Safe Robot Manipulation. IEEE Robotics and Automation Letters, 8(12), 7994-8001
Open this publication in new window or tab >>Learning Failure Prevention Skills for Safe Robot Manipulation
2023 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 8, no 12, p. 7994-8001Article in journal (Refereed) Published
Abstract [en]

Robots are more capable of achieving manipulation tasks for everyday activities than before. However, the safety of manipulation skills that robots employ is still an open problem. Considering all possible failures during skill learning increases the complexity of the process and restrains learning an optimal policy. Nonetheless, safety-focused modularity in the acquisition of skills has not been adequately addressed in previous works. For that purpose, we reformulate skills as base and failure prevention skills, where base skills aim at completing tasks and failure prevention skills aim at reducing the risk of failures to occur. Then, we propose a modular and hierarchical method for safe robot manipulation by augmenting base skills by learning failure prevention skills with reinforcement learning and forming a skill library to address different safety risks. Furthermore, a skill selection policy that considers estimated risks is used for the robot to select the best control policy for safe manipulation. Our experiments show that the proposed method achieves the given goal while ensuring safety by preventing failures. We also show that with the proposed method, skill learning is feasible and our safe manipulation tools can be transferred to the real environment © 2023 IEEE

Place, publisher, year, edition, pages
Piscataway, NJ: IEEE, 2023
Keywords
Estimation, Failure Prevention, Libraries, Mathematical models, Reinforcement Learning, Reinforcement learning, Robot Safety, Robots, Robust/Adaptive Control, Safe Robot Manipulation, Safety, Task analysis
National Category
Control Engineering
Identifiers
urn:nbn:se:hh:diva-51939 (URN)10.1109/LRA.2023.3324587 (DOI)001089241600003 ()2-s2.0-85174857924 (Scopus ID)
Note

This work was supported by the Scientific and Technological Research Council of Turkey (TUBITAK) under Grant 119E-436.

Available from: 2023-11-16 Created: 2023-11-16 Last updated: 2024-01-17Bibliographically approved
Rezk, N., Nordström, T., Stathis, D., Ul-Abdin, Z., Aksoy, E. & Hemani, A. (2022). MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks. Journal of systems architecture, 133, Article ID 102778.
Open this publication in new window or tab >>MOHAQ: Multi-Objective Hardware-Aware Quantization of recurrent neural networks
Show others...
2022 (English)In: Journal of systems architecture, ISSN 1383-7621, E-ISSN 1873-6165, Vol. 133, article id 102778Article in journal (Refereed) Published
Abstract [en]

The compression of deep learning models is of fundamental importance in deploying such models to edge devices. The selection of compression parameters can be automated to meet changes in the hardware platform and application. This article introduces a Multi-Objective Hardware-Aware Quantization (MOHAQ) method, which considers hardware performance and inference error as objectives for mixed-precision quantization. The proposed method feasibly evaluates candidate solutions in a large search space by relying on two steps. First, post-training quantization is applied for fast solution evaluation (inference-only search). Second, we propose the ”beacon-based search” to retrain selected solutions only and use them as beacons to estimate the effect of retraining on other solutions. We use speech recognition models on TIMIT dataset. Experimental evaluations show that Simple Recurrent Unit (SRU)-based models can be compressed up to 8x by post-training quantization without any significant error increase. On SiLago, we found solutions that achieve 97% and 86% of the maximum possible speedup and energy saving, with a minor increase in error on an SRU-based model. On Bitfusion, the beacon-based search reduced the error gain of the inference-only search on SRU-based models and Light Gated Recurrent Unit (LiGRU)-based model by up to 4.9 and 3.9 percentage points, respectively.

Place, publisher, year, edition, pages
Amsterdam: Elsevier, 2022
Keywords
Simple recurrent unit, Light gated recurrent unit, Quantization, Multi-objective optimization, Genetic algorithms
National Category
Embedded Systems
Identifiers
urn:nbn:se:hh:diva-48679 (URN)10.1016/j.sysarc.2022.102778 (DOI)000892114100006 ()2-s2.0-85141919627 (Scopus ID)
Available from: 2022-11-23 Created: 2022-11-23 Last updated: 2023-08-21Bibliographically approved
Nowaczyk, S., Resmini, A., Long, V., Fors, V., Cooney, M., Duarte, E. K., . . . Dougherty, M. (2022). Smaller is smarter: A case for small to medium-sized smart cities. Journal of Smart Cities and Society, 1(2), 95-117
Open this publication in new window or tab >>Smaller is smarter: A case for small to medium-sized smart cities
Show others...
2022 (English)In: Journal of Smart Cities and Society, ISSN 2772-3577, Vol. 1, no 2, p. 95-117Article in journal (Refereed) Published
Abstract [en]

Smart Cities have been around as a concept for quite some time. However, most examples of Smart Cities (SCs) originate from megacities (MCs), despite the fact that most people live in Small and Medium-sized Cities (SMCs). This paper addresses the contextual setting for smart cities from the perspective of such small and medium-sized cities. It starts with an overview of the current trends in the research and development of SCs, highlighting the current bias and the challenges it brings. We follow with a few concrete examples of projects which introduced some form of “smartness” in the small and medium cities context, explaining what influence said context had and what specific effects did it lead to. Building on those experiences, we summarise the current understanding of Smart Cities, with a focus on its multi-faceted (e.g., smart economy, smart people, smart governance, smart mobility, smart environment and smart living) nature; we describe mainstream publications and highlight the bias towards large and very large cities (sometimes even subconscious); give examples of (often implicit) assumptions deriving from this bias; finally, we define the need of contextualising SCs also for small and medium-sized cities. The aim of this paper is to establish and strengthen the discourse on the need for SMCs perspective in Smart Cities literature. We hope to provide an initial formulation of the problem, mainly focusing on the unique needs and the specific requirements. We expect that the three example cases describing the effects of applying new solutions and studying SC on small and medium-sized cities, together with the lessons learnt from these experiences, will encourage more research to consider SMCs perspective. To this end, the current paper aims to justify the need for this under-studied perspective, as well as to propose interesting challenges faced by SMCs that can serve as initial directions of such research.

Place, publisher, year, edition, pages
Amsterdam: IOS Press, 2022
Keywords
Smart cities, small- and medium-sized cities
National Category
Information Systems, Social aspects
Research subject
Smart Cities and Communities
Identifiers
urn:nbn:se:hh:diva-47260 (URN)10.3233/scs-210116 (DOI)
Funder
VinnovaKnowledge Foundation
Available from: 2022-06-21 Created: 2022-06-21 Last updated: 2022-09-06Bibliographically approved
Akyol, G., Sariel, S. & Aksoy, E. E. (2021). A Variational Graph Autoencoder for Manipulation Action Recognition and Prediction. In: : . Paper presented at 2021 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia (Virtual Event), December 7-10, 2021 (pp. 968-973). IEEE
Open this publication in new window or tab >>A Variational Graph Autoencoder for Manipulation Action Recognition and Prediction
2021 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Despite decades of research, understanding human manipulation activities is, and has always been, one of the most attractive and challenging research topics in computer vision and robotics. Recognition and prediction of observed human manipulation actions have their roots in the applications related to, for instance, human-robot interaction and robot learning from demonstration. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. These networks, however, come with immense computational complexity to be able to process high dimensional raw data.

Different from the related works, we here introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs, instead of relying on the structured Euclidean data. Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs. The input of the proposed network is a set of semantic graphs which store the spatial relations between subjects and objects in the scene. The network output is a label set representing the detected and predicted class types. We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance. We also release our source code https://github.com/gamzeakyol/GNet.

Place, publisher, year, edition, pages
IEEE, 2021
National Category
Robotics
Identifiers
urn:nbn:se:hh:diva-46345 (URN)10.1109/ICAR53236.2021.9659385 (DOI)000766318900146 ()2-s2.0-85124687976 (Scopus ID)9781665436847 (ISBN)
Conference
2021 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia (Virtual Event), December 7-10, 2021
Note

Funding: The Scientific and Technological Research Council of Turkey (TUBITAK), Grant No. 119E-436

Available from: 2022-02-14 Created: 2022-02-14 Last updated: 2023-10-05Bibliographically approved
Projects
ROADVIEW - Robust Automated Driving in Extreme Weather; Halmstad University
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5712-6777

Search in DiVA

Show all publications