hh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Variational Graph Autoencoder for Manipulation Action Recognition and Prediction
Artificial Intelligence and Robotics Laboratory, Faculty of Computer and Informatics Engineering, Istanbul Technical University, Maslak, Turkey.
Artificial Intelligence and Robotics Laboratory, Faculty of Computer and Informatics Engineering, Istanbul Technical University, Maslak, Turkey.
Halmstad University, School of Information Technology.ORCID iD: 0000-0002-5712-6777
2021 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Despite decades of research, understanding human manipulation activities is, and has always been, one of the most attractive and challenging research topics in computer vision and robotics. Recognition and prediction of observed human manipulation actions have their roots in the applications related to, for instance, human-robot interaction and robot learning from demonstration. The current research trend heavily relies on advanced convolutional neural networks to process the structured Euclidean data, such as RGB camera images. These networks, however, come with immense computational complexity to be able to process high dimensional raw data.

Different from the related works, we here introduce a deep graph autoencoder to jointly learn recognition and prediction of manipulation tasks from symbolic scene graphs, instead of relying on the structured Euclidean data. Our network has a variational autoencoder structure with two branches: one for identifying the input graph type and one for predicting the future graphs. The input of the proposed network is a set of semantic graphs which store the spatial relations between subjects and objects in the scene. The network output is a label set representing the detected and predicted class types. We benchmark our new model against different state-of-the-art methods on two different datasets, MANIAC and MSRC-9, and show that our proposed model can achieve better performance. We also release our source code https://github.com/gamzeakyol/GNet.

Place, publisher, year, edition, pages
IEEE, 2021. p. 968-973
National Category
Robotics and automation
Identifiers
URN: urn:nbn:se:hh:diva-46345DOI: 10.1109/ICAR53236.2021.9659385ISI: 000766318900146Scopus ID: 2-s2.0-85124687976ISBN: 9781665436847 (print)OAI: oai:DiVA.org:hh-46345DiVA, id: diva2:1637404
Conference
2021 20th International Conference on Advanced Robotics (ICAR), Ljubljana, Slovenia (Virtual Event), December 7-10, 2021
Note

Funding: The Scientific and Technological Research Council of Turkey (TUBITAK), Grant No. 119E-436

Available from: 2022-02-14 Created: 2022-02-14 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopusFull text at arXiv

Authority records

Aksoy, Eren Erdal

Search in DiVA

By author/editor
Aksoy, Eren Erdal
By organisation
School of Information Technology
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 301 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf