SONY

Bootstrapped Representation Learning for Skeleton-Based Action Recognition

Date
2022
Academic Conference
CVPR L3D-IVU: Workshop on Learning with Limited Labelled Data for Image and Video Understanding
Authors
Olivier Moliner (Sony Europe, B.V.)
Sangxia Huang (Sony Europe, B.V.)
Kalle Åström (Lund University)
Research Areas
AI & Machine Learning

Abstract

In this work, we study self-supervised representation learning for 3D skeleton-based action recognition. We extend Bootstrap Your Own Latent (BYOL) for representation learning on skeleton sequence data and propose a new data augmentation strategy including two asymmetric transformation pipelines. We also introduce a multi-viewpoint sampling method that leverages multiple viewing angles of the same action captured by different cameras. In the semisupervised setting, we show that the performance can be further improved by knowledge distillation from wider networks, leveraging once more the unlabeled samples. We conduct extensive experiments on the NTU-60, NTU-120 and PKU-MMD datasets to demonstrate the performance of our proposed method. Our method consistently outperforms the current state of the art on linear evaluation, semisupervised and transfer learning benchmarks.

このページの先頭へ