SONY

Spatial Cross-Attention RGB-D Fusion Module for Object Detection

Date
2021
Academic Conference
IEEE Multimedia Signal Processing
Authors
Shangyin Gao(Sony Europe, B.V.)
Lev Markhasin
Bi Wang(Sony Europe, B.V.)
Research Areas
AI & Machine Learning

Abstract

We investigate different RGB and depth fusion techniques for object detection with the aim to improve the detection accuracy compared to RGB-only systems. We consider recent proposal-free convolutional object detectors which we modify for RGB-D data. We introduce a third mixed branch in our network beside the RGB and depth branches and define a novel attention mechanism which extracts weighted features from the depth branch and applies them to the RGB feature map thus fusing the branches adaptively. Our method, which we call spatial Cross-Attention Fusion network or CAF-Net yields a state-of-theart mean average precision of 60.3% on the SUN RGB-D dataset outperforming all previous techniques by a significant margin.

このページの先頭へ