R&D Activities

Making Punctuation Restoration Robust and Fast with Multi-Task Learning and Knowledge Distillation

Date: 2021
Academic Conference: IEEE International Conference on Acoustics, Speech and Signal Processing
Authors: Michael Hentschel(Sony Corporation); Emiru Tsunoo(Sony Corporation); Takao Okuda(Sony Corporation)
Research Areas: AI & Machine Learning

Abstract

In punctuation restoration, we try to recover the missing punctuation from automatic speech recognition output to improve understandability. Currently, large pre-trained transformers such as BERT set the benchmark on this task but there are two main drawbacks to these models. First, the pre-training data does not match the output data from speech recognition that contains errors. Second, the large number of model parameters increases inference time. To address the former, we use a multi-task learning framework with ELECTRA, a recently proposed improvement on BERT, that has a generator-discriminator structure. The generator allows us to inject errors into the training data and, as our experiments show, this improves robustness against speech recognition errors during inference. To address the latter, we investigate knowledge distillation and parameter pruning of ELECTRA. In our experiments on the IWSLT 2012 benchmark data, a model with less than 11% the size of BERT achieved better performance while having an 82% faster inference time.