Natural Language Processing (NLP)

Jul 17, 2020

This section introduces some papers from the technology area of Natural Language Processing (NLP) presented at AAAI -20. Of all accepted papers, NLP-related papers account for about 18%, which shows that NLP takes up a substantial portion of AI technology. Comparing the AAAI -20 program with the previous years', it is interesting to note that the number of papers in fields such as Dialogue, Semantics and Summarization, and Q & A has increased, and new technology categories such as Generation and Machine Translation have appeared. The appearance of these two new categories is likely attributed to the facts that deep generative models such as "GAN" and "Flow" are becoming popular, and that Transformer*1 proposed by Google in 2017 made a big break. US restricted visitors from China due to the COVID-19, so many presentations were shifted to video presentation and many poster presentations were cancelled at the conference venue. The presentation in the venue seemed to be 20-30% of the entire presentations.

# of papers by subcategory in the NLP

From here, I would like to introduce some of the papers that I found interesting.

Knowledge-Graph Augmented Word Representations For Named Entity Recognition *2

This paper deals with NER (Named Entity Recognition Entity Representation Extraction). NER is an information extraction task which identifies boundary of various named entities in unstructured text and classifies them into predetermined categories such as locations, time, people names, etc. In recent years, NER often uses techniques such as ELMo*3 and BERT*4 which consider the context of sentences . However, these methods have a problem that the accuracy of NER is significantly reduced for word strings that are not included in the learning data (we call them unknown words).
In order to solve this problem, they proposed Knowledge-Graph Augmented Word Representation (KAWR), a method of learning the expression of word strings including information such as attributes and relationships of the word strings, using knowledge graph information as prior knowledge. Proposals for using knowledge such as knowledge graphs in addition to learning sentences were introduced in several other papers.

Boundary Enhanced Neural Span Classification for Nested Named Entity Recognition

This paper also deals with NER and focuses on the problem that they cannot learn well when the named entities to be extracted have a nested structure. To solve this problem, they proposed a boundary enhanced neural span classification model (BENSC), which incorporates the boundary detection task into the category classification task under a multitask learning framework. I thought it would be interesting to use a joint learning approach in which the internal representations are shared between the modules which are learned separately for the problems that should be solved at the same time such as boundary detection and category classification.

WinoGrande: An Adversarial Winograd Schema Challenge at Scale

This paper proposes a test dataset that measures the "intelligence" of a computer. The Winograd Schema Challenge (WSC) is proposed as an alternative to the Turing Test, and has been used as a benchmark for evaluating commonsense reasoning. This is a kind of reference resolution task, but it has a problem that some of the test sets have spurious biases that lead to an over estimation of the true capabilities of machine commonsense. To tackle this problem, they proposed a new test data set for this challenge. They've collected a large amount of data from cloud workers and created a large data set, and brushed up the data by filtering by their new algorithm which is called AFLITE that can systematically reduce biases using state-of-the-art contextual representation of words. I felt the importance of the method of creating data in machine learning from the fact that this paper won the AAAI-20 Outstanding Paper Award.

Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset

This paper explains a method of creating a dataset for DSTC8 (The Eighth Dialogue System Technology Challenge)*5 proposed by Google and novel dialogue description method. A task-oriented dialogue is a dialogue for achieving a certain purpose through dialogues such as conversations with Google Home and Alexa. For example, “Tell me about tomorrow's weather" and "I want to reserve a restaurant from tomorrow evening." This paper focus on 2 problems related to task-oriented dialogue: They cite 2 challenges: 1) little data is publicized to model these task-oriented dialogue, and only data for typical tasks such as finding a restaurant or booking a flight 2) there are multiple services that provide a single function (For example, restaurant reservations) and need to support such expanding services. So, they create and publicized large datasets of various dialogue tasks and propose the method of task descriptions in natural language to avoid developing new APIs for additional tasks. The point of this paper is method of creating dialogue data. They made dialogue data by using a dialogue simulator and having cloud workers correct strange expressions unlikely for normal conversation. I have come to realize that "How do you create data?" is an extremely important issue from this paper as well.

Summary

The handling of context in sentences has been advanced by the technology of pre-training language models such as BERT/ELMo, and the focus seems to be shifted to the fusion with "knowledge" to treat the words which do not appear in learning sentences. It's interesting there is a paper that reports how much knowledge is embedded in BERT. I also noticed that many papers that deal with the real problems of machine learning in the NLP field were submitted such as data creation and evaluation method. Even in the technology area of dialogue, it seems that practical issues such as task-oriented scaling and response variation are being discussed.

＊1 Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin,” Attention Is All You Need,” NIPS2017.
＊2 [video] AAAI-2020: Knowledge-Graph Augmented Word Representations For Named Entity Recognition
＊3 Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner,Christopher Clark, Kenton Lee, Luke Zettlemoyer,,“Deep contextualized word representations,” NAACL 2018.
＊4 Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova, “ BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” arxiv 2018.
＊5 DSTC8 Official Site

Reporter

Name : Kazumi Aoyama
Education : Computer Science
Current job : AI Technology investigation & Technical development of natural language processing and spoken dialogue