Learning to speak and hear through multi-agent communication over a continuous acoustic channel

Date
2023-03
Journal Title
Journal ISSN
Volume Title
Publisher
Stellenbosch : Stellenbosch University
Abstract
ENGLISH ABSTRACT: Human infants acquire language in large part through continuous signalling with their caregivers. By interacting and communicating with their caregivers, infants can observe the consequences of their communicative attempts (e.g. through parental response) that may guide the process of language acquisition. We find many similarities between human language acquisition and the intuition of intrinsic motivation which serves as a basis of reinforcement learning. In contrast, current trends in natural language processing disregard this, instead focusing on having larger models and more data to learn the statistical relationships between words with none of the original goals of language in mind. Multi-agent reinforcement learning has proven effective for investigating emergent communication between social agents. Most of these studies, however, focus on communication with discrete symbols. Humans learn language over a continuous channel and language has evolved through gestures and spoken communication, both of which are inherently continuous. This channel is also time-varying: interactions take place in unique settings with different channel acoustics and types of noise. These intricacies are lost when agents communicate directly with purely discrete symbols. We therefore ask: are we able to observe emergent language between agents with a continuous communication channel? And if so, how does learned continuous communication differ from discrete communication? Our objective is to provide a platform to study emergent continuous signalling in order to see how it relates to human language acquisition and evolution. We propose a messaging environment where a Speaker agent needs to convey a set of attributes to a Listener over a noisy acoustic channel. This thesis makes two core contributions. Firstly, in contrast to recent studies on language emergence, we train our agents with deep Q-learning rather than REINFORCE. When using DQN, we show significant performance gains and improved compositionality. Secondly, we provide a platform to study spoken emergent language between agents. To showcase this, we compare discrete and acoustic emergent languages. We show that, unlike the discrete case, the acoustic Speaker learns redundancy to improve Listener coherency when longer sequences are allowed. We also find that the acoustic Speaker develops more compositional communication protocols which implicitly compensates for transmission errors over the noisy channel. In addition, we show early experiments with promising results in language grounding (to English) and effective generalisation to real-world communication channels.
AFRIKAANS OPSOMMING: Menslike babas verwerf taal grootliks deur voortdurende seine met hul versorgers. Deur interaksie en kommunikasie met hul versorgers, kan babas die gevolge van hul kommunikatiewe pogings waarneem (bv. deur ouerlike reaksie) wat die proses van taalverwerwing kan lei. Ons vind baie ooreenkomste tussen menslike taalverwerwing en die intu¨ısie van intrinsieke motivering wat as basis van versterkende leer dien. Hierteenoor ignoreer huidige neigings in natuurlike taalverwerking dit, maar fokus eerder daarop om groter modelle en meer data te hˆe om die statistiese verwantskappe tussen woorde te leer met geen van die oorspronklike doelwitte van taal in gedagte nie. Multi-agent versterking leer het bewys effektief vir die ondersoek van ontluikende kommunikasie tussen agente. Die meeste van hierdie studies fokus egter op kommunikasie met diskrete simbole. Mense leer taal oor ’n deurlopende kanaal en taal het ontwikkel deur gebare en gesproke kommunikasie, wat albei inherent aaneenlopend is. Hierdie kanaal is ook tyd-vari¨erend: interaksies vind plaas in unieke omgewings met verskillende kanaal akoestiek en tipes geraas. Hierdie ingewikkeldhede gaan verlore wanneer agente direk met suiwer diskrete simbole kommunikeer. Ons vra dus: is ons in staat om opkomende taal tussen agente met ’n deurlopende kommunikasiekanaal waar te neem? En indien wel, hoe verskil aangeleerde deurlopende kommunikasie van diskrete kommunikasie? Ons doelwit is om ’n platform te bied om ontluikende deurlopende seine te bestudeer om te sien hoe dit verband hou met menslike taalverwerwing en -evolusie. Ons stel ’n boodskap-omgewing voor waar ’n spreker-agent ’n stel eienskappe aan ’n luisteraar moet oordra oor ’n lawaaierige akoestiese kanaal. Hierdie tesis lewer twee kernbydraes. Eerstens, in teenstelling met onlangse studies oor taalopkoms, lei ons ons agente op met diepgaande Q-leer eerder as REINFORCE. Wanneer ons DQN gebruik, toon ons aansienlike prestasiewinste en verbeterde samestelling. Tweedens bied ons ’n platform om gesproke opkomende taal tussen agente te bestudeer. Om dit ten toon te stel, vergelyk ons diskrete en akoestiese opkomende tale. Ons wys dat, anders as die diskrete geval, die akoestiese luidspreker oortolligheid leer om luisteraarsamehang te verbeter wanneer langer reekse toegelaat word. Ons vind ook dat die akoestiese spreker meer komposisionele kommunikasieprotokolle ontwikkel wat implisiet kompenseer vir transmissiefoute oor die raserige kanaal. Daarbenewens toon ons vroe¨e eksperimente met belowende resultate in taalbegronding (na Engels) en effektiewe veralgemening na werklike kommunikasiekanale.
Description
Thesis (MEng)--Stellenbosch University, 2023.
Keywords
Citation