A resourceful destination for academicians, corporate professionals, researchers & tech enthusiasts

Sunday, November 05, 2017

Conversational Systems


Somewhat utopian at the moment, the future when AI surrounds us to mediate a seamless experience interacting with the world may be sooner than expected. You say “ok google “ and all your problems seem small, this interaction has been possible only with the help of conversational systems today. If you’ve read the latest Dan Brown fiction “Origin”, you would get a vision on how AI and conversational systems might evolve to the meridian we never imagined. 

A Conversational System is a computer system intended to converse with a human, with a coherent structure. Dialog systems have employed text, speech, graphics, haptics, gestures and other modes for communication on both the input and output channel. One of the strategic technology trends for 2017 are conversational systems. As defined by Tata Consulting, enterprise conversational systems offer a messaging or conversation-driven user experience and facilitate contextual conversations around business events. Through connected APIs, enterprises can build conversational systems that aggregate business events from every area of the enterprise to facilitate people-to-people, people-to-systems, and systems-to-systems interactions.

How Does it work?

The user speaks, and the input is converted to plain text by the system's input recognizer/decoder, which may include automatic speech recognizer (ASR), gesture recognizer, handwriting recognizer 

The text is analyzed by a Natural language understanding unit (NLU), which may include Proper Name identification, part of speech tagging, syntactic/semantic parser. 
The semantic information is analyzed by the dialog manager, that keeps the history and state of the dialog and manages the general flow of the conversation. 

Usually, the dialog manager contacts one or more task managers, that have knowledge of the specific task domain. The dialog manager produces output using an output generator, which may include natural language generator, gesture generator and layout engine. Finally, the output is rendered using an output renderer, which may include text-to-speech engine (TTS), talking head and robot or avatar. 

Moreover, Dialog systems that are based on a text-only interface (e.g. text-based chat) contain only stages 4 steps.

The goal of addressee detection is to answer the question, “Are you talking to me?” When a dialogue system interacts with multiple users, it is crucial to detect when a user is speaking to the system as opposed to another person. This problem is studied in a multimodal scenario, using lexical, acoustic, visual, dialogue state, and beamforming information. Using data from a multiparty dialogue system, the benefits of using multiple modalities over using a single modality are quantified. 

The energy-based acoustic features are by far the most important, that information from speech recognition and system state is useful as well, and that visual and beamforming features provide little additional benefit. While we find that head pose is affected by whom the speaker is addressing, it yields little nonredundant information due to the system acting as a situational attractor. Any findings would be relevant to multiparty, open-world dialogue systems in which the agent plays an active, conversational role, such as an interactive assistant deployed in a public, open space. 

For these scenarios, studies suggest that acoustic, lexical, and system-state information is an effective and practical combination of modalities to use for addressee detection. This shows how analyses might be affected by the ongoing development of more realistic, natural dialogue systems.

The Model

User Simulator

Training reinforcement learners is challenging because they need an environment to operate in. Thus, a user simulator is developed for learning and evaluation. 

The first end-to-end reinforcement learning agent is then developed with differential knowledge base access and the first end-to-end dialogue policy trained with both supervised and reinforcement learning.

Task-completion bot

An end-to-end learning framework is created for task-completion neural dialogue systems along with BBQ Networks (Bayes-by-Backprop Q-Networks) which performs efficient exploration for dialogue policy learning as well as efficient actor-critic methods which substantially reduce the sample complexity for end-to-end learning.

Composite Task-completion bot

A composite task-completion dialogue system is then setup, based on hierarchical reinforcement learning to learn the dialogue policies that operate at different temporal scales, and demonstrated its significant improvement over flat deep reinforcement learning in both simulation and human evaluation.

What Future Holds

Conversational systems of the future will not be limited to text/voice . It is suggested that they will enable people and machines to use multiple modalities (e.g., sight, sound, tactile, etc.) to communicate across the digital device mesh (e.g., sensors, appliances, IoT systems). The “conversation” between the human and the machine uses all these modalities to create a comprehensive conversational experience.

Moreover, IBM introduced Watson Virtual Agent, a cognitive conversational technology that allows businesses to simply build and deploy conversational agents. Watson Virtual Agent allows users – from startups and small businesses to enterprise – to easily and quickly build and train engagement bots from the cloud, harnessing the power of cognitive technologies.

Companies like Staples and Autodesk are embracing services that go beyond simple, narrowly focused tools to sophisticated full-blown virtual agents, relying on deep natural language processing capabilities that can be used to assist consumers. This clearly signifies how companies are trying to grow in the conversational systems with all the possible tools to bring about an automation revolution using artificial intelligence. 

Note: The views expressed here are those of authors and not necessarily represent those of DoT Club as a whole.

No comments:

Post a comment