Skip to main content

Connecting Language and Vision talk Thursday 3 Dec 1-2pm

Date

TIME AND DATE: Thursday 3 December, 1-2pm

VENUE: School of Computing Active Learning Lab, EC Stoner room 9.30a (off Staircase 2).

SPEAKER: Muhannad Al-Omari, School of Computing, Leeds

**Connecting Language and Vision: Autonomous, Incremental Learning and Grounding of Natural Language Descriptions to Perceptual Features**

Language is a powerful modality that enables humans to interact with each other more efficiently, by symbolically representing the physical world around them. Understanding how humans learn and represent languages, and how word meanings are mapped to sensory stimuli coming from the external environment (the language bootstrapping problem), is a long standing objective of AI research.

This work introduces a unified approach for connecting language and vision from static and dynamic scenes. A loosely supervised learning technique anchors object descriptions, relations and verbs to their perceptual manifestations. The system uses a combination of online incremental Gaussian mixture models and probabilistic grammar rules to learn about language and its connection to perceptual features with very few assumptions. Our system demonstrates how, for the first time in a computational setting, it is possible to autonomously acquire three different kinds of knowledge that children learn to master during the early stages of their life: 1) a number of visual concepts which allow to differentiate between, e.g., different colours and shapes; 2) the mapping between terms in language and what they denote in the perceptual world; 3) the grammar rules that govern the processing and generation of correct sentences in natural language. Further, we show that the system is capable of incrementally learning these parts of knowledge in multiple languages.