Skip to main content

Morphology for Related Languages

Date
Date
Thursday 26 April 2018
Michael Sadler B.37 12-1pm

To continue the tradition of pre-conference presentations in CTS, this talk by Serge Sharoff presents the outcomes of his recently completed research leave and the paper at the forthcoming LREC: http://lrec2018.lrec-conf.org/en/
 Abstract:
This paper proposes a linguistically-informed framework aimed at generating large dictionaries with morphosyntactic annotations for under-resourced languages by combining data from annotated corpora of better-resourced (donor) languages with raw text data for under-resourced (recipient) languages. The framework is based on (1) developing a monolingual embedding space, which takes into account some morphosyntactic properties of words, (2) creating a cross-lingual embedding space using bilingual dictionaries and orthographic similarity between words in the donor and recipient languages, and (3) detecting morphological similarity between the annotated and unannotated words in the cross-lingual space. Quality of dictionary induction can be measured within each language, and also via integration of the produced dictionaries into tagging and parsing frameworks.