Parsing Poorly Standardized Language Dependency on Old French

Abstract

This paper presents results of dependency parsing of Old French, a language which is poorly standardized at the lexical level, and which displays a relatively free word order. The work is carried out on five distinct sample texts extracted from the dependency treebank Syntactic Reference Corpus of Medieval French (SRCMF). Following Achim Stein’s previous work, we have trained the Mate parser on each sub-corpus and cross-validated the results. We show that the parsing efficiency is diminished by the greater lexical variation of Old French compared to parse results on modern French. In order to improve the result of the POS tagging step in the parsing process, we applied a pre-treatment to the data, comparing two distinct strategies: one using a slightly post-treated version of the TreeTagger trained on Old French by Stein, and a CRF trained on the texts, enriched with external resources. The CRF version outperforms every other approach.

Publication
In the 13th International Workshop on Treebanks and Linguistic Theories (TLT13)
Gaël Guibon
Gaël Guibon
Post-doctoral Researcher

My research goes from emojis and emotion prediction and recommendation to French lexical evolution studies.

Related