AI for Historical Linguistics

Explore the repository

This project allowed me to discover AI through low-resource Natural Language Processing problems. It was supported by a researcher at Inria, Paris (ALMAnaCH team).

Program

Designed an experiment about the reliability of a state-of-the-art unsupervised model for linguistic reconstruction task.

Implemented a model with PyTorch
Handled vector computing for string generation and probabilistic calculation algorithms with dynamic programs and graph traversal
Coded training and inference stages of language models (n-grams + LSTM)

A bottleneck occurred during the expectation-maximization sampling stage, as the number of generated samples was inherently exponential. This prevented the program from running effectively without parallelization, more advanced optimizations, or improved hardware capabilities.

N.B.: Jupyter has been used to debug and log some parts of the experiment’s global pipeline.

Dissertation and Defense

Produced a dissertation and defended it with the congratulations of the jury.