This project allowed me to discover AI through low-resource Natural Language Processing problems. It was supported by a researcher at Inria, Paris (ALMAnaCH team).
Program
Designed an experiment about the reliability of a state-of-the-art unsupervised model for linguistic reconstruction task.
- Implemented a model with PyTorch
- Handled vector computing for string generation and probabilistic calculation algorithms with dynamic programs and graph traversal
- Coded training and inference stages of language models (n-grams + LSTM)
A bottleneck occurred during the expectation-maximization sampling stage, as the number of generated samples was inherently exponential. This prevented the program from running effectively without parallelization, more advanced optimizations, or improved hardware capabilities.
N.B.: Jupyter has been used to debug and log some parts of the experiment’s global pipeline.
Dissertation and Defense
Produced a dissertation and defended it with the congratulations of the jury.