Autocomplete AI in Scala

March 2025 - April 2025

Creation of an autocomplete artificial intelligence capable of predicting the following words from a given text.

Contributors:

Clément Garro, Giovanni Gozzo

Scala

Traitement du langage naturel (NLP)

Machine Learning

SBT

Git

Big Data

Project Overview

The project consisted of developing an autocomplete artificial intelligence in Scala. The AI trained on a textual dataset to analyze contextual relationships between words. When a user entered text, the model dynamically determined the appropriate n-gram based on the length of the input or its context, to adjust its training and improve prediction relevance. The algorithms used statistical and probabilistic models to predict the most coherent word or sequence of words after input. The goal was to ensure smooth and accurate results while optimizing performance and resource consumption.

Key Features

Training a model on custom text datasets
Dynamic n-gram determination based on user input
Probabilistic algorithms for evaluating prediction coherence
Using Scala and SBT for efficient dependency management
Performance optimization for processing large datasets
Simple user interface for testing autocomplete capabilities
Technical documentation and usage examples

Context

This project was completed as a pair collaboration to explore the capabilities of Scala and Machine Learning in natural language processing (NLP). The AI was designed to analyze texts and predict words smoothly, learning from contextual relationships in a given corpus. Dynamic n-gram determination allowed adjusting training for predictions adapted to each user context. This project helped consolidate our skills in machine learning, Scala and data manipulation.

Lessons Learned from this Experience

Mastery of n-grams and probabilistic models for NLP.
Development and debugging of machine learning algorithms in Scala.
Dataset management for efficient learning and prediction.
Performance optimization for fast prediction execution.
Deepening of dynamic processing logic based on user input.
Collaboration and task distribution in a team project.