HomeAboutProjectsCVContact
Back to Projects

Autocomplete AI in Scala

March 2025 - April 2025

Creation of an autocomplete artificial intelligence capable of predicting the following words from a given text.

Contributors:

Clément Garro, Giovanni Gozzo

Scala
IA
Traitement du langage naturel (NLP)
Machine Learning
SBT
Git
Big Data

Project Overview

The project consisted of developing an autocomplete artificial intelligence in Scala. The AI trained on a textual dataset to analyze contextual relationships between words. When a user entered text, the model dynamically determined the appropriate n-gram based on the length of the input or its context, to adjust its training and improve prediction relevance. The algorithms used statistical and probabilistic models to predict the most coherent word or sequence of words after input. The goal was to ensure smooth and accurate results while optimizing performance and resource consumption.

Key Features

  • Training a model on custom text datasets
  • Dynamic n-gram determination based on user input
  • Probabilistic algorithms for evaluating prediction coherence
  • Using Scala and SBT for efficient dependency management
  • Performance optimization for processing large datasets
  • Simple user interface for testing autocomplete capabilities
  • Technical documentation and usage examples

Context

This project was completed as a pair collaboration to explore the capabilities of Scala and Machine Learning in natural language processing (NLP). The AI was designed to analyze texts and predict words smoothly, learning from contextual relationships in a given corpus. Dynamic n-gram determination allowed adjusting training for predictions adapted to each user context. This project helped consolidate our skills in machine learning, Scala and data manipulation.

Lessons Learned from this Experience

  • Mastery of n-grams and probabilistic models for NLP.
  • Development and debugging of machine learning algorithms in Scala.
  • Dataset management for efficient learning and prediction.
  • Performance optimization for fast prediction execution.
  • Deepening of dynamic processing logic based on user input.
  • Collaboration and task distribution in a team project.