Embedded AI, Predictive Maintenance on STM32

Project Overview

Designed, trained, and deployed a Deep Neural Network (DNN) for predictive maintenance directly on an STM32L4R9 microcontroller. The project covers the full embedded ML pipeline, from raw data preprocessing to live on-device inference, using the AI4I 2020 Predictive Maintenance Dataset and the STM32Cube.AI toolchain.

Pipeline

Data Preprocessing: Cleaned and normalized the AI4I 2020 dataset, handling class imbalance and engineering discriminative features from sensor readings (temperature, torque, rotational speed, tool wear).
Model Design & Training: Built and trained a DNN in Python (Google Colab), iterating on architecture depth, activation functions, and regularization to maximize classification performance while keeping the model footprint small enough for deployment.
Performance Evaluation: Assessed accuracy, precision, recall, and F1-score using held-out test splits; analyzed the confusion matrix to characterize failure-mode detection capability.
Model Conversion: Exported the trained model via STM32Cube.AI, analyzing network size, RAM/Flash footprint, and computational cost on the target MCU.
STM32CubeIDE Integration: Integrated the converted model into a bare-metal STM32CubeIDE project, wiring the inference engine to sensor input buffers and live output reporting.

Key Features

Full Embedded ML Workflow: Covers every stage from data science to microcontroller firmware, following industrial embedded AI practices.
Resource Analysis: Used STM32Cube.AI’s profiler to evaluate the trade-off between model complexity and on-device memory constraints.
Professional Git Workflow: Developed collaboratively on GitLab with feature branches, incremental commits, and descriptive commit messages reflecting real project evolution.
Bonus, MCUNet Exploration: Investigated MCUNet as an alternative ultra-compact architecture for even tighter memory budgets.

Technical Challenges

Fitting a meaningful DNN within the strict RAM and Flash limits of a Cortex-M4 required iterative pruning and quantization analysis. Ensuring that classification latency remained acceptable for real-time predictive maintenance decisions, without an OS or runtime, demanded careful profiling at every step of the conversion process.