MLOps Pipeline — Projet CI 2 Informatique

Phase 1 — Conception & Cadrage

Définir le problème métier, les métriques de succès et évaluer la faisabilité.

🎯

Définition du Problème

Nom du projet

Type de problème ML

Description du problème métier

📊

Métriques de Succès

MÉTRIQUES MÉTIER (KPI)

MÉTRIQUES ML

≥ 85%

F1-Score cible

≥ 80%

Rappel cible

≤ 0.15

RMSE max

≤ 200ms

Latence inférence

⚡

Architecture MLOps — Vue Globale

📦

Data Sources

→

🔄

Ingestion
(Airflow)

→

🧹

Preprocessing
(Pandas)

→

🧠

Training
(Sklearn)

→

📋

Registry
(MLflow)

→

🐳

Docker
(Deploy)

→

🚀

FastAPI
(Serving)

→

📡

Grafana
(Monitor)

● Complété ● En cours ● En attente

Phase 2 — Collecte & Gestion des Données

Ingestion, EDA, nettoyage, feature engineering et versionnage DVC.

🔗

Insertion de Dataset via URL

ℹ️ Collez un lien direct vers un fichier CSV (raw GitHub, Kaggle API, UCI, etc.) ou téléchargez un fichier local.

URL du dataset (CSV)

Exemples rapides :

—

Lignes

—

Colonnes

—

Manquants

—

Types

Aperçu

Schéma

EDA Stats

DVC Versioning

Phase 3 — Développement du Modèle

Expérimentation, tuning des hyperparamètres, tracking MLflow.

⚙️ Configuration Entraînement

Algorithme

Test size (%)

Random seed

📈 Résultats — Run actuel

—

Accuracy

—

F1-Score

—

Précision

—

Rappel

🧪 Tableau de Comparaison des Expériences (MLflow)

Run ID	Algorithme	Paramètres	Accuracy	F1	Précision	Rappel	Durée	Statut
Aucune expérience. Lancez un entraînement.

Phase 4 — Orchestration du Pipeline

Apache Airflow DAG — automatisation complète du workflow ML.

🌊 DAG Apache Airflow

🔄

data_ingestion
✓ done

→

✅

data_validation
✓ done

→

🧹

preprocessing
✓ done

→

⚙️

feature_eng
⟳ running

→

🧠

model_train
pending

→

📊

evaluation
pending

→

📋

registry
pending

📄 Code du DAG — airflow_mlops.py

from airflow import DAG

from airflow.operators.python import PythonOperator

from datetime import datetime, timedelta

import mlflow

# Configuration du DAG

default_args = {

    'owner': 'mlops-team',

    'retries': 3,

    'retry_delay': timedelta(minutes=5),

    'email_on_failure': True,

}

with DAG(

    dag_id='mlops_churn_pipeline',

    default_args=default_args,

    schedule_interval='@daily',

    start_date=datetime(2026, 1, 1),

    catchup=False,

    tags=['mlops', 'churn', 'production'],

) as dag:

    ingest = PythonOperator(task_id='data_ingestion', python_callable=ingest_data)

    validate = PythonOperator(task_id='data_validation', python_callable=validate_data)

    preprocess = PythonOperator(task_id='preprocessing', python_callable=preprocess_data)

    feature_eng = PythonOperator(task_id='feature_engineering', python_callable=engineer_features)

    train = PythonOperator(task_id='model_training', python_callable=train_model)

    evaluate = PythonOperator(task_id='evaluation', python_callable=evaluate_model)

    register = PythonOperator(task_id='registry', python_callable=register_model)

    ingest >> validate >> preprocess >> feature_eng >> train >> evaluate >> register

Phase 5 — Validation & Qualité

Tests Pytest, Great Expectations, MLflow Model Registry, conteneurisation Docker.

🧪 Tests Pytest — TDD ML

test_preprocessing.py 8/8 passed

test_feature_engineering.py 5/5 passed

test_model_pipeline.py 6/7 passed

test_data_leakage.py 3/3 passed

22/23 tests — Couverture : 94%

✅ Great Expectations — Validation Données

expect_column_values_to_not_be_null OK

expect_column_values_to_be_between OK

expect_column_to_exist OK

expect_table_row_count_to_be_between FAIL

⚠️ 1 expectation échouée — Vérifier le schéma des données d'entrée.

🐳 Dockerfile — Conteneurisation

# Dockerfile — MLOps Churn Prediction Service

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .

RUN pip install --no-cache-dir -r requirements.txt

COPY . .

RUN dvc pull  # Pull le modèle depuis DVC remote

EXPOSE 8000

ENV MLFLOW_TRACKING_URI=http://mlflow-server:5000

CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]

Phase 6 — Déploiement & Serving

CI/CD GitHub Actions, FastAPI, Docker, Inférence Hybride.

🚀 Déploiements Actifs

churn-model-v2.1

Production · RandomForest · F1=0.873

LIVE port 8000

churn-model-v2.2-beta

Staging · XGBoost · F1=0.891

STAGING port 8001

⚡ FastAPI — Test Inférence

Endpoint

Payload JSON

🔁 CI/CD Pipeline — GitHub Actions

# .github/workflows/mlops-ci.yml

name: MLOps CI/CD Pipeline

on: [push, pull_request]

jobs:

  test:

    runs-on: ubuntu-latest

    steps:

      - uses: actions/checkout@v4

      - name: Run Quality Pipeline

        run: |

          pip install -r requirements.txt

          black --check .  # Formatage

          isort --check-only .  # Imports

          flake8 .  # Linting

          mypy src/  # Type checking

          pytest tests/ --cov=src --cov-report=xml

  train_and_deploy:

    needs: test

    runs-on: ubuntu-latest

    steps:

      - name: DVC Pull Data

        run: dvc pull

      - name: Train Model

        run: python src/train.py

      - name: Build & Push Docker

        run: docker build -t churn-model:$GITHUB_SHA . && docker push

Phase 7 — Monitoring & Observabilité

Prometheus, Grafana, détection de Data Drift avec Evidently AI.

142ms

Latence P99

847

Req/sec

0.3%

Taux d'erreur

0.12

Drift Score

📡 Dashboard Grafana — Métriques Techniques

— Latence — Throughput

🔍 Evidently AI — Data Drift Detection

⚠️ Drift détecté sur monthly_charges — PSI = 0.23 (seuil > 0.20)

📋 Alertes Actives

Data Drift — monthly_charges

PSI = 0.23 · Détecté il y a 2h · Evidently AI

WARNING

Performance dégradée — F1 Score

0.873 → 0.851 (-2.5%) · Dernière semaine

INFO

Phase 8 — Gouvernance & Amélioration Continue

Conformité réglementaire, éthique, boucle de rétroaction.

📜 Gouvernance & Traçabilité

Version	Date	Responsable	Statut
v2.1.0	2026-05-01	Team A	Production
v2.0.3	2026-04-15	Team A	Archivé
v1.9.0	2026-03-01	Team B	Retiré

🔄 Boucle de Rétroaction

📊 Monitoring a détecté un drift — Déclenchement du réentraînement recommandé

Nouvelles données disponibles +12,450 lignes

Qualité des labels 97.3%

Réentraînement planifié Demain 02:00

MLflow — Model Registry

Cycle de vie des modèles : None → Staging → Production → Archived.

📋 Modèles Enregistrés

Modèle	Version	Algorithme	F1	Accuracy	Stage	Date	Action
churn-predictor	v3	XGBoost	0.891	0.912	Staging	2026-05-15
churn-predictor	v2	RandomForest	0.873	0.895	Production	2026-05-01
churn-predictor	v1	LogisticRegression	0.812	0.843	Archived	2026-04-10	—

Code & Tests — Qualité

Type hints, docstrings Google Style, PEP 8, pre-commit hooks.

🛠️ Pre-commit Pipeline

black ✓ isort ✓ flake8 ✓ mypy ✓ pytest ✓

# src/preprocessing.py — Code PEP8 + Type hints + Docstrings

from typing import Optional

import pandas as pd

import numpy as np

def preprocess_data(

    df: pd.DataFrame,

    target_col: str,

    drop_cols: Optional[list] = None,

) -> tuple[pd.DataFrame, pd.Series]:

    """Prétraite le dataset pour l'entraînement ML.

    Args:

        df: DataFrame brut chargé depuis la source.

        target_col: Nom de la colonne cible.

        drop_cols: Colonnes à supprimer avant traitement.

    Returns:

        Tuple (X features, y target) prêts pour Sklearn.

    Raises:

        ValueError: Si target_col n'existe pas dans df.

    """

    if target_col not in df.columns:

        raise ValueError(f"Colonne '{target_col}' introuvable")

    df = df.drop(columns=drop_cols or [])

    df = df.dropna()  # Supprimer les NaN

    y = df.pop(target_col)

    return pd.get_dummies(df), y