MLOps & Production Pipelines · Airflow-based Pipeline Orchestration

AirflowTFX: End-to-End MLOps Pipeline

An end-to-end TFX-based ML pipeline for medical insurance cost prediction — orchestrated via Apache Airflow DAGs for automated, reproducible, and modular workflow execution.

Architecture TFX · Apache Airflow
Tech Stack
TFX Apache Airflow TFDV TFT TFMA pandas Python

DAG

Airflow-Orchestrated Task Dependencies

TFX

Standardized ML Pipeline Framework

5 Stages

Ingestion to Evaluation — Fully Automated

The Problem

Notebook-based ML workflows are not reproducible, not auditable, and not production-ready

Most ML development happens in notebooks — a format optimized for exploration but fundamentally unsuited to production. Notebooks have no task dependency management, no schema validation to catch data quality issues before training, no consistent preprocessing contract between training and serving, no standardized evaluation framework, and no orchestration layer that ensures every step runs in the correct order with full logging and traceability. Every manual re-run is a potential source of inconsistency. Every ad hoc preprocessing step is a source of training-serving skew. Without a structured pipeline replacing notebook-based workflows, ML systems cannot be reliably reproduced, audited, or scaled — regardless of how good the model itself is.

The Solution

A TFX pipeline orchestrated via Apache Airflow DAGs — automated, modular, and fully reproducible from ingestion to evaluation

AirflowTFX replaces manual notebook-based ML development with a structured, automated pipeline for medical insurance cost prediction. The TFX pipeline covers five stages — data ingestion via ExampleGen, schema inference and anomaly detection via StatisticsGen, SchemaGen, and ExampleValidator, consistent feature preprocessing via Transform, model training via Trainer, and post-training evaluation via TFMA-powered Evaluator. Apache Airflow orchestrates the entire pipeline as a DAG, managing task dependencies, scheduling, logging, and run history through its web UI. A key engineering challenge was resolved during implementation — Airflow and TFX had environment compatibility conflicts, solved by running the compiled TFX pipeline in a dedicated virtual environment invoked directly from within the Airflow DAG definition in TFX_Pipeline_DAG.py.

Key Outcome

A fully automated, DAG-orchestrated TFX pipeline that replaces manual notebook workflows with a reproducible, modular ML system — covering every stage from data ingestion through schema validation, feature transformation, model training, and evaluation, with full task dependency management and run logging via Apache Airflow.

Technical Deep Dive

Architecture & Design

TFX Pipeline & Airflow DAG

Apache Airflow — DAG Orchestration Layer

Stage 1 · ExampleGen

Data Ingestion

Ingests medical insurance cost CSV · Splits into train/eval sets · Produces TFRecord artifacts

Stage 2a · StatisticsGen

Statistics

Computes dataset stats for drift detection

Stage 2b · SchemaGen

Schema

Infers feature schema from training data

Stage 2c · ExampleValidator

Validation

Detects anomalies against inferred schema

Stage 3 · Transform

Feature Engineering

TFT preprocessing graph · Consistent transforms for training + serving · Eliminates training-serving skew

Stage 4 · Tuner

Hyperparameter Tuning

Keras Tuner integration · Searches optimal hyperparameter configuration · Best trial passed to Trainer

Stage 5 · Trainer

Model Training

Regression model on insurance cost data · module.py defines training logic · Artifacts saved to pipeline store

Stage 6 · Model Resolver

Model Comparison

Compares newly trained model against production baseline · Auto-promotes if performance improves

Stage 7 · Evaluator

Model Evaluation

TFMA post-training evaluation · Configurable metrics · Evaluation artifacts logged for comparison

Stage 8 · Pusher

Model Deployment

Blessed models pushed to serving infrastructure · Deployment gated by Evaluator blessing

Airflow DAG — TFX_Pipeline_DAG.py

Task dependency management Scheduled execution Run logging & history Airflow UI at :8080 Dedicated TFX venv invocation

Stage 1

Data Ingestion

ExampleGen ingests the medical insurance cost CSV dataset and partitions it into training and evaluation splits, producing standardized TFRecord artifacts. This ensures all downstream components receive data in a consistent, versioned format — eliminating the ad hoc data loading that makes notebook-based workflows non-reproducible across runs.

Stage 2

Schema Validation

Three components run in sequence — StatisticsGen computes descriptive statistics across all features, SchemaGen infers a formal data schema from the training set, and ExampleValidator checks every example against that schema to detect anomalies such as missing values, unexpected types, or out-of-range values. Data quality issues are caught before they reach training rather than after they silently degrade model performance.

Stage 3

Feature Engineering

The Transform component applies TF Transform to preprocess features using a TensorFlow graph that is saved alongside the model. The same preprocessing graph is applied at both training and serving time — eliminating training-serving skew entirely. Feature engineering logic is defined once and guaranteed to be identical in every context where the model is used.

Stage 4

Hyperparameter Tuning

The Tuner component integrates Keras Tuner to search across model hyperparameters using the transformed features. The best trial is automatically passed to the Trainer — removing manual tuning from the pipeline and ensuring every run uses an optimized configuration without human intervention.

Stage 5

Model Training

The Trainer component runs the regression model defined in module.py using the tuned hyperparameters and transformed features. Training artifacts — model weights, saved model, and run metadata — are written to the pipeline artifact store, versioned and accessible to the Evaluator. The training logic is fully decoupled from the orchestration layer, making it independently testable and replaceable.

Stage 6

Model Comparison

The Model Resolver compares the newly trained model against the current production baseline. If the new model improves on the baseline metrics it is automatically promoted for evaluation and deployment — preventing a model that performs worse than its predecessor from silently replacing it in the serving pipeline.

Stage 7

Model Evaluation

The Evaluator uses TensorFlow Model Analysis (TFMA) to compute post-training metrics against the held-out evaluation split. Evaluation results are stored as versioned artifacts alongside the model — enabling metric comparison across pipeline runs and providing the audit trail needed to justify model promotion and deployment decisions.

Stage 8

Model Deployment

The Pusher component deploys blessed models to the serving infrastructure — deployment is gated by the Evaluator blessing, ensuring that only models that pass all evaluation thresholds are promoted to production. This closes the pipeline from training to serving in a fully automated, auditable sequence with no manual deployment steps.

Orchestration

Apache Airflow DAG

TFX_Pipeline_DAG.py defines the pipeline as an Airflow DAG — managing task dependencies, execution order, scheduling, and run logging through the Airflow web UI at port 8080. A key engineering decision resolved during implementation: Airflow and TFX had environment compatibility conflicts, solved by invoking the compiled TFX pipeline from a dedicated virtual environment within the DAG definition — keeping both tools isolated without sacrificing integration.

Key Design Decisions

Airflow DAG orchestration replaces manual execution order

Without an orchestration layer, pipeline stages must be run manually in the correct order — and any execution error leaves the system in an undefined state. By defining the TFX pipeline as an Airflow DAG, task dependencies are declared explicitly and enforced automatically. A failed stage stops downstream tasks from running on stale or missing artifacts, and the full run history is logged and searchable through the Airflow UI.

Dedicated TFX virtual environment resolves Airflow compatibility

Airflow and TFX have conflicting dependency requirements that prevent them from coexisting in the same Python environment. Rather than downgrading either tool, the pipeline compiles TFX separately and invokes it from a dedicated virtual environment within the Airflow DAG task. This isolation pattern — running TFX in its own environment and calling it from Airflow — is a practical, production-relevant solution to a real dependency management problem that practitioners encounter when building MLOps systems with heterogeneous toolchains.

TFT Transform graph eliminates training-serving skew by design

Training-serving skew — where preprocessing at training time differs from preprocessing at inference time — is one of the most common and hardest-to-debug production ML failures. TF Transform generates a preprocessing TensorFlow graph that is saved alongside the model and applied identically in both contexts. The skew is eliminated architecturally rather than by discipline, making it impossible for the two preprocessing paths to diverge across pipeline runs.

Tech Stack

Technology Purpose
TensorFlow Extended (TFX) Standardized ML pipeline framework — ExampleGen through Pusher
Apache Airflow DAG-based workflow orchestration, task dependency management, and scheduling
TF Data Validation (TFDV) Dataset statistics, schema inference, and anomaly detection
TF Transform (TFT) Consistent preprocessing graph for training and serving
TF Model Analysis (TFMA) Post-training model evaluation with versioned metric artifacts
Keras Tuner Automated hyperparameter search within TFX Tuner component
pandas Data manipulation and feature engineering for the insurance cost dataset
Python Core language and pipeline orchestration

Results & Metrics

What the system delivers

DAG

Airflow-Orchestrated Pipeline

Task dependencies declared and enforced automatically — full run history and logging via Airflow UI

TFX

Standardized ML Framework

TFDV, TFT, and TFMA integrated — reproducible, auditable pipeline from ingestion to evaluation

5 Stages

Fully Automated Workflow

Ingestion through evaluation — every stage automated, logged, and reproducible across runs

🔁

Fully reproducible pipeline execution across every run

Every pipeline run produces identical results from the same inputs — data ingestion, schema validation, feature transformation, training, and evaluation all execute in a fixed, dependency-enforced order. TFX artifacts are versioned at every stage, making any run fully reproducible and any result traceable back to the exact data and code that produced it.

Data quality issues caught before training — not after

StatisticsGen, SchemaGen, and ExampleValidator form a three-component data quality gate that runs before any training begins. Schema violations, anomalous values, and missing features are detected and flagged at ingestion time — preventing silent data quality degradation from propagating into model weights and corrupting downstream evaluation metrics.

Single DAG trigger runs the entire pipeline end-to-end

Triggering the pipeline with airflow dags trigger tfx_pipeline_dag executes all five stages in the correct order without any manual intervention. Airflow manages the execution graph, retries failed tasks, and surfaces the full run log in the UI — replacing a sequence of manual script executions with a single, auditable, repeatable workflow trigger.

🔧

Environment isolation solves real-world toolchain compatibility

The Airflow-TFX compatibility conflict is a genuine engineering challenge that practitioners encounter when building MLOps systems with heterogeneous toolchains. The solution — compiling TFX in a dedicated virtual environment and invoking it from within the Airflow DAG — demonstrates practical production ML engineering that goes beyond academic pipeline implementations.

📊

Improved reproducibility and scalability over notebook workflows

The pipeline successfully automated all five stages from data ingestion through evaluation — demonstrating a measurable improvement in reproducibility and scalability compared to manual notebook-based workflows. Every run produces versioned artifacts at each stage, evaluation metrics are stored and comparable across runs, and the full execution history is available through the Airflow UI for audit and debugging.