AirflowTFX: End-to-End MLOps Pipeline
An end-to-end TFX-based ML pipeline for medical insurance cost prediction — orchestrated via Apache Airflow DAGs for automated, reproducible, and modular workflow execution.
DAG
Airflow-Orchestrated Task Dependencies
TFX
Standardized ML Pipeline Framework
5 Stages
Ingestion to Evaluation — Fully Automated
The Problem
Notebook-based ML workflows are not reproducible, not auditable, and not production-ready
Most ML development happens in notebooks — a format optimized for exploration but fundamentally unsuited to production. Notebooks have no task dependency management, no schema validation to catch data quality issues before training, no consistent preprocessing contract between training and serving, no standardized evaluation framework, and no orchestration layer that ensures every step runs in the correct order with full logging and traceability. Every manual re-run is a potential source of inconsistency. Every ad hoc preprocessing step is a source of training-serving skew. Without a structured pipeline replacing notebook-based workflows, ML systems cannot be reliably reproduced, audited, or scaled — regardless of how good the model itself is.
The Solution
A TFX pipeline orchestrated via Apache Airflow DAGs — automated, modular, and fully reproducible from ingestion to evaluation
AirflowTFX replaces manual notebook-based ML development with a structured, automated pipeline for medical insurance cost prediction. The TFX pipeline covers ten components — data ingestion via ExampleGen, schema inference and anomaly detection via StatisticsGen, SchemaGen, and ExampleValidator, consistent feature preprocessing via Transform, automated hyperparameter search via Tuner, model training via Trainer, production baseline resolution via Model Resolver, post-training evaluation via TFMA-powered Evaluator, and model deployment via Pusher. Apache Airflow orchestrates the entire pipeline as a DAG, scheduling, triggering, logging, and run history through its web UI. A key engineering challenge was resolved during implementation — Airflow and TFX had environment compatibility conflicts, solved by running the compiled TFX pipeline in a dedicated virtual environment invoked directly from within the Airflow DAG definition in TFX_Pipeline_DAG.py.
Key Outcome
A fully automated, DAG-orchestrated TFX pipeline that replaces manual notebook workflows with a reproducible, modular ML system — covering every stage from data ingestion through schema validation, feature transformation, hyperparameter tuning, model training, evaluation, and deployment, with component ordering enforced by TFX's LocalDagRunner and run scheduling and logging via Apache Airflow.
Technical Deep Dive
Architecture & Design
TFX Pipeline & Airflow DAG
Apache Airflow — DAG Orchestration Layer
Stage 1 · ExampleGen
Data Ingestion
Ingests medical insurance cost CSV · Splits into train/eval sets · Produces TFRecord artifacts
Stage 2a · StatisticsGen
Statistics
Computes dataset statistics on the training split · Output feeds SchemaGen
Stage 2b · SchemaGen
Schema
Infers feature schema from training statistics · Output feeds ExampleValidator
Stage 2c · ExampleValidator
Validation
Validates examples against schema using both statistics and schema outputs
Stage 3 · Transform
Feature Engineering
TFT preprocessing graph · Consistent transforms for training + serving · Eliminates training-serving skew
Stage 4 · Tuner
Hyperparameter Tuning
Keras Tuner integration · Searches optimal hyperparameter configuration · Best trial passed to Trainer
Stage 5 · Trainer
Model Training
Regression model on insurance cost data · module.py defines training logic · Artifacts saved to pipeline store
Stage 6 · Model Resolver
Model Comparison
Fetches the latest blessed model from ML Metadata · Provides the production baseline to the Evaluator for comparison
Stage 7 · Evaluator
Model Evaluation
TFMA post-training evaluation · Configurable metrics · Evaluation artifacts logged for comparison
Stage 8 · Pusher
Model Deployment
Blessed models pushed to serving infrastructure · Deployment gated by Evaluator blessing
Airflow DAG — TFX_Pipeline_DAG.py
Stage 1
Data Ingestion
ExampleGen ingests the medical insurance cost CSV dataset and partitions it into training and evaluation splits, producing standardized TFRecord artifacts. This ensures all downstream components receive data in a consistent, versioned format — eliminating the ad hoc data loading that makes notebook-based workflows non-reproducible across runs.
Stage 2
Schema Validation
Three components run in sequence — StatisticsGen computes descriptive statistics across all features, SchemaGen infers a formal data schema from the training set, and ExampleValidator checks every example against that schema to detect anomalies such as missing values, unexpected types, or out-of-range values. Data quality issues are caught before they reach training rather than after they silently degrade model performance.
Stage 3
Feature Engineering
The Transform component applies TF Transform to preprocess features using a TensorFlow graph that is saved alongside the model. The same preprocessing graph is applied at both training and serving time — eliminating training-serving skew entirely. Feature engineering logic is defined once and guaranteed to be identical in every context where the model is used.
Stage 4
Hyperparameter Tuning
The Tuner component integrates Keras Tuner to search across model hyperparameters using the transformed features. The best trial is automatically passed to the Trainer — removing manual tuning from the pipeline and ensuring every run uses an optimized configuration without human intervention.
Stage 5
Model Training
The Trainer component runs the regression model defined in module.py using the tuned hyperparameters and transformed features. Training artifacts — model weights, saved model, and run metadata — are written to the pipeline artifact store, versioned and accessible to the Evaluator. The training logic is fully decoupled from the orchestration layer, making it independently testable and replaceable.
Stage 6
Model Comparison
The Model Resolver queries ML Metadata for the latest blessed model and surfaces it as the production baseline — it does not perform any comparison or make any promotion decision. The actual metric comparison is performed by the Evaluator using TFMA, and promotion to the serving directory is executed by the Pusher only when the Evaluator issues a blessing. Without a previously blessed model, the Evaluator skips the baseline comparison and evaluates the candidate model in isolation.
Stage 7
Model Evaluation
The Evaluator uses TensorFlow Model Analysis (TFMA) to compute post-training metrics against the held-out evaluation split. Evaluation results are stored as versioned artifacts alongside the model — enabling metric comparison across pipeline runs and providing the audit trail needed to justify model promotion and deployment decisions.
Stage 8
Model Deployment
The Pusher component deploys blessed models to the serving infrastructure — deployment is gated by the Evaluator blessing, ensuring that only models that pass all evaluation thresholds are promoted to production. This closes the pipeline from training to serving in a fully automated, auditable sequence with no manual deployment steps.
Orchestration
Apache Airflow DAG
TFX_Pipeline_DAG.py defines an Airflow DAG containing a single BashOperator task that activates the TFX virtual environment and runs the full pipeline via TFX's LocalDagRunner. Airflow handles scheduling, triggering, and run logging through the web UI at port 8080 — the internal component execution order (ExampleGen → Pusher) is managed entirely by TFX's LocalDagRunner, not by Airflow task dependencies. This architecture was necessary because Airflow and TFX have hard dependency conflicts that prevent them from sharing a Python environment.
Key Design Decisions
Airflow DAG orchestration replaces manual pipeline execution
Without an orchestration layer, the pipeline must be triggered manually by running pipeline_run.py directly. By wrapping the TFX pipeline in an Airflow DAG, execution becomes schedulable, triggerable via the UI or CLI, and fully logged with run history. The component execution order within the pipeline (ExampleGen through Pusher) is enforced by TFX's LocalDagRunner — Airflow's role is to trigger, schedule, and monitor the overall pipeline run as a single atomic task.
Dedicated TFX virtual environment resolves Airflow compatibility
Airflow and TFX have conflicting dependency requirements that prevent them from coexisting in the same Python environment. Rather than downgrading either tool, the pipeline compiles TFX separately and invokes it from a dedicated virtual environment within the Airflow DAG task. This isolation pattern — running TFX in its own environment and calling it from Airflow — is a practical, production-relevant solution to a real dependency management problem that practitioners encounter when building MLOps systems with heterogeneous toolchains.
TFT Transform graph eliminates training-serving skew by design
Training-serving skew — where preprocessing at training time differs from preprocessing at inference time — is one of the most common and hardest-to-debug production ML failures. TF Transform generates a preprocessing TensorFlow graph that is saved alongside the model and applied identically in both contexts. The skew is eliminated architecturally rather than by discipline, making it impossible for the two preprocessing paths to diverge across pipeline runs.
Tech Stack
| Technology | Purpose |
|---|---|
| TensorFlow Extended (TFX) | Standardized ML pipeline framework — ExampleGen through Pusher |
| Apache Airflow | DAG-based workflow orchestration, task dependency management, and scheduling |
| TF Data Validation (TFDV) | Dataset statistics, schema inference, and anomaly detection |
| TF Transform (TFT) | Consistent preprocessing graph for training and serving |
| TF Model Analysis (TFMA) | Post-training model evaluation with versioned metric artifacts |
| Keras Tuner | Automated hyperparameter search within TFX Tuner component |
| pandas | Data manipulation and feature engineering for the insurance cost dataset |
| Python | Core language and pipeline orchestration |
Results & Metrics
What the system delivers
DAG
Airflow-Orchestrated Pipeline
Task dependencies declared and enforced automatically — full run history and logging via Airflow UI
TFX
Standardized ML Framework
TFDV, TFT, and TFMA integrated — reproducible, auditable pipeline from ingestion to evaluation
5 Stages
Fully Automated Workflow
Ingestion through evaluation — every stage automated, logged, and reproducible across runs
Fully reproducible pipeline execution across every run
Every pipeline run produces identical results from the same inputs — data ingestion, schema validation, feature transformation, training, and evaluation all execute in a fixed, dependency-enforced order. TFX artifacts are versioned at every stage, making any run fully reproducible and any result traceable back to the exact data and code that produced it.
Data quality issues caught before training — not after
StatisticsGen, SchemaGen, and ExampleValidator form a three-component data quality gate that runs before any training begins. Schema violations, anomalous values, and missing features are detected and flagged at ingestion time — preventing silent data quality degradation from propagating into model weights and corrupting downstream evaluation metrics.
Single DAG trigger runs the entire pipeline end-to-end
Triggering the pipeline with airflow dags trigger tfx_pipeline_dag executes all five stages in the correct order without any manual intervention. Airflow manages the execution graph, retries failed tasks, and surfaces the full run log in the UI — replacing a sequence of manual script executions with a single, auditable, repeatable workflow trigger.
Environment isolation solves real-world toolchain compatibility
The Airflow-TFX compatibility conflict is a genuine engineering challenge that practitioners encounter when building MLOps systems with heterogeneous toolchains. The solution — compiling TFX in a dedicated virtual environment and invoking it from within the Airflow DAG — demonstrates practical production ML engineering that goes beyond academic pipeline implementations.
Improved reproducibility and scalability over notebook workflows
The pipeline successfully automated all five stages from data ingestion through evaluation — demonstrating a measurable improvement in reproducibility and scalability compared to manual notebook-based workflows. Every run produces versioned artifacts at each stage, evaluation metrics are stored and comparable across runs, and the full execution history is available through the Airflow UI for audit and debugging.