MLOps & Production Pipelines · Airflow-based Pipeline Orchestration

AirflowTFX: End-to-End MLOps Pipeline

An end-to-end TFX-based ML pipeline for medical insurance cost prediction — orchestrated via Apache Airflow DAGs for automated, reproducible, and modular workflow execution.

Architecture TFX · Apache Airflow
Tech Stack
TFX Apache Airflow TFDV TFT TFMA pandas Python
Source Code View on GitHub

DAG

Airflow-Orchestrated Task Dependencies

TFX

Standardized ML Pipeline Framework

5 Stages

Ingestion to Evaluation — Fully Automated

The Problem

Notebook-based ML workflows are not reproducible, not auditable, and not production-ready

Most ML development happens in notebooks — a format optimized for exploration but fundamentally unsuited to production. Notebooks have no task dependency management, no schema validation to catch data quality issues before training, no consistent preprocessing contract between training and serving, no standardized evaluation framework, and no orchestration layer that ensures every step runs in the correct order with full logging and traceability. Every manual re-run is a potential source of inconsistency. Every ad hoc preprocessing step is a source of training-serving skew. Without a structured pipeline replacing notebook-based workflows, ML systems cannot be reliably reproduced, audited, or scaled — regardless of how good the model itself is.

The Solution

A TFX pipeline orchestrated via Apache Airflow DAGs — automated, modular, and fully reproducible from ingestion to evaluation

AirflowTFX replaces manual notebook-based ML development with a structured, automated pipeline for medical insurance cost prediction. The TFX pipeline covers ten components — data ingestion via ExampleGen, schema inference and anomaly detection via StatisticsGen, SchemaGen, and ExampleValidator, consistent feature preprocessing via Transform, automated hyperparameter search via Tuner, model training via Trainer, production baseline resolution via Model Resolver, post-training evaluation via TFMA-powered Evaluator, and model deployment via Pusher. Apache Airflow orchestrates the entire pipeline as a DAG, scheduling, triggering, logging, and run history through its web UI. A key engineering challenge was resolved during implementation — Airflow and TFX had environment compatibility conflicts, solved by running the compiled TFX pipeline in a dedicated virtual environment invoked directly from within the Airflow DAG definition in TFX_Pipeline_DAG.py.

Key Outcome

A fully automated, DAG-orchestrated TFX pipeline that replaces manual notebook workflows with a reproducible, modular ML system — covering every stage from data ingestion through schema validation, feature transformation, hyperparameter tuning, model training, evaluation, and deployment, with component ordering enforced by TFX's LocalDagRunner and run scheduling and logging via Apache Airflow.

Technical Deep Dive

Architecture & Design

TFX Pipeline & Airflow DAG

Apache Airflow — DAG Orchestration Layer

Stage 1 · ExampleGen

Data Ingestion

Ingests medical insurance cost CSV · Splits into train/eval sets · Produces TFRecord artifacts

Stage 2a · StatisticsGen

Statistics

Computes dataset statistics on the training split · Output feeds SchemaGen

Stage 2b · SchemaGen

Schema

Infers feature schema from training statistics · Output feeds ExampleValidator

Stage 2c · ExampleValidator

Validation

Validates examples against schema using both statistics and schema outputs

Stage 3 · Transform

Feature Engineering

TFT preprocessing graph · Consistent transforms for training + serving · Eliminates training-serving skew

Stage 4 · Tuner

Hyperparameter Tuning

Keras Tuner integration · Searches optimal hyperparameter configuration · Best trial passed to Trainer

Stage 5 · Trainer

Model Training

Regression model on insurance cost data · module.py defines training logic · Artifacts saved to pipeline store

Stage 6 · Model Resolver

Model Comparison

Fetches the latest blessed model from ML Metadata · Provides the production baseline to the Evaluator for comparison

Stage 7 · Evaluator

Model Evaluation

TFMA post-training evaluation · Configurable metrics · Evaluation artifacts logged for comparison

Stage 8 · Pusher

Model Deployment

Blessed models pushed to serving infrastructure · Deployment gated by Evaluator blessing

Airflow DAG — TFX_Pipeline_DAG.py

Single BashOperator task Scheduled execution & triggering Run logging & history Airflow UI at :8080 Dedicated TFX venv invocation Component order managed by LocalDagRunner

Stage 1

Data Ingestion

ExampleGen ingests the medical insurance cost CSV dataset and partitions it into training and evaluation splits, producing standardized TFRecord artifacts. This ensures all downstream components receive data in a consistent, versioned format — eliminating the ad hoc data loading that makes notebook-based workflows non-reproducible across runs.

Stage 2

Schema Validation

Three components run in sequence — StatisticsGen computes descriptive statistics across all features, SchemaGen infers a formal data schema from the training set, and ExampleValidator checks every example against that schema to detect anomalies such as missing values, unexpected types, or out-of-range values. Data quality issues are caught before they reach training rather than after they silently degrade model performance.

Stage 3

Feature Engineering

The Transform component applies TF Transform to preprocess features using a TensorFlow graph that is saved alongside the model. The same preprocessing graph is applied at both training and serving time — eliminating training-serving skew entirely. Feature engineering logic is defined once and guaranteed to be identical in every context where the model is used.

Stage 4

Hyperparameter Tuning

The Tuner component integrates Keras Tuner to search across model hyperparameters using the transformed features. The best trial is automatically passed to the Trainer — removing manual tuning from the pipeline and ensuring every run uses an optimized configuration without human intervention.

Stage 5

Model Training

The Trainer component runs the regression model defined in module.py using the tuned hyperparameters and transformed features. Training artifacts — model weights, saved model, and run metadata — are written to the pipeline artifact store, versioned and accessible to the Evaluator. The training logic is fully decoupled from the orchestration layer, making it independently testable and replaceable.

Stage 6

Model Comparison

The Model Resolver queries ML Metadata for the latest blessed model and surfaces it as the production baseline — it does not perform any comparison or make any promotion decision. The actual metric comparison is performed by the Evaluator using TFMA, and promotion to the serving directory is executed by the Pusher only when the Evaluator issues a blessing. Without a previously blessed model, the Evaluator skips the baseline comparison and evaluates the candidate model in isolation.

Stage 7

Model Evaluation

The Evaluator uses TensorFlow Model Analysis (TFMA) to compute post-training metrics against the held-out evaluation split. Evaluation results are stored as versioned artifacts alongside the model — enabling metric comparison across pipeline runs and providing the audit trail needed to justify model promotion and deployment decisions.

Stage 8

Model Deployment

The Pusher component deploys blessed models to the serving infrastructure — deployment is gated by the Evaluator blessing, ensuring that only models that pass all evaluation thresholds are promoted to production. This closes the pipeline from training to serving in a fully automated, auditable sequence with no manual deployment steps.

Orchestration

Apache Airflow DAG

TFX_Pipeline_DAG.py defines an Airflow DAG containing a single BashOperator task that activates the TFX virtual environment and runs the full pipeline via TFX's LocalDagRunner. Airflow handles scheduling, triggering, and run logging through the web UI at port 8080 — the internal component execution order (ExampleGen → Pusher) is managed entirely by TFX's LocalDagRunner, not by Airflow task dependencies. This architecture was necessary because Airflow and TFX have hard dependency conflicts that prevent them from sharing a Python environment.

Key Design Decisions

Airflow DAG orchestration replaces manual pipeline execution

Without an orchestration layer, the pipeline must be triggered manually by running pipeline_run.py directly. By wrapping the TFX pipeline in an Airflow DAG, execution becomes schedulable, triggerable via the UI or CLI, and fully logged with run history. The component execution order within the pipeline (ExampleGen through Pusher) is enforced by TFX's LocalDagRunner — Airflow's role is to trigger, schedule, and monitor the overall pipeline run as a single atomic task.

Dedicated TFX virtual environment resolves Airflow compatibility

Airflow and TFX have conflicting dependency requirements that prevent them from coexisting in the same Python environment. Rather than downgrading either tool, the pipeline compiles TFX separately and invokes it from a dedicated virtual environment within the Airflow DAG task. This isolation pattern — running TFX in its own environment and calling it from Airflow — is a practical, production-relevant solution to a real dependency management problem that practitioners encounter when building MLOps systems with heterogeneous toolchains.

TFT Transform graph eliminates training-serving skew by design

Training-serving skew — where preprocessing at training time differs from preprocessing at inference time — is one of the most common and hardest-to-debug production ML failures. TF Transform generates a preprocessing TensorFlow graph that is saved alongside the model and applied identically in both contexts. The skew is eliminated architecturally rather than by discipline, making it impossible for the two preprocessing paths to diverge across pipeline runs.

Tech Stack

Technology Purpose
TensorFlow Extended (TFX) Standardized ML pipeline framework — ExampleGen through Pusher
Apache Airflow DAG-based workflow orchestration, task dependency management, and scheduling
TF Data Validation (TFDV) Dataset statistics, schema inference, and anomaly detection
TF Transform (TFT) Consistent preprocessing graph for training and serving
TF Model Analysis (TFMA) Post-training model evaluation with versioned metric artifacts
Keras Tuner Automated hyperparameter search within TFX Tuner component
pandas Data manipulation and feature engineering for the insurance cost dataset
Python Core language and pipeline orchestration

Results & Metrics

What the system delivers

DAG

Airflow-Orchestrated Pipeline

Task dependencies declared and enforced automatically — full run history and logging via Airflow UI

TFX

Standardized ML Framework

TFDV, TFT, and TFMA integrated — reproducible, auditable pipeline from ingestion to evaluation

5 Stages

Fully Automated Workflow

Ingestion through evaluation — every stage automated, logged, and reproducible across runs

🔁

Fully reproducible pipeline execution across every run

Every pipeline run produces identical results from the same inputs — data ingestion, schema validation, feature transformation, training, and evaluation all execute in a fixed, dependency-enforced order. TFX artifacts are versioned at every stage, making any run fully reproducible and any result traceable back to the exact data and code that produced it.

Data quality issues caught before training — not after

StatisticsGen, SchemaGen, and ExampleValidator form a three-component data quality gate that runs before any training begins. Schema violations, anomalous values, and missing features are detected and flagged at ingestion time — preventing silent data quality degradation from propagating into model weights and corrupting downstream evaluation metrics.

Single DAG trigger runs the entire pipeline end-to-end

Triggering the pipeline with airflow dags trigger tfx_pipeline_dag executes all five stages in the correct order without any manual intervention. Airflow manages the execution graph, retries failed tasks, and surfaces the full run log in the UI — replacing a sequence of manual script executions with a single, auditable, repeatable workflow trigger.

🔧

Environment isolation solves real-world toolchain compatibility

The Airflow-TFX compatibility conflict is a genuine engineering challenge that practitioners encounter when building MLOps systems with heterogeneous toolchains. The solution — compiling TFX in a dedicated virtual environment and invoking it from within the Airflow DAG — demonstrates practical production ML engineering that goes beyond academic pipeline implementations.

📊

Improved reproducibility and scalability over notebook workflows

The pipeline successfully automated all five stages from data ingestion through evaluation — demonstrating a measurable improvement in reproducibility and scalability compared to manual notebook-based workflows. Every run produces versioned artifacts at each stage, evaluation metrics are stored and comparable across runs, and the full execution history is available through the Airflow UI for audit and debugging.