Applied ML Systems · Climate Resilience & Infrastructure

Alberta Wildfire Prediction System

A production-grade ConvLSTM deep learning system that predicts wildfire ignition likelihood across the Province of Alberta from geospatial and meteorological time-series data — deployed as a full TFX pipeline on GCP Vertex AI with 91% accuracy and AUC of 90%.

Architecture CNN+LSTM · Deep Learning
Tech Stack
TensorFlow TFX GCP Vertex AI Keras GIS Python

91%

Model Accuracy

AUC 90%

Predictive Power

Vertex AI

Production Deployment on GCP

The Problem

Wildfire ignition is spatially complex, temporally dynamic, and increasingly unpredictable under climate change

Alberta is one of Canada's most wildfire-prone provinces, with fire seasons growing longer, more intense, and more geographically widespread as climate change accelerates. Predicting where and when ignitions are likely to occur requires modeling the interaction of dozens of variables — vegetation type and dryness, topography, wind speed and direction, temperature, humidity, and historical fire patterns — across a province-wide spatial domain that spans hundreds of thousands of square kilometers. Traditional rule-based fire risk indices cannot capture the nonlinear spatial and temporal dynamics that drive ignition likelihood, and conventional ML approaches treat each location independently, missing the spatial propagation patterns that are critical for accurate province-wide prediction. The result is a chronic gap between what fire management agencies need — precise, forward-looking ignition risk maps — and what existing tools can reliably deliver.

The Solution

A ConvLSTM system that models wildfire ignition as a spatiotemporal prediction problem — deployed as a production TFX pipeline on GCP Vertex AI

The Alberta Wildfire Prediction System frames ignition likelihood as a spatiotemporal sequence prediction problem, addressed using a ConvLSTM architecture that simultaneously captures spatial patterns across the landscape and temporal dynamics across time. The model ingests geospatial and meteorological time-series inputs — including weather variables, vegetation indices, topographic features, and historical fire data — and produces province-wide ignition likelihood predictions across Alberta. Trained and validated on historical wildfire records, the system achieved 91% accuracy, an AUC of 90%, and a recall of 87% — metrics that reflect both predictive precision and the system's ability to identify true ignition events. The full system was productionized as a TFX pipeline on GCP Vertex AI Pipelines, following the same production architecture as the Calgary Flood Prediction System, with automated retraining, model evaluation gating, and a Vertex AI serving endpoint for real-time inference.

Key Outcome

A province-wide wildfire ignition prediction system that delivers spatially explicit risk maps across Alberta at 91% accuracy — built on a ConvLSTM architecture that captures both spatial landscape patterns and temporal weather dynamics, and deployed as a fully automated production pipeline on GCP Vertex AI.

Technical Deep Dive

Architecture & Design

ConvLSTM Architecture & TFX Pipeline

ConvLSTM Architecture — Spatiotemporal Ignition Prediction

Input Layer 1

Weather Variables

Temp, humidity, wind speed & direction, precipitation

Input Layer 2

Vegetation Indices

NDVI, fuel moisture, dryness indicators from GEE

Input Layer 3

Topographic Features

Elevation, slope, aspect — geospatial rasters

Input Layer 4

Historical Fire Data

Past ignition locations, fire perimeters, burn scars

Core Architecture · ConvLSTM

Spatiotemporal Encoder

Convolutional filters capture spatial ignition patterns across the Alberta landscape · LSTM gates model temporal weather dynamics across time steps · Combined in a single ConvLSTM unit

Output

Province-Wide Ignition Likelihood Map

Per-grid-cell ignition probability across Alberta · Accuracy 91% · AUC 90% · Recall 87%

TFX Pipeline — GCP Vertex AI Pipelines

Stage 1 · ExampleGen

Data Ingestion

Ingests geospatial + meteorological time-series from GCS · Train/eval splits

StatisticsGen

Statistics

Dataset stats & drift detection

SchemaGen

Schema

Schema inference from training data

ExampleValidator

Validation

Anomaly detection against schema

Stage 3 · Transform

Feature Engineering

Spatial normalization · Temporal sequence construction · Consistent train/serve preprocessing

Stage 4 · Tuner

Hyperparameter Tuning

Keras Tuner · Searches ConvLSTM filters, hidden units, learning rate, dropout · Best trial passed to Trainer

Stage 5 · Trainer

CNN+LSTM Model Training

Spatiotemporal ignition model · Accuracy 91% · AUC 90% · Recall 87%

Stage 6 · Evaluator

Model Evaluation

TFMA · Accuracy, AUC, Recall thresholds · Blessing gate blocks underperforming models

Stage 6 · Resolver

Model Comparison

Champion vs. challenger · Auto-promotes on improvement

Stage 7 · Pusher

Model Serving — Vertex AI Endpoint

Blessed models pushed to Vertex AI · Real-time ignition likelihood inference endpoint

MLOps & CI/CD Layer

Pipeline Orchestration

Vertex AI Pipelines

Kubeflow DAG execution · Automated runs on new data · Full lineage tracking

Continuous Training

Automated Retraining

Triggered on new fire season data · Drift detection gates retraining

Model Registry

Vertex AI Model Registry

Versioned artifacts · Champion tracking · Rollback support

Metadata & Lineage

ML Metadata Store

Full artifact and execution logging · Reproducibility across runs

GCP Infrastructure

Cloud Storage (GCS) Vertex AI Pipelines Vertex AI Training Vertex AI Serving

Model Inputs

Geospatial & Meteorological Features

The model ingests four input streams — meteorological variables (temperature, humidity, wind), vegetation indices including NDVI and fuel moisture sourced via Google Earth Engine, topographic rasters (elevation, slope, aspect), and historical fire records (ignition locations, perimeters, burn scars). All streams are spatially aligned to a province-wide grid covering Alberta before being passed to the ConvLSTM encoder.

Core Architecture

ConvLSTM Spatiotemporal Encoder

The ConvLSTM architecture natively combines spatial and temporal modeling in a single unit — convolutional filters capture spatial ignition patterns across the Alberta landscape while LSTM gates model how weather and fuel conditions evolve over time. This unified encoding eliminates the need for a separate synchronization layer, as the architecture inherently handles spatiotemporal co-dependence across all input streams simultaneously.

Data Ingestion & Validation

ExampleGen + Schema Validation

ExampleGen ingests geospatial and meteorological time-series from GCS and partitions data into training and evaluation splits. StatisticsGen, SchemaGen, and ExampleValidator then compute statistics, infer the data schema, and detect anomalies — ensuring that seasonal variation in weather data and fire records does not introduce drift that silently degrades model performance.

Feature Engineering

Transform — Spatial & Temporal Preprocessing

The Transform component applies spatial normalization across the province-wide grid and constructs temporal sequences from the multi-stream input data. The same transformation graph is used at both training and serving time, eliminating training-serving skew and ensuring the model receives identically preprocessed inputs whether it is training on historical fire seasons or serving real-time ignition predictions.

Tuning & Training

Tuner + Trainer — Optimized ConvLSTM

The Tuner component uses Keras Tuner to search across ConvLSTM filter counts, hidden units, learning rate, and dropout — passing the best trial directly to the Trainer. The Trainer then trains the full ConvLSTM model using the tuned configuration and transformed features, producing a model that achieved 91% accuracy, AUC of 90%, and recall of 87% on held-out Alberta fire season data.

Evaluation & Serving

Evaluator + Pusher — Gated Deployment

The Evaluator enforces per-metric thresholds across accuracy, AUC, and recall using TFMA. Only models that pass all three gates receive a blessing. The Resolver compares the blessed model against the current production champion, and the Pusher deploys only genuinely improved models to the Vertex AI serving endpoint — preventing silent degradation across fire seasons.

Key Design Decisions

ConvLSTM natively models spatial propagation — not just point predictions

Most wildfire risk models treat each grid cell independently, missing the spatial propagation dynamics that drive ignition clustering across landscapes. ConvLSTM processes the entire spatial domain simultaneously at each time step — capturing how weather patterns, fuel conditions, and historical fire presence interact across neighboring cells. This is the core architectural choice that enables province-wide prediction rather than point-level risk scoring.

Recall optimized alongside accuracy to minimize missed ignitions

In wildfire prediction, a missed ignition (false negative) carries far greater operational cost than a false alarm. The evaluation gate enforces a recall threshold of 87% alongside accuracy and AUC — ensuring the system is tuned to identify true ignition events even under class imbalance, where non-ignition locations vastly outnumber actual fire events in the training data.

Google Earth Engine enables scalable province-wide vegetation data

Sourcing vegetation indices like NDVI and fuel moisture for a province the size of Alberta at the resolution required for meaningful spatial modeling would be prohibitive through conventional data pipelines. Google Earth Engine's planetary-scale geospatial compute allows these layers to be extracted, processed, and aligned to the model grid without local storage or processing constraints — a critical infrastructure decision for making province-wide prediction tractable.

Tech Stack

Technology Purpose
TensorFlow / Keras ConvLSTM model architecture, training, and evaluation
TensorFlow Extended (TFX) End-to-end ML pipeline — ExampleGen, Transform, Tuner, Trainer, Evaluator, Pusher
GCP Vertex AI Pipelines Pipeline orchestration, automated retraining, and DAG execution
Vertex AI Training & Serving Scalable model training and real-time inference endpoint
Vertex AI Model Registry Versioned model artifacts, champion-challenger tracking, rollback
TF Model Analysis (TFMA) Multi-metric model evaluation with blessing gate
Keras Tuner Automated hyperparameter search within TFX Tuner component
Google Earth Engine Province-wide vegetation indices and fuel moisture extraction
Google Cloud Storage Raw data storage and TFX artifact repository
Python Core language and pipeline orchestration

Results & Metrics

What the system delivers

91%

Model Accuracy

Validated on held-out Alberta fire season data — province-wide ignition likelihood prediction

90%

AUC Score

Area under the ROC curve — strong discriminative power between ignition and non-ignition zones

Vertex AI

Production Deployment

Full TFX pipeline on GCP — automated retraining, model registry, and real-time serving endpoint

🔥

Province-wide ignition likelihood maps at 91% accuracy

The system produces per-grid-cell ignition probability maps spanning the entire Province of Alberta — not point predictions or regional aggregates. At 91% accuracy and AUC of 90%, the maps provide operationally reliable spatial risk intelligence for fire management agencies planning pre-season resource deployment and real-time response.

🎯

87% recall — minimizing missed ignition events

In wildfire prediction, failing to identify a true ignition event carries far greater operational cost than a false alarm. The model was optimized and evaluated with an explicit recall threshold of 87% — ensuring the system identifies the vast majority of actual ignition events even under the severe class imbalance that characterizes provincial fire datasets.

⚙️

Fully automated production pipeline on GCP Vertex AI

The TFX pipeline runs end-to-end without manual intervention — ingesting new fire season data, validating schema, engineering features, tuning hyperparameters, training, evaluating against all three metric gates, and deploying only genuinely improved models to the serving endpoint. New data triggers a full retraining run automatically.

🛰️

Scalable geospatial feature pipeline via Google Earth Engine

Province-wide vegetation indices, fuel moisture estimates, and land cover data are sourced and preprocessed at scale via Google Earth Engine — enabling the model to ingest high-resolution geospatial features across Alberta without local storage or processing constraints. This infrastructure decision makes provincial-scale real-time inference tractable.

📄

Peer-reviewed publication — Canadian Journal of Forest Research

The system and its methodology are the subject of a submitted paper — "Development and Implementation of Wildfire Prediction System: Application for the Province of Alberta" — under review at the Canadian Journal of Forest Research, establishing the academic validity of the approach alongside its operational deployment.