Applied ML Systems · Infrastructure Risk & Project Controls

Infrastructure Risk Intelligence System

A two-phase ML and optimization system that predicts how cascading risk interactions degrade infrastructure project KPIs — then generates optimized schedules and risk registers using NSGA-II, with GA-tuned models achieving R² of 0.97.

Architecture Graph-Based ML · Multi-Objective Optimization
Tech Stack
ANNs Random Forest Decision Tree Genetic Algorithm NSGA-II Graph Theory Python

R² 0.97

Model Accuracy — Train & Test

GA-Tuned

Genetic Algorithm Hyperparameter Optimization

NSGA-II

Multi-Objective Schedule & Risk Optimization

The Problem

Infrastructure projects fail not because of individual risks — but because risks interact, cascade, and compound in ways that conventional tools cannot model

Large infrastructure projects — construction, rail, utilities, civil works — consistently exceed budget and schedule not because any single risk materializes unexpectedly, but because risks interact. One delay triggers a resource conflict, which amplifies a procurement risk, which cascades into a schedule deviation that compounds cost overruns across interdependent teams and activities. Conventional risk management tools treat risks as independent line items in a register — they quantify individual probabilities and impacts but cannot model how risks interact, how disruptions propagate through complex project networks, or how a project manager should optimally respond once cascading effects are underway. The result is a systematic blind spot at the heart of infrastructure project controls: the system-level risk behavior that actually drives project failures is invisible to the tools designed to prevent them.

The Solution

A two-phase predict-then-optimize platform that models cascading risk behavior and generates optimized project controls

The Infrastructure Risk Intelligence System addresses this gap through a two-phase pipeline. In Phase 1, project risk registers and schedules are converted into directed network representations — a risk interactions network capturing how risks trigger and amplify each other, and a team/equipment interdependence network derived from task dependencies and resource proximity. Network centrality features are engineered from these graphs and combined with project-level attributes to train three ML regression models — Decision Tree, Random Forest, and ANN — each tuned using a Genetic Algorithm rather than conventional grid search. The best-performing model (ANN, R² 0.97) predicts how combined risk interactions and systemic risks degrade project KPIs. In Phase 2, this predictive model is integrated with NSGA-II multi-objective optimization to generate optimized schedules and risk registers — actionable project controls that improve KPIs while respecting real operational constraints. The result is a platform that transforms passive risk data into prescriptive project decisions.

Key Outcome

An intelligent project risk platform that moves infrastructure teams from reactive risk logging to proactive risk optimization — predicting how cascading disruptions degrade project performance with R² of 0.97, and generating GA-tuned, NSGA-II-optimized schedules and risk registers that give project managers actionable controls before cost and schedule problems escalate.

Technical Deep Dive

Architecture & Design

Two-Phase Pipeline

Project Data Inputs

Input 1

Risk Register

Risk identities, magnitudes, interaction logic · 25 risks (demo project)

Input 2

Project Schedule

Activities, durations, task dependencies · 32 activities, 23-month horizon

Phase 1 Network Feature Engineering + ML Prediction

Network 1 · Graph Theory

Risk Interactions Network

Directed graph of risk-to-risk triggers · Centrality features: total, betweenness, closeness

Network 2 · Graph Theory

Team/Equipment Interdependence

Resource proximity + task dependency graph · Centrality features: total, betweenness, closeness

Feature Engineering

Network + Project Feature Matrix

Risk centralities · Team centralities · Project duration · No. risks · Avg. risk magnitude · No. teams · No. equipment · Standard scaling · Correlation filtering · Permutation importance

Model A

Decision Tree

GA-tuned hyperparameters

Model B

Random Forest

GA-tuned hyperparameters

Model C · Best

ANN ✓ Selected

GA-tuned · R² 0.97 train & test

Genetic Algorithm Tuning — Outperforms grid search in efficiency · K-fold + nested cross-validation for generalization

Prediction Output

Predicted KPI Degradation

Cost deviation, schedule deviation, and other KPIs under combined risk interactions + systemic risk effects

Phase 2 Multi-Objective Optimization — Prescriptive Controls

NSGA-II · Multi-Objective Optimization

Non-Dominated Sorting Genetic Algorithm II

Integrates ANN predictions · Optimizes project KPIs under practical constraints · Generates Pareto-optimal solutions

Output 1

Optimized Schedule

Reordered activities that reduce systemic risk exposure while meeting operational constraints

Output 2

Optimized Risk Register

Revised risk response plan that improves KPIs — cost, schedule, quality, safety, productivity

Data Inputs

Risk Register & Project Schedule

The system ingests two structured project documents — the risk register, which captures risk identities, magnitudes, and interaction logic, and the project schedule, which defines activities, durations, task dependencies, and resource assignments. These two inputs are the foundation for constructing the directed network representations that drive the feature engineering layer.

Phase 1 · Step 1

Network Construction

The risk register is converted into a directed Risk Interactions Network (RIN), where nodes represent risks and edges capture how one risk can trigger or amplify another. The project schedule is converted into a Team/Equipment Interdependence Network, derived from task dependencies and the spatio-temporal proximity of resources. Both networks are the structural basis for capturing how local risk events propagate into systemic project disruptions.

Phase 1 · Step 2

Network Feature Engineering

Centrality measures — total, betweenness, and closeness — are computed for both networks, quantifying each risk's and each team's structural influence on the system. These network features are combined with project-level attributes (duration, number of risks, average risk magnitude, number of teams and equipment) to form the ML feature matrix. Standard scaling, correlation filtering, and permutation importance analysis ensure feature quality before model training.

Phase 1 · Step 3

GA-Tuned Model Training & Selection

Three regression models — Decision Tree, Random Forest, and ANN — are trained and compared. Rather than conventional grid search, hyperparameters for all three models are tuned using a Genetic Algorithm, which converges on optimal configurations more efficiently and with lower computational overhead. K-fold and nested cross-validation confirm generalization performance. The ANN, achieving R² of 0.97 on both training and test sets, is selected as the prediction engine for Phase 2.

Phase 2

NSGA-II Multi-Objective Optimization

The selected ANN is integrated with NSGA-II — the Non-Dominated Sorting Genetic Algorithm II — to optimize project KPIs under real operational constraints. NSGA-II explores the solution space for Pareto-optimal trade-offs between competing objectives, producing optimized project schedules and risk registers that improve cost deviation, schedule deviation, quality, safety, and productivity simultaneously. This is what converts prediction into prescriptive action.

Key Design Decisions

Genetic Algorithm tuning outperforms grid search on real-world ML problems

Grid search exhaustively evaluates every hyperparameter combination — computationally expensive and impractical for models with large search spaces. The Genetic Algorithm treats hyperparameter optimization as an evolutionary search problem, iteratively selecting and recombining the best-performing configurations across generations. This produces competitive or superior results in a fraction of the compute time, and is a more realistic approach for production ML systems where retraining budgets are constrained.

Network features capture system-level risk behavior that tabular features cannot

Standard project risk models use tabular features — probability, impact, cost, duration. These capture individual risk attributes but miss the structural behavior of risk networks: which risks are most connected, which teams are most critical to the project's risk propagation path, and where cascading disruptions are likely to originate. Graph centrality features encode this structural information, giving the ML models inputs that reflect how risk actually behaves in complex project environments.

Predict-then-optimize separates concerns and enables scalable decision support

Combining prediction and optimization into a single model would produce a system that is difficult to update, validate, or explain. By separating Phase 1 (ML prediction of KPI degradation) from Phase 2 (NSGA-II optimization of project controls), each component can be independently validated, retrained, or replaced. The ML model can be updated as new project data arrives without rebuilding the optimization layer — and the optimization can be run with different objective weights without retraining the prediction model.

Tech Stack

Technology Purpose
Artificial Neural Network (ANN) Best-performing regression model for KPI degradation prediction — R² 0.97
Random Forest Comparative regression model — GA-tuned and evaluated against ANN
Decision Tree Interpretable baseline model — GA-tuned for fair comparison
Genetic Algorithm (GA) Hyperparameter optimization for all three ML models — outperforms grid search
NSGA-II Multi-objective optimization of project schedules and risk registers
Graph Theory / NetworkX Construction of risk interaction and team interdependence directed networks
Centrality Analysis Total, betweenness, and closeness centrality feature engineering from both networks
K-Fold & Nested Cross-Validation Robust model generalization evaluation across all three ML models
Permutation Importance Feature selection and importance ranking for network and project features
Python Core language and system implementation

Results & Metrics

What the system delivers

R² 0.97

Model Accuracy

ANN achieves R² of 0.97 on both training and test sets — confirmed by nested cross-validation at 0.969

GA-Tuned

Hyperparameter Optimization

Genetic Algorithm tuning across all three models — outperforms grid search in efficiency and computational cost

NSGA-II

Multi-Objective Optimization

Pareto-optimal schedules and risk registers generated from ANN predictions — turning forecasts into actionable project controls

📈

R² 0.97 — consistent across train, test, and nested cross-validation

The ANN model achieved R² of 0.97 on both the training and test sets, with nested cross-validation confirming a mean R² of 0.969 — indicating that the model generalizes well beyond the training data and is not overfitting to the project scenarios used during development. This consistency across evaluation methods validates the model's reliability for real project risk prediction.

🧬

GA tuning delivers competitive accuracy at lower computational cost than grid search

All three models — Decision Tree, Random Forest, and ANN — were tuned using a Genetic Algorithm rather than exhaustive grid search. The GA converged on optimal hyperparameter configurations more efficiently by treating the search as an evolutionary problem, iteratively selecting and recombining top-performing configurations. This is a practically significant result for organizations with constrained retraining budgets or large hyperparameter search spaces.

🏗️

NSGA-II converts predictions into Pareto-optimal project controls

Phase 2 integrates the ANN with NSGA-II to generate optimized project schedules and risk registers. Rather than producing a single solution, NSGA-II explores the Pareto frontier — surfaces of trade-offs between competing KPI objectives — giving project managers a menu of optimized options rather than a single prescribed answer. Each solution improves cost deviation, schedule deviation, quality, safety, and productivity simultaneously while respecting real operational constraints.

🕸️

Network features expose cascading risk behavior invisible to tabular models

By encoding risk registers and project schedules as directed networks and computing centrality measures, the feature engineering layer gives the ML models structural information that standard tabular approaches cannot capture — which risks are most connected, which teams are most critical to cascading propagation, and where systemic disruptions are most likely to originate. This graph-theoretic foundation is what enables the system to model project risk as a system-level phenomenon rather than a collection of independent events.

🎯

Prescriptive analytics — from passive risk logging to actionable project controls

The system moves infrastructure project teams beyond descriptive risk registers and reactive cost tracking toward prescriptive decision support. Project managers receive not just a prediction of how risks will degrade performance, but optimized schedules and risk response plans they can act on — before cost and schedule problems escalate. This predict-then-optimize architecture is what distinguishes the platform from a standalone ML model and makes it an operational project controls tool.