Infrastructure Risk Intelligence System
A two-phase ML and optimization system that predicts how cascading risk interactions degrade infrastructure project KPIs — then generates optimized schedules and risk registers using NSGA-II, with GA-tuned models achieving R² of 0.97.
R² 0.97
Model Accuracy — Train & Test
GA-Tuned
Genetic Algorithm Hyperparameter Optimization
NSGA-II
Multi-Objective Schedule & Risk Optimization
The Problem
Infrastructure projects fail not because of individual risks — but because risks interact, cascade, and compound in ways that conventional tools cannot model
Large infrastructure projects — construction, rail, utilities, civil works — consistently exceed budget and schedule not because any single risk materializes unexpectedly, but because risks interact. One delay triggers a resource conflict, which amplifies a procurement risk, which cascades into a schedule deviation that compounds cost overruns across interdependent teams and activities. Conventional risk management tools treat risks as independent line items in a register — they quantify individual probabilities and impacts but cannot model how risks interact, how disruptions propagate through complex project networks, or how a project manager should optimally respond once cascading effects are underway. The result is a systematic blind spot at the heart of infrastructure project controls: the system-level risk behavior that actually drives project failures is invisible to the tools designed to prevent them.
The Solution
A two-phase predict-then-optimize platform that models cascading risk behavior and generates optimized project controls
The Infrastructure Risk Intelligence System addresses this gap through a two-phase pipeline. In Phase 1, project risk registers and schedules are converted into directed network representations — a risk interactions network capturing how risks trigger and amplify each other, and a team/equipment interdependence network derived from task dependencies and resource proximity. Network centrality features are engineered from these graphs and combined with project-level attributes to train three ML regression models — Decision Tree, Random Forest, and ANN — each tuned using a Genetic Algorithm rather than conventional grid search. The best-performing model (ANN, R² 0.97) predicts how combined risk interactions and systemic risks degrade project KPIs. In Phase 2, this predictive model is integrated with NSGA-II multi-objective optimization to generate optimized schedules and risk registers — actionable project controls that improve KPIs while respecting real operational constraints. The result is a platform that transforms passive risk data into prescriptive project decisions.
Key Outcome
An intelligent project risk platform that moves infrastructure teams from reactive risk logging to proactive risk optimization — predicting how cascading disruptions degrade project performance with R² of 0.97, and generating GA-tuned, NSGA-II-optimized schedules and risk registers that give project managers actionable controls before cost and schedule problems escalate.
Technical Deep Dive
Architecture & Design
Two-Phase Pipeline
Project Data Inputs
Input 1
Risk Register
Risk identities, magnitudes, interaction logic · 25 risks (demo project)
Input 2
Project Schedule
Activities, durations, task dependencies · 32 activities, 23-month horizon
Network 1 · Graph Theory
Risk Interactions Network
Directed graph of risk-to-risk triggers · Centrality features: total, betweenness, closeness
Network 2 · Graph Theory
Team/Equipment Interdependence
Resource proximity + task dependency graph · Centrality features: total, betweenness, closeness
Feature Engineering
Network + Project Feature Matrix
Risk centralities · Team centralities · Project duration · No. risks · Avg. risk magnitude · No. teams · No. equipment · Standard scaling · Correlation filtering · Permutation importance
Model A
Decision Tree
GA-tuned hyperparameters
Model B
Random Forest
GA-tuned hyperparameters
Model C · Best
ANN ✓ Selected
GA-tuned · R² 0.97 train & test
Prediction Output
Predicted KPI Degradation
Cost deviation, schedule deviation, and other KPIs under combined risk interactions + systemic risk effects
NSGA-II · Multi-Objective Optimization
Non-Dominated Sorting Genetic Algorithm II
Integrates ANN predictions · Optimizes project KPIs under practical constraints · Generates Pareto-optimal solutions
Output 1
Optimized Schedule
Reordered activities that reduce systemic risk exposure while meeting operational constraints
Output 2
Optimized Risk Register
Revised risk response plan that improves KPIs — cost, schedule, quality, safety, productivity
Data Inputs
Risk Register & Project Schedule
The system ingests two structured project documents — the risk register, which captures risk identities, magnitudes, and interaction logic, and the project schedule, which defines activities, durations, task dependencies, and resource assignments. These two inputs are the foundation for constructing the directed network representations that drive the feature engineering layer.
Phase 1 · Step 1
Network Construction
The risk register is converted into a directed Risk Interactions Network (RIN), where nodes represent risks and edges capture how one risk can trigger or amplify another. The project schedule is converted into a Team/Equipment Interdependence Network, derived from task dependencies and the spatio-temporal proximity of resources. Both networks are the structural basis for capturing how local risk events propagate into systemic project disruptions.
Phase 1 · Step 2
Network Feature Engineering
Centrality measures — total, betweenness, and closeness — are computed for both networks, quantifying each risk's and each team's structural influence on the system. These network features are combined with project-level attributes (duration, number of risks, average risk magnitude, number of teams and equipment) to form the ML feature matrix. Standard scaling, correlation filtering, and permutation importance analysis ensure feature quality before model training.
Phase 1 · Step 3
GA-Tuned Model Training & Selection
Three regression models — Decision Tree, Random Forest, and ANN — are trained and compared. Rather than conventional grid search, hyperparameters for all three models are tuned using a Genetic Algorithm, which converges on optimal configurations more efficiently and with lower computational overhead. K-fold and nested cross-validation confirm generalization performance. The ANN, achieving R² of 0.97 on both training and test sets, is selected as the prediction engine for Phase 2.
Phase 2
NSGA-II Multi-Objective Optimization
The selected ANN is integrated with NSGA-II — the Non-Dominated Sorting Genetic Algorithm II — to optimize project KPIs under real operational constraints. NSGA-II explores the solution space for Pareto-optimal trade-offs between competing objectives, producing optimized project schedules and risk registers that improve cost deviation, schedule deviation, quality, safety, and productivity simultaneously. This is what converts prediction into prescriptive action.
Key Design Decisions
Genetic Algorithm tuning outperforms grid search on real-world ML problems
Grid search exhaustively evaluates every hyperparameter combination — computationally expensive and impractical for models with large search spaces. The Genetic Algorithm treats hyperparameter optimization as an evolutionary search problem, iteratively selecting and recombining the best-performing configurations across generations. This produces competitive or superior results in a fraction of the compute time, and is a more realistic approach for production ML systems where retraining budgets are constrained.
Network features capture system-level risk behavior that tabular features cannot
Standard project risk models use tabular features — probability, impact, cost, duration. These capture individual risk attributes but miss the structural behavior of risk networks: which risks are most connected, which teams are most critical to the project's risk propagation path, and where cascading disruptions are likely to originate. Graph centrality features encode this structural information, giving the ML models inputs that reflect how risk actually behaves in complex project environments.
Predict-then-optimize separates concerns and enables scalable decision support
Combining prediction and optimization into a single model would produce a system that is difficult to update, validate, or explain. By separating Phase 1 (ML prediction of KPI degradation) from Phase 2 (NSGA-II optimization of project controls), each component can be independently validated, retrained, or replaced. The ML model can be updated as new project data arrives without rebuilding the optimization layer — and the optimization can be run with different objective weights without retraining the prediction model.
Tech Stack
| Technology | Purpose |
|---|---|
| Artificial Neural Network (ANN) | Best-performing regression model for KPI degradation prediction — R² 0.97 |
| Random Forest | Comparative regression model — GA-tuned and evaluated against ANN |
| Decision Tree | Interpretable baseline model — GA-tuned for fair comparison |
| Genetic Algorithm (GA) | Hyperparameter optimization for all three ML models — outperforms grid search |
| NSGA-II | Multi-objective optimization of project schedules and risk registers |
| Graph Theory / NetworkX | Construction of risk interaction and team interdependence directed networks |
| Centrality Analysis | Total, betweenness, and closeness centrality feature engineering from both networks |
| K-Fold & Nested Cross-Validation | Robust model generalization evaluation across all three ML models |
| Permutation Importance | Feature selection and importance ranking for network and project features |
| Python | Core language and system implementation |
Results & Metrics
What the system delivers
R² 0.97
Model Accuracy
ANN achieves R² of 0.97 on both training and test sets — confirmed by nested cross-validation at 0.969
GA-Tuned
Hyperparameter Optimization
Genetic Algorithm tuning across all three models — outperforms grid search in efficiency and computational cost
NSGA-II
Multi-Objective Optimization
Pareto-optimal schedules and risk registers generated from ANN predictions — turning forecasts into actionable project controls
R² 0.97 — consistent across train, test, and nested cross-validation
The ANN model achieved R² of 0.97 on both the training and test sets, with nested cross-validation confirming a mean R² of 0.969 — indicating that the model generalizes well beyond the training data and is not overfitting to the project scenarios used during development. This consistency across evaluation methods validates the model's reliability for real project risk prediction.
GA tuning delivers competitive accuracy at lower computational cost than grid search
All three models — Decision Tree, Random Forest, and ANN — were tuned using a Genetic Algorithm rather than exhaustive grid search. The GA converged on optimal hyperparameter configurations more efficiently by treating the search as an evolutionary problem, iteratively selecting and recombining top-performing configurations. This is a practically significant result for organizations with constrained retraining budgets or large hyperparameter search spaces.
NSGA-II converts predictions into Pareto-optimal project controls
Phase 2 integrates the ANN with NSGA-II to generate optimized project schedules and risk registers. Rather than producing a single solution, NSGA-II explores the Pareto frontier — surfaces of trade-offs between competing KPI objectives — giving project managers a menu of optimized options rather than a single prescribed answer. Each solution improves cost deviation, schedule deviation, quality, safety, and productivity simultaneously while respecting real operational constraints.
Network features expose cascading risk behavior invisible to tabular models
By encoding risk registers and project schedules as directed networks and computing centrality measures, the feature engineering layer gives the ML models structural information that standard tabular approaches cannot capture — which risks are most connected, which teams are most critical to cascading propagation, and where systemic disruptions are most likely to originate. This graph-theoretic foundation is what enables the system to model project risk as a system-level phenomenon rather than a collection of independent events.
Prescriptive analytics — from passive risk logging to actionable project controls
The system moves infrastructure project teams beyond descriptive risk registers and reactive cost tracking toward prescriptive decision support. Project managers receive not just a prediction of how risks will degrade performance, but optimized schedules and risk response plans they can act on — before cost and schedule problems escalate. This predict-then-optimize architecture is what distinguishes the platform from a standalone ML model and makes it an operational project controls tool.