Applied ML · Materials Science

Material Property Prediction

Benchmarking Bagging, Random Forests, and Boosting across 1,030 laboratory-tested concrete specimens — using tree ensemble methods to predict compressive strength, enabling faster and more accurate structural design decisions.

Methods Bagging · Random Forests · Boosting
Tech Stack
R randomForest gbm ipred
Source Code View on GitHub

1,030

Concrete Specimens · 8 Predictors

3

Tree Ensemble Methods Benchmarked

17.14

Best MSE — Boosting (λ=0.01, d=10)

The Problem

Determining concrete compressive strength through laboratory testing is time-intensive and costly — structural engineering needs a reliable predictive model that can estimate strength from mix ingredients and curing age alone

Concrete compressive strength — the capacity of a standard specimen to resist compressive stress before failure — is one of the most critical parameters in reinforced concrete structural design. It determines load-bearing capacity, safety factors, and material specifications across infrastructure, buildings, and civil works. Traditionally, compressive strength is determined by testing standard specimens in a laboratory at fixed curing ages, typically 28 days after pouring. This process requires physical samples, controlled curing conditions, and lab equipment — making it time-consuming, expensive, and incompatible with the need for rapid strength estimates during the early stages of structural design. Concrete strength depends on the proportions of seven mix ingredients — cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, and fine aggregate — as well as the age of the specimen at testing. These relationships are non-linear and interact in complex ways that simple parametric models cannot reliably capture across the full range of mix configurations encountered in practice.

The Solution

A three-method tree ensemble benchmark — Bagging, Random Forests, and Boosting — trained on 1,030 laboratory-tested specimens with iterative tree count optimization and variable importance analysis across all methods

The analysis applies three tree ensemble methods to 1,030 concrete specimens split 70/30 into training and testing sets. Each method is iteratively tuned to minimize MSE on the test set. Bagging is optimized at 900 trees, leveraging bootstrap aggregation to reduce variance from individual regression trees. Random Forests are evaluated across three variable-per-split configurations (3, 4, and 5 variables), with the optimal number of trees determined iteratively for each — identifying 5 variables per split and 150 trees as the best Random Forests configuration. Boosting is evaluated across two shrinkage rates (λ=0.01 and λ=0.10) and three interaction depths (d=1, 5, and 10), with 5,000 trees at low shrinkage and fewer at higher shrinkage. Variable importance is analyzed for all three methods — using MSE increase for Bagging and Random Forests, and relative influence for Boosting — providing a cross-method view of which concrete mix ingredients most determine compressive strength.

Key Outcome

Boosting substantially outperforms Bagging and Random Forests — achieving MSE of 17.14 at λ=0.01 and interaction depth d=10 versus 27.57 and 27.78 respectively — with Age and Cement consistently identified as the dominant predictors of compressive strength across all three methods, in alignment with established civil engineering theory and practice.

Technical Deep Dive

Methodology & Analysis

Analytical Workflow

Stage 1 — Data Setup

Dataset

1,030 Specimens · 8 Predictors

No missing data · No outliers · One moderate correlation: Water–Superplasticizer (−0.67)

Split

70% Train / 30% Test

721 training · 309 testing · MSE evaluated on held-out test set throughout

Target

Compressive Strength (MPa)

Range: 2.3–82.6 MPa · Mean: 35.82 MPa · Regression problem

Stage 2 — Three-Method Ensemble Training

Bagging

900 Trees · All 8 Vars/Split

Bootstrap aggregation · Trees grown to full depth · MSE 27.57

Random Forests

5 Vars/Split · 150 Trees

Tested 3, 4 & 5 vars/split · Decorrelated trees · MSE 27.78 at best config

Boosting

λ=0.01 · d=10 · 5,000 Trees

Tested λ=0.01 & 0.10 · d=1, 5, 10 · Sequential residual fitting · MSE 17.14 (best)

Stage 3 — Evaluation & Variable Importance

Performance

MSE Comparison — Boosting Wins

Boosting 17.14 · Bagging 27.57 · RF 27.78 · Boosting 38% lower MSE than Bagging

Variable Importance

Age & Cement Dominant Across All Methods

%IncMSE & IncNodePurity for Bagging/RF · Relative influence for Boosting · Coarse Aggregate & Fly Ash least important

Stage 1

Data Setup & Preparation

The dataset contains 1,030 concrete specimens from laboratory testing, with 8 predictors representing the weight of each mix ingredient per cubic meter of concrete (cement, blast furnace slag, fly ash, water, superplasticizer, coarse aggregate, fine aggregate) plus the age of the specimen at testing in days. The response variable is compressive strength in MPa. The dataset is clean — no missing values, no outliers — and the only notable correlation between predictors is between water and superplasticizer at −0.67, considered moderate and not severe enough to distort variable importance interpretation. The dataset is split 70/30 into training (721 specimens) and testing (309 specimens) sets, with the test set held out throughout model training and used exclusively for final MSE evaluation.

Stage 2

Three-Method Ensemble Training

Bagging is trained with 900 trees using all 8 predictors at each split — reducing variance through bootstrap aggregation but allowing the strongest predictor (Age) to dominate each split, correlating the trees. Random Forests decorrelate the trees by randomly selecting m predictors at each split; configurations of m=3, 4, and 5 are evaluated with iteratively tuned tree counts (450, 300, and 150 respectively), with m=5 and 150 trees yielding the best MSE of 27.78. Boosting grows trees sequentially on residuals, with each tree fitted to what the current ensemble gets wrong; two shrinkage rates (λ=0.01 and 0.10) and three interaction depths (d=1, 5, 10) are evaluated, with λ=0.01 and d=10 achieving MSE of 17.14 using 5,000 trees.

Stage 3

Evaluation & Variable Importance

Models are compared on test-set MSE. Variable importance is computed for Bagging and Random Forests using mean percentage increase in MSE when each variable is permuted (%IncMSE) and increase in node purity (IncNodePurity), and for Boosting using relative influence. Across all three methods, Age and Cement are consistently the two most important predictors — with Age substantially more important than Cement and Cement substantially more important than the third-ranked predictor. Superplasticizer and Water form a second tier. Coarse Aggregate and Fly Ash are consistently the least important variables. The variable importance rankings align with established civil engineering knowledge about the drivers of concrete compressive strength.

Key Methodological Choices

Random Forests over Bagging — decorrelation prevents strong-variable dominance

When all 8 predictors are available at every split — as in Bagging — Age dominates the topmost split in nearly every bootstrapped tree, making the trees highly correlated with each other. Averaging correlated predictions reduces variance only modestly. Random Forests address this by restricting each split to a random subset of m predictors, preventing Age from dominating every split and giving other informative predictors (Cement, Superplasticizer) a chance to drive tree structure. The result is a more diverse ensemble whose averaged predictions are more reliable — though on this dataset the MSE improvement over Bagging (27.78 vs 27.57) is modest, reflecting that Age's dominance is so strong that even partial decorrelation provides limited additional benefit.

Boosting's interaction depth d=10 — deep trees capture complex ingredient interactions

Interaction depth d controls the number of splits per tree in Boosting — with d=1 producing stumps (single splits, no interactions) and d=10 allowing trees that capture interactions up to ten-way. At d=1, MSE is 29.02 (λ=0.01) — comparable to Bagging. At d=10, MSE drops to 17.14, demonstrating that the relationship between concrete ingredients and compressive strength involves non-trivial interaction effects that shallow trees cannot capture. Concrete strength is determined by the combined proportions of ingredients — not by any single ingredient in isolation — so deep trees that model joint effects between cement, water, and age are essential for achieving the model's best performance.

Tree count optimized iteratively — not fixed a priori for any method

Rather than using a fixed tree count for all methods, the number of trees is determined iteratively for each model configuration by tracking test-set MSE as tree count increases. Adding trees always reduces training error but can increase test-set error in Boosting if too many trees overfit the residuals. Iterative tuning ensures each configuration is evaluated at its optimal tree count — 900 for Bagging, 150 for Random Forests at m=5, and 5,000 for Boosting at λ=0.01 — making MSE comparisons across methods fair. This approach also reveals that lower shrinkage (λ=0.01) requires more trees than higher shrinkage (λ=0.10) to reach a comparable MSE, a known trade-off in gradient boosting.

Tech Stack

Technology Purpose
R Statistical modeling environment and primary implementation language
randomForest (R) Bagging and Random Forests implementation with OOB error and variable importance (%IncMSE, IncNodePurity)
gbm (R) Gradient Boosting Machine — configurable shrinkage, interaction depth, and tree count with relative influence output
Bagging Bootstrap aggregation of full-depth regression trees — variance reduction baseline using all 8 predictors at each split
Random Forests Decorrelated bootstrap ensemble — random feature subset (m=3, 4, 5) at each split to reduce tree correlation
Gradient Boosting Sequential residual-fitting ensemble — shrinkage λ and interaction depth d tuned across 6 configurations

Results & Metrics

What the analysis reveals

17.14

Best MSE — Boosting

At λ=0.01, d=10, 5,000 trees — 38% lower MSE than the best Bagging configuration

38%

Boosting MSE Advantage

Boosting (17.14) versus Bagging (27.57) — the largest performance gap between any two methods

2

Dominant Predictors

Age and Cement — consistently the top two predictors across Bagging, Random Forests, and Boosting

🏆

Boosting substantially outperforms both averaging-based methods

Boosting at its optimal configuration (λ=0.01, d=10, 5,000 trees) achieves MSE of 17.14 — 38% lower than Bagging's 27.57 and 38% lower than Random Forests' best of 27.78. This gap is meaningful for structural design: an MSE reduction of 10 units on a target ranging from 2.3 to 82.6 MPa represents a substantial improvement in the precision of strength estimates. Boosting's advantage stems from its sequential residual-fitting mechanism — it specifically addresses the prediction errors that averaging-based methods leave unresolved, particularly in the complex interaction space between curing age, cement content, and water-to-binder ratio.

Age is the strongest predictor — curing time determines strength more than any ingredient

Age is the most important predictor across all three methods by a wide margin in Bagging and Random Forests — removing Age from the Bagging model increases MSE by over 200%, while removing it from Random Forests increases MSE by around 100%. The difference in sensitivity reflects Random Forests' decorrelation: by not always using Age at the topmost split, other predictors develop partial redundancy with Age's contribution. The result is consistent with concrete engineering theory — hydration of cement compounds continues for months after pouring, and curing time is the single largest driver of compressive strength development under standard conditions.

🧱

Cement is the second most important ingredient — primary binder drives strength

Cement ranks second in variable importance across all three methods. In Bagging and Random Forests, the gap between Cement and the third-ranked predictor (Blast Furnace Slag or Superplasticizer) is substantial — Cement is much more important than any other ingredient. In Boosting, Age and Cement have comparable relative influence, with neither dominating the other as starkly as in the averaging-based methods. This aligns with materials science: cement is the primary binding agent whose hydration products form the strength-bearing matrix of concrete, making its content the most direct mix-design lever for targeting a given compressive strength.

📉

Interaction depth d=10 critical for Boosting — concrete strength involves complex joint effects

At d=1 (stumps), Boosting achieves MSE of 29.02 (λ=0.01) — barely better than Bagging. At d=5, MSE falls to 18.53. At d=10, MSE reaches 17.14. The consistent improvement with interaction depth demonstrates that compressive strength is genuinely determined by the joint configuration of ingredients, not by their individual contributions in isolation. The water-to-cement ratio, the supplementary cementitious materials balance, and the aggregate-to-paste volume all involve interactions between multiple predictors — effects that deep trees are required to model accurately and that shallow trees systematically miss.

Variable importance rankings validate against civil engineering theory

The consistency of variable importance rankings across all three methods — and their alignment with established concrete engineering knowledge — provides a strong validation signal. Age and Cement are theoretically the primary strength drivers; Superplasticizer and Water influence the water-to-binder ratio and workability; Blast Furnace Slag and Fly Ash are supplementary cementitious materials with secondary strength contribution; Coarse and Fine Aggregate primarily affect volume and workability rather than strength directly. The models' rankings reproduce this theoretical hierarchy without domain-specific engineering constraints being imposed — confirming that the data-driven learning process is capturing genuine physical relationships rather than dataset artifacts.