Applied ML · Engineering & Materials

Fuel Economy Prediction

Comparative analysis of Lasso regression and Multivariate Adaptive Regression Splines in a multicollinear dataset where MARS's piecewise non-parametric approach substantially outperforms L1-penalized linear regression.

Methods Lasso Regression · MARS
Tech Stack
R glmnet earth 10-Fold Cross-Validation
Source Code View on GitHub

392

Vehicles · 7 Predictors · Target: MPG

7.35

Best MSE — MARS (deg=2, nk=20)

35%

MARS MSE Advantage Over Lasso

The Problem

Predicting vehicle fuel economy from engine and design characteristics is complicated by strong multicollinearity between predictors — and the question of whether variable interactions meaningfully influence predictions demands a method that can both handle collinearity and test interactions explicitly

Vehicle fuel economy (measured in miles per gallon) is determined by a cluster of interrelated design parameters: cylinder count, engine displacement, horsepower, weight, acceleration, model year, and country of origin. Many of these predictors are not independent — cylinders, displacement, horsepower, and weight are all highly intercorrelated (correlations above 0.75), creating a multicollinearity structure that inflates coefficient variance in standard least-squares regression and can produce misleading variable importance estimates. Beyond multicollinearity, vehicle performance parameters interact: a heavy vehicle with high horsepower behaves differently from a light vehicle with the same horsepower, and the fuel economy penalty for weight likely depends on model year as manufacturing technology improves. Standard linear regression — and Lasso, which extends it with L1 regularization — cannot capture these interaction effects. The analytic challenge is two-fold: selecting the most informative predictors from a correlated feature set, and determining whether variable interactions are present and materially influence fuel economy predictions.

The Solution

A direct benchmark of Lasso and MARS — both with 10-fold cross-validated parameter selection — comparing variable selection, interaction detection, and predictive MSE to identify which approach better characterizes the drivers of vehicle fuel economy

Lasso regression is fitted using the glmnet package with λ selected by 10-fold cross-validation across 12 values from 0.0001 to 5.0, with the optimal λ=0.001 achieving MSE of 11.391 while retaining all 7 predictors. MARS is fitted using the earth package, with cross-validation over degree (1, 2, 3) and nk (10 to 70) — testing no interactions, pairwise interactions, and three-way interactions. The optimal MARS configuration is degree=2 and nk=20, achieving MSE of 7.350 using 6 of 7 predictors with pairwise interaction terms. Both methods are compared on MSE, number of retained variables, variable importance rankings, and ability to detect and quantify interaction effects. The analysis also includes degree=1 MARS (no interactions) as an intermediate comparison point — allowing the contribution of interactions to be isolated from the algorithmic difference between the two methods.

Key Outcome

MARS substantially outperforms Lasso — MSE 7.35 versus 11.39, a 35% advantage — with weight and model_year emerging as the dominant predictors across all configurations; variable interactions exist within the dataset but are weak, as the MSE improvement from degree=1 to degree=2 MARS is modest, suggesting the performance gap between MARS and Lasso is driven primarily by MARS's non-parametric piecewise flexibility rather than interaction modeling alone.

Technical Deep Dive

Methodology & Analysis

Analytical Workflow

Stage 1 — Data Preparation & Multicollinearity Diagnosis

Dataset

392 Vehicles · 7 Predictors

398 original · 6 removed (missing horsepower) · Target: mpg (miles per gallon)

Multicollinearity

High Correlations Detected

cylinders, disp, horsepower & weight all correlated >0.75 · Motivates regularization & non-parametric approach

Tuning Strategy

10-Fold Cross-Validation

Applied to both methods · λ for Lasso · degree & nk for MARS

Stage 2 — Lasso Regression · glmnet

Optimal Configuration

λ = 0.001 · MSE 11.391 · 7 Variables

All 7 predictors retained · No variable selection at this λ · 12 λ values tested (0.0001–5.0)

Parsimonious Alternative

λ = 0.5 · MSE 11.824 · 4 Variables

Retains horsepower, weight, model_year, origin · Removes cylinders, disp, acc · Slight MSE cost

Stage 3 — MARS · earth · 21 Configurations (degree × nk)

No Interactions

Degree=1 · nk=30 · MSE 7.816

6 of 7 variables · origin unused · weight & model_year top predictors

Pairwise Interactions

Degree=2 · nk=20 · MSE 7.350

6 of 7 variables · disp unused · Interactions: weight×model_year, horsepower×model_year, acc×model_year, acc×origin

Three-Way Interactions

Degree=3 · nk=10 · MSE 7.696

Worse than degree=2 · Higher-order interactions overfit · MSE degrades with increasing nk

Stage 4 — Comparative Evaluation & Variable Importance

Conclusion

MARS Best (MSE 7.35) · 35% Advantage Over Lasso (MSE 11.39) · weight & model_year Dominant

Interactions present but weak · disp consistently least important in MARS · Lasso cannot rank variable importance

Stage 1

Data Preparation & Multicollinearity Diagnosis

The Auto MPG dataset contains 398 vehicle records; 6 instances with missing horsepower values are removed, leaving 392 complete observations. The 7 predictors span engine characteristics (cylinders, displacement, horsepower), physical attributes (weight, acceleration), and contextual variables (model year, origin). The name variable is excluded as non-predictive. Correlation analysis reveals strong multicollinearity: cylinders, displacement, horsepower, and weight are all intercorrelated above 0.75 — a cluster of engine power proxies that will distort standard least-squares coefficients. This collinearity structure motivates the use of regularization (Lasso) and a non-parametric alternative (MARS) that handles correlated predictors through its forward-backward selection procedure rather than global coefficient shrinkage alone.

Stage 2

Lasso Regression

Lasso is fitted using glmnet with 10-fold CV across 12 λ values from 0.0001 to 5.0. At λ=0.001 (optimal), all 7 predictors are retained with MSE 11.391 — the L1 penalty is small enough that no coefficient reaches zero. The coefficient path plot shows that as t decreases (equivalently λ increases), model_year and origin are the last to depart from zero — identifying them as the most robustly important predictors. At λ=0.5, three predictors (cylinders, disp, acc) are zeroed out — retaining only horsepower, weight, model_year, and origin with a slightly higher MSE of 11.824. Lasso cannot model variable interactions by design, making its variable importance assessment limited to individual coefficient magnitudes rather than interaction-adjusted rankings.

Stage 3

MARS — Forward Selection & Backward Deletion

MARS builds a model of piecewise linear hinge functions via forward selection — iteratively adding basis function pairs that most reduce RSS — then backward deletion removes redundant terms using GCV to avoid overfitting. Cross-validation over degree (1, 2, 3) and nk (10 to 70) selects degree=2 and nk=20 as optimal. At degree=2, MARS fits interaction terms including weight×model_year, horsepower×model_year, acc×model_year, and acc×origin — capturing how the fuel economy effect of weight, horsepower, and acceleration changes with vehicle era and market origin. Degree=3 performs worse (MSE deteriorates with nk), confirming that three-way interactions overfit this dataset. Disp is unused across degree=2 configurations despite being a primary power driver, displaced by its highly correlated counterparts.

Stage 4

Variable Importance & Comparative Evaluation

MARS provides quantitative variable importance via GCV and RSS criteria across all basis function subsets. At degree=2 and nk=20: weight scores 100.0 (most important), model_year 52.2, acc 24.3, origin 15.7, cylinders 11.8, horsepower 8.9, disp 0.0 (unused). The interaction-adjusted ranking shifts acc ahead of horsepower compared to the no-interaction model — confirming that acceleration interacts with model_year and origin in ways that amplify its contribution. Lasso produces coefficient magnitudes but no ranked importance under multicollinearity, making MARS the superior method for understanding which predictors drive fuel economy and in what order.

Key Methodological Choices

MARS over degree=3 — pairwise interactions optimal, three-way interactions overfit

Cross-validation over all three degree values makes the interaction order a data-driven decision rather than a modeling assumption. Degree=3 achieves MSE of 7.696 at nk=10 — already worse than degree=2 — and deteriorates further as nk increases, indicating that three-way interactions are not supported by the data and produce overfitting when too many basis function terms are added. Degree=2 achieves the best MSE (7.350) with the stable plateau behavior that indicates a well-regularized model. The result confirms that fuel economy is shaped by pairwise relationships — how weight interacts with model_year, how horsepower changes meaning across eras — rather than higher-order joint effects.

Lasso at λ=0.001 retains all variables — minimum penalty needed to handle multicollinearity

The optimal λ=0.001 is small enough that no coefficient reaches zero — the L1 penalty shrinks coefficients slightly but does not eliminate any predictor. This reveals that even under the collinearity present in the dataset, all 7 variables contribute marginal predictive information when taken together. The practical implication is that Lasso's variable selection benefit — zeroing out redundant predictors — does not activate at the MSE-optimal regularization level for this dataset. A practitioner requiring a parsimonious model must accept a MSE increase from 11.391 to 11.824 to reduce the model to 4 predictors at λ=0.5.

MARS performance gap over Lasso driven by flexibility, not interactions alone

The MSE gap between MARS degree=1 (no interactions, MSE 7.816) and degree=2 (with interactions, MSE 7.350) is 0.466 — modest. But the gap between Lasso (MSE 11.391) and MARS degree=1 is 3.575, much larger. This arithmetic isolates the source of MARS's advantage: most of the 35% MSE improvement over Lasso comes from MARS's piecewise non-parametric flexibility — its ability to fit different linear slopes in different regions of the predictor space — rather than from interaction modeling. Variable interactions contribute additional improvement but are not the primary driver of the performance difference, confirming that fuel economy has non-linear relationships with individual predictors that Lasso's global linear coefficients cannot represent.

Tech Stack

Technology Purpose
R Statistical modeling environment and primary implementation language
glmnet (R package) Lasso regression with L1 penalty — 10-fold CV over 12 λ values for optimal regularization
earth (R package) MARS implementation — forward selection of hinge basis functions, backward deletion via GCV, CV over degree & nk
10-Fold Cross-Validation Parameter selection for both methods — λ for Lasso; degree (1, 2, 3) and nk (10–70) for MARS
GCV (Generalized Cross-Validation) MARS internal criterion for backward deletion — penalizes model complexity during term pruning

Results & Metrics

What the analysis reveals

7.35

Best MSE — MARS

At degree=2, nk=20 — pairwise interactions with 6 of 7 predictors retained

11.39

Lasso MSE at λ=0.001

35% higher than MARS — all 7 predictors retained, no variable selection at optimal λ

weight

Dominant Predictor

GCV score 100.0 in MARS — most important predictor across every configuration tested

🏆

MARS outperforms Lasso by 35% — piecewise flexibility beats L1 regularization on this dataset

MARS at degree=2 and nk=20 achieves MSE of 7.350 versus Lasso's 11.391 — a gap that persists even when MARS is restricted to no interactions (degree=1, MSE 7.816). The performance difference is not primarily explained by MARS's ability to model interactions: MARS without interactions (MSE 7.816) still substantially outperforms Lasso (MSE 11.391). The advantage is structural — MARS's piecewise hinge functions adapt the slope of each predictor's relationship with fuel economy across different value ranges, capturing non-linearities that Lasso's single global coefficient per predictor cannot represent regardless of how well λ is tuned.

⚖️

Weight and model_year are the two dominant predictors across all methods and configurations

In every MARS configuration — degree=1 and degree=2, nk=10 and nk=20 — weight ranks first (GCV 100.0) and model_year ranks second (GCV 51–52). The consistency is striking: even as other predictors enter and leave the model depending on interactions allowed, and even as the MARS formula changes substantially across configurations, the top two rankings are invariant. This is interpretable: vehicle weight directly determines the power required to maintain speed (physics), and model_year captures the compound effect of technological improvements in engine efficiency, aerodynamics, and fuel systems over time.

🔗

Variable interactions exist but are weak — MSE improvement from degree=1 to degree=2 is modest

The optimal degree=2 MARS model (MSE 7.350) outperforms the optimal degree=1 model (MSE 7.816) by 0.466 units — confirming that interactions between weight, horsepower, acc, and model_year are present and detectable. However, the gap is small relative to the total MSE reduction achieved by switching from Lasso to MARS. The interaction terms in Eq. 10 are statistically non-trivial — horsepower×model_year and acc×origin capture meaningful cross-predictor effects — but their incremental contribution to predictive accuracy is modest, meaning the non-linear marginal effects of individual predictors matter more than their joint interactions for this dataset.

🚗

Disp is consistently excluded by MARS despite being a primary engine power metric

Engine displacement (disp) is unused in both the degree=1 and degree=2 optimal MARS models — receiving GCV importance score of 0.0. This is not because displacement is physically unimportant: larger displacement engines consume more fuel. Rather, displacement is so highly correlated with cylinders, horsepower, and weight (correlations above 0.75) that once those variables are included, displacement adds no independent predictive signal. MARS's forward-backward selection effectively identifies this redundancy and eliminates displacement in favor of its less-correlated surrogates. Lasso, by contrast, retains disp at the optimal λ=0.001 due to its global coefficient shrinkage mechanism — which does not enforce the hard exclusion that MARS's selection procedure achieves.

📊

MARS provides ranked variable importance; Lasso cannot under multicollinearity

MARS's GCV and RSS-based variable importance measures provide a quantitative, ranked ordering of predictor contributions — weight (100.0), model_year (52.2), acc (24.3), origin (15.7), cylinders (11.8), horsepower (8.9) — that is stable across the degree=2 configuration. Lasso's coefficient magnitudes, while interpretable in isolation, are distorted by multicollinearity: when cylinders, disp, horsepower, and weight share variance, Lasso's coefficients reflect arbitrary splits of the shared explanatory power rather than independent contributions. For practitioners seeking to understand the engineering drivers of fuel economy rather than just prediction accuracy, MARS's interaction-adjusted importance rankings are the more reliable and actionable output.