Applied ML · Real Estate

Boston House Value Prediction

Comparative analysis of Least-Squares Linear Regression, Lasso, and Ridge across 506 Boston housing records — using 10-fold cross-validation to benchmark regularization strategies in the presence of multicollinearity.

Methods Ridge · Lasso · LSLR
Tech Stack
R glmnet 10-Fold Cross-Validation
Source Code View on GitHub

506

Housing Records · 13 Predictors

3

Regression Methods Compared

23.63

Best MSE — Lasso (λ=0.0337)

The Problem

Predicting Boston house values from 13 socioeconomic and structural predictors is complicated by multicollinearity — inflating coefficient variance and making standard linear regression unstable and hard to interpret

The Boston housing dataset presents a prediction task where the target — median home value — is influenced by a diverse set of 13 regressors spanning safety (crime rate), location (residential land ratio, proximity to employment centers), environmental quality (nitric oxide concentration), house-specific attributes (number of rooms, age, tax rate), accessibility (highway index), and social factors (student-teacher ratio, population demographics). Many of these predictors are not independent: the correlation matrix reveals reasonably high multicollinearity (≥0.75) among INDUS, NOX, AGE, DIS, RAD, and TAX — with RAD and TAX correlating at 0.91. In standard least-squares linear regression, multicollinearity inflates the variance of coefficient estimates without necessarily increasing bias, producing a model whose coefficients are unstable across samples and difficult to interpret. The question is not just which model predicts best, but which regularization strategy — if any — is warranted given the collinearity structure of this dataset.

The Solution

A three-method comparison of LSLR, Lasso, and Ridge — each with 10-fold cross-validated λ selection — to quantify the bias-variance tradeoff and determine whether regularization improves prediction and interpretability over the baseline

Three regression techniques are fitted to the 506-instance Boston housing dataset and compared on MSE and coefficient magnitude. Least-Squares Linear Regression provides the unregularized baseline — all 13 predictors retained with no penalty. Lasso regression adds an L1 penalty (λ·Σ|β|) that shrinks some coefficients exactly to zero, enabling automatic variable selection; the optimal λ=0.0337 is identified via 10-fold cross-validation and results in 11 of 13 predictors being retained, with INDUS and AGE dropped. Ridge regression adds an L2 penalty (λ·Σβ²) that shrinks all coefficients toward zero without zeroing any; the optimal λ=5 is selected by cross-validation, yielding 12.58 effective degrees of freedom. All three models are evaluated by their 10-fold cross-validated MSE and by the sum of absolute and squared coefficient values — providing a complete picture of prediction accuracy, model complexity, and regularization strength across the three approaches.

Key Outcome

All three methods achieve comparable MSE — Lasso (23.63), LSLR (23.71), Ridge (23.89) — confirming that LSLR is an acceptable baseline for this dataset; however, Lasso is the preferred method when model parsimony is desired, as it produces the smallest coefficient magnitudes and automatically excludes INDUS and AGE — the two variables with the least discriminative contribution to median house value prediction.

Technical Deep Dive

Methodology & Analysis

Analytical Workflow

Stage 1 — Data Characterization & Multicollinearity Diagnosis

Dataset

506 Records · 13 Predictors

No missing data · Target: MEDV (median home value in $1,000)

Multicollinearity

High Correlations Detected

INDUS, NOX, AGE, DIS, RAD, TAX correlated ≥0.75 · RAD–TAX: 0.91 · Motivates regularization

Tuning Strategy

10-Fold Cross-Validation

Applied to both Lasso and Ridge · λ selected at minimum CV-MSE

Stage 2 — Three-Method Regression Fitting

Baseline

LSLR — No Regularization

All 13 predictors retained · MSE 23.71 · Largest coefficient magnitudes

L1 Penalty

Lasso — λ=0.0337

11/13 predictors retained · INDUS & AGE zeroed · MSE 23.63 (best)

L2 Penalty

Ridge — λ=5

All 13 predictors shrunk · Eff. DOF 12.58 · MSE 23.89

Stage 3 — Comparative Evaluation

Predictive Performance

MSE Comparison Across Methods

Lasso 23.63 · LSLR 23.71 · Ridge 23.89 · All within 0.27 of each other

Model Complexity

Coefficient Magnitude Analysis

Sum of squared coefficients: Lasso 284 · Ridge 302 · LSLR 341 · Lasso most parsimonious

Conclusion

LSLR Acceptable — Lasso Preferred for Parsimony · INDUS and AGE Excluded

All three methods comparable on MSE · Lasso achieves lowest MSE with fewest predictors and smallest coefficient magnitudes

Stage 1

Data Characterization & Multicollinearity Diagnosis

The dataset contains 506 Boston-area housing records with no missing values. The 13 predictors cover five categories: safety (CRIM), location (ZN, INDUS, CHAS, RAD), environment (NOX), house-specific attributes (RM, AGE, TAX, PTRATIO), and social factors (B, LSTAT). The correlation matrix reveals a cluster of highly correlated predictors: INDUS, NOX, AGE, DIS, RAD, and TAX all share pairwise correlations ≥0.75, with RAD and TAX reaching 0.91. This collinearity structure inflates variance in LSLR coefficient estimates — motivating the evaluation of both Ridge (which shrinks all coefficients) and Lasso (which zeros out the least informative ones).

Stage 2

Three-Method Regression Fitting

LSLR is fitted as the unregularized baseline, retaining all 13 predictors. Lasso is fitted using the glmnet package with λ selected via 10-fold cross-validation; the optimal λ=0.0337 zeroes out INDUS and AGE while retaining 11 predictors with the smallest overall coefficient magnitudes (sum of squared coefficients: 284.3). Ridge is also fitted via glmnet with 10-fold CV; the optimal λ=5 shrinks all 13 coefficients without eliminating any, resulting in 12.58 effective degrees of freedom and a sum of squared coefficients of 302.5 — larger than Lasso but smaller than LSLR's 340.9.

Stage 3

Comparative Evaluation

All three models are compared on 10-fold cross-validated MSE and on the sum of absolute and squared coefficient values. Lasso achieves the lowest MSE (23.63) with the smallest coefficient magnitudes and the fewest predictors. LSLR achieves MSE of 23.71 — only 0.08 worse than Lasso — with the largest coefficient magnitudes. Ridge sits between at MSE 23.89. The near-identical MSE values confirm that regularization provides minimal predictive benefit on this dataset, but Lasso's automatic variable selection and coefficient shrinkage make it the methodologically preferred choice when parsimony and interpretability are priorities.

Key Methodological Choices

Regularization evaluated against LSLR — not assumed necessary

Multicollinearity is a necessary but not sufficient condition for regularization to improve predictive performance. High collinearity between predictors inflates coefficient variance in LSLR — but if the variance inflation is modest relative to the dataset size, the regularization penalty may introduce more bias than it removes variance. By evaluating all three methods on the same 10-fold cross-validated MSE, the analysis lets the data determine whether regularization helps — rather than assuming it does because multicollinearity was detected. The result — near-identical MSE across all three methods — is itself a finding: this dataset is large enough relative to its collinearity that LSLR remains competitive.

Lasso over Ridge when variable selection is the goal

Ridge regression shrinks all coefficients toward zero but cannot set any exactly to zero — it retains all 13 predictors in the final model regardless of their individual contribution. This makes Ridge appropriate for bias-variance tradeoff correction when all predictors are believed to contribute, but less useful when the goal is a parsimonious, interpretable model. Lasso's L1 penalty creates exact zeros for sufficiently uninformative predictors, enabling automatic variable selection. The exclusion of INDUS (non-retail business ratio) and AGE (proportion of old housing) from the Lasso model is interpretable: both are highly correlated with other retained predictors and add limited independent signal to median home value prediction.

10-fold cross-validation for λ selection — balancing reliability and cost

The penalty parameter λ controls the strength of regularization in both Lasso and Ridge — too small and the model behaves like LSLR; too large and the model is over-shrunk toward zero. Cross-validation provides an empirical estimate of the test error as a function of λ, selecting the value that minimizes out-of-sample prediction error without requiring a separate holdout set. With 506 observations, 10-fold CV provides a reasonable balance: each fold contains ~50 training-excluded records — enough for a stable error estimate — while keeping computation tractable across the range of λ values evaluated.

Tech Stack

Technology Purpose
R Statistical modeling environment and primary implementation language
glmnet (R package) Lasso and Ridge regression with cross-validated λ selection
LSLR Ordinary least-squares baseline — no regularization, all 13 predictors retained
10-Fold Cross-Validation Empirical λ selection for both Lasso and Ridge; MSE evaluation for all three methods

Results & Metrics

What the analysis reveals

23.63

Best MSE — Lasso

At λ=0.0337 with 11 of 13 predictors — lowest MSE and most parsimonious model

0.27

MSE Spread Across Methods

Lasso 23.63 · LSLR 23.71 · Ridge 23.89 — all three methods within 0.27 MSE

2

Predictors Excluded by Lasso

INDUS and AGE zeroed out — least discriminative predictors of median home value

🏆

Lasso achieves the best MSE with the fewest predictors and smallest coefficients

At λ=0.0337, Lasso achieves MSE of 23.63 — the lowest of the three methods — while retaining only 11 of 13 predictors and producing the smallest coefficient magnitudes (sum of squared coefficients: 284.3, versus 302.5 for Ridge and 340.9 for LSLR). This triple advantage — best MSE, fewest variables, smallest coefficients — makes Lasso the dominant choice whenever both predictive performance and model interpretability are evaluation criteria. The coefficient path plot confirms that as the constraint on absolute coefficient values tightens, more predictors are progressively zeroed out.

🏠

INDUS and AGE contribute negligible independent signal to median house value

Lasso's exclusion of INDUS (non-retail business ratio) and AGE (percentage of pre-1940 housing units) is interpretable given the correlation structure: INDUS is highly correlated with NOX and other environmental predictors already retained in the model, and AGE is correlated with DIS and other location-based predictors. Once those correlated predictors are included, INDUS and AGE add redundant information. Their removal does not increase MSE — confirming that the 11-predictor Lasso model captures essentially all the predictive signal available in the full 13-predictor set.

📐

NOX and RM are the strongest predictors across all three models

Across LSLR, Lasso, and Ridge, the NOX coefficient (nitric oxide concentration) and the RM coefficient (average number of rooms) are consistently among the largest in absolute magnitude. In the Lasso model, NOX carries a coefficient of −16.09 and RM a coefficient of +3.88 — reflecting the strong negative impact of pollution exposure and the strong positive impact of space on median home value. These findings align with housing economics theory: air quality and living space are among the most strongly capitalized amenities in residential property markets.

⚖️

All three methods are within 0.27 MSE — LSLR is acceptable for this dataset

The near-identical MSE values across all three methods (23.63 to 23.89) confirm that regularization provides minimal predictive benefit for a 506-instance dataset with moderate collinearity. The 506 observations are large enough relative to 13 predictors that LSLR does not severely overfit, and the collinearity — while present — does not push coefficient variance high enough to meaningfully degrade out-of-sample performance. This is an important nuance: multicollinearity motivates regularization as a precaution, but does not guarantee it will produce a meaningfully better model.

🔍

Ridge shrinks all coefficients but cannot match Lasso's interpretability advantage

Ridge at λ=5 achieves an effective degrees of freedom of 12.58 — nearly the full 13-predictor model, indicating that the L2 penalty is providing only light shrinkage at this optimal λ. The sum of squared coefficients (302.5) is lower than LSLR (340.9) but higher than Lasso (284.3), and its MSE (23.89) is the worst of the three methods. Ridge is most useful when all predictors are genuinely informative and none should be excluded — a condition not well-supported by this dataset, where INDUS and AGE are demonstrably redundant. For this specific problem, Ridge offers neither the predictive advantage over LSLR nor the parsimony advantage of Lasso.