Boston House Value Prediction
Comparative analysis of Least-Squares Linear Regression, Lasso, and Ridge across 506 Boston housing records — using 10-fold cross-validation to benchmark regularization strategies in the presence of multicollinearity.
506
Housing Records · 13 Predictors
3
Regression Methods Compared
23.63
Best MSE — Lasso (λ=0.0337)
The Problem
Predicting Boston house values from 13 socioeconomic and structural predictors is complicated by multicollinearity — inflating coefficient variance and making standard linear regression unstable and hard to interpret
The Boston housing dataset presents a prediction task where the target — median home value — is influenced by a diverse set of 13 regressors spanning safety (crime rate), location (residential land ratio, proximity to employment centers), environmental quality (nitric oxide concentration), house-specific attributes (number of rooms, age, tax rate), accessibility (highway index), and social factors (student-teacher ratio, population demographics). Many of these predictors are not independent: the correlation matrix reveals reasonably high multicollinearity (≥0.75) among INDUS, NOX, AGE, DIS, RAD, and TAX — with RAD and TAX correlating at 0.91. In standard least-squares linear regression, multicollinearity inflates the variance of coefficient estimates without necessarily increasing bias, producing a model whose coefficients are unstable across samples and difficult to interpret. The question is not just which model predicts best, but which regularization strategy — if any — is warranted given the collinearity structure of this dataset.
The Solution
A three-method comparison of LSLR, Lasso, and Ridge — each with 10-fold cross-validated λ selection — to quantify the bias-variance tradeoff and determine whether regularization improves prediction and interpretability over the baseline
Three regression techniques are fitted to the 506-instance Boston housing dataset and compared on MSE and coefficient magnitude. Least-Squares Linear Regression provides the unregularized baseline — all 13 predictors retained with no penalty. Lasso regression adds an L1 penalty (λ·Σ|β|) that shrinks some coefficients exactly to zero, enabling automatic variable selection; the optimal λ=0.0337 is identified via 10-fold cross-validation and results in 11 of 13 predictors being retained, with INDUS and AGE dropped. Ridge regression adds an L2 penalty (λ·Σβ²) that shrinks all coefficients toward zero without zeroing any; the optimal λ=5 is selected by cross-validation, yielding 12.58 effective degrees of freedom. All three models are evaluated by their 10-fold cross-validated MSE and by the sum of absolute and squared coefficient values — providing a complete picture of prediction accuracy, model complexity, and regularization strength across the three approaches.
Key Outcome
All three methods achieve comparable MSE — Lasso (23.63), LSLR (23.71), Ridge (23.89) — confirming that LSLR is an acceptable baseline for this dataset; however, Lasso is the preferred method when model parsimony is desired, as it produces the smallest coefficient magnitudes and automatically excludes INDUS and AGE — the two variables with the least discriminative contribution to median house value prediction.
Technical Deep Dive
Methodology & Analysis
Analytical Workflow
Stage 1 — Data Characterization & Multicollinearity Diagnosis
Dataset
506 Records · 13 Predictors
No missing data · Target: MEDV (median home value in $1,000)
Multicollinearity
High Correlations Detected
INDUS, NOX, AGE, DIS, RAD, TAX correlated ≥0.75 · RAD–TAX: 0.91 · Motivates regularization
Tuning Strategy
10-Fold Cross-Validation
Applied to both Lasso and Ridge · λ selected at minimum CV-MSE
Stage 2 — Three-Method Regression Fitting
Baseline
LSLR — No Regularization
All 13 predictors retained · MSE 23.71 · Largest coefficient magnitudes
L1 Penalty
Lasso — λ=0.0337
11/13 predictors retained · INDUS & AGE zeroed · MSE 23.63 (best)
L2 Penalty
Ridge — λ=5
All 13 predictors shrunk · Eff. DOF 12.58 · MSE 23.89
Stage 3 — Comparative Evaluation
Predictive Performance
MSE Comparison Across Methods
Lasso 23.63 · LSLR 23.71 · Ridge 23.89 · All within 0.27 of each other
Model Complexity
Coefficient Magnitude Analysis
Sum of squared coefficients: Lasso 284 · Ridge 302 · LSLR 341 · Lasso most parsimonious
Conclusion
LSLR Acceptable — Lasso Preferred for Parsimony · INDUS and AGE Excluded
All three methods comparable on MSE · Lasso achieves lowest MSE with fewest predictors and smallest coefficient magnitudes
Stage 1
Data Characterization & Multicollinearity Diagnosis
The dataset contains 506 Boston-area housing records with no missing values. The 13 predictors cover five categories: safety (CRIM), location (ZN, INDUS, CHAS, RAD), environment (NOX), house-specific attributes (RM, AGE, TAX, PTRATIO), and social factors (B, LSTAT). The correlation matrix reveals a cluster of highly correlated predictors: INDUS, NOX, AGE, DIS, RAD, and TAX all share pairwise correlations ≥0.75, with RAD and TAX reaching 0.91. This collinearity structure inflates variance in LSLR coefficient estimates — motivating the evaluation of both Ridge (which shrinks all coefficients) and Lasso (which zeros out the least informative ones).
Stage 2
Three-Method Regression Fitting
LSLR is fitted as the unregularized baseline, retaining all 13 predictors. Lasso is fitted using the glmnet package with λ selected via 10-fold cross-validation; the optimal λ=0.0337 zeroes out INDUS and AGE while retaining 11 predictors with the smallest overall coefficient magnitudes (sum of squared coefficients: 284.3). Ridge is also fitted via glmnet with 10-fold CV; the optimal λ=5 shrinks all 13 coefficients without eliminating any, resulting in 12.58 effective degrees of freedom and a sum of squared coefficients of 302.5 — larger than Lasso but smaller than LSLR's 340.9.
Stage 3
Comparative Evaluation
All three models are compared on 10-fold cross-validated MSE and on the sum of absolute and squared coefficient values. Lasso achieves the lowest MSE (23.63) with the smallest coefficient magnitudes and the fewest predictors. LSLR achieves MSE of 23.71 — only 0.08 worse than Lasso — with the largest coefficient magnitudes. Ridge sits between at MSE 23.89. The near-identical MSE values confirm that regularization provides minimal predictive benefit on this dataset, but Lasso's automatic variable selection and coefficient shrinkage make it the methodologically preferred choice when parsimony and interpretability are priorities.
Key Methodological Choices
Regularization evaluated against LSLR — not assumed necessary
Multicollinearity is a necessary but not sufficient condition for regularization to improve predictive performance. High collinearity between predictors inflates coefficient variance in LSLR — but if the variance inflation is modest relative to the dataset size, the regularization penalty may introduce more bias than it removes variance. By evaluating all three methods on the same 10-fold cross-validated MSE, the analysis lets the data determine whether regularization helps — rather than assuming it does because multicollinearity was detected. The result — near-identical MSE across all three methods — is itself a finding: this dataset is large enough relative to its collinearity that LSLR remains competitive.
Lasso over Ridge when variable selection is the goal
Ridge regression shrinks all coefficients toward zero but cannot set any exactly to zero — it retains all 13 predictors in the final model regardless of their individual contribution. This makes Ridge appropriate for bias-variance tradeoff correction when all predictors are believed to contribute, but less useful when the goal is a parsimonious, interpretable model. Lasso's L1 penalty creates exact zeros for sufficiently uninformative predictors, enabling automatic variable selection. The exclusion of INDUS (non-retail business ratio) and AGE (proportion of old housing) from the Lasso model is interpretable: both are highly correlated with other retained predictors and add limited independent signal to median home value prediction.
10-fold cross-validation for λ selection — balancing reliability and cost
The penalty parameter λ controls the strength of regularization in both Lasso and Ridge — too small and the model behaves like LSLR; too large and the model is over-shrunk toward zero. Cross-validation provides an empirical estimate of the test error as a function of λ, selecting the value that minimizes out-of-sample prediction error without requiring a separate holdout set. With 506 observations, 10-fold CV provides a reasonable balance: each fold contains ~50 training-excluded records — enough for a stable error estimate — while keeping computation tractable across the range of λ values evaluated.
Tech Stack
| Technology | Purpose |
|---|---|
| R | Statistical modeling environment and primary implementation language |
| glmnet (R package) | Lasso and Ridge regression with cross-validated λ selection |
| LSLR | Ordinary least-squares baseline — no regularization, all 13 predictors retained |
| 10-Fold Cross-Validation | Empirical λ selection for both Lasso and Ridge; MSE evaluation for all three methods |
Results & Metrics
What the analysis reveals
23.63
Best MSE — Lasso
At λ=0.0337 with 11 of 13 predictors — lowest MSE and most parsimonious model
0.27
MSE Spread Across Methods
Lasso 23.63 · LSLR 23.71 · Ridge 23.89 — all three methods within 0.27 MSE
2
Predictors Excluded by Lasso
INDUS and AGE zeroed out — least discriminative predictors of median home value
Lasso achieves the best MSE with the fewest predictors and smallest coefficients
At λ=0.0337, Lasso achieves MSE of 23.63 — the lowest of the three methods — while retaining only 11 of 13 predictors and producing the smallest coefficient magnitudes (sum of squared coefficients: 284.3, versus 302.5 for Ridge and 340.9 for LSLR). This triple advantage — best MSE, fewest variables, smallest coefficients — makes Lasso the dominant choice whenever both predictive performance and model interpretability are evaluation criteria. The coefficient path plot confirms that as the constraint on absolute coefficient values tightens, more predictors are progressively zeroed out.
INDUS and AGE contribute negligible independent signal to median house value
Lasso's exclusion of INDUS (non-retail business ratio) and AGE (percentage of pre-1940 housing units) is interpretable given the correlation structure: INDUS is highly correlated with NOX and other environmental predictors already retained in the model, and AGE is correlated with DIS and other location-based predictors. Once those correlated predictors are included, INDUS and AGE add redundant information. Their removal does not increase MSE — confirming that the 11-predictor Lasso model captures essentially all the predictive signal available in the full 13-predictor set.
NOX and RM are the strongest predictors across all three models
Across LSLR, Lasso, and Ridge, the NOX coefficient (nitric oxide concentration) and the RM coefficient (average number of rooms) are consistently among the largest in absolute magnitude. In the Lasso model, NOX carries a coefficient of −16.09 and RM a coefficient of +3.88 — reflecting the strong negative impact of pollution exposure and the strong positive impact of space on median home value. These findings align with housing economics theory: air quality and living space are among the most strongly capitalized amenities in residential property markets.
All three methods are within 0.27 MSE — LSLR is acceptable for this dataset
The near-identical MSE values across all three methods (23.63 to 23.89) confirm that regularization provides minimal predictive benefit for a 506-instance dataset with moderate collinearity. The 506 observations are large enough relative to 13 predictors that LSLR does not severely overfit, and the collinearity — while present — does not push coefficient variance high enough to meaningfully degrade out-of-sample performance. This is an important nuance: multicollinearity motivates regularization as a precaution, but does not guarantee it will produce a meaningfully better model.
Ridge shrinks all coefficients but cannot match Lasso's interpretability advantage
Ridge at λ=5 achieves an effective degrees of freedom of 12.58 — nearly the full 13-predictor model, indicating that the L2 penalty is providing only light shrinkage at this optimal λ. The sum of squared coefficients (302.5) is lower than LSLR (340.9) but higher than Lasso (284.3), and its MSE (23.89) is the worst of the three methods. Ridge is most useful when all predictors are genuinely informative and none should be excluded — a condition not well-supported by this dataset, where INDUS and AGE are demonstrably redundant. For this specific problem, Ridge offers neither the predictive advantage over LSLR nor the parsimony advantage of Lasso.