Applied ML · ML Tooling

OptiTree: Tree Model Benchmarking

A model benchmarking and optimization framework that replaces grid and random search with Particle Swarm Optimization — comparing seven tree-based classifiers under automated hyperparameter tuning.

Architecture PSO · Multi-Model Benchmarking
Tech Stack
Python scikit-learn MLflow hyperparameter-optimizer XGBoost LightGBM CatBoost
Source Code View on GitHub

7

Tree-Based Classifiers Benchmarked

PSO

Particle Swarm Optimization — Replaces Grid & Random Search

MLflow

Full Experiment Tracking — Parameters, Metrics & Artifacts

The Problem

Selecting and tuning the right tree-based classifier for a classification task requires searching a large, irregular hyperparameter space — a process that grid search handles poorly and random search handles inconsistently

Tree-based ensemble models — from simple Decision Trees to gradient boosting variants like XGBoost, LightGBM, and CatBoost — each expose hyperparameter spaces that are high-dimensional, irregularly shaped, and filled with non-obvious interactions between parameters. The number of trees, learning rate, maximum depth, subsampling ratio, and regularization terms jointly determine performance in ways that no exhaustive grid can efficiently cover. Grid search scales exponentially with the number of parameters and wastes evaluations on unpromising regions; random search improves coverage but lacks convergence — it samples points independently without learning from previous results. The result is that practitioners either underoptimize their models by using defaults, or waste substantial compute time on search strategies that do not converge intelligently toward the best configurations. There is also no standard framework for comparing multiple model families under equally rigorous tuning — making it hard to know whether the best model was selected from a fairly benchmarked set or simply chosen by coincidence of which defaults happened to perform well.

The Solution

A unified benchmarking and optimization suite that applies PSO-based hyperparameter search to seven tree-based classifiers in a single pipeline — with MLflow experiment tracking and joblib serialization for fully reproducible, comparable results

OptiTree integrates the custom hyperparameter-optimizer library — which implements Particle Swarm Optimization (PSO) and Pattern Search — as a drop-in replacement for scikit-learn's GridSearchCV or RandomizedSearchCV. Each of the seven classifiers (Decision Tree, Random Forest, AdaBoost, Gradient Boosting, LightGBM, XGBoost, CatBoost) is tuned under the same PSO-based search, ensuring that hyperparameter optimization quality is consistent across model families. PSO maintains a swarm of candidate configurations that converge iteratively toward optimal regions of the search space — learning from past evaluations rather than sampling blindly. Every experiment run is automatically logged to MLflow: parameters, metrics, and model artifacts are captured for each configuration, enabling post-hoc comparison across models and runs without manual bookkeeping. The best-performing model from each run is serialized with joblib for immediate downstream use or deployment, completing a pipeline from raw data through tuned, production-ready model in a single reproducible execution.

Key Outcome

A production-ready benchmarking suite that identifies the best-performing tree-based classifier for any structured classification task — with PSO-tuned hyperparameters consistently outperforming default settings, all experiments logged to MLflow for reproducible comparison, and the winning model serialized with joblib for immediate deployment.

Technical Deep Dive

Architecture & Design

Benchmarking Pipeline

Stage 1 — Data Ingestion & Preprocessing

Input

Structured Classification Dataset

pandas loading · Feature engineering · Train/test split · Preprocessing applied consistently across all models

Design Principle

Unified Preprocessing for Fair Comparison

All 7 classifiers trained on identical preprocessed data · No model-specific data treatment

Stage 2 — PSO Hyperparameter Optimization · hyperparameter-optimizer

Optimizer

Particle Swarm Optimization

Swarm of candidate configs · Iterative convergence · Faster than grid search · More consistent than random search

Compatibility

scikit-learn Drop-In API

Replaces GridSearchCV or RandomizedSearchCV · Works with any scikit-learn-compatible estimator

Models Tuned

7 Classifiers · Same Search Quality

Decision Tree · Random Forest · AdaBoost · Gradient Boosting · LightGBM · XGBoost · CatBoost

Stage 3 — Experiment Tracking · MLflow

Logged Per Run

Parameters · Metrics · Artifacts

Every PSO configuration evaluated is tracked · Full audit trail for post-hoc comparison across all 7 models

Reproducibility

Consistent Re-run & Comparison

Any experiment can be re-run or compared · No manual logging required · MLflow UI for visual inspection

Stage 4 — Model Serialization & Output · joblib

Output

Best Model Serialized — Ready for Downstream Use or Deployment

PSO-tuned winner persisted with joblib · Loadable for inference, further evaluation, or pipeline integration · Versioned alongside MLflow run

Stage 1 & 2

PSO-Driven Hyperparameter Optimization

The hyperparameter-optimizer library provides PSO and Pattern Search as drop-in replacements for scikit-learn's search utilities. PSO initializes a swarm of candidate hyperparameter configurations and iterates — each particle moves toward both its own historical best position and the swarm's global best — converging on high-performing regions of the search space without exhaustively evaluating every point. This convergence-driven search reaches competitive configurations significantly faster than grid search across the high-dimensional hyperparameter spaces of gradient boosting models (which expose learning rate, max depth, subsample, column sample, regularization terms, and tree count simultaneously). All seven classifiers are tuned using the same PSO search quality, ensuring that no model is disadvantaged by inferior tuning and that the benchmark reflects genuine algorithmic differences rather than tuning effort discrepancies.

Stage 3

MLflow Experiment Tracking

Every PSO evaluation — not just the final best configuration — is logged to MLflow as a named run, capturing the full hyperparameter vector, cross-validation accuracy, and any additional metrics. This creates a complete audit trail of the search process for each of the seven model families, enabling post-hoc analysis of which hyperparameter regions produced the best results, how performance varied across the search, and which models were consistently strong versus sensitive to tuning. The MLflow UI allows visual comparison across runs without manual spreadsheet management, and the experiment registry provides a single source of truth for all benchmarking results that can be shared and reproduced by any team member.

Stage 4

Model Serialization & Downstream Use

The best-performing model identified by PSO is serialized with joblib at the end of each run, creating a deployment-ready artifact that can be loaded directly for inference, integrated into a larger ML pipeline, or versioned alongside the MLflow run that produced it. Joblib's efficient binary serialization handles the large in-memory representations of tree ensemble models — particularly the hundreds or thousands of trees in Random Forest, XGBoost, and LightGBM models — more efficiently than pickle. The combination of MLflow tracking and joblib serialization means every OptiTree run produces both a traceable experimental record and a production-usable model, closing the gap between benchmarking and deployment without additional engineering effort.

Key Design Decisions

PSO over grid and random search — convergence-driven exploration outperforms blind sampling

Grid search scales exponentially with the number of hyperparameters — a 5-parameter grid with 5 values each requires 3,125 evaluations before a single result is produced. Random search improves coverage but samples independently without learning from previous evaluations, wasting compute on regions already known to be unproductive. PSO treats each configuration as a particle that moves through the search space guided by both its own best historical result and the swarm's best — converging toward high-performing regions iteratively. For the complex, correlated hyperparameter landscapes of gradient boosting models (where learning rate and tree count interact, and regularization terms interact with depth), PSO's guided convergence reaches competitive configurations in a fraction of the evaluations that grid search would require.

Seven model families benchmarked under equal tuning conditions

A common failure mode in model comparison is that one algorithm is tuned carefully while others use defaults — making the winning model's advantage an artifact of tuning investment rather than genuine algorithmic superiority. OptiTree applies the same PSO search quality to all seven classifiers, ensuring that Decision Trees, AdaBoost, and simple Gradient Boosting receive the same hyperparameter search effort as XGBoost, LightGBM, and CatBoost. The result is a benchmark where performance differences reflect the algorithms' intrinsic properties on the data — not accidental advantages from better defaults or more careful manual tuning of some models over others.

MLflow + joblib together close the loop from benchmarking to deployment

Experiment tracking and model persistence are often treated as separate concerns — a tracking system logs metrics but doesn't produce deployment artifacts, and a serialization step produces the artifact but doesn't connect it back to the experiment that generated it. OptiTree integrates both in the same pipeline: MLflow captures the full experimental context (which hyperparameters, which cross-validation scores, which data split), and joblib serializes the winning model from that exact context. Any serialized model can be traced back to its MLflow run, and any MLflow run can be re-executed to regenerate the same model — creating a fully reproducible, auditable pipeline from search through deployment.

Tech Stack

Technology Purpose
scikit-learn Decision Tree, Random Forest, AdaBoost, Gradient Boosting — base classifiers and cross-validation utilities
LightGBM Gradient boosting with histogram-based tree building — optimized for large datasets and fast training
XGBoost Extreme gradient boosting with L1/L2 regularization and column subsampling for improved generalization
CatBoost Gradient boosting with native categorical feature support — no manual encoding required
hyperparameter-optimizer Custom PSO and Pattern Search library — drop-in replacement for GridSearchCV and RandomizedSearchCV
MLflow Experiment tracking — logs parameters, metrics, and model artifacts for every PSO evaluation across all runs
joblib Model serialization — persists the PSO-optimized winning model for downstream deployment or pipeline integration
pandas Data loading, preprocessing, and feature transformation prior to model training

Results & Metrics

What the system delivers

7

Classifiers Benchmarked

Decision Tree, Random Forest, AdaBoost, Gradient Boosting, LightGBM, XGBoost, CatBoost — all PSO-tuned

PSO

Smarter Search

Convergence-driven tuning consistently outperforms default hyperparameters across all seven model families

Full

Experiment Traceability

Every PSO run logged to MLflow — parameters, metrics, and artifacts for every configuration evaluated

🎯

PSO-tuned hyperparameters consistently outperform defaults across all seven classifiers

Default hyperparameters are chosen by library authors to work reasonably across a broad range of problems — they are not optimized for any specific dataset. PSO search targets the hyperparameter configuration that maximizes cross-validation accuracy for the specific data at hand. Across all seven model families, the PSO-tuned configurations outperform their respective defaults — with gradient boosting variants (LightGBM, XGBoost, CatBoost) typically showing the largest gains, as their larger hyperparameter spaces provide more room for optimization. The improvement is consistent because PSO's convergence mechanism reliably finds configurations that defaults miss, rather than occasionally stumbling on a better setting through random sampling.

📊

MLflow provides a complete, queryable record of every experiment without manual bookkeeping

Without experiment tracking, benchmarking results live in notebooks, spreadsheets, or memory — making it impossible to reliably reproduce a result, compare a new model against historical runs, or audit which configuration produced a given deployed model. MLflow eliminates this problem by automatically capturing the full context of every OptiTree run: the model class, the PSO-found hyperparameters, the cross-validation accuracy, and the serialized model artifact. The MLflow UI renders these as a sortable, filterable table — enabling instant identification of the best-performing configuration across all seven models and all historical runs without any manual record-keeping.

🔁

Any run is fully reproducible — same data, same PSO seed, same logged result

Reproducibility is a systemic challenge in ML benchmarking: different random seeds, different library versions, or undocumented preprocessing steps can cause results to drift between runs. OptiTree addresses this by logging all parameters that affect the result — including PSO configuration, random state, and model hyperparameters — as MLflow run metadata. Any experiment can be re-executed with identical parameters to verify or extend the result, and the logged artifacts allow the exact model from any historical run to be loaded without retraining. This makes OptiTree suitable not just for one-time benchmarking but as a repeatable, auditable component of a model selection workflow.

🚀

Best model is immediately deployment-ready — no gap between benchmarking and production use

The joblib-serialized winning model produced by each OptiTree run is a complete, self-contained inference artifact — it can be loaded by any Python process with the appropriate dependencies and used to generate predictions immediately, without retraining or reconfiguration. This removes the common gap between the benchmarking environment (a Jupyter notebook with manually chosen settings) and the production environment (a service or pipeline that needs a stable, versioned model). The serialized model is linked to its MLflow run, providing a full chain of custody from data through hyperparameter search through deployment-ready artifact.

🔧

Extensible to any scikit-learn-compatible classifier — not limited to the seven included models

The hyperparameter-optimizer library is designed as a drop-in replacement for GridSearchCV and RandomizedSearchCV — it works with any estimator that follows the scikit-learn fit/predict API. OptiTree's pipeline structure means that adding a new model to the benchmark requires only defining the hyperparameter search space and the estimator class; the PSO search, MLflow logging, and joblib serialization all apply automatically. This makes the framework extensible to logistic regression, SVMs, neural network wrappers, or any future tree-based algorithm that follows the scikit-learn interface — without changes to the benchmarking infrastructure.