AgriGrain: Grain Type Classification
A multiclass classification pipeline benchmarking five algorithm families — Bagging, Random Forests, Boosting, MDA, and ANNs — across 13,611 grain records and 16 geometric shape features to automate grain type identification.
13,611
Grain Records · 7 Classes · 16 Shape Features
5
ML Algorithm Families Benchmarked
91.1%
Best Test Accuracy (RF & ANN)
The Problem
Manual grain type identification is slow, inconsistent, and unscalable — agricultural quality control needs a data-driven classification system that can reliably distinguish between grain varieties from geometric image features alone
Grain type identification is a critical step in agricultural quality control and food production — different grain varieties have different nutritional profiles, market values, and processing requirements. Traditional inspection relies on manual visual assessment, which is time-intensive, subject to human error, and impossible to scale across the volumes of grain processed in modern agricultural operations. Computer vision systems can extract rich geometric and shape-based features from grain images — perimeter, roundness, compactness, axis lengths, shape factors — but converting those features into reliable multiclass predictions across seven grain varieties requires a classification model that can handle the structural overlap and within-class variability that makes this a genuinely difficult problem. No single algorithm is guaranteed to handle multiclass grain classification optimally, and the right choice depends on which families of decision boundaries best match the feature geometry of each grain type.
The Solution
A five-algorithm multiclass classification benchmark in R, with 5-fold cross-validated hyperparameter tuning across Bagging, Random Forests, Boosting, MDA, and ANNs — evaluated on misclassification rate and ARI
AgriGrain implements a comprehensive classification pipeline across five algorithm families using 16 geometric shape features extracted from 13,611 grain images across 7 varieties. A stratified random sample of 1,364 observations is used for computational feasibility, split 75/25 into training and testing sets. All hyperparameters — number of trees, variables per split, interaction depth, shrinkage, hidden layer size, and weight decay — are tuned using 5-fold cross-validation to ensure each algorithm is evaluated at its best configuration rather than a default. Bagging is tuned to 2,100 trees, Random Forests to 6 variables per split with 500 trees, Boosting to depth 3 with λ=0.05 and 200 trees, and ANNs to size 11 with decay 4. MDA fits a BIC-selected Gaussian mixture model per class. All five models are evaluated on misclassification rate and Adjusted Rand Index (ARI) across both training and testing sets, with variable importance analysis conducted for the tree-based methods.
Key Outcome
Random Forests and ANNs jointly achieve the best test accuracy at 91.1% (8.9% misclassification) — with all five algorithms performing within 1.5% of each other on the test set, demonstrating that shape-based geometric features provide a consistently strong signal for automated grain classification regardless of the modeling approach used.
Technical Deep Dive
Architecture & Design
Classification Pipeline
Stage 1 — Data Setup & Sampling
Dataset
13,611 Grain Records
7 grain classes · 16 geometric shape features · No missing data
Sampling
Stratified Sample — 1,364 Records
Class proportions preserved · 75% train / 25% test split · 5-fold CV for tuning
Features
16 Geometric Shape Predictors
Area, Perimeter, Axis Lengths, Roundness, Compactness, Shape Factors · High inter-feature correlations noted
Stage 2 — Multi-Model Training & CV Tuning
Bagging
2,100 Trees
OOB error 8.87% · Bootstrap aggregation · adabag
Random Forests
500 Trees · 6 Vars/Split
OOB error 8.77% · Decorrelated trees · randomForest
Boosting
200 Trees · λ=0.05 · Depth 3
Sequential residual fitting · adabag
MDA
BIC-Selected Gaussian Mixture
Mixture model per class · Scaled inputs · mclust
ANN
Size 11 · Decay 4
Single hidden layer · Scaled inputs · Softmax output · nnet
Stage 3 — Evaluation & Variable Importance
Metrics
Misclassification Rate & ARI
Reported on both training and testing sets · Standard deviation across models computed for consistency
Variable Importance
Mean Decrease Accuracy & Gini
Computed for Bagging, RF & Boosting · Heatmap visualization across methods
Result
RF & ANN Best on Test — 91.1% Accuracy · All Models Within 1.5%
Perimeter, Shape Factor 1, Compactness, Minor Axis Length, Major Axis Length identified as top predictors · Extent consistently least important
Stage 1
Data Setup & Stratified Sampling
The full dataset contains 13,611 grain records across 7 classes, with 16 continuous geometric shape features extracted from high-resolution grain images using computer vision and feature extraction techniques. Features include area, perimeter, major and minor axis lengths, aspect ratio, eccentricity, roundness, compactness, and four shape factors — capturing both size and morphological characteristics of each grain. Given the computational intensity of fitting five models to 13,611 observations, a stratified random sample of 1,364 records is drawn, preserving the class proportion of each grain variety. The sample is split 75/25 into training and testing sets, and 5-fold cross-validation is applied throughout for all hyperparameter tuning decisions.
Stage 2
Multi-Model Training & Hyperparameter Tuning
All five algorithms are trained with CV-optimized hyperparameters. Bagging (2,100 trees) and Random Forests (500 trees, 6 variables per split) use bootstrap aggregation to reduce variance, with Random Forests decorrelating trees by randomly subsampling features at each split. Boosting grows 200 shallow trees (depth 3) sequentially on residuals with shrinkage λ=0.05, building a strong learner incrementally. MDA fits a BIC-selected Gaussian mixture model within each class — allowing non-elliptical class boundaries. The ANN uses a single hidden layer of 11 nodes with weight decay 4 and a Softmax output layer for 7-class prediction, trained on scaled inputs to prevent feature dominance by magnitude.
Stage 3
Evaluation & Variable Importance Analysis
All models are evaluated on misclassification rate and ARI across both training and testing sets. The train/test gap is the key diagnostic for overfitting — Boosting shows the largest gap (0.2% training vs 9.5% test), indicating more variance than the ensemble or neural methods. Variable importance is analyzed for Bagging, Random Forests, and Boosting using mean decrease in accuracy and Gini index, and visualized as a cross-model heatmap. Perimeter, Shape Factor 1, Compactness, Minor Axis Length, and Major Axis Length emerge as the most consistently important predictors. Extent is the least important variable across all methods — confirmed independently by both the accuracy-based and Gini-based importance measures.
Key Design Decisions
Stratified sampling preserves class distribution at scale
Simple random sampling from a dataset with unequal class sizes risks underrepresenting minority classes — resulting in models that are optimized for the majority class and perform poorly on rarer grain varieties. Stratified sampling ensures that each of the 7 grain classes appears in the 1,364-record sample in the same proportions as the full 13,611-record dataset. This guarantees that the training and testing sets reflect the true distribution of the problem, making evaluation metrics meaningful and ensuring all classes are adequately represented during hyperparameter tuning.
5-fold cross-validation applied consistently across all five algorithms
Using the same cross-validation strategy across all five models ensures that hyperparameter selection is conducted on a level playing field — no algorithm benefits from more favorable tuning conditions than another. 5-fold CV balances the statistical reliability of the error estimate against the computational cost of running five complex models with multiple hyperparameter configurations. The consistent application of CV also means that the final test set evaluation is a genuine holdout — untouched during any tuning step — making the reported misclassification rates and ARI values trustworthy estimates of out-of-sample performance.
Five algorithm families cover the full spectrum of inductive biases
Bagging and Random Forests reduce variance through averaging and decorrelation. Boosting reduces bias through sequential residual fitting. MDA uses a probabilistic generative model that can capture non-elliptical class shapes. ANNs learn hierarchical feature combinations through nonlinear transformations. By benchmarking across all five families rather than selecting a single approach upfront, the pipeline produces a result that is robust to the question of which algorithm family best suits this feature geometry — and the close performance across all five models confirms that the geometric features are genuinely discriminative regardless of which decision boundary type is used.
Tech Stack
| Technology | Purpose |
|---|---|
| R | Primary modeling environment for all five classification algorithms |
| caret | Unified training, cross-validation, and preprocessing interface across all models |
| randomForest | Random Forest implementation with OOB error estimation and variable importance |
| adabag | Bagging and Boosting implementations for tree ensemble methods |
| nnet | Single hidden layer feedforward neural network with Softmax output for multiclass prediction |
| mclust | Mixture Discriminant Analysis with BIC-based Gaussian mixture model selection per class |
Results & Metrics
What the system delivers
91.1%
Best Test Accuracy
Random Forests and ANN tied — 8.9% misclassification across 7 grain classes
<1.5%
Performance Spread
All five algorithms within 1.5% of each other on test misclassification — std dev 0.6%
5
Top Shape Features
Perimeter, Shape Factor 1, Compactness, Minor & Major Axis Length — consistent across all methods
Random Forests and ANNs jointly achieve the best test performance
Both Random Forests (500 trees, 6 variables per split) and the ANN (size 11, decay 4) achieve 8.9% test misclassification and ARI of 0.785 and 0.779 respectively — effectively tied within measurement precision. Random Forests also achieves the best OOB error among tree-based methods at 8.77%, and shows the smallest gap between training and testing performance, indicating the most stable generalization. The ANN achieves 7.9% training misclassification — slightly more variance than RF, but comparable at test time.
Boosting achieves near-perfect training accuracy but shows the highest overfitting
Boosting achieves 0.2% training misclassification (ARI 0.994) — the best training performance of all five models by a wide margin. However, its test misclassification rises to 9.5% (ARI 0.768), showing the largest train-to-test gap. This reflects Boosting's known tendency to overfit when sequential residual fitting drives training error too close to zero — the 200-tree, depth-3, λ=0.05 configuration is already well-tuned, but the algorithm's inherent mechanism produces more variance than the averaging-based ensemble methods.
All five algorithms perform within 1.5% — geometric features provide robust signal
The standard deviation of test misclassification across all five models is just 0.6%, and ARI standard deviation is 0.009 — indicating that the geometric shape features are sufficiently discriminative that the choice of algorithm matters far less than feature quality. This is a meaningful result for deployment: it suggests that even computationally simpler methods like Bagging (10.1% test misclassification) can serve as viable production classifiers when the alternative is manual inspection with far higher error rates.
Perimeter, compactness, and axis lengths are the most important shape features
Variable importance analysis across Bagging, Random Forests, and Boosting consistently identifies Perimeter, Shape Factor 1, Compactness, Minor Axis Length, and Major Axis Length as the top predictors of grain type. These features capture the boundary geometry and proportions of the grain — the characteristics that most reliably distinguish between varieties with different elongation, roundness, and edge profiles. Extent is consistently the least important variable across all three methods, suggesting that the ratio of grain pixels to bounding box area adds little discriminative power beyond what the other shape descriptors already capture.
The pipeline replaces manual inspection with a scalable, automated classification system
The full pipeline — from image-derived geometric features through five independently benchmarked classifiers to a final accuracy evaluation — delivers a production-ready foundation for automated grain quality control. At 91.1% accuracy across seven grain varieties, the system substantially outperforms the consistency and speed of manual visual inspection. The modular R implementation allows any of the five models to be swapped into production depending on the computational constraints of the deployment environment, with confidence that performance will remain within a narrow 1.5% band regardless of which model is selected.