Edge AI · Computer Vision · 2024 – 202675% Complete · Phase 06: Pipeline Integration

Hybrid99

Rice quality grading on the edge. A hybrid Classical CV + ML pipeline that fits in under 2MB — no cloud, no latency, no compromise.

Model Card

Segmenter

MicroUNet

467K params · 5.4MB FP32

Classifier

TinyMobile

49K params · 0.62MB FP32

Combined (INT8)

~1.56 MB

After QAT target

Hardware

Apple M1 Pro

MPS training

Core Thesis

“Classical CV handles 80–90% of grains for free. ML only fires for the hard 10–20%. The result is a system that's fast enough for a $35 edge device.”

94.92%

Segmenter IoU

82.45%

Classifier Acc

Grain Classes

Target vs. Achieved

✓Segmenter IoU94.92%/ 85%

✓Segmenter Dice97.38%/ 90%

⚠Classifier Acc82.45%/ 90%

01 — The Hybrid Engine

Why pay the ML tax when math is free?

The 80/20 efficiency rule: classical geometry handles the straightforward cases. ML only activates for the genuinely ambiguous ones.

Full Image

3024×3024 input

→

Classical CV

Contour + geometry

→

ML Segmenter

Clusters only (10-20%)

→

Classifier

9-class TinyMobile

→

Quality Report

TTA + dedup

Stage 1 — Classical CV80–90%

Free & Instantaneous

Thresholding + contour detection computes area, perimeter, solidity, and circularity. Stones are round → geometry catches them. Single grains get sent straight to the classifier. Zero ML inference cost.

0 params · 0 KB · ~0ms

Stage 2 — ML Segmenter10–20%

Precision Where It Counts

MicroUNet activates only for touching grain clusters. Input: 256×256 crop. Output: instance masks that feed back into the same classifier path as single grains.

467K params · 5.4MB FP32 → ~1.4MB INT8

Resource Comparison

Metric	Standard ML Pipeline	Hybrid99
Model Size	YOLOv8n ~6MB+	~1.56MB combined (INT8 target)
Inference Path	Every grain through ML	80% via free geometry
Foreign Detection	ML classification	Math: circularity > 0.7
Cluster Handling	Single model	Dedicated MicroUNet segmenter
Edge Viable	Marginal	Yes — <100ms target

02 — The Data Factory

When the algorithm fails, you build the tool.

Automatic watershed annotation was unreliable for touching grains. The pivot: synthetic data generation and a custom masking tool for real annotations.

❌ Step 02 — Failed

Watershed Annotation

Automatic watershed-based segmentation produced unreliable masks for touching rice grain clusters. Over-segmentation of overlapping grains and sensitivity to lighting made the approach untenable for production-quality labels.

✓ Step 2.1 — Pivot

Synthetic + Manual Masking Tool

Built a custom masking tool for precise manual COCO-format annotations with RLE mask encoding. Paired with synthetic cluster generation (compositing individual grains) for initial prototyping. Result: 1,517 production-quality annotated image pairs.

Built From Scratch

Custom Masking Tool

A purpose-built annotation interface for pixel-level segmentation of rice grain clusters. Outputs individual COCO JSON files with RLE-encoded masks, directly consumable by the MicroUNet training pipeline. Enforces data quality through an accept/reject workflow — only clean annotations make it tomasking_tool/dataset/accepted/.

Annotation FormatCOCO JSON + RLE

Accepted Pairs1,517

Train Split1,213 (80%)

Val Split304 (20%)

5,421

Total labeled samples

1,517

Segmenter pairs

2,447

Classifier crops

Grain categories

03 — Two-Stage Architecture

One model segments. One model classifies.

Each model solves exactly one problem — making size optimization and iteration independent.

Stage A — Segmenter

MicroUNet

✓ Trained

2-level U-Net with skip connections. Pixel-level instance segmentation of touching grain clusters. Fixed a tensor dimension mismatch (ValueError: target [B,H,W] != input [B,1,H,W]) by adding target.unsqueeze(1) in loss computation.

MicroUNet segmentation at epoch 100 — input, ground truth mask, and model prediction

⤢

debug_epoch_100.png — Input · Ground Truth · Prediction

Parameters

467,169

FP32 Size

~5.4 MB

Input

256×256 RGB

Loss

BCE + Dice (0.5/0.5)

Val IoU

94.92% ✓

Val Dice

97.38% ✓

Stage B — Classifier

TinyMobileClassifier

✓ Trained

MobileNet-style depthwise separable CNN. Handles 9 grain classes including rare defects (weevil, stone, shell_dust). Border rejection excludes partial grains touching image edges from training — a subtle but critical data quality decision.

TinyMobile classifier sample predictions at epoch 60 — 9 grain classes

⤢

sample_predictions_epoch_60.png — True Label vs. Prediction

Parameters

49,609

FP32 Size

~0.62 MB

Input

48×48 RGB

Classes

Val Accuracy

82.45%

Epochs

61 (early stop)

9-Class Breakdown

⚠broken

⚠healthy

✓immature

✓weevil

~shell_dust

✓stone

~covered

✓red

~yellow

04 — Rigor & Results

Every failure is a data point.

Training curves, confusion matrices, and misclassification logs — raw evidence of engineering depth, not cherry-picked accuracy numbers.

TinyMobileClassifier — Training Curves

Early stopping triggered at epoch 61. Note the train/val accuracy gap (~17%) — the primary driver is the intrinsic visual similarity between healthy ↔ broken partial grains. Plan: TTA + additional borderline samples.

Convergence: epoch 61

TinyMobile classifier training and validation loss/accuracy curves over 61 epochs

⤢

Confusion Matrix — Epoch 60

The healthy↔broken confusion boundary drives the 82.45% accuracy floor. Weevil, stone, and red grains show high confidence due to distinctive visual features.

9-class confusion matrix at epoch 60 showing healthy/broken confusion boundary

⤢

Misclassification Log — Epoch 60

# track_misclassifications.py — epoch 60

File: IMG_0507_grain_036_grain_001.jpg

True: broken

Predicted: healthy

File: IMG_0701_grain_013_grain_001.jpg

True: broken

Predicted: healthy

File: IMG_0683_grain_022_grain_002.jpg

True: healthy

Predicted: weevil

— 3 misclassifications in sample batch

$ _

Why 82.45% and the path to 90%

healthy↔broken overlap

More borderline training samples + cutout augmentation

Train/val gap ~17%

Test-Time Augmentation (TTA) for inference robustness

yellow only 72 samples

WeightedRandomSampler + √-dampened class weights already applied

05 — Edge Deployment Roadmap

From research to the edge.

~6.0MB FP32 combined → ~1.56MB INT8 via QAT. Then ONNX and TFLite for cross-platform edge deployment.

Model Size Budget

Classical CV + Geometry

0 KB

free

MicroUNet (Segmenter)

5.4 MB

~1.4 MB

trained

TinyMobileClassifier

0.62 MB

~0.16 MB

trained

Combined ML

~6.0 MB

~1.56 MB

QAT pending

Step 07

Not Started

INT8 QAT

Quantization-Aware Training for both models. Expected 3.8× size reduction on segmenter. Target: segmenter <0.5MB, classifier <0.2MB. Verify accuracy drop <2%.

Step 08a

Not Started

ONNX Export

Cross-platform inference. Enables deployment on hardware without PyTorch. Benchmark inference speed on target edge device.

Step 08b

Not Started

TFLite Export

Mobile and microcontroller deployment. Required for Android-based handheld graders or embedded systems in grain processing facilities.

Architectural Reuse

The pattern is domain-agnostic.

Any domain with small-object detection and a clear size budget can use this exact hybrid pattern. Classical CV filters the easy 80%. ML handles the rest. Swap the classifier labels, retrain on domain data. Pharmaceutical tablet inspection, seed quality grading, PCB defect detection — all viable targets.

→Pharmaceuticals

→Seed grading

→PCB inspection

→Food sorting

Edge AI · 2024–202675% Complete

Interested in the full pipeline?

Step 06 integration is active. QAT quantization and ONNX export coming next. Let’s connect if you’re working on edge inference, agricultural tech, or small-object detection.

Let’s Connect More Projects