Home/Products/Resume TLM
Independent Research · 2025 – 2026In Progress

Resume TLM

High-fidelity resume parsing without the LLM tax. A modular PyTorch pipeline: boundary detection → section classification → entity extraction.

resume-tlm · inference
Architecture
DistilRoBERTa + CRF
Parameters
66M
CPU Inference
~78ms
Quantization
INT8 · ONNX
01 — Engineering Thesis

Efficiency as a feature.

The “LLM Tax”: the unnecessary cost and latency of using a massive model for a structured extraction task.

MetricGPT-4o / Claude 3.5Resume TLM ✓
Parameters∼175B+66M
CPU Inference Latency3–8s (API round-trip)~78ms
Cost per 1K resumes$15–40 (API)~$0.002 (compute)
Schema Adherence~92% (hallucination risk)99.8% (deterministic CRF)
Runs offline / on-deviceNoYes (TorchScript / ONNX)
Token-level confidenceNoYes (per-entity score)
02 — The Data Factory

High-quality parsing needs high-quality ground truth.

Instead of downloading a dataset, I built the machinery to create one — a custom Human-in-the-Loop Labeling Workbench.

01
Raw PDF
203 resumes
02
Local LLM Pre-label
Gemma 4 via Ollama
03
Human Review UI
Next.js annotator
04
Gold Dataset
MongoDB Atlas
🏷️

LLM Pre-annotation

Gemma 4 generates "silver standard" labels for every token before a human sees the resume. Batch runs process 12 resumes per Ollama cooldown window with smart skip logic for already-labeled docs.

70% less manual effort
👁️

Visual Labeling UI

Custom Next.js app renders token bounding boxes, section clusters, and BIO tag assignments in a 3-stage interface (Live → Heuristic → AI) across Skills, Experience, Education, and Projects sections.

3-stage review pipeline
🛡️

Data Integrity

The UI enforces 8D/24D spatial feature constraints during labeling. BIO violation audit scripts catch illegal tag transitions before training. Per-document loss scores surface high-loss outliers for re-review.

savedBy: 'user' | 'model'
203
Total resumes
189
Fully labeled
5
Label stages
14
Manual review queue
03 — The Laboratory

Architecture deep dive.

GLU Spatial Fusion

Resumes are spatial documents — a header's position is as informative as its text. A Gated Linear Unit selectively blends token semantics with layout features at inference time, without hardcoding layout rules:

# GLUSpatialFusion forward pass
gate = sigmoid(W_g · [f_text ; f_spatial])
output = gate × f_text + (1 − gate) × f_spatial
 
# Personal model: 24D spatial features
# All other models: 8D spatial features
 
# 8D vector per token:
# x0n · y0n · wn · hn
# bold · caps · font_n · abs_y

Training Configuration

Optimizer
AdamW · lr=2e-5
LR Schedule
Cosine decay + 10% warmup
Gradient clip
max_norm=1.0
Early stopping
Patience = 4 epochs
Focal Loss
γ=2.0 (personal model)
CRF head
Personal model only
Class weights
sqrt-inverse-freq [0.5, 5.0]
Batch size
8 (token) · 16 (chunk)

4-Stage Inference Pipeline

Stage 1boundary
Token-level: O / B-HEADING / I-HEADING

Detects all section heading boundaries across the full document. Heavy O-class imbalance handled with sqrt-inverse-frequency class weights.

Stage 2section_chunk
Sequence-level: chunk → semantic section label

Classifies heading+body blocks as EXPERIENCE / SKILLS / EDUCATION / etc. 25% header-stripping augmentation for headless section robustness. Virtual PERSONAL chunk for pre-heading personal info.

Stage 3apersonal
Token-level entity: NAME, EMAIL, PHONE, GITHUB …

24D spatial features + CRF head + Focal Loss (γ=2.0). Faker-based entity swapping augmentation for location data. Enforces legal BIO tag transitions.

Stage 3bexp_boundary + exp_label
Entry segmentation + role/company/date extraction

ExpBoundaryDataset uses confirmed experienceEntryHeads as ground truth. Skips docs with no confirmed heads. Entity labels: ROLE, COMP, COMP_LOC, SDATE, EDATE, DESC.

04 — JSON Sandbox

Show, don't tell.

Simulated extraction output showing the structured JSON the model produces — with per-field confidence scores.

Raw Resume Text
Priya Sharma
priya@example.com | +91 98765 43210
github.com/priya | Mumbai, India

EXPERIENCE
Software Engineer — Acme Corp (2022–2024)
Built microservices in Go, reduced p99 latency by 40%

SKILLS
Go, Python, Kubernetes, PostgreSQL, gRPC

EDUCATION
B.Tech Computer Science — IIT Bombay (2018–2022)
Extracted JSON✓ Structured
{
"personal": {
"name":"Priya Sharma"conf: 0.994
"email":"priya@example.com"conf: 0.971
"phone":"+91 98765 43210"conf: 0.958
"github":"github.com/priya"conf: 0.933
"location":"Mumbai, India"conf: 0.912
},
"sections_detected": [
"EXPERIENCE"
"SKILLS"
"EDUCATION"
],
"experience": [
{
"role": "Software Engineer"
"company": "Acme Corp"
"start_date": "2022"
"end_date": "2024"
"confidence": 0.967
}
]
}
05 — System Architecture

Full stack, production-ready.

// inference flow
PDF Upload ──▶ OCR / PyMuPDF ──▶ Spatial Token Extraction
↓ (8D/24D spatial vectors per token)
DistilRoBERTa-Base ──▶ GLUSpatialFusion ──▶ CRF / Linear Head
↓ (4-stage modular pipeline)
Structured JSON ──▶ MongoDB Atlas ──▶ API Response
// labeling stack
Next.js UI ──▶ Ollama (Gemma 4) ──▶ FastAPI Training Engine
↓ (PyTorch · MPS / CUDA-agnostic)
MongoDB (labels) ──▶ DataLoader ──▶ Model Checkpoints
// deployment target
PyTorch ──▶ TorchScript / ONNX Runtime ──▶ INT8 Quantization
Next.jsReactTailwindCSSPythonFastAPIPyTorchHuggingFace Transformersdistilroberta-baseOllama / Gemma 4MongoDB AtlasONNX RuntimeTorchScriptPyMuPDFtorchcrfAdamW
06 — Failure Log

Where it fails — and how it's being fixed.

Honesty as an engineering signal. Real edge cases, their root causes, and the current mitigation status.

07 — Milestones

Build log.

Labeling app + human-verified gold dataset✓ Done
Training engine (FastAPI + PyTorch)✓ Done
All dataset classes + model architectures✓ Done
Auto-label full DB: Sections, Skills, Experience (May 2026)✓ Done
Assign trainingMeta.split to all 203 resumes
Train boundary model (baseline metrics)
Train section_chunk model
Train personal model with updated dataset
Train exp_boundary + exp_label models
Evaluate all models · high-loss outlier analysis
Add /infer endpoint for active-learning loop
INT8 / ONNX quantization for deployment
Get in touch

Interested in the research?

The training engine is actively in development. Happy to talk architecture, labeling strategy, or production NLP challenges.