We shipped our tenth production LoRA fine-tuning pipeline last month.
The pattern never changes: teams spend weeks debating architecture, then throw garbage data at it and wonder why the model hallucinates.
Here is what actually matters.
1. Your data is worse than you think
We have never seen a “ready” dataset.
Before any training run, we spend 60-80% of the project on data work:
- Deduplication — Near-duplicates silently poison your loss curves
- Quality filtering — Outliers and formatting errors amplify during training
- Distribution analysis — Biased data produces biased models, end of story
- Synthetic augmentation — When done right, 10% synthetic + 90% real beats 100% real. When done wrong, it is poison.
Data curation is not preprocessing. It is the core engineering task.
2. Architecture decisions matter less than eval decisions
Every team asks: “LoRA or full fine-tune? QLoRA? What rank?”
These are optimization questions, not success questions.
The question that really matters: “How will we know if this model is better?”
We build custom eval harnesses before the first training run. Not standard benchmarks — custom ones tied to the actual task. If we cannot measure it, we cannot improve it.
Our standard pipeline: hold-out task-specific evals, regression checks against base model, and human-in-the-loop spot checks. Anything less is guessing.
3. Open-weight is not a compromise — it is a capability
We run Llama, Mistral, Qwen, and DeepSeek in production. Not because clients can’t afford APIs. Because open weights give you:
- Cost transparency — No per-token surprises at scale
- Latency control — Inference on your hardware, your network
- Data sovereignty — No data leaves your environment
- Iteration speed — Train, test, deploy without vendor approval cycles
The “open vs. closed” debate is a false binary. We use both. The right model for the job, not ideology.
The single pattern
Every successful fine-tuning project follows the same sequence:
- Define the task precisely (what does “better” mean?)
- Build the eval harness first
- Curate the data obsessively
- Train with disciplined logging
- Evaluate against baseline + regressions
- Deploy with monitoring
- Iterate
Skip any step and you are gambling.