Qwen-2.5-1.5B-KCC-LiteRT-LM

This is an on-device farmer advisory language model fine-tuned on cleaned Kisan Call Centre (KCC) question–answer pairs from Indian smallholder farmers, then converted and packaged for efficient offline inference using Google's LiteRT / LiteRT-LM stack.

It is intended for low-connectivity, edge scenarios — such as mobile advisory apps for Indian farmers.

Model Details

Base model: unsloth/Qwen2.5-1.5B-Instruct
Fine-tuning method: LoRA (parameter-efficient) via Unsloth + TRL SFTTrainer
Dataset: Cleaned "Farmers Call Query Data" by Das Koushik, based on data from data.gov.in → Only null/empty rows removed; no synthetic data, paraphrasing, or external augmentation
Training regime: Very short, step-limited runs (max_steps=60, warmup_steps=5) on Colab Tesla T4 due to free-tier constraints
→ Pilot for pipeline validation, not full convergence
Conversion: PyTorch → LiteRT (.tflite) using Google AI Edge Torch (v0.7.1)
→ Static KV cache: 4096 tokens
→ Result: ~1.6 GB .tflite artifact
Packaging: .tflite → .litertlm using LiteRT-LM (v0.8.1) tools
Quantization: Quantized graph as produced by AI Edge Torch conversion
Context length: 4096 tokens (fixed/static KV cache)
Intended use: Offline, interactive agricultural advisory in low-resource settings
Out-of-scope: General-purpose chat, high-precision agronomy, multi-turn memory beyond context limit, production-grade fluency

Performance on Consumer Hardware

Tested on Mac mini (Apple M4, 16 GB unified memory) using LiteRT-LM with GPU backend:

Time-to-first-token (TTFT): < 1 second
End-to-end response time (50–150 token advisory answers): ~2.5–4 seconds
Throughput: Stable incremental decoding

Suitable for real-time, offline farmer-facing tools.

Important Limitations & Known Behaviors

This is an early engineering validation release — not a production model.

Due to extremely short training (Colab constraints):

Strong mirroring of original KCC terse, bullet-list style → outputs often lack natural conversational flow
Occasional near-verbatim reuse of training phrases with limited adaptation to query variations
Mild repetition / incomplete reasoning (undertraining artifact)

LiteRT-LM specific observations (compared to PyTorch inference):

Noticeably reduced coherence
Increased repetition, fragmentation, or looping in some generations
Responses sometimes feel more generic / less tightly grounded

→ These are runtime-specific behaviors (not present in original PyTorch checkpoint).
Root cause not yet isolated due to lack of controlled ablation; likely contributors include decoding configuration, stop-token alignment, fixed KV-cache constraints, and runtime-specific sampling behavior.

The project deliberately prioritizes reproducible deployment path + failure mode transparency over peak quality. Full multi-epoch training + runtime debugging expected to improve results significantly.

Comparison with Prior Gemma-3n Effort

Compared to earlier Gemma-3n-E2B fine-tuning on the same task:

Qwen2.5-1.5B wins on: conversion success, long-context stability, deployment reliability
Gemma-3n-E2B wins on: more natural dialogue style, broader multilingual starting point
Deciding factor: Gemma-3n could not be reliably converted to LiteRT-LM with public tooling → hard dead-end

See full comparison in the project documentation.

Usage

This model is packaged in .litertlm format for use with LiteRT-LM runtime (preview stage as of Dec 2025).

Refer to:

LiteRT-LM GitHub
Google AI Edge Gallery app (Android) for quick testing
LiteRT documentation for integration into Android/iOS/macOS/Linux apps
UAI.LiteRTLM — a Unity package wrapping LiteRT-LM inference, useful for building Android/Quest apps with this model

Reproducibility

Full pipeline (data cleaning → LoRA fine-tuning → merge → LiteRT conversion → LiteRT-LM packaging) is documented with scripts and exact commands in the associated repository.

Conversion requires high-RAM CPU instance (~128 GB recommended). No GPUs needed for conversion/packaging.

See the project documentation for step-by-step instructions (AWS EC2 r6i instances used in original work):
https://uralstech.github.io/Qwen-KCC-On-Device-Pipeline

Downloads last month: 4

Model tree for uralstech/Qwen-2.5-1.5B-KCC-LiteRT-LM

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

unsloth/Qwen2.5-1.5B-Instruct

Finetuned

(143)

this model