Qwen-2.5-1.5B-KCC-LiteRT-LM
This is an on-device farmer advisory language model fine-tuned on cleaned Kisan Call Centre (KCC) questionβanswer pairs from Indian smallholder farmers, then converted and packaged for efficient offline inference using Google's LiteRT / LiteRT-LM stack.
It is intended for low-connectivity, edge scenarios β such as mobile advisory apps for Indian farmers.
Model Details
- Base model: unsloth/Qwen2.5-1.5B-Instruct
- Fine-tuning method: LoRA (parameter-efficient) via Unsloth + TRL SFTTrainer
- Dataset: Cleaned "Farmers Call Query Data" by Das Koushik, based on data from data.gov.in β Only null/empty rows removed; no synthetic data, paraphrasing, or external augmentation
- Training regime: Very short, step-limited runs (max_steps=60, warmup_steps=5) on Colab Tesla T4 due to free-tier constraints
β Pilot for pipeline validation, not full convergence - Conversion: PyTorch β LiteRT (.tflite) using Google AI Edge Torch (v0.7.1)
β Static KV cache: 4096 tokens
β Result: ~1.6 GB .tflite artifact - Packaging: .tflite β .litertlm using LiteRT-LM (v0.8.1) tools
- Quantization: Quantized graph as produced by AI Edge Torch conversion
- Context length: 4096 tokens (fixed/static KV cache)
- Intended use: Offline, interactive agricultural advisory in low-resource settings
- Out-of-scope: General-purpose chat, high-precision agronomy, multi-turn memory beyond context limit, production-grade fluency
Performance on Consumer Hardware
Tested on Mac mini (Apple M4, 16 GB unified memory) using LiteRT-LM with GPU backend:
- Time-to-first-token (TTFT): < 1 second
- End-to-end response time (50β150 token advisory answers): ~2.5β4 seconds
- Throughput: Stable incremental decoding
Suitable for real-time, offline farmer-facing tools.
Important Limitations & Known Behaviors
This is an early engineering validation release β not a production model.
Due to extremely short training (Colab constraints):
- Strong mirroring of original KCC terse, bullet-list style β outputs often lack natural conversational flow
- Occasional near-verbatim reuse of training phrases with limited adaptation to query variations
- Mild repetition / incomplete reasoning (undertraining artifact)
LiteRT-LM specific observations (compared to PyTorch inference):
- Noticeably reduced coherence
- Increased repetition, fragmentation, or looping in some generations
- Responses sometimes feel more generic / less tightly grounded
β These are runtime-specific behaviors (not present in original PyTorch checkpoint).
Root cause not yet isolated due to lack of controlled ablation; likely contributors include decoding configuration, stop-token alignment, fixed KV-cache constraints, and runtime-specific sampling behavior.
The project deliberately prioritizes reproducible deployment path + failure mode transparency over peak quality. Full multi-epoch training + runtime debugging expected to improve results significantly.
Comparison with Prior Gemma-3n Effort
Compared to earlier Gemma-3n-E2B fine-tuning on the same task:
- Qwen2.5-1.5B wins on: conversion success, long-context stability, deployment reliability
- Gemma-3n-E2B wins on: more natural dialogue style, broader multilingual starting point
- Deciding factor: Gemma-3n could not be reliably converted to LiteRT-LM with public tooling β hard dead-end
See full comparison in the project documentation.
Usage
This model is packaged in .litertlm format for use with LiteRT-LM runtime (preview stage as of Dec 2025).
Refer to:
- LiteRT-LM GitHub
- Google AI Edge Gallery app (Android) for quick testing
- LiteRT documentation for integration into Android/iOS/macOS/Linux apps
- UAI.LiteRTLM β a Unity package wrapping LiteRT-LM inference, useful for building Android/Quest apps with this model
Reproducibility
Full pipeline (data cleaning β LoRA fine-tuning β merge β LiteRT conversion β LiteRT-LM packaging) is documented with scripts and exact commands in the associated repository.
Conversion requires high-RAM CPU instance (~128 GB recommended). No GPUs needed for conversion/packaging.
See the project documentation for step-by-step instructions (AWS EC2 r6i instances used in original work):
https://uralstech.github.io/Qwen-KCC-On-Device-Pipeline
- Downloads last month
- -
Model tree for uralstech/Qwen-2.5-1.5B-KCC-LiteRT-LM
Base model
Qwen/Qwen2.5-1.5B