GenRL
Collection
Model collections trained with our framework: https://github.com/ModelTC/GenRL • 3 items • Updated
• 3
This is a LoRA adapter for Wan2.1-T2V-1.3B fine-tuned using Group Relative Policy Optimization (GRPO) with multi-reward optimization.
This model was optimized using a weighted combination of four reward functions:
| Reward Function | Weight | Purpose |
|---|---|---|
| HPSv3 General | 1.0 | General aesthetic quality assessment |
| HPSv3 Percentile | 1.0 | Percentile-based aesthetic normalization |
| VideoAlign Motion Quality | 1.0 | Video motion coherence and quality |
| VideoAlign Text Alignment | 1.0 | Text-to-video semantic alignment |
full_shardbfloat16flow_sde{
"r": 128,
"lora_alpha": 64,
"target_modules": [
"to_k",
"to_q",
"to_v",
"to_out.0",
"net.0.proj",
"net.2"
],
"lora_dropout": 0.0,
"bias": "none",
"init_lora_weights": "gaussian"
}
pip install diffusers transformers accelerate torch
import torch
from diffusers import WanPipeline
from diffusers.utils import export_to_video
# Load base model
pipe = WanPipeline.from_pretrained(
"Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
torch_dtype=torch.bfloat16,
device_map="auto"
)
# Load LoRA weights
pipe.load_lora_weights("YOUR_USERNAME/longcat-step1000")
# Generate video
prompt = "A golden retriever playing in a sunny park, high quality, detailed"
video = pipe(
prompt=prompt,
height=480,
width=832,
num_frames=81,
num_inference_steps=50,
guidance_scale=4.5,
generator=torch.Generator().manual_seed(42)
).frames[0]
# Save video
export_to_video(video, "output.mp4", fps=16)
from diffusers import WanPipeline
import torch
pipe = WanPipeline.from_pretrained(
"Wan-AI/Wan2.1-T2V-1.3B-Diffusers",
torch_dtype=torch.bfloat16
).to("cuda")
# Load this LoRA
pipe.load_lora_weights("YOUR_USERNAME/longcat-step1000")
# Generate
video = pipe(
"A cat walking on the street",
height=480,
width=832,
num_frames=81,
num_inference_steps=50,
guidance_scale=4.5
).frames[0]
This checkpoint at 1000 training steps shows significant improvements in:
Note: This mid-training checkpoint offers a good balance between quality and training time. For the best performance, consider checkpoint-1500.
This model was trained using GenRL, a scalable reinforcement learning framework for visual generation.
This model is released under the MIT License.
If you use this model in your research, please cite:
@misc{genrl,
author = {GenRL Contributors},
title = {GenRL: Reinforcement Learning Framework for Visual Generation},
year = {2026},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/ModelTC/GenRL}},
}
Base model
Wan-AI/Wan2.1-T2V-1.3B-Diffusers