Wanwei He
Grocery
AI & ML interests
LLM
Recent Activity
upvoted a paper 2 days ago
Learning Ordinal Probabilistic Reward from Preferences liked
a model 12 days ago
Qwen/Qwen3.5-35B-A3B commented on
a paper
6 months ago
Implicit Actor Critic Coupling via a Supervised Learning Framework for
RLVR