-
DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition
Paper • 2101.08674 • Published • 2 -
Aligning Anime Video Generation with Human Feedback
Paper • 2504.10044 • Published • 1 -
Illustrious: an Open Advanced Illustration Model
Paper • 2409.19946 • Published • 15
Yamata Zen
yamatazen
AI & ML interests
None yet
Recent Activity
updated a collection about 15 hours ago
AI for anime upvoted a paper about 15 hours ago
Illustrious: an Open Advanced Illustration Model liked a model about 23 hours ago
Abhiray/gemma-4-E4B-it-heretic-GGUFOrganizations
None yet
Optimizers
-
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Paper • 2307.02047 • Published • 2 -
Practical Efficiency of Muon for Pretraining
Paper • 2505.02222 • Published • 40 -
AdaMuon: Adaptive Muon Optimizer
Paper • 2507.11005 • Published • 2 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 12
GGUF tools
Model merging
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 21 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 14 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper • 2505.11427 • Published • 14 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Japanese LLMs
-
mradermacher/Himeyuri-v0.1-12B-i1-GGUF
12B • Updated • 1 • 2 -
spow12/ChatWaifu_12B_v2.0
Text Generation • 12B • Updated • 21 • • 22 -
Local-Novel-LLM-project/Vecteus-v1
Text Generation • 7B • Updated • 42 • 34 -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published
LLM leaderboards
- Running1.65k
UGI Leaderboard
📢1.65kUncensored General Intelligence Leaderboard
- Runtime error108
Open Japanese LLM Leaderboard
🌸108Explore and compare LLM models with interactive filters and visualizations
- Running on CPU Upgrade13.9k
Open LLM Leaderboard
🏆13.9kTrack, rank and evaluate open LLMs and chatbots
- Running4.82k
Arena Leaderboard
🏆4.82kView the LMArena language model leaderboard
Genshin Impact
Autoregressive image generation
-
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 9 -
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper • 2601.02204 • Published • 63 -
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper • 2601.02256 • Published • 33 -
Autoregressive Image Generation with Masked Bit Modeling
Paper • 2602.09024 • Published • 7
AGI
-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 24 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 83 -
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Paper • 2507.06952 • Published • 7
Multilingual LLMs
-
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability
Paper • 2306.06688 • Published -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published -
Language Models' Factuality Depends on the Language of Inquiry
Paper • 2502.17955 • Published • 32 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
AI censorship
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper • 2501.18492 • Published • 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper • 2412.19512 • Published • 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper • 2407.16637 • Published • 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper • 2406.11717 • Published • 9
Grokking
-
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 42 -
Grokking at the Edge of Numerical Stability
Paper • 2501.04697 • Published • 2 -
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Paper • 2506.21551 • Published • 28
AI for anime
-
DAF:re: A Challenging, Crowd-Sourced, Large-Scale, Long-Tailed Dataset For Anime Character Recognition
Paper • 2101.08674 • Published • 2 -
Aligning Anime Video Generation with Human Feedback
Paper • 2504.10044 • Published • 1 -
Illustrious: an Open Advanced Illustration Model
Paper • 2409.19946 • Published • 15
Genshin Impact
Optimizers
-
CAME: Confidence-guided Adaptive Memory Efficient Optimization
Paper • 2307.02047 • Published • 2 -
Practical Efficiency of Muon for Pretraining
Paper • 2505.02222 • Published • 40 -
AdaMuon: Adaptive Muon Optimizer
Paper • 2507.11005 • Published • 2 -
Muon is Scalable for LLM Training
Paper • 2502.16982 • Published • 12
Autoregressive image generation
-
Hyperspherical Latents Improve Continuous-Token Autoregressive Generation
Paper • 2509.24335 • Published • 9 -
NextFlow: Unified Sequential Modeling Activates Multimodal Understanding and Generation
Paper • 2601.02204 • Published • 63 -
VAR RL Done Right: Tackling Asynchronous Policy Conflicts in Visual Autoregressive Generation
Paper • 2601.02256 • Published • 33 -
Autoregressive Image Generation with Masked Bit Modeling
Paper • 2602.09024 • Published • 7
GGUF tools
AGI
-
Thinking Beyond Tokens: From Brain-Inspired Intelligence to Cognitive Foundations for Artificial General Intelligence and its Societal Impact
Paper • 2507.00951 • Published • 24 -
On Path to Multimodal Generalist: General-Level and General-Bench
Paper • 2505.04620 • Published • 83 -
What Has a Foundation Model Found? Using Inductive Bias to Probe for World Models
Paper • 2507.06952 • Published • 7
Model merging
-
Arcee's MergeKit: A Toolkit for Merging Large Language Models
Paper • 2403.13257 • Published • 21 -
Model Stock: All we need is just a few fine-tuned models
Paper • 2403.19522 • Published • 14 -
Mergenetic: a Simple Evolutionary Model Merging Library
Paper • 2505.11427 • Published • 14 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Multilingual LLMs
-
Language Versatilists vs. Specialists: An Empirical Revisiting on Multilingual Transfer Ability
Paper • 2306.06688 • Published -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published -
Language Models' Factuality Depends on the Language of Inquiry
Paper • 2502.17955 • Published • 32 -
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models
Paper • 2410.01335 • Published • 5
Japanese LLMs
-
mradermacher/Himeyuri-v0.1-12B-i1-GGUF
12B • Updated • 1 • 2 -
spow12/ChatWaifu_12B_v2.0
Text Generation • 12B • Updated • 21 • • 22 -
Local-Novel-LLM-project/Vecteus-v1
Text Generation • 7B • Updated • 42 • 34 -
Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs
Paper • 2412.14471 • Published
AI censorship
-
GuardReasoner: Towards Reasoning-based LLM Safeguards
Paper • 2501.18492 • Published • 88 -
Safeguard Fine-Tuned LLMs Through Pre- and Post-Tuning Model Merging
Paper • 2412.19512 • Published • 9 -
Course-Correction: Safety Alignment Using Synthetic Preferences
Paper • 2407.16637 • Published • 26 -
Refusal in Language Models Is Mediated by a Single Direction
Paper • 2406.11717 • Published • 9
LLM leaderboards
- Running1.65k
UGI Leaderboard
📢1.65kUncensored General Intelligence Leaderboard
- Runtime error108
Open Japanese LLM Leaderboard
🌸108Explore and compare LLM models with interactive filters and visualizations
- Running on CPU Upgrade13.9k
Open LLM Leaderboard
🏆13.9kTrack, rank and evaluate open LLMs and chatbots
- Running4.82k
Arena Leaderboard
🏆4.82kView the LMArena language model leaderboard
Grokking
-
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Paper • 2405.15071 • Published • 42 -
Grokking at the Edge of Numerical Stability
Paper • 2501.04697 • Published • 2 -
Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test
Paper • 2506.21551 • Published • 28