Michael Goin
mgoin
AI & ML interests
LLM inference optimization, compression, quantization, pruning, distillation
Recent Activity
new activity 5 days ago
poolside/Laguna-XS.2-INT4:Add base_model new activity 5 days ago
poolside/Laguna-XS.2-NVFP4:Add base_model new activity 5 days ago
poolside/Laguna-XS.2-FP8:Add base_modelOrganizations
Add base_model
#1 opened 5 days ago
by
mgoin
Add base_model
#1 opened 5 days ago
by
mgoin
Add base_model
#1 opened 5 days ago
by
mgoin
Update MXFP4 format to compressed-tensors
1
#3 opened 3 months ago
by
mgoin
Support for B200s?
👀 5
3
#7 opened 8 months ago
by
shriramc
Quantization recipe?
2
#3 opened 10 months ago
by
veden
Not working with vLLM 0.9.1
5
#1 opened 10 months ago
by
zacksiri
Update config.json with the correct state
#1 opened 10 months ago
by
dsikka
Make model config compatible with Hugging Face MiniMax implementation
5
#39 opened 11 months ago
by
geetu040
Missing Tokenizer/Processor for use with Transformers
👍 1
5
#3 opened 11 months ago
by
mgoin
How should I input the image?
1
#3 opened about 1 year ago
by
CyberWolf0
用vllm serve启动不了
1
#2 opened about 1 year ago
by
VenomEY
Fix processor_class to match upstream
#4 opened about 1 year ago
by
zifeitong
Remove image_processor_type
#1 opened about 1 year ago
by
pooya-davoodi-parasail
OSError: nm-testing/Llama-3_1-Nemotron-Ultra-253B-v1-FP8-dynamic does not appear to have a file named decilm.py
2
#2 opened about 1 year ago
by
TheDrummer
how to deploy this model without internet connection
1
#1 opened about 1 year ago
by
superahn
Why not FP8 with static and per-tensor quantization?
👍 1
2
#2 opened about 1 year ago
by
wanzhenchn
Address discrepancies in the languages supported by the Mistral Small 3.1 2503
🔥 1
3
#54 opened about 1 year ago
by
fpaupier
Please update the chat template
1
#1 opened about 1 year ago
by
stelterlab
FP8 Dynamic/W8A16 Quants Please
6
#44 opened about 1 year ago
by
rjmehta