tokyotech-llm

university

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

s-mizuki-nlp updated a Space about 18 hours ago

tokyotech-llm/README

s-mizuki-nlp published a model 1 day ago

tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1-MXFP4

s-mizuki-nlp published a model 1 day ago

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1-MXFP4

View all activity

updated a Space about 18 hours ago

README

published 2 models 1 day ago

tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1-MXFP4

Text Generation • 120B • Updated 1 day ago • 259 • 1

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1-MXFP4

Text Generation • 22B • Updated 1 day ago • 260

updated a collection 1 day ago

GPT-OSS-Swallow-v0.1

6 items • Updated 1 day ago • 13

updated 2 models 1 day ago

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1-MXFP4

Text Generation • 22B • Updated 1 day ago • 260

tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1-MXFP4

Text Generation • 120B • Updated 1 day ago • 259 • 1

updated 2 models 1 day ago

tokyotech-llm/GPT-OSS-Swallow-20B-RL-v0.1-MXFP4

Text Generation • 22B • Updated 1 day ago • 260

tokyotech-llm/GPT-OSS-Swallow-120B-RL-v0.1-MXFP4

Text Generation • 120B • Updated 1 day ago • 259 • 1

updated a Space 7 days ago

README

authored a paper 4 months ago

On the Optimal Reasoning Length for RL-Trained Language Models

Paper • 2602.09591 • Published Feb 10 • 6

authored a paper 8 months ago

MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources

Paper • 2509.25531 • Published Sep 29, 2025 • 10

authored a paper 9 months ago

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs

Paper • 2411.08719 • Published Nov 10, 2024 • 1

authored a paper 9 months ago

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

Paper • 2508.18672 • Published Aug 26, 2025 • 10

authored a paper 11 months ago

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Paper • 2505.02881 • Published May 5, 2025 • 7

authored a paper about 1 year ago

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

Paper • 2505.02881 • Published May 5, 2025 • 7

authored 4 papers about 1 year ago

Building Instruction-Tuning Datasets from Human-Written Instructions with Open-Weight Large Language Models

Paper • 2503.23714 • Published Mar 31, 2025 • 2

Balancing Speed and Stability: The Trade-offs of FP8 vs. BF16 Training in LLMs

Paper • 2411.08719 • Published Nov 10, 2024 • 1

Why We Build Local Large Language Models: An Observational Analysis from 35 Japanese and Multilingual LLMs

Paper • 2412.14471 • Published Dec 19, 2024

Wider or Deeper? Scaling LLM Inference-Time Compute with Adaptive Branching Tree Search

Paper • 2503.04412 • Published Mar 6, 2025 • 6