Adds a config to enable yarn scaling with a factor of 3.2 (max recommended by qwen) such that the model can be deployed to inference providers with yarn scaling enabled by pinning to the PR revision
(in VLLM: --revision refs/pr/28)

logan-vegna-shopify changed pull request title from Base + Yarn 3.2 to Base + Yarn 8
logan-vegna-shopify changed pull request title from Base + Yarn 8 to Base + Yarn 3.2
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment