Djuunaa

djuna

AI & ML interests

None yet

Recent Activity

liked a model 1 day ago

owensong/Inflect-Nano-v1

reacted to owensong's post with 🔥 1 day ago

I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model. The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters. Quick facts: - 4.63M total inference parameters - 3.46M acoustic model - 1.17M vocoder - 24 kHz audio - English-only - Single male voice - Runs locally with a simple PyTorch inference script Why I made it: Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech. It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio. What works: It can generate arbitrary English speech locally, and the model is small enough to be interesting for: - local voice assistants - embedded/edge experiments - browser or WASM-style TTS exploration - efficient inference research - tiny-model baselines Limitations: The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable. So I would frame this as a research/demo release, not a production TTS engine. I’d love feedback from people interested in: - tiny speech models - vocoders - local TTS - efficient inference - embedded speech synthesis - improving small-model generalization If people find it useful, I’m interested in putting more training budget into a stronger v2. Model page: https://huggingface.co/owensong/Inflect-Nano-v1

liked a model 1 day ago

Kwai-Keye/Keye-VL-2.0-30B-A3B

View all activity

Organizations

liked a model 1 day ago

owensong/Inflect-Nano-v1

Text-to-Speech • Updated about 10 hours ago • 129

reacted to owensong's post with 🔥 1 day ago

Post

6059

I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model.

The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters.

Quick facts:
- 4.63M total inference parameters
- 3.46M acoustic model
- 1.17M vocoder
- 24 kHz audio
- English-only
- Single male voice
- Runs locally with a simple PyTorch inference script

Why I made it:
Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech.

It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio.

What works:
It can generate arbitrary English speech locally, and the model is small enough to be interesting for:

- local voice assistants
- embedded/edge experiments
- browser or WASM-style TTS exploration
- efficient inference research
- tiny-model baselines

Limitations:
The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable.

So I would frame this as a research/demo release, not a production TTS engine.

I’d love feedback from people interested in:
- tiny speech models
- vocoders
- local TTS
- efficient inference
- embedded speech synthesis
- improving small-model generalization

If people find it useful, I’m interested in putting more training budget into a stronger v2.

Model page:
owensong/Inflect-Nano-v1

liked 2 models 1 day ago

Kwai-Keye/Keye-VL-2.0-30B-A3B

Image-Text-to-Text • 31B • Updated 10 days ago • 11.1k • 115

Kwai-Keye/Keye-VL-2.0-30B-A3B-GGUF

Image-Text-to-Text • 31B • Updated 2 days ago • 573 • 14

liked a model 3 days ago

ovi054/qwen3-14b-manim-lora

Updated 15 days ago • 1

liked a model 9 days ago

andrewdalpino/MewZoom-V1-4X

Image-to-Image • 21.4M • Updated Apr 29 • 32 • 3

liked a model 13 days ago

CohereLabs/North-Mini-Code-1.0

Text Generation • 30B • Updated 5 days ago • 18.8k • 460

reacted to evalstate's post with 🚀 16 days ago

Post

3330

Hugging Face MCP Server v0.3.17
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

SEP-2640 "Skills Over MCP" support added (early access)

2 replies

liked 2 models 17 days ago

hanzogak/Anima-Comradeship

Text-to-Image • Updated 21 days ago • 6

JetBrains/Mellum2-12B-A2.5B-Instruct

Text Generation • 12B • Updated 8 days ago • 5.65k • 72

liked a model 18 days ago

NCUT-AI/Heliars-Phi4-Carla-X1-14B

Text Generation • 15B • Updated May 19 • 16 • 2

New activity in prithivMLmods/PiD-Image-Upscaler 22 days ago

Download button for png output

🔥 1

#1 opened 22 days ago by

djuna

upvoted a collection 22 days ago

TongUI

Collection

Open source our work TongUI: Building Generalized GUI Agents by Learning from Multimodal Web Tutorials; https://github.com/TongUI-agent/TongUI-agent • 14 items • Updated May 19 • 4

reacted to vincentg64's post with 🔥 24 days ago

Post

963

96% Correct Next Token Prediction, with No DNN, no Training, auto-distilled model - https://mltblog.com/4urfvTB

Over the last 12 months, I’ve built a model to predict the next token and to suggest synonyms or related queries to a user prompt, with 100% correct predictions on the training set in one shot, without training or deep neural networks (DNNs). The same model is now integrated in some of the most recent LLM architectures, albeit with costly training via DNNs. My version does not need DNNs or training.

The purpose of this article is to provide validation to my deep neural network alternative in the context of LLMs. The new model is as a substitute to standard DNNs, with increased explainability and higher accuracy. It is designed for corporate corpuses. The end goal is to provide better accuracy at a much lower cost, while providing full control over all the components.

An interesting feature is auto-distillation, whereas the model self-identifies weights that do not contribute over time in 99.9% of user-generated prompts, and drop them, based on prompts from a large, specialized user base. The gain is most spectacular in open-weight LLMs applied to specialized contexts, whether based on DNNs or not.

Read article and download the free technical paper with NVIDIA case study, at https://mltblog.com/4urfvTB

upvoted an article 24 days ago

Article

Harness, Scaffold, and the AI Agent Terms Worth Getting Right

sergiopaniego, ariG23498

•

26 days ago

• 116

liked a model 28 days ago

syvai/cohere-transcribe-diarize

Automatic Speech Recognition • 2B • Updated 29 days ago • 738 • 26

reacted to FlameF0X's post with 🚀 about 1 month ago

Post

278

Greetings Hugging Face!

I started a new project called **FWKV** (Feed-forward Weighted Key Value, or Floored Weighted Key Value), a RWKV-style LM that uses FFNNs (Feed-Forward Neural Networks) instead of RNN and floor(W·K·V). I'm hoping to make it much more efficient and scalable than RWKV.

So far I have:

- FlameF0X/FWKV-29M — this one is undertrained and doesn't have a Space yet. In the attached image you can see its speed on a T4 compared to models with the same configuration.

The only model that's fully working right now is:
- FlameF0X/FWKV-TinyStories — trained on TinyStories for one epoch. The demo Space is FlameF0X/FWKV-demo.

2 replies

liked 3 models about 1 month ago

Djuunaa

AI & ML interests

Recent Activity

Organizations

djuna's activity

Download button for png output

Harness, Scaffold, and the AI Agent Terms Worth Getting Right