Diffusion Single File
comfyui

Using multiple quality tags (NovelAI/Pony/etc) is a good idea

#47
by Shinku - opened

I think using multiple quality tags is actually a good idea. What is good or bad is subjective, so you can't really automate this. No matter how you train a model, it won't inherently understand what is "good", however, by using different rating models, the image model can more or less understand what these models consider a good or bad image without getting too biased toward a specific style favored by a specific rating model.

pony tags make gens look like slop

The best models often have the most accurate anatomy, which give high adherence in return and reduce morphing, can't go wrong with that

no quality tags is better. dont train on random "quality" assessments. just train on good data.

using multiple quality tags is actually a good idea. What is good or bad is subjective

No.

Pony v6 is a perfect illustration of the opposite: it heavily relies on [score_this, score_that_up, score_abracadabra_whatever] mantras in both positive and negative prompt, so much so it doesn't generate anything half-decent without them. Not only that is mandatory, and it occupies a good half of untruncated CLIP token budget.

If that's not OBJECTIVELY bad, I don't know what is: you're either down to train-wrecked images, as the model doesn't generate anything correctly without scoring slop, or can't prompt the model normally due to tiny window and truncation, with ALL trunkated chunks requiring scoring slop too! We can literally quantify how bad that is. And before you start arguing about it, this flaw was addressed by the model authors themselves as something they had to eliminate in their future models.

Granted, Qwen 3 has a much bigger window than CLIP. But it still is going to be an issue since it still will be mandatory input, and that is going to shift model's attention from the actual image description towards those quality tags. Even the best models work better with less concepts in the prompt.

And what exactly are we communicating about the image to the model with quality tags? "Bro, don't mess that up please"? That has absolutely no information about the actual image. If you want a [bad image, worst quality, score_1, shitpiece] specimen, you might as well use SD3, there is need to train a new model for that from scratch.

For these reasons all SOTA models don't need any quality tags at all, take Gemini 3 Pro/Flash Image - doesn't need them, Flux 2 - doesn't need them, Z-Image - doesn't need them, Qwen 3 - doesn't need them. If you use them, they take that as shwa tokens and ignore them. All good community fine-tunes either don't need them or require minimal mandatory input, it usually takes [best quality, masterpiece], and that's it. I don't see any reason we should ignore this experience here.

Sign up or log in to comment