Update inference_tagger_standalone.py

by ClintHardwood - opened 4 days ago

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+325

-145

ClintHardwood

4 days ago

Fix backbone state-dict loading: remap backbone.model.layer.* → backbone.layer.*
The checkpoint stores the 32 transformer blocks under backbone.model.layer.N.* (HF-style, with an intermediate model wrapper), but DINOv3ViTH in this script declares them at backbone.layer.N.. Combined with strict=False, assign=True in load_state_dict, all 608 block parameters (32 layers × 19 tensors per block) were silently failing to load — the backbone ran on default nn.Linear / nn.LayerNorm initializations while only the head loaded correctly. The only hint was a printed [Tagger] Missing keys (608): ['backbone.layer.0.layer_scale1', ...] line that was easy to miss, and the model produced plausible-looking but essentially random tag predictions, making it feel like undertraining.
Confirmed by dumping tagger_proto.safetensors keys — they're all under backbone.model.layer.N. and the head is a single projection.weight of shape (74625, 6400).
Changes:

Strip the intermediate model. segment from backbone keys during loading so backbone.model.layer.N.* maps to self.layer[N].* correctly.
Load both backbone and head with strict=True, so any future name/shape drift fails loudly at load time instead of silently returning noise.
Auto-detect head layout (currently a single Linear) so this class of silent mis-load can't recur if the head changes later.
Minor: consistent aspect-ratio preservation in preprocessing, torch.zeros instead of torch.empty for embedding parameters, drop the redundant torch.autocast wrapper (backbone is explicitly cast to bf16, head stays fp32 per the training recipe).

Verified by running the loader against a synthesized state dict matching the real key layout (616 keys: 5 embedding + 608 block + 2 final norm + 1 head) — strict load passes and a forward returns the right logit shape. Also confirmed by another user who hit the same bug and fixed it by remapping the keys, reporting that outputs went "from horrifically bad to pretty much perfect."

Update inference_tagger_standalone.pyf76676a9

lodestones changed pull request status to merged 4 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment