[Query-ISSUE] tokenizer.vocab_size is 128000, however len(tokenizer) is 128256, which prevents me from using those other tokens.

#34

by HV-Khurdula - opened Oct 30, 2024

@HV-Khurdula The extra 256 are special tokens with token ids ranging from 128000-128255.

You can find the complete list in the tokenizer_config.json file.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment