BertForMaskedLM train

yun · December 8, 2020, 9:10am

I have a question
When training using BertForMaskedLM, is the train data as below correct?

token2idx

<pad> : 0, <mask>: 1, <cls>:2, <sep>:3

max len : 8
input token

 <cls> hello i <mask> cats <sep>

input ids

 [2, 34,45,1,56,3,0,0]

attention_mask

 [1,1,1,1,1,1,0,0]

labels

 [-100,-100,-100,64,-100,-100,-100,-100]

I wonder if I should also assign -100 to labels for padding token.

ayalaall · January 19, 2021, 10:38am

Hi,
Were you able to figure it out? I’m also trying to do the same thing.

Thanks,
Ayala

valhalla · January 20, 2021, 8:05am

you should replace all tokens (including paddding) in labels with -100 except the masked tokens so the loss will only be calculated for masked tokens.

Topic		Replies	Views
BertForMaskedLM training from scratch 🤗Transformers	0	1094	April 7, 2023
Where in the code does masking of tokens happen when pretraining BERT Beginners	5	7433	August 17, 2020
BertForMaskedLM’s loss and scores, how the loss is computed? 🤗Transformers	13	25592	September 22, 2023
Use of "input_ids,token_type_ids and lm_labels" in BERT Language model 🤗Transformers	1	1094	September 20, 2020
Apply BertForTokenClassification on partially labeled input 🤗Transformers	0	286	November 16, 2021

BertForMaskedLM train

Related topics