Replicating RoBERTa-base GLUE results

by marklee - opened Jun 28, 2022

Jun 28, 2022

Original issue here: https://github.com/huggingface/transformers/issues/17885

Hello! I had originally posted this on the forums but it seems like there's not much foot traffic there, so hoping to get more visibility here.

I'm trying to replicate RoBERTa-base GLUE results as reported in the model card. The numbers in the model card look like they were copied from the paper. Has anyone made an attempt to actually match these numbers with run_glue.py? If so, what configuration was used for the trainer?

If I follow the original configs from fairseq, I am unable to match the reported numbers for RTE, CoLA, STS-B, and MRPC.

Any pointers would be much appreciated, thanks!

julien-c

Facebook AI community org Jun 28, 2022

maybe pinging @myleott ?

Lohse

May 24, 2023

single card：

CUDA_VISIBLE_DEVICES=0

hyperparam：

--max_seq_length 128
--per_device_train_batch_size 64
--learning_rate 1e-4
--use_lora True
--r 8
--num_train_epochs 20 \

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment