Yesterday it was 48%. I mean sure we’ll see, or it’s going to take fundamentally new changes before we make new increases in performance
Login to reply
Replies (1)
Yesterday it was 48% with 57k RLHF pairs. Today it's 50% with zero. This is notable because the previous paper tried using less data and found it was necessary:
"Training data from math train 7.5k to Open-Reasoner-
Zero 57k, we observe a consistent increase in both training reward and response length for training and evaluation set, indicating that data scale plays a crucial role in training performance."
Leading to my conclusion that for zero pairs, the previous record was close to 0%. Maybe this isn't strictly true, but I expect it to be more predictive than seeing a 2% change