Thread - Nostr Hypermedia

Jonathan _@jonathansm.com npub1uqee...jckg

https://arxiv.org/html/2501.12599v2 Finally found the paper at least. I had to go to the wasteland most have never traveled to: the rumored second page of the search results. Some interesting techniques that I haven't seen before. They made sure the problems they did the RL on really did requiring reasoning to solve by having a non-reasoning model guess the answer, and remove the problem if the answer was guessable without reasoning. They also modified the reward function to train the model to get the right answer with the least possible amount of thinking. That might help avoid some of those annoying times with DeepSeek R1 and other models where it thinks of the right answer and then says "Wait" and goes off in the completely wrong direction. Interesting stuff. Still can't figure out why this model release stayed completely under the radar. You would think the Twitter AI influencers would be talking this because Americans are still worried about China having better AI than us. nostr:nevent1qvzqqqqqqypzpcpnjdyv5m9vjuyvmx8xx830fw4d2dxle6rs3qdkt2jh6v8lwff7qythwumn8ghj7ct5d3shxtnwdaehgu3wd3skuep0qyt8wumn8ghj7etyv4hzumn0wd68ytnvv9hxgtcqyqr9x5prljey4s858rd0few6tnp6nzrwrezzpknse2qjulvz67smv2ppmfu

2025-04-09 16:09:14 from 1 relay(s) ↑ Parent Reply

Replies (1)