https://livecodebench.github.io/leaderboard.html
Currently the best model by far for coding is Kimi by the Chinese company Moonshot. It's trouncing every American model. Even though it was released weeks ago I've never heard of it. I can't find literally a single website in English mentioning it. This feels like DeepSeek R1 again but no one has any idea.
Login to reply
Replies (1)
https://arxiv.org/html/2501.12599v2
Finally found the paper at least. I had to go to the wasteland most have never traveled to: the rumored second page of the search results.
Some interesting techniques that I haven't seen before. They made sure the problems they did the RL on really did requiring reasoning to solve by having a non-reasoning model guess the answer, and remove the problem if the answer was guessable without reasoning. They also modified the reward function to train the model to get the right answer with the least possible amount of thinking. That might help avoid some of those annoying times with DeepSeek R1 and other models where it thinks of the right answer and then says "Wait" and goes off in the completely wrong direction.
Interesting stuff. Still can't figure out why this model release stayed completely under the radar. You would think the Twitter AI influencers would be talking this because Americans are still worried about China having better AI than us.
nostr:nevent1qvzqqqqqqypzpcpnjdyv5m9vjuyvmx8xx830fw4d2dxle6rs3qdkt2jh6v8lwff7qythwumn8ghj7ct5d3shxtnwdaehgu3wd3skuep0qyt8wumn8ghj7etyv4hzumn0wd68ytnvv9hxgtcqyqr9x5prljey4s858rd0few6tnp6nzrwrezzpknse2qjulvz67smv2ppmfu