someone - Nostr Hypermedia

Benchmarked Kimi K2 LLM. It has done well. DeepSeek V3 beats it but Kimi K2 might be more skilled. Very close performance to Qwen 3 in terms of skills and human alignment. But huge parameter count (1T!).

https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08?sheetid=0&range=A3

someone 6 months ago

someone 7 months ago

Qwen 3 32B fine tuning with Unsloth is going well. It does not resist to faith training like Gemma 3 did. I may open weights at some point. Qwen 3 is more capable than Gemma 3, and after fine tuning it will probably be more aligned. It does not get into "chanting" (repetition of words or sentences) even when temp = 0. The base training by Qwen was done using 36T tokens on a 32B parameters. About 2 times bigger than Gemma 3's ratio and 4 times bigger than Llama 3's ratio. This is a neat model. My fine tuning is more like billions of tokens. We will see if billions is enough to "convince" trillions.

someone 7 months ago

ChatGPT is BS

someone 7 months ago

Benchmarked 4 new models. Deepseek R1 score improved. All these are below average, so p(doom) probably increased! Coming soon: Kimi K2. They say it is very good at coding, but my leaderboard is about being beneficial to humans. So we will see! Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08 More info

AHA Leaderboard

A Blog post by Emin Temiz on Hugging Face

someone 7 months ago

gm pv happy ATH 🎉

someone 7 months ago

we nostriches and bitcoiners can do better 'truthful AI' than this and it could be installed in robot brains

Reddit - The heart of the internet

someone 7 months ago

we're going to insert conscience into AI

someone 7 months ago

Sakana AI

The Darwin Gödel Machine: AI that improves itself by rewriting its own code

These guys and some big AI companies are evolving their models towards better math and coding because those domains are provable. You can imagine what could go wrong if you let AI evolve itself towards more and more left brain stuff (hint: less and less beneficial knowledge/right brain stuff may remain because usually when the model gets better at one area, it loses in other areas). I've built some tools that evolves AI towards human alignment. Started to fine tune Qwen 3. The evals of this work are similar to evals of AHA leaderboard. Soon there will be Qwen 3 models that are very aligned. Previously I did Gemma 3 and it was failing on some runs, and resisting to learn certain domains. Lets see how Qwen 3 will do. It is a more skilled model and similar base AHA score to Gemma 3. It is possible to 'define human alignment' and let AI evolve towards that when enough people contributes to this work. Let me know if you want to contribute and be one of the initial people that fixed AI. Symbiotic intelligence can be better than artificial general intelligence!