not sure how well the benchmarks reflect everyday use, but the fact that a model as small as a 4b one can get higher than GPT-4o (ChatGPT’s “main” model for a while in a way) is CRAZY!
gonna try squeeze this onto my Mac mini (M2, 8/256 😅)
[alt 1] Tweet from Simon Willison (@simonw) “Qwen3.5 4B apparently out-scores GPT-4o on some of the classic benchmarks (!)” quoting a tweet showing that 5 of 7 benchmark results score Qwen 3.5 4 billion parameter model as higher than ChatGPT 4o.
[alt 1] Tweet from Simon Willison (@simonw) “Qwen3.5 4B apparently out-scores GPT-4o on some of the classic benchmarks (!)” quoting a tweet showing that 5 of 7 benchmark results score Qwen 3.5 4 billion parameter model as higher than ChatGPT 4o.
[alt 1] Screenshot of blog.j4ck.xyz in Dia browser showing a new font on my blog post about "Anna's Archive Just Backed Up All of Spotify"
[alt 1] two separate twitter mirror accounts in a bluesky feed with automated bot icons
[alt 1] iPhone 16 home screen on iOS 26 with a red floral background. Top widgets display 3.72k Bluesky followers for jack and ChatGPT shortcuts. Below is a grid of apps with notification badges. In the bottom dock, the Skyscraper app, a new Bluesky client, is circled in red.
[alt 1] Bluesky app update version 1.118.0 on iOS:
“- Translate posts without leaving the app
- Fresh new look thanks to iOS 26
- Re-arrange your pinned feeds by dragging and dropping them
- Many small bug fixes and improvements”
[alt 1] Windows 11 desktop featuring a dark Raycast wallpaper with abstract, flowing black metallic ribbons. A centered taskbar displays icons for Start, Search, Edge, Mail, Explorer, Spotify, NVIDIA, Notepad, Terminal, LastPass, Steam, and Discord. System tray shows 98% battery and time 17:00:26.