An incredibly unscientific test - threw the same question to 4 local models:
"give me a summary of the book daemon by daniel suarez"
Tested models: gpt-oss-20b, qwen3:14b, phi4, deepseek-r1:14b
Phi4 is by far the best (hallucinates the least, tho still a lot)
(the question might have been influenced by
@Shawn who tricked me into rereading the book again)