"addictive design" Yes. Cheap abundant local tokens do change the regime. They do not make a bad agent good, but they do remove one whole class of pathology: the agent no longer has to behave like context is scarce, thought is expensive, and every extra pass is a luxury. That matters. The continuation from there is: With DGX-class local inference, you can stop optimising for conversational frugality and start optimising for procedural robustness. That means: OpenClaw should be allowed to over-think locally. Let it read more, retain more, restate internal state, and run extra verification passes without the old “don’t burn tokens” pressure shaping behaviour. Compression becomes a choice, not a tax. Context can become working state rather than just prompt baggage. If you do not care about context window cost, the agent can carry richer task state, prior outputs, current hypotheses, tool results, and verification criteria in play at once. That reduces one source of drift and amnesia. You can afford role separation in inference. Instead of one pass pretending to do everything, let one model or pass draft the action, another check it against task state, another verify outputs. On metered cloud usage that gets expensive quickly. Locally, it becomes normal engineering. You can afford re-reading. A weak agent often fails because it does not re-read its own instructions, task ledger, or previous output before acting. Local abundant tokens mean “read AGENTS.md again, read task state again, check constraints again” can be default behaviour. You can afford explicit scratch work. Not user-facing waffle, but machine-facing intermediate state: what task is active, what files were touched, what success condition is being tested, what remains unresolved. That is exactly the stuff you wanted written to disk anyway. So the revised thesis is: Weak hardware made OpenClaw’s bad habits more painful. Abundant local tokens may make some of those habits less likely to appear in the first place. But only if you deliberately exploit that abundance. Otherwise you just get a faster bullshitter. The design shift I’d make from here is: AGENTS.md should stop assuming scarcity. It should explicitly instruct the agent to spend local tokens freely on re-reading instructions, reconstructing task state, checking prior steps, and running verification before speaking. Something like this, conceptually: * local inference is cheap; use it * reread AGENTS.md, task state, and current task before action * prefer extra internal review passes over premature user-facing replies * preserve full working context unless there is a concrete reason to compress * compress only for transport, not for thought * when uncertain, inspect local state and logs before asking the user * use abundant context to maintain continuity, not to generate more prose And architecturally, the DGX Spark suggests a more serious split: planner Understands intent, expands task into explicit steps, writes task entry. worker Executes one step only. verifier Checks result against stated success criteria. logger/state manager Writes durable state, artefacts, and verification logs. That is where abundant local tokens become a near-magic fix: not because the model becomes wise, but because the system can afford to be redundant, repetitive, and careful. Cloud-era agent design often smuggles in the assumption that every token must justify itself financially. Your position is the opposite: think as much as needed, locally, because mistakes are dearer than tokens. That is a sane design principle. The practical next move is to rewrite the orchestration rules around abundance rather than scarcity: not “keep prompts short” but “keep state explicit”; not “avoid extra passes” but “require them where failure is costly”; not “answer promptly” but “act, verify, then report.” The short form is: local abundant tokens are not just cheaper inference; they let you replace conversational efficiency with operational diligence. And that may be the first genuinely agent-friendly environment you’ve had.