Matt Lorentz's avatar
Matt Lorentz
_@mattlorentz.com
npub16zsl...92l7
Technologist, solarpunk, gamer, backpacker, passionate about using the internet to push more power to more people.
Matt Lorentz's avatar
mplorentz 2 days ago
Today I'm opening up Horcrux for public beta testing! If you've ever had a password or document that felt too sensitive to put in a password manager or physical safe, Horcrux is for you. It encrypts and splits your data into pieces and stores it across your friends' devices. Like multi-sig for data. It's free, open-source, and Nostr-based. I've been testing with a private group as I build but now I'm looking for more folks to help smooth out the experience before people start putting real secrets in it. If you'd be willing to download it and set up a vault with a few friends (or me! ) and put it through the paces I would be so grateful. Let me know how it goes here or using the Send Feedback button in the app. Android users can grab the APK here (Google Play link pending review): iOS users can install via TestFlight from here:
Matt Lorentz's avatar
mplorentz 5 days ago
I was there today when the Chaos Commoner Collective was born.
Matt Lorentz's avatar
mplorentz 5 days ago
Against my better judgement I have now homebrewed my own AI agent orchestration harness. I'm running a bunch of different pi.dev harnesses in tmux with a watchdog agent that can restart or send them messages. Each persona (planner, coder, tester, reviewer) has its own prompt and handoff protocol to move work to the next agent. All the tasks and plans are tracked in a beads database. It's a lot like Steve Yegge's gas town but it works much better with my theory that passing work through different models and personas produces much higher quality output. It also works much better with local models and doesn't have a ton of bugs. And it's all running on my old gaming pc with limited permissions to my repos and no credentials for production services. Since Cursor killed off my previous workflow (pair with a really fast agent to work on a big feature, maybe have a couple easy side quests going elsewhere) I have been trying to switch to a "slow productivity" model of development. By which I mean minimizing context switching and being fully present to review one piece of work at a time. This means I want my AI pipeline to produce the highest quality work even if it takes a long time. And I also don't have a thousand dollars to spend on tokens each month, so I can't just send codex or claude code off to spin for hours on every task. Right now I'm using a combination of GLM 5.1, Deepseek v4 Pro, Deepseek v4 flash, and Qwen 3.6 37B A3B (last one running locally). What are you using for orchestration?
Matt Lorentz's avatar
mplorentz 2 weeks ago
I haven't shared much about this but in my free time I've been venturing into the self-hosted AI space. I acquired an old gaming machine with a decent graphics card from 4 years ago (RTX 4070S) and put linux on it and spend some time getting hermes agent (https://hermes-agent.nousresearch.com/) running on it. I got it running with various sparse versions of Qwen 3. Managed to cobble together a few scripts to do things like scrape some news and flight data, but I kept running into timeout errors at various levels of the hermes stack. It's really not set up to work with agents that take multiple minutes to respond and after fixing things in a bunch of different places I got tired of it and switched it back to claude. I did find a fork that supports Zulip and I really love it as an interface for many long-running async conversations. Then I decided to try to some autonomous coding with local models and fell down hard into the Steve Yegge beads/gas town/gas city rabbit hole. I took gas city (which is like the sdk for agent interactions extracted from gas town) and got it running. I tried running the entire thing with only local models but it wasn't working at all. I ended up with Claude as the mayor of the city who oversees a bunch of short-lived agents that use qwen on my gpu and try to write code and open PRs. They aren't doing a very good job yet but the mayor and I learn and improve things a bit more every day. I'm not a fan of the super-extractive metaphors of gas town but I do really like beads db as a system of getting agents to cooperate. It's basically an issue tracker, but some issues get labeled as memories and some get labeled as mail and some even represent agents, so it creates an observable system of cooperation where agents spin up, read their mail, complete their task, and hand it off to another agent, then shut down. I'm trying to run all these agents serially to limit gpu contention and it somewhat works. But it's going against the system's design which is just to have a mega bonfire of tokens. The biggest weakness I think is just that the free models that fit in 12GB of vram are not enough to do good coding. But the goal I'm working towards is getting frontier-quality code with free models on my own machine by chaining together enough hill-climing loops (planner, coder, architect review, qa review, bounce it back to coder, etc.) to get good code. And I'm thinking a lot about what the right interface is for me to review the work, right now it's just producing pull requests that I review normally. This has been my first time running --dangerously-skip-permissions agents 24/7 on my hardware and it feels quite cyborgian.
Matt Lorentz's avatar
mplorentz 3 weeks ago
Cursor 3 is so much worse than Cursor 2 that I spent time looking for another IDE yesterday. I thought they really understood how I wanted to work with AI but that trust has been totally shattered. I tried Windsurf but it doesn't have useful git worktree support. I also tried native VSCode with some plugins and Claude Code CLI. All were disappointing and I'm back on Cursor today. I feel so alone as a Cursor user, I feel like everyone I know is doing CLI development. For me the development bottleneck is reviewing and manually testing AI generated code. Both of these are much easier in an IDE. The UI to approve each hunk of changes the AI made is key for me (after it's done, not the interactive permissions prompts that claude insists on unless you use yolo mode). I need to be able to quickly see more context around the lines the AI changed, click through call hierarchies and go to definition. Then I want a dashboard that lists all my agents working in different worktrees and I need to be able spawn a new one quickly. And I want to quickly switch between worktrees and have the associated agent chat all right there. And I want all of this in one window. I'm sure this is all possible on the command line if you spend enough time configuring tmux and vim, but I'm worried that my workflow is going to change in another few months and I'd have to do it all again. So for now I'm reinstalling Cursor 2 and I'll check back in on 3 in a few weeks.
Matt Lorentz's avatar
mplorentz 1 month ago
I have never heard the phrase "belt-and-suspenders" before last week but Opus 4.7 finds an opportunity to use it multiple times per day.
Matt Lorentz's avatar
mplorentz 1 month ago
Another AI pattern I'm really digging lately is managing my home server with Cursor + ansible. I run a few dozen docker containers and I've always managed the server with SSH, vim, and docker CLI. I don't want an AI agent mucking around on the machine and uploading who-knows-what as context to foreign servers. But for recent containers I have started a repo locally on my Mac where I have Claude or Composer write ansible scripts to deploy compose files and start the services. This feels like the best of both worlds to me: AI can blast out changes much faster than I can, but it doesn't have any access to the actual server and I can easily see exactly what it's going to do before I execute the playbook myself. This has allowed me to layer on additional functionality like creating a zfs dataset and ACL for every container which was too much work to do manually.
Matt Lorentz's avatar
mplorentz 1 month ago
I spent some time over the weekend setting up a hermes AI agent on my old gaming PC. It took a lot of fiddling but I finally have some models running locally on it that make it feel like a slower slightly stupider version of claude. It feels so good to finally have a fully local stack. There are a lot of boring chores in my life that I want AI to do and that it's probably capable of, but up until now I have refused to share much personal information with any of the big companies. I'm hoping that I can build up trust with a local agent and gradually make it more useful over time. I gave it an email address and already have it submitting some receipts for reimbursement (after approval by me) which feels like a good start.
Matt Lorentz's avatar
mplorentz 1 month ago
Instead of waiting for an app developer to fix the bug I reported I just one-shotted a replacement app with Opus. Achievement unlocked?
Matt Lorentz's avatar
mplorentz 1 month ago
Every couple months I do a race where I have some agents go off and build a feature or fix a bug while I do it myself in Cursor. The time I spend reviewing and fixing the agent's work always end up being longer and more painful, which is I haven't switched over to an "agent command-center" style of software dev. I do kick off worktree agents here and there throughout the day to make minor changes that come up while I'm working on a larger branch. But those are side quests while I work on the main thing.
Matt Lorentz's avatar
mplorentz 1 month ago
Cursor's Composer 2 model is performing much worse for me than Composer 1 :( I feel like Composer 1 really hit a sweet spot for me between speed and quality. For me the bottlenecks for coding with AI are: - understanding all the code that the model wrote - testing changes Composer 1 really helped with the first because it could blast out small amounts of code that I could quickly review without my brain getting bored and context switching to something else. I feel like I'm an outlier in that I'm trying to stay heavily involved in the dev flow rather than having a multiple agents work on long tasks and then coming back in cold to review their work. Is anyone else using smaller quicker models in this way?