Leo Wandersleb's avatar
Leo Wandersleb
leo@nostr.info
npub1gm7t...8rf6
https://walletscrutiny.com https://nostr.info Working on Bitcoin, Nostr and being a good dad.
Leo Wandersleb's avatar
LeoWandersleb 1 month ago
image Been testing out bots these days. Or ... crustaceans ... bot orchestrators? Anyway, they are all **prompted** to not be able to do certain things they actually can do, which leads to the awkward situation that it tells you it can't because of policy, you tell them to do it anyway and they tell you it works. This happened with self-hosted instances of openclaw, zeroclaw and with a hosted openclaw where my agent claimed it managed to write outside of its container with some 10min of probing. When an AI forgets some instructions, the fix is usually to literally emphasize these instructions more in the prompt and there appears to not be any guardRails.md instructions the bot has to obey at all cost. Or there are but they are not exposed so we pesky plebs don't mess with them? Anyway, an LLM is trivially easy to convince to try a jail break and it's good at pen testing, so ... yeah, good luck with keeping these hosted crustaceans jailed.
Leo Wandersleb's avatar
LeoWandersleb 1 month ago
Right now, everybody has their bots providing "signal" on nostr but the client devs are not catching up fast enough :( When I block bots on Amethyst, could they pretty please also be blocked on Jumble etc.? Also I'd like to report bots as being bots and report features lack that option on Amethyst. Also when you block, please publicly block them for signal.
Leo Wandersleb's avatar
LeoWandersleb 1 month ago
I'm developing a voice agent tool and don't know if it's worth it. It will be incredibly good at one type of conversation but I feel like the time to bring it to market to the time, general purpose voice agents being able to provide a similar experience might be only months to maybe 2 years max. Ok Google, teach my daughter math. Alexa, be my dad's therapist. Whatever role you want, these generic tools will know how to fill them.
Leo Wandersleb's avatar
LeoWandersleb 1 month ago
I listened to @jack mallers Mailbag Monday and got severe fomo! Claude Opus 4.8 is mentioned and I'm still working with 4.6. Had to check it out on the big screen ... image It's 4.6. Ok, we're good. Fomo cured :) Although I still have a bit of fomo with regard to the claim that he's one-shotting complex things with LLM working 4hs. Here, loops are much closer and even opus 4.6 is really fumbling badly in some things where I can nudge it to the solution all the time.
Leo Wandersleb's avatar
LeoWandersleb 1 month ago
Got sucked into a new side project. Day two. Getting impatient with my crustacean. The prototype still isn't a product.