working on an embedded database custom-built for nostr. think an embeddable strfry. It will probably be the fastest special-purpose database on the planet once it’s done. I’ve built a custom in-memory note representation with O(1) and zero-copy access so that you can memory-map the data from inside lmdb directly into your data types without needing to serialize anything in and out. if that didn’t make sense, the TLDR: shit is about to get real fast.

Replies (46)

Also, embedded has to be a good thing for performance. Right? Nice fucking work Will!
simulx's avatar
simulx 2 years ago
now how do we shard things across a cluster of pods... and still get a coherent subscription.
> I've built a custom in-memory note representation... That is quite the trick, much easier to code in C than in rust. I love this idea. This will be faster than anything I code for some time. I'm serializing/deserializing and even though I'm using a fast library (speedy) it still requires an allocation and a copy at the minimum.
Thanks for pushing me in this direction, i was debating weather to go the simple route with sqlite, but since nostr is so restricted in its data format i feel like a custom DB with a dedicated mapping would work well.
Pedro Vicente's avatar
Pedro Vicente 2 years ago
Thanks. I’m trying to port nostrdb for windows using Cmake , seems very easy because the code is so simple. Fixed 2 build errors and test works only fails because I have to allocate memory in ndb_tag (declared as c pointer instead of that array zero size declaration)
How much faster will it be than sqlite? And how much value will it add to your most important projects? It's tempting to do cool stuff but ultimately it always takes (me at least) much more time and effort than anticipated
I think about 20-100 times faster. Not for the whole program, just for the data access part. People love the idea of 'zero copy' but 'zero allocation' is where the real gains are. Memory allocators are complex and sometimes slowish things. I've imagined a library that manages normally cloned data (mostly strings) by owning them in some global singleton object and handing out Arc references to them (actually has to be offsets, not actual references, if you want to persist and restart your code). Mainly so that I don't have to do .clone() and .to_owned() all the fricking time feeling like I'm slowing down the code, and so I can impl Copy on all the structs and as a side effect you could access them directly in LMDB without serialization. That system also has to persist to LMDB.
vasily's avatar
vasily 2 years ago
Is this something all relays will use by default or what is the intended use case(s)?
I already have a fast C content parser 😅. I’m using C code in Damus. Since it’s depending on lmdb, it made more sense because lmdb is a C lib as well. This will also work in rust and notedeck, I’m planning on making a rust interface.
Pedro Vicente's avatar
Pedro Vicente 2 years ago
*meant to say C is the way to go, but the “way to code “ works too… probably some Damus auto correct 🙃
Don't worry, I'm not a Rust proselytizer lol, just curios and glad to see anyone writing low level efficient code these days. Excited to try out notedeck 👍
The old-timers have been doing this for a long time. Every generation thinks they are such geniuses by reinventing the wheel that’s a previous generation already did.