lol
i can tell that you have a strong grasp on the implementation requirements for a relational database or graph database. they ALL use KV stores under the hood.
the only way real world database engines tolerate this kind of structure without degrading iteration performance is partitioning the large data from the small. or you could just say, why would you store a jpeg in an event instead of the raw binary in a blossom record? it's trivial to just store them as files with the hash as filename, and the filesystem is already optimized for traversing the metadata to find it efficiently.
Login to reply
Replies (2)
ok, i'm wrong about that. there are such things as column stores which use a different structure model than key/values
after a few questions drilling down to the point, this is what GPT gives me:
Short answer to your core question
Is a flat file of blobs addressable by SHA256 hash the best solution?
No, not as a single monolithic flat file.
But:
Content-addressable blobs by hash = excellent idea.
The best implementation in practice is usually:
Append-only segment files (or many files), with a hash → location index, and possibly hash-sharded directories or an embedded KV store.
This will:
Be faster and more scalable than a single flat file.
Be more efficient than a naive “one file per blob” once you hit large counts.
Play very nicely with your immutable tables.
storing events with an LSM like lmdb or badger and storing binary blobs with your regular ext4 filesystem (maybe f2fs is better for SSD) is actually the most efficient way to sort and fetch large blobs, not only that, as it points out, you need to partition the blob storage to have at least a small and then larger blob index because the more you...
you know, mix up small and big things in one storage system
the slower you can search it, especially for the small objects.
or in other words
just put a blossom server on it. regardless of anything else, base64 is 20% less efficient and that efficiency loss can't be amortized with compression. and such encoded forms also require decoding to actually pick through the content as well. simple blob files named after their hash and indexed in a table is the simplest, yet nearly most efficient way for nostr to be handling hosting blobs.
So.. ALL dbs use KV except for the real world databases that are doing it right? I know most DBs are just a glorified KV. But I also know those who actually fixed this a while back and not just deferred to the developer to manually save shit in files.