Thread - Nostr Hypermedia

This is one of the most important developments in nostr. We getting rsync for nostr queries. Incredible work, thank you Doug! Time to start implementing this in nostrdb.

npub1yxpr...qud4

Hi all! I have just pushed a major update to the negentropy project. It implements protocol version 1 (previous was version 0). The protocol has been simplified: ID size is no longer configurable, there are no more continuation ranges, and the output can be constructed in the same input-scan pass. There is a new fingerprint function, based on an incremental hash. This allows fingerprints to be pre-computed and stored in a tree structure. The C++ implementation includes a B+Tree implementation that allows fingerprints to be computed without collecting IDs from the entire DB. I have written a comprehensive article that goes over the theory of RBSR and the negentropy implementation: https://logperiodic.com/rbsr.html Comments are appreciated! Finally, I have integrated the new version of negentropy and the B+Tree with strfry. It's in a development branch and not quite ready for production, but my testing indicates this will be a massive improvement for full-DB syncs, especially on relays with very large DBs. In-sync or nearly in-sync relays should sync almost instantly with negligible resource usage. Relays will also use the B+Tree for filters that contain only until and/or since, meaning that date-range full-DB syncs are also efficient. Syncing arbitrary filters works as before (but I have begun work to make these more efficient as well). Unfortunately, this is a breaking change for the negentropy protocol (this really should be the last one!) and will also require a new DB version. I'm going to take this opportunity to make a few more breaking DB changes, and plan to release strfry 1.0 after a beta testing period.

View quoted note →

Replies (18)

Nelly 🍁 2 years ago

Need a transalator on this. 🤷‍♂️

1 replies ↓

node node@nostr.land 2 years ago

Yes I would love to understand

1 replies ↓

jb55 _@jb55.com 2 years ago

relay a has some set of notes relay b had another set of notes How do you sync these relays efficiently? Also from a client perspective: how do i query and download notes i don’t already have efficiently. No longer do we need to download tons of redundant data.

4 replies ↓

₿² thespiritualmachine@nostrplebs.com 2 years ago

You wouldn't even bother. Random sample a smaller set and average for a quick answer that's good enough for the accuracy you want at the moment. If you want to know more exactly, take a larger data set and sample that based on how much time and resources you have. No?

1 replies ↓

jb55 _@jb55.com 2 years ago

not sure what you mean, usually you don’t want to make nostr queries probabilistic.

1 replies ↓

node node@nostr.land 2 years ago

Thanks Will!

Derek Ross derekross@grownostr.org 2 years ago

Woot. It's like rysnc for Nostr 🤣

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ me@mleku.dev 2 years ago

That's what HORNET Storage Scionic Merkle DAGs are for. I'm building the first relay that will use it btw, we are aiming to complete it in 4-5 months (that's our milestone target and funding limit).

1 replies ↓

jb55 _@jb55.com 2 years ago

Have you read the negentropy v1 writeup? It’s elegant and already built into strfry.

3 replies ↓

₿² thespiritualmachine@nostrplebs.com 2 years ago

Why not. That’s all they will ever be anyway. Everything is an estimate except all the information itself, which you can never have all in one place simultaneously in this case anyway, due to the decentralization that is growing rapidly. Think on it or what just my 2 sats.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ me@mleku.dev 2 years ago

It's not capable of guaranteeing complete synchronisation of things. I'm even working on (in my part time) a filter plugin that quickly recognises whether a reply post references a whitelisted identity, by building filters out of posts made by the user - for the purpose of caching first level replies to posts so the relay has it immediately. I'm going to be using bloom filters to build a recogniser, not sure how the state is going to be stored but probably replaceable private events for the filter itself as a virtual user.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ me@mleku.dev 2 years ago

just forgot my second point: compact, addressable merkle trees let you guarantee full set membership discovery, assuming that the user didn't post them to a relay not known by the one trying to find it. But it can know the ID, and it can know the set of a given batch of them that have been bundled into a tree.

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ me@mleku.dev 2 years ago

also, yes, i have read the negentropy documents, and i have been working with strfry. why in the lord's name did they build it using perl and c++? It's not 1998 lol.

MattA matta@stacker.news 2 years ago

I do not understand any of this, but…

deeznuts deeznuts@crypto.im 2 years ago

It’s a game changer, all centralized social media is fully legacy software

DETERMINISTIC OPTIMISM 🌞 nvk@primal.net 2 years ago

Rsync is such an under appreciated tool, best of its kind.

1 replies ↓

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ me@mleku.dev 2 years ago

i use it regularly. it's an extension of cp/scp that does lots of stuff like scanning the directory tree and reading filesizes and datestamps and - i'm not sure, i don't think it includes generating hashes. IMO, filesystems have several advancements they need but we are still stuck in late 70s unix filesystem designs. A few technologies work with the base and take it further but sadly not enough. A number of things that people associate with Macintosh relate to this. Apple did a lot of effort with making better filesystems but sadly their user-oriented advances have not been widely made use of. It seems to me like, for example, that all files should have a hash in their record that can be quickly accessed. But no. All such systems only create mirror images of the filesystem, if at best, inside a database, no filesystem exists afaik, that keeps a current hash of the file consist.

1 replies ↓

ᴛʜᴇ ᴅᴇᴀᴛʜ ᴏꜰ ᴍʟᴇᴋᴜ me@mleku.dev 2 years ago

I should just add also, there has been quite a lot of rigidity with this. A lot of "filetypes" actually have filesystems inside them. Hell, HTML files encode a tree. Hypertext links are like filesystem links across the network. The technology is still very immature. The benefits of extending the filesystem have yet to be realised, and I'm already 40 years in on this from my early experiences with simple tape recorder soud modulated recording systems, and better ways already existed but just have not been implemented, since at least 30 years. There isn't enough computer programmers, and partly this is because programming languages are overly complex and build systems unfriendly and labor intensive.