Thread - Nostr Hypermedia

There is NIP-45 for counts, but it's not widely supported and it's quite cumbersome for relays as it probably requires a separate tags table, and some indices as well. I know because I've built my own.

GitHub - pippellia-btc/nastro: A collection of plug&play and configurable databases for nostr

A collection of plug&play and configurable databases for nostr - pippellia-btc/nastro

Anyway, I kinda agree, adding too much features to relays reduces one of the best feature of nostr: simplicity.

1 replies ↓

There is NIP-45 for counts, but it's not widely supported and it's quite cumbersome for relays as it probably requires a separate tags table, and some indices as well. I know because I've built my own.

GitHub - pippellia-btc/nastro: A collection of plug&play and configurable databases for nostr

A collection of plug&play and configurable databases for nostr - pippellia-btc/nastro

Anyway, I kinda agree, adding too much features to relays reduces one of the best feature of nostr: simplicity.

il_lost_ il_lost_@getalby.com 4 months ago

something like cassettes

GitHub - sandwichfarm/cassette: portable, read-only nostr relays.

portable, read-only nostr relays. Contribute to sandwichfarm/cassette development by creating an account on GitHub.

So Tachi so@nuts.cash 4 months ago

nice, will have a look at Nastro and nip-45. I wonder if there is any kind of cryptographic proof involved in a count response, doubt it.

maybe some ZK magic, idk. But I know they guy to ask, which is @A₿del fgu

1 replies ↓

A₿del fgu abdel@snort.social 4 months ago

Always looking for more ways to leverage ZK for Bitcoin and Nostr. What is the need exactly ?

There is NIP-45, that is used to count how many events match a certain filter. So the proof ideally would prove that X (the returned count) events with valid signatures actually matched the filter(s).

2 replies ↓

i must make a note to implement count on #orly - the database is designed to avoid needing to decode events before it is certain they are result candidates, so it can actually do it, at least for author/kind/timestamp filters, without decoding a single event stored in the database. probably i need to add a new index that covers the gap with tags, like pubkey/tag/timestamp, and the count function never has to decode any events. my main objection to count had to do with the excessive work required that was basically the same as if you actually just fetch the event and do that processing locally. but with the necessary indexes that doesn't have to happen. trading off the size of the database for the speed of search and count is what i'm talking about. there definitely is algorithms that would run a lot faster if they could do counts

1 replies ↓

> my main objection to count had to do with the excessive work required that was basically the same as if you actually just fetch the event and do that processing locally For a client these aren't even comparable. 100k events + signature checks vs 1 COUNT

1 replies ↓

yes but if the relay has to decode all 100k of those events, there's not really any gain. count searches need to be as efficient as possible because of extreme numbers like 100k. you simply wouldn't even request that much if you wanted the whole thing so it's mad if the relay's database literally has to parse the whole damn thing, just for a headcount.

ah, okay you were looking from the relay perspective. Ofc you are right. In my own SQLite implementation I just use the SELECT COUNT, and it's quite efficient.

1 replies ↓

yeah, i build my DB from scratch using a key/value store so such a search index has to be added. it has part of it already but it doesn't cover the full filter query space, only pubkey/id/timestamp. it also needs tags and kinds. the extra index eliminates the need to decode events by allowing queries to effectively search by those criteria and not sift through such inordinately large volumes of data. i've chalked in a mental note to implement count properly for the near future, it's not a big task really.

ah got it. Yeah ofc in a key-value store you have to implement that from scratch. Btw if you ever need a SQLITE package for nostr events, I wrote this

GitHub - pippellia-btc/nastro: A collection of plug&play and configurable databases for nostr

A collection of plug&play and configurable databases for nostr - pippellia-btc/nastro

It provides bare bone tables, but in the constructor you are free to add other schemas, or triggers, and you can even change or modify the default Query builder to take advantage of new tables, indices and so on.

1 replies ↓

well, i'm so used to building badger based databases i have a bunch of libraries for it, to manage building keys and and generating indexes. doesn't take me that much longer to make an embedded, dedicated database with some particular set of tables this way compared to learning some yet another new thing to work with antiquated sql query engines. i mean, it's just my thing, building things like this so yeah. i write the encoders and the database engines and the network schedulers. offloading stuff to inherently slower, less efficient, less scalable fixed-purpose flexible database engines just doesn't make sense for me, and when you see how fast it returns results the reason is clear. IMO, nostr is a unique enough type of network connected database protocol, so open ended, that using generic general purpose parts to build it is a big sacrifice in performance.

2 replies ↓

This is a fair take

If you are interested, you can open a PR to nastro adding your own badger based DB. I would actually really appreciate it. The prerequisite is just to satisfy the store interface, expose the internals (so people can make queries directly if needed), and allow for customisation. I think it would be quite beneficial to the ecosystem to move away from the event store fiatjaf packages, which is quite crappy

1 replies ↓

yeah, the event store interface is kinda wrong too. it should be sorting the results before sending them, just not depending on the insertion time of the record to be a match with it (this breaks when someone uploads a bunch of old events to a relay). i mean, at least, where sqlite or postgresql or others are used, the results are sorted in the engine, but the badger engine that fiatjaf made does not, and it runs WAY too many threads. create an issue on the repo and tag me as assigned to the issue of adding a badger based DB engine implementation. i'll get to it in the next month. it's not a difficult task, just looking at the interface, creating all the stub implementations of the methods and then putting the right bits into them based on the existing engine i have built. probably a day or so work maybe.

1 replies ↓

> i mean, at least, where sqlite or postgresql or others are used, the results are sorted in the engine, but the badger engine that fiatjaf made does not oooof, not good. > and it runs WAY too many threads typical... > create an issue on the repo and tag me as assigned to the issue of adding a badger based DB engine implementation That's great, thank you sir, I'll do ASAP!

1 replies ↓

npub1p84a...xnar 4 months ago

It’s definitely doable but can take quite some time and resources, so it might be a fit only for archiving old data. Eg for 422k likes it would be several hours and up to 1TB of ram which would cost $6-8 per hour. The cost can be justified if the events are quite sensitive but def not for ordinary posts:) We do expect 10-100x perf improvements in the nearest future though so it might become viable at some point

3 replies ↓

A₿del fgu abdel@snort.social 4 months ago

Check this sir @Pip the WoT guy . Michael is from my team. We are cooking some cool shit that might unlock this in the future.

Looking forward to it guys. Maybe it doesn't help, but I'll throw it out there. HyperLogsLogs give an estimate of the cardinality of a set, and are one of the most efficient ways to reduce memory/storage requirement when it comes to counting unique elements. If you don't know, they work by keeping track of the max of the number of consecutive 0s (or 1s, doesn't matter) in the hash of the elements. example "test1" --> hash --> 01001111 (4 consecutive "1"s) "test2" --> hash --> 10101101 (1 consecutive "1") Maybe there is a ZK version of an HLL, where I can prove that the maximum number of consecutive "1"s was X, which allow me to estimate the cardinality of the set

just following up, the implementation is done and a PR is awaiting you