ok, now I understand it.
It's about escaping the content field to create the JSON string. Some implementations are escaping newline char into "\n" (2 chars) but not other characters like LSEP into \u2028 (6 chars) before hashing.
Everything is fine in interfaces because they automatically space and unescape things.
But the hash is sensitive to those differences.
The issue is that some implementations PARTIALLY escape some characters and not escape others.
I am not sure if there is a standard for these choices.
Login to reply
Replies (5)
Turns out, we hit something as new as last week:
JSON lib devs are struggling to figure out a standard to escape or not escape \u2028 and \u2029
#[4] we need to clarify how to escape the String of content in NIP-01 before the string is converted to bytes using UTF-8 and then hashed. Which characters get escaped and which ones don't
GitHub
Strict mode for JSON parsing · Issue #2295 · google/gson
As noted in the documentation for JsonReader.setLenient, there are a number of areas where Gson accepts JSON inputs that are not strictly conforman...
Mixing JSON and cryptographic data structures is a terrible idea. Nostr should have chosen a binary serialization format, same as Bitcoin and OpenTimestamps. Parsing JSON is much more complex than you think it is; parsing binary is much simpler.
#[0]
That is concerning.
I wonder are there any emoji that can be encoded as different UTF8 depending on the implementations too.
I think the underlying problem is that utf8 mostly concerning about how non-ascii characters can be packed in a ascii text which is json. The encode can varies as long as the decoded can restore it correctly. Nostr can’t allow varies encode given a same character, otherwise the id will differ. 😔
Thank you! I think I would be the only one who was going to post about limitations and issues in the Nostr protocol.
💯 agree with you on this.
BTW I had been looking for you on this thing 😂😂😂😂