Article header

NIP 44 Update

A summary of Cure53's audit of the NIP 44 encryption standard. A report on a report, if you will.

By this time, many of you may have either given up on a new standard for encrypting notes to emerge, or entirely forgotten that there was one. Well, I have good news — we've received the audit report from Cure53 on the NIP 44 encryption spec, and the results are good! This post will be a summary of the findings, and some hints on what lies ahead.

The results

The assets in scope for the audit can be found on Paul Miller's NIP 44 repository. Included is the full text of the spec, as well as implementations in Typescript, Rust, and Go.

Overall, here's what Cure53 had to say:

The Cure53 team succeeded in achieving very good coverage of the WP1-WP4 targets. All ten findings spotted by the testers were classified as general weaknesses with limited exploitation potential. In other words, no vulnerabilities were detected within the inspected components.

In other words, no vulnerabilities were found, but there are things we can do to harden the implementations. Cure53 elaborates:

The fact that Cure53 was not able to identify any exploitable vulnerabilities can be interpreted as a positive sign in regard to the security of the NIP44 specification and implementations. Nevertheless, even though the spotted problem can all be seen as general weaknesses and hardening advice, they should not be ignored. It is known that weaknesses may serve as entry points for more severe vulnerabilities in the future. Cure53 strongly recommends swift resolution of all reported flaws.

These weaknesses and recommendations each have their own code, prefixed by "NOS-01", which is our audit number. Here's a quick summary of the individual points:

  • NOS-01-001 describes a weakness related to naive secp256k1 implementations, which may accept public keys with both x and y coordinates instead of just x and a sign-bit. This can result in the compromise of a sender's private key if the sender can be tricked into encrypting a message with an invalid public key. This is known as a "twist attack". Cure53 recommends some additional test vectors to make sure that uncompressed keys are not accepted.
  • NOS-01-002 includes a handful of minor recommendations, including specifying the initial counter for ChaCha20, nailing down key formats, and a few other things which they go into more depth on elsewhere.
  • NOS-01-003 provides some additional test vectors to prevent twist attacks and out of bound errors.
  • NOS-01-004 suggests using a constant time equality function to prevent timing side-channel attacks, along with some code samples.
  • NOS-01-005 identifies missing range checks in the Go implementation which should be addressed in order to avoid crashes.
  • NOS-01-006 addresses the lack of provisions for forward secrecy in the NIP 44 spec. They suggest introducing sender-side forward secrecy by deriving the session key using ephemeral keys and a KDF.
  • NOS-01-007 explains that using the same key for multiple purposes can lead to unexpected compromises when taken together. Best practice is to use a different key for signing and for encryption, which nostr violates. While the particular curves we use aren't vulnerable to this exploit, Cure53 recommends deriving an encryption key from a user's main key using a KDF in order to reduce the impact of a leaked encryption key.
  • NOS-01-008 identifies an incorrect use of the salt parameter in calls to HKDF (an HMAC-based Key Derivation Function allows us to prove that an encrypted payload was not forged). Currently, the salt is generated randomly every time, but the security definition of HKDF specifies that it be static. This is likely not exploitable, but should be fixed. Cure53 suggests switching the nonce and salt when calling the HKDF, and deriving a static salt using something like SHA256(pubA, pubB, "nip-44-v2").
  • NOS-01-009 demonstrates that the Message Authentication Code (MAC) tag is currently computed without a nonce. This doesn't compromise the encryption, and authentication is covered by event signatures as specified in NIP 01, but it does prevent the MAC from being useful on its own.
  • NOS-01-010 suggests using clearer boundaries when constructing the HMAC payload.

The takeaway here is that with a few minor changes, the spec itself is sufficient to "provide a simple way for the users to communicate privately". There are, however, a number of edge cases which can compromise implementations, so for any developers out there porting the spec to new languages, be sure to take a look at Paul's test vectors.

In the end, the NIP44 specification and implementations appear to have already achieved the security goal of providing users with a way to communicate privately. Miscellaneous issues identified by Cure53 during NOS-01 correspond to further hardening of the current version and/or suggestions for future versions. Following these suggestions could provide more advanced security guarantees.

They also clarify that:

Importantly, the lack of certain security guarantees, such as forward secrecy, post-compromise security, deniability, and post-quantum, had been known to the development team prior to NOS-01.

Since this is the case, it's important that client developers be absolutely clear with their users about what NIP 44 is good for, and what it's not. This is a "simple" way to communicate privately — users with more advanced privacy requirements should use something like MLS or SimpleX.

What's next?

NIP 44 isn't quite ready for widespread use just yet, but it shouldn't take long to incorporate all the adjustments mentioned in the audit. Then there are several NIPs which rely on the encryption in NIP 04 which should eventually be migrated individually. Beyond that, a whole world of affordances for encrypted data becomes available on nostr.

First and most anticipated, we can finally deprecate NIP 04 and stop leaking DM metadata. Vitor's Sealed Gift Wraps are an excellent alternative to NIP 04 DMs, and add support for small group chats, which will be a big improvement to client UX.

Other more ambitious specifications targeting larger groups exist as well, including my Closed Communities draft spec. This NIP makes it possible to extend NIP 72 communities with a private component, which can be useful for publishers or real-life communities.

There are lots of other things NIP 44 might be used for as well, for example:

  • The ability to share your location only with certain people
  • Private calendar events, product listings, or streams
  • Private tags — instead of encrypting an entire event, it should be possible to encrypt only a single tag's value. This allows senders to attach private data to public events.

Conclusion

There's lots of work still to be done, but I think this marks a huge milestone for nostr. So thank you to Paul Miller for his work on the spec and the audit, and to OpenSats for funding the audit!

Replies (14)

The salt thing is fascinating. It turns out that allowing a random salt allows an attacker to use a specially crafted random salt. Generally salts add randomness to an input to avoid an attacker looking up the output from a known set of inputs. And that generally means picking a random number every time. In fact the playstation attack was because some developer hardcoded a single number that he chose randomly, instead of writing code to pick a new random number each time. So the fact that this situation is the opposite of that just shows how complex cryptography is and why you need to leave it to the experts.
I don't think it's complicated at all. Also I really really really hate the word expert. It's been known for a long time and "recommended" in AES cryptography standard to be careful to use a strong random value for each message and to avoid reusing them. This is also a very solid reason why to not use piddly 12 byte ones like AES-GCM uses and I personally think that the salts should be 256 bit values, not 128. The effective bitwise security of a nonce has the effect of changing the chances of a secret and a nonce pair recurring and opening up the possibility of a plaintext attack. If the obviousness of the problem isn't transparent to you, you just need to learn a little more about how it actually works, like, the difference between counter mode and galois counter mode and so on, but in principle if you get it that the secret is the seed of a specially crafted string of random values that you can regenerate if you have the secret, the rest is just details. This is why I hate the word expert. Most of the time they are only experts at making up complicated names and explaining it in excessively complicated ways to protect their rarity and pay rates.
I agree that nothing is complicated once you know it. I define expert as a person who has walked that road many more times than you have. That doesn't preclude you from walking it, and even noticing things he never noticed. But I do see that I have stepped too far and misspoke by saying "you need to leave it to the experts." That is never true, so, point taken. In this case, what is suprising is that the salt should NOT be chosen randomly by the person encrypting the message, but should be a well known set of bytes always the same every time. I still don't understand precisely why, but I inferred from reading some web pages that there might be an attack that depends on being able to tweak the input and the salt lets an attacker do that. So how do you add randomness to the input? You do it on the input itself. IMHO we shouldn't be calling this field a salt.
I think it's more accurate to call this known string a protocol namespace separator. It ensures that another protocol can't create a colliding signature on the same message, but "salt" has been a term used for a long time to mean this, but not with the same precise semantics, as it has overlapped with another concept which is often called "nonce". The reason why the "salt" or as I prefer "protocol namespace separator" is critical is because it increases the size of the field in which secrets can be used on a given protocol instead of competing in the same space as another. A finite field is a set of counting numbers that ultimately maps to the simplest one, the index of the position, or the scalar. A field is based on one specific number that you define as the zero, and this generates the field's zero value after you hash it. When you take any other given input, you transform that number using a hash function into a member of the field, by including this "salt" or "protocol namespace separator" in the base value you derive the field member using a hash function. It can be said to be the member corresponding to the value from which the hash is derived, of the counting number set of the specific finite field. If you don't add a specific base value as the zero in your field, the risk of leaking the secret rises with the number of competing users of the same hash function. Like mining bitcoin, the way you find these solutions is based on sheer numbers, so the more numbers you make available to be distinctive, the less chances of a collision, as more attempts must be made, and the numbers escalate by pure permutation, so they are factorial, thus also why they talk about "bits of security" in encryption. On the other hand, the in-message random value, the nonce has to be in cleartext, the salt can merely be a protocol feature, and assumed to have been part of the construction of the encryption secret. I'm not sure why it got that name as it is slang for pedophile in british english, it always makes me squirm a bit to use it. The reason for the confusion is the vagueness of the way the word "salt" has been used, and IMO, it helps everything a lot to call the known value that is not in the message "protocol namespace separator" and the random value contained with the encrypted message the "nonce". Then there is no confusion. Salt is an old, old name for adding a value prior to a hash function, dates way back to before the concept of hash function even became clear, when it was primarily used as a data integrity mechanism, ie CRC32 and such. Hash functions became important after the invention of the hash map, which is a mechanism to accelerate scanning for records in a database.
Not more than ONCE :D yes, it makes sense, I sorta got the inkling that uniqueness was something about the concept but not the word "once". On a side note, since malice and error are indistinguishable without evidence of intent, you could say that pedophiles represent high entropy humans, that replicate their entropy onto immature other humans, that replication part being the most fundamental element of their intent (normalization). Random values in messages are definitely needed for optimal security, unless there is inherent entropy in the message. 32 bits of timestamp don't constitute strong entropy. Maybe if it was nanoseconds and 64 bit we'd be starting to talk about something with low collision potential. Keep in mind that this encryption is not just for the relatively low volume of human readable text. It will inevitably be needed to encrypt massive amounts of data and the bigger the amount of data, the more likely is collision. For this reason, GCM, for example, should not use the same secret on more than 4gb of data. This is directly related to the permutations created by the entropy of the data you are encrypting and the size of the nonce, which for GCM is 12 bytes, or 96 bits. SSL uses 128 and 256 bit sizes for the secrets because reversing them starts to become unlikely to occur within the time window of which the security of the protocol must survive. The cheaper that computation gets, and the more effective systems of parallelisation becomes, the more likely we are going to see a need to scale up to 384 and 512 bits. For now, it is very safe to bet on 256. There has been several recent instances of people messing around with dice roll entropy to generate bitcoin addresses and having them almost immediately hacked. Computing a private key by iterating through the counting number space will pick up a short seed with very few iterations. Pasword Based Key Derivation Functions (is HKDF a term used in papers now?) purpose is to inccrease the amount of time required to test each candidate. It is like raising the difficulty on proof of work, it functions on the same basis as everything I've discussed previously except the number of repetitions and the expansion of the output values (PBKDFs like Argon/Argon2 include a memory usage factor that adds memory hardness to the derivation, similar to how the old Ethereum PoW used a huge key, which was essentially a type of salt, to lower the optimization potential).
FYI Argon2d won that contest but I've heard the older scrypt is better from cryptographers I trust ( Scrypt is Maximally Memory Hard) and that the contest happened while the field was in flux and we weren't as good at evaluating these kinds of systems yet. So I chose it for encrypting your nostr nsec under a password for the gossip client.
yeah, I used argon2 for my `signr` https://mleku.online/git/signr but its parameters are pretty hard core. Takes about 3 seconds on my ryzen 5 2021 laptop. About the same delay on my dm-crypt passwords (the grub unlock takes typically around 8 seconds). I wrote a memory, cache and processor hard hash function for my work on parallelcoin. I designed it to do about 10 iterations per second, and that rate was pretty flat across several different CPUs, because it's based on very large integer long division. Makes me think. I should use it in signr, add a config option and set the long division based one as default so if anyone's actually using it they don't get locked out of their keychain. A job for the weekend. large integer long division is the hardest and least wide variation on all possible hardware. CPUs have 64 bits, and division circuitry takes up about 1/4 the die. ARM and GPUs can do it also, but GPUs have smaller die devoted to it and both require two cycles to process 64 bits. I'm too busy on scratching out a living to be able to have the luxury of solving what I consider to be the big problems but maybe in a year or two I'll really have some freedom to play.