SEJeff
The firedancer team at one of the better HFT firms wrote an AVX512 optimized implementation of ed25519 and X25519 that’s significantly faster than OpenSSL.

https://github.com/firedancer-io/firedancer/pull/716

Ditto for sha256: https://github.com/firedancer-io/firedancer/pull/778

And sha512: https://github.com/firedancer-io/firedancer/pull/760

If you’re an optimization nerd, this codebase is wild.

nanolith
The formal methods nerd in me is happy to see HOL Light being used to formally verify this implementation. I'm curious to see how closely their abstract machine models follow specific machine implementations. OOO, speculation, and deep pipelining have non-trivial impacts on potential side channels, and these vary quite a bit by stepping and architecture.
notfed
> The x25519 algorithm also plays a role in post-quantum safe cryptographic solutions, having been included as the classical algorithm in the TLS 1.3 and SSH hybrid scheme specifications for post-quantum key agreement.

Really though? This mostly-untrue statement is the line that warrants adding hashtag #post-quantum-cryptography to the blogpost?

saghm
My (probably naive) understanding is that 25519 already provided better performance than other algorithms used for similar purposes (e.g. RSA) when tuned for a roughly similar level of security; anecdotally, generating 2048-bit or larger RSA keys for me tends to be a lot slower than ed25519. At times I've run into places that require me to use RSA keys though (ironically, I seem to remember first experiencing this with AWS years back, although I honestly can't recall if this is still the case or not).

If this further improvement becomes widely used, it would be interesting to see if it's enough to tip the scales towards ed25519 being more of the de facto "default" ssh key algorithm. My experience is that a decent number of people still use RSA keys most of the time, but I don't feel like I have nearly enough of a sample size to conclude anything significant from that.

aseipp
I was aware of s2n-bignum which is a very cool project, but apparently there is a larger sister project, aws-lc, that aims for broader set of APIs including OpenSSL compatibility, while retaining the general approach and vibe (lots of formal verification + performance work): https://github.com/aws/aws-lc

That's pretty sweet. I'm currently using BoringSSL in a project as a supplement to OpenSSL (mostly because it is much easier to build for Windows users than requiring them to fiddle with msys2/vcpkg etc; the alternative is to rely on the Windows CNG API, but it lacks features like ed25519 support.) I wonder how much effort it would take to use aws-lc instead... Not that I'm that interested, BSSL is pretty good, but free performance and heavy automated verification is always nice :)

Related: one of the authors of this post, John Harrison, wrote a really good book about automated theorm proving about 15 years ago while working on floating point verification at Intel -- there's still no other book quite like this one, I think https://www.cl.cam.ac.uk/~jrh13/

fefe23
Holy shit these claims are wild! It's not just a percent more performance here and there, the graphs look more like 50% more throughput on the same hardware (depending on the cpu architecture).

My immediate fear was that they optimized away the security features like absence of timing side channels, but they say they still have those.

They also claim to have formal proof of correctness, which is even more amazing, because they are not doing it on a symbolic level but on a machine instruction level. Apparently they tought their reasoning system the semantics of all the CPU instructions used in the assembler implementation.

I'll still wait what djb has to say about this, but it looks freaking amazing to me.

jonmon6691
I'm assuming when they say that this improves user experience, that it implies the use case is primarily TLS. In which case store-now-decrypt-later attacks are already considered an urgent threat with regard to post quantum crypto. With FIPS 203 being released and Chrome is already using an implementation based on the draft standard, this seems like this algo (at least for TLS) should be on its way out.
webXL
Why don't they just focus on making a Gravitron variant with those algorithms in the circuitry?
londons_explore
Does 25519 suffer from key/data-dependant execution time?

Is this implementation resistant to that?

If it isn't, it's kinda a footgun which shouldn't be published for general use.