How I got 7 times faster PBKDF2 in .NET

As developers we often ask ourselves, "What is the fastest way to do X in Y?"

The answer may be interesting, but is usually not worth the extra effort because you seldom write code that scales to the point where micro optimizations become significantly relevant. But we sometimes still wonder if we implemented a certain task so it operates in the fastest way.

In some cases, the answer is simple. "The fastest way to do X in python" is to find an implementation in C and just call that library function. This is how some math libraries operate for example. Python is one of if not the best language to call functions written in faster languages.

.NET is already reasonably fast, safe and quite elegant, so using C libraries (read: "unmanaged code") is usually not a big improvement. Your standard I/O and maths tasks will complete sufficiently fast. And yet, CPU bound tasks can be accelerated a lot by using unmanaged code. And PBKDF2 just happens to be a lot of CPU work. The fastest way to compare two byte arrays for equality is to call into memcmp from the msvcrt dll. .NET even supports renaming the function for managed use so you won't even notice you're calling into unmanaged code except for the 3rd parameter that would not be needed for a managed implementation.

[DllImport("msvcrt.dll", EntryPoint = "memcmp")]
public static extern int CompareBytes(byte[] b1, byte[] b2, int count);

How I got here

.onion Authentication

Did you know that Tor hidden services (.onion addresses) can limit access to a select few people?

This is normally not done and most people that use Tor have never experienced this. If you host a .onion service yourself, Tor creates a directory for it where it stores keys and other information. You will find an "authorized_clients" directory inside of it. This directory can be filled with public key files. The next time Tor restarts, it encrypts the .onion descriptor with all keys found in that directory. The encryption is done in a way that any single key can decrypt the data. Without a key, you can't access the onion even if you know it's alive. The decryption process happens on the user side and passively. A .onion operator doesn't knows who is accessing their onion service.

.onion authentication failed

The client can paste the private key into Tor browser when asked. You can also specify a directory for Tor to look for private keys. This is useful if you use other applications besides the Tor Browser.

What has any of this to do with PBKDF2 or .NET?

Nothing. The keys are using Curve25519.

It's common to name curves after some of the numbers that define them. Curve25519 uses a prime field of 2255-19.

Keys of course have a weakness. This weakness is you. Sooner or later some people will lose their keys. And some people may not want to store a key because they use an OS like Tails that resets after every use. It would be really nice if you could deterministically derive keys from a passphrase. Some programs allow you to do this. Some crypto currency wallets have a mode where they use a passphrase to generate the addresses.

Passphrases of course have a weakness. This weakness is you. (spot the pattern yet?). Most people still don't use a password manager and will happily use the same password everywhere. This is where a key derivation scheme comes in. It converts unsafe human input into safe(r) machine usable output in a way that annoys people that try to break passwords.

PBKDF2

This is a very common key derivation algorithm. It's quite old but when properly used, still in the "good enough" category. It takes 5 parameters:

In our case, we use a deterministically generated salt because we're only interested in the key part. If we were to use a salt, we could use the .onion name for example.

PBKDF2 was convenient to use because I wrote my first onion key generator in JavaScript where this algorithm is available in all modern browsers, desktop and mobile.

The full key generator is available as an onion service itself here, and as a clearweb website here. The JS files are neither combined nor minified so feel free to explore.

The implementation in JS is actually quite fast because it uses the browsers crypto module which is written in native code instead of JS. And not just desktops, it's also very fast on mobile devices. An increasing number of processors contain instructions specifically for common cryptographic operations.

You can benchmark your browser with the button below. It will start with a small iteration count and then increases it in set levels up to 3 million iterations. It stops when either all tests are performed, or when a test takes longer than 3 seconds.

From PBKDF2 to Curve25519

Curve25519 accepts pretty much any sequence of 32 bytes as valid private key. There are 5 bits that are hardcoded.

In C#, this can be expressed as follows:

KeyBytes[0]  &= 0xF8; //Rightmost 3 bits always zero
KeyBytes[31] &= 0x7F; //Leftmost bit always zero
KeyBytes[31] |= 0x40; //Bit next to leftmost always one.

.NET PBKDF2

A .NET application is nice to have, because a browser based key generator cannot directly modify files and directories on your computer, which means you will end up copy-pasting and this is not really user friendly. It's enough for a basic key generator, but not a key manager.

Because of the fast JS implementation, I went with 1 million iterations by default. Because not every device is as powerful as mine (AMD Ryzen 9 5950X) my generator offers a faster mode with only 100'000 iterations. Interesting is that Firefox on my Samsung Galaxy phone actually beats Firefox on my Windows.

1 million iterations may take a few seconds, but this is acceptable in this case because you're not constantly generating new keys using this method. Also this only slows down the key generation, not the usage of the key itself.

Now finally comes the .NET part. PBKDF2 in .NET is so much easier than it is in JS:

//Rfc2898 is the PBKDF2 standard. 
using (var PBKDF = new Rfc2898DeriveBytes(Password, Salt, 1000000, new HashAlgorithmName("SHA512")))
{
    return PBKDF.GetBytes(32);
}

Easy enough. Run it and wait.

And wait.

And wait.

Did I add to many zeroes to the iteration value?

And wait.

WTF?

And wait.

And finally, you get your 32 bytes.

This took way too long, but the bytes are identical to those generated in the browser, so the algoritm is definitely correct. Running it without the debugger has no significant impact either. There's also no difference between debug and release mode. Release mode optimizes code, but only our code, not methods from the framework itself.

.NET has a Stopwatch class to measure time. We can surround the GetBytes call with it:

var SW = System.Diagnostics.Stopwatch.StartNew();
PBKDF.GetBytes(32);
SW.Stop();
Console.WriteLine(SW.ElapsedMilliseconds);

It prints 18887 on my machine. That is 10 times slower than the JS version.

This is unacceptable. I didn't dig deeper as to why it is so much slower, but it's very likely that the implementation is entirely in managed code, and remember how I said that for CPU bound work .NET may be slow? This is probably it.

Now I also provided you with a solution to that question: Unmanaged code.

Why stay in the safety of the .NET when we can trivially descend into the madness that is the world of "how-to-set-your-pc-on-fire" languages.

The Windows crypto API

If you have to work with unmanaged code, you normally go to pinvoke.net, search for the function you need, and blindly copy-paste I mean carefully evaluate the correct solution and integrate it into your code.

Nobody has added the BCrypt API yet, so we have to do the unthinkable: Reading documentation.

Luckily, Microsoft is really good at writing documentation. Their documentation is likely one of the first results when you search for <thing> windows api and indeed when looking for PBKDF2, you likely land on this documentation page.

It tells you the function layout, below it describes the parameters, and finally the files where the function is declared.

The types may be confusing but some cleverly crafted search queries will quickly yield results.

Figuring out .NET types

The basic types are declared in windows.h.

If you struggle to find an API specific type, look what the header file is called at the bottom of the page. The first function parameter in this case is BCRYPT_ALG_HANDLE but nowhere will it say what native type this actually is. The solution is to search for the type in the header file.

A good query is "typedef BCRYPT_ALG_HANDLE" inurl:bcrypt.h and it will likely land you at a github repository of the Windows 10 SDK where you find that this handle is defined as PVOID. Using the same trick to search for PVOID in windows.h tells you it's void* in C. This is simply IntPtr in C#.

Tip: A windows type that starts with a P or LP is likely a pointer.

Using unmanaged PBKDF2

It's actually fairly simple:

  1. Use BCryptOpenAlgorithmProvider to get a handle to the SHA512 function.
  2. Use BCryptDeriveKeyPBKDF2 to derive the number of bytes we need
  3. Use BCryptCloseAlgorithmProvider to close the handle from step 1 again.

And that's it. The function arguments are fairly easy to implement on the .NET side. We don't have to deal with any structs with inline data or functions as arguments. They're all byte arrays, strings and numbers only. And .NET converts a byte array into an unmanaged memory region and back again automatically.

I will not add code here. The fully functional code file can be found at the end

Time difference

We can use the Stopwatch class again to measure the time. First we do the .NET method, followed by our umanaged method. We use a different password for both to make sure the crypto API can't cache anything.

Result

This is approximately 1 second slower than the web browser crypto API, but still fast enough for the occasional key generation.

Tested browser was Firefox v93 on Windows 10 x64.

Can we go faster?

Yes. We could use a different unmanaged PBKDF2 implementation that is optimized for speed, and there are implementations out there that beat OpenSSL.

Location of the Prefer 32 bit checkbox

But there's another way we can improve performance by a lot. It involves the extremely complicated step of disabling a single checkbox. You find it labeled "Prefer 32-bit" in your project properties in the "Build" section (see image). Do not forget to uncheck this for "Debug" as well as "Release"

Disabling this option gets us these results:

The .NET version is now 3 times faster, and the BCrypt API version is 2 times faster. Our unmanaged .NET solution is now faster than the browser crypto API.

So if your .NET code seems to perform less than optimal, uncheck that checkbox to make use of the 64 bit architecture if it's available. The checkbox is not available for all .NET project types.

If you have trouble with other libraries when enabling this option, see here on how to dynamically select 32 or 64 bit DLL files at runtime.

Finalizing

Whenever we use unmanaged APIs, it's important to properly clean up after using them. The easiest way is to build SafeHandle wrappers around anything that needs a "close" call of some sort.

If your application is supposed to run on other systems besides Windows, you should implement OS detection (See Environment.OSVersion.Platform == PlatformID.Win32NT) and fall back to the internal .NET mechanism if necessary. If your application needs to run on older NT versions (pre Vista), you should do additional version checks as the ummanaged API in question was not available back there.

Or be lazy and just do try{/*unmanaged*/}catch{/*managed*/}

Possible exceptions:

Final code

The final code for a working unmanaged PBKDF2 implementation is shown below. At the very end is a link to open the raw code file for easier copy/paste.

The code can soon be seen in action in the .NET onion authentication generator. The first and only release was just a generator, but it's currently being rewritten to also act as a key manager, which will support deterministic keys.

using Microsoft.Win32.SafeHandles;
using System;
using System.ComponentModel;
using System.Diagnostics;
using System.Runtime.ConstrainedExecution;
using System.Runtime.InteropServices;
using System.Text;

namespace SpeedyGonzales
{
    /// <summary>
    /// Provides PBKDF2 key derivation function from unmanaged windows API
    /// </summary>
    /// <remarks>
    /// This is massively faster than the .NET internal mechanism
    /// </remarks>
    public static class UnmanagedPBKDF2
    {
        /// <summary>
        /// Flags for BCrypt functions
        /// </summary>
        private enum BCryptFlags : int
        {
            /// <summary>
            /// No flags
            /// </summary>
            NO_FLAGS = 0,
            /// <summary>
            /// Hash provider operates in HMAC mode
            /// </summary>
            BCRYPT_ALG_HANDLE_HMAC_FLAG = 0x8
        }

        /// <summary>
        /// Derive a key using PBKDF2
        /// </summary>
        /// <param name="hAlgorithm">Algorithm opened using <see cref="BCryptOpenAlgorithmProvider"/></param>
        /// <param name="pbPassword">Private piece of key material</param>
        /// <param name="cbPassword">Byte length of <paramref name="pbPassword"/></param>
        /// <param name="pbSalt">Public piece of key material</param>
        /// <param name="cbSalt">Byte length of <paramref name="pbSalt"/></param>
        /// <param name="cIterations">Number of iterations</param>
        /// <param name="pbDerivedKey">Memory to hold derived bytes</param>
        /// <param name="cbDerivedKey">Length of <paramref name="pbDerivedKey"/></param>
        /// <param name="dwFlags">Reserved. Must be <see cref="BCryptFlags.NO_FLAGS"/></param>
        /// <returns>NT status</returns>
        /// <seealso cref="https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptderivekeypbkdf2"/>
        [DllImport("Bcrypt.dll")]
        private static extern int BCryptDeriveKeyPBKDF2(
            IntPtr hAlgorithm,
            byte[] pbPassword,
            int cbPassword,
            byte[] pbSalt,
            int cbSalt,
            long cIterations,
            [Out]
            byte[] pbDerivedKey,
            int cbDerivedKey,
            BCryptFlags dwFlags
            );

        /// <summary>
        /// Opens a BCrypt algorithm
        /// </summary>
        /// <param name="phAlgorithm">Pointer to algorithm</param>
        /// <param name="pszAlgId">Algorithm name</param>
        /// <param name="pszImplementation">Implementation name. <see cref="null"/> for default</param>
        /// <param name="dwFlags">Algorithm flags</param>
        /// <returns>NT status</returns>
        /// <seealso cref="https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptopenalgorithmprovider"/>
        [DllImport("Bcrypt.dll")]
        private static extern int BCryptOpenAlgorithmProvider(
            out IntPtr phAlgorithm,
            [MarshalAs(UnmanagedType.LPWStr)]
            string pszAlgId,
            [MarshalAs(UnmanagedType.LPWStr)]
            string pszImplementation,
            BCryptFlags dwFlags
            );

        /// <summary>
        /// Closes algorithm that was opened with <see cref="BCryptOpenAlgorithmProvider(out IntPtr, string, string, BCryptFlags)"/>
        /// </summary>
        /// <param name="phAlgorithm">Algorithm handle</param>
        /// <param name="dwFlags">Must be <see cref="BCryptFlags.NO_FLAGS"/></param>
        /// <returns>NT status</returns>
        /// <seealso cref="https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptclosealgorithmprovider"/>
        [DllImport("Bcrypt.dll")]
        private static extern int BCryptCloseAlgorithmProvider(
            IntPtr phAlgorithm,
            BCryptFlags dwFlags
            );

        /// <summary>
        /// Derives a key using PBKDF2
        /// </summary>
        /// <param name="Key">Password</param>
        /// <param name="Salt">Salt</param>
        /// <param name="Iterations">Iteration count</param>
        /// <param name="ByteCount">Number of bytes to derive</param>
        /// <returns>Derived bytes</returns>
        public static byte[] PBKDF2(string Key, byte[] Salt, int Iterations, int ByteCount)
        {
            return PBKDF2(Encoding.UTF8.GetBytes(Key), Salt, Iterations, ByteCount);
        }

        /// <summary>
        /// Derives a key using PBKDF2
        /// </summary>
        /// <param name="Key">Password</param>
        /// <param name="Salt">Salt</param>
        /// <param name="Iterations">Iteration count</param>
        /// <param name="ByteCount">Number of bytes to derive</param>
        /// <returns>Derived bytes</returns>
        public static byte[] PBKDF2(byte[] Key, byte[] Salt, int Iterations, int ByteCount)
        {
            if (Key == null)
            {
                throw new ArgumentNullException(nameof(Key));
            }

            if (Salt == null)
            {
                throw new ArgumentNullException(nameof(Salt));
            }
            if (Iterations < KeyDerivation.MinIterations)
            {
                throw new ArgumentOutOfRangeException(nameof(Iterations), $"Value must be at least {KeyDerivation.MinIterations}");
            }

            if (ByteCount < 1)
            {
                throw new ArgumentOutOfRangeException(nameof(ByteCount), "Value must be at least 1");
            }

            int Ret = BCryptOpenAlgorithmProvider(out IntPtr Alg, "SHA512", null, BCryptFlags.BCRYPT_ALG_HANDLE_HMAC_FLAG);
            using (var Handle = new SafeBcryptAlgorithmHandle(Alg))
            {
                if (Alg != IntPtr.Zero)
                {
                    Debug.Print($"Opened: Ptr: {Alg}");

                    byte[] Result = new byte[ByteCount];

                    Ret = BCryptDeriveKeyPBKDF2(
                        Alg,
                        Key, Key.Length,
                        Salt, Salt.Length,
                        Iterations,
                        Result, Result.Length,
                        BCryptFlags.NO_FLAGS);
                    if (Ret != 0)
                    {
                        throw new Win32Exception($"Key generator failed with NT Status {Ret}");
                    }
                    return Result;
                }
                else
                {
                    throw new Win32Exception($"Open failed with NT Status {Ret}");
                }
            }
        }

        /// <summary>
        /// Safe handle for BCrypt algorithm
        /// </summary>
        private class SafeBcryptAlgorithmHandle : SafeHandleZeroOrMinusOneIsInvalid
        {
            public SafeBcryptAlgorithmHandle(IntPtr Handle) : base(true)
            {
                SetHandle(Handle);
            }

            [ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]
            protected override bool ReleaseHandle()
            {
                BCryptCloseAlgorithmProvider(handle, BCryptFlags.NO_FLAGS);
                SetHandleAsInvalid();
                return true;
            }
        }
    }
}

Open UnmanagedPBKDF2.cs

Copyright © 2021 by Kevin Gut 📧 | More services | Generated for 3.15.208.126