An open source, header-only library that provides fast, immutable, constexpr-compatible implementation of std::set, std::map, std::unordered_map and std::unordered_set to C++14 users. It can be used as an alternative to gperf.

Introduction

I've always been frustrated, when initializing a const std::set<int> keys = {1, 2, 3}, to see that the structure is initialized at runtime. Gast, it's marked as const! Couldn't this be a simple binary search in a sorted, constant array of ints?

And what about this one: const std::unordered_set<int> keys = {4, 5, 6}? By the eyes of my Bonnie Mary, one could find a hash function without collision for these three little pigs and use it, instead of doing the whole stuff at runtime!

This should even make lookup faster, right?

Actually, there already exists solutions to the second problem, known as perfect minimal hashing. You may have heard of gperf. It does exactly what we want: take a list of values, find a collision-free hash function for these values and generate the appropriate C code. It slightly complicates the build but it gets the jobs done.

This article uses C++14-powered constexpr to solve both issues, initializing containers at compile time, resulting in 0-cost initialization and fast lookup. The whole stuff is bundled in the frozen library.

It's available under the Apache license 2.0 on github.

std::set<> and std::map<>

std::set<> and std::map<> are standard containers that only require their keys to be comparable. The lookup is required to be in O(log(n)) \def\pelican{\textrm{pelican}^2} \mathcal{O}(log(n)) and the implementation is usually based on a red-black tree to keep a balanced tree. The lookup is done using a binary search.

For an immutable set, there's no need for a red-black tree, a plain sorted array is enough. The constexpr constructor is in charge of sorting the initializer list, storing it into the array and then, the lookup is implemented using a standard binary search.

From the user point of view, the structure declaration only requires an extra size template parameter:

#include <frozen/set.h>

constexpr frozen::set<int, 3> keys = {1, 2, 3};

int some_user(int key) {
    return keys.count(key);
}

Because keys is used in a non-constexpr context by some_user, it does not only live in the constexpr world. Let's have a look at the generated assembly to find out why:

0000000000000000 <_Z9some_useri>:
   0:    83 ff 02                 cmp    edi,0x2
   3:    7e 10                    jle    15 <_Z9some_useri+0x15>
   5:    b8 03 00 00 00           mov    eax,0x3
   a:    83 ff 03                 cmp    edi,0x3
   d:    74 0e                    je     1d <_Z9some_useri+0x1d>
   f:    31 c0                    xor    eax,eax
  11:    0f b6 c0                 movzx  eax,al
  14:    c3                       ret
  15:    0f 94 c0                 sete   al
  18:    0f b6 c0                 movzx  eax,al
  1b:    ff c0                    inc    eax
  1d:    39 f8                    cmp    eax,edi
  1f:    0f 9e c0                 setle  al
  22:    0f b6 c0                 movzx  eax,al
  25:    c3                       ret

The binary search is actually unrolled, and the keys are set as immediate operands. Brest!

When trying with a larger set of elements, the unrolling + inlining still works nicely and the body of the lookup ends in a very large function, without any function call.

Implementation detail

The binary search is implemented as a recursive function, specialized for the size of the current range. This takes advantage of the compile-time information on the set size, and ends up with a fully unrolled lookup. The text-book iterative algorithm uses a while loop which performs very badly in that case, while this version plays very well with the compiler, or at least with Clang 5. Note that fully unrolling the binary search may be a bad choice, especially if it happens to stress the instruction cache too much. Yet, it works flawlessly for small sets!

template <class T, class Compare> struct LowerBound {
  T const &value;
  Compare const &compare;
  constexpr LowerBound(T const &value, Compare const &compare)
      : value(value), compare(compare) {}

  template <class ForwardIt>
  inline constexpr ForwardIt doit(ForwardIt first,
                                  std::integral_constant<std::size_t, 0>) {
    return first;
  }

  template <class ForwardIt, std::size_t N>
  inline constexpr ForwardIt doit(ForwardIt first,
                                  std::integral_constant<std::size_t, N>) {
    auto constexpr step = N / 2;
    auto it = first + step;
    if (compare(*it, value)) {
      auto constexpr next_count = N - (step + 1);
      return doit(it + 1, std::integral_constant<std::size_t, next_count>{});
    } else {
      auto constexpr next_count = step;
      return doit(first, std::integral_constant<std::size_t, next_count>{});
    }
  }
};
/*...*/

std::unordered_set<> and std::unordered_map<>

The main feature of frozen is a static version of std::unordered_set<> and std::unordered_map<>. Finding a collision-free hashing function for a given set of keys is a well-studied problem, known as perfect (eventually minimal) hashing problem. I did not invent anything here, just read this great blogpost that provides a simple algorithm in Python, converted into a constexpr version and voilà. For the lazy ones, the whole idea is to use a first regular hashing function and fill the buckets. For those with collisions, order by number of collisions, we iteratively try another hashing function parametrized by a seed. Once we find a seed parameter that makes all keys fall into an empty slot, we move to the next bucket and so on until only collision-free entries are left, then we put them into the empty slots. This uses an auxiliary table to hold the extra seeds and collision-free displacements.

Using this algorithm, it's possible to declare a frozen unordered set like this:

#include <frozen/unordered_set.h>

constexpr frozen::unordered_set<int, 3> keys = {1,2,4};

int some_user(int key) {
    return keys.count(key);
}

with the guarantee that the frozen::unordered_set<int, 3>::count(int) call is collision-free.

The assembly dump of the some_user function tells us more about the implementation:

0000000000000000 <_Z9some_useri>:
   0:    89 f8                    mov    eax,edi
   2:    83 e0 03                 and    eax,0x3
   5:    48 8b 04 c5 00 00 00     mov    rax,QWORD PTR [rax*8+0x0]
   c:    00
   d:    48 85 c0                 test   rax,rax
  10:    78 45                    js     57 <_Z9some_useri+0x57>
  12:    48 63 cf                 movsxd rcx,edi
  15:    48 31 c8                 xor    rax,rcx
  18:    48 89 c1                 mov    rcx,rax
  1b:    48 f7 d1                 not    rcx
  1e:    48 c1 e0 15              shl    rax,0x15
  22:    48 01 c8                 add    rax,rcx
  25:    48 89 c1                 mov    rcx,rax
  28:    48 c1 e9 18              shr    rcx,0x18
  2c:    48 31 c1                 xor    rcx,rax
  2f:    48 69 c1 09 01 00 00     imul   rax,rcx,0x109
  36:    48 89 c1                 mov    rcx,rax
  39:    48 c1 e9 0e              shr    rcx,0xe
  3d:    31 c1                    xor    ecx,eax
  3f:    6b c1 15                 imul   eax,ecx,0x15
  42:    89 c1                    mov    ecx,eax
  44:    c1 e9 1c                 shr    ecx,0x1c
  47:    31 c1                    xor    ecx,eax
  49:    89 c8                    mov    eax,ecx
  4b:    c1 e0 1f                 shl    eax,0x1f
  4e:    29 c8                    sub    eax,ecx
  50:    f7 d8                    neg    eax
  52:    83 e0 03                 and    eax,0x3
  55:    eb 03                    jmp    5a <_Z9some_useri+0x5a>
  57:    48 f7 d0                 not    rax
  5a:    48 8b 0c c5 00 00 00     mov    rcx,QWORD PTR [rax*8+0x0]
  61:    00
  62:    31 c0                    xor    eax,eax
  64:    39 3c 8d 00 00 00 00     cmp    DWORD PTR [rcx*4+0x0],edi
  6b:    0f 94 c0                 sete   al
  6e:    c3                       ret

The first and performs the first (simplistic) hashing. Then it performs a lookup in the auxiliary table and based on the result, it either computes the auxiliary hash (based on the extra seed from the auxiliary table), or directly picks the index. Either way, it performs the comparison between the argument and the candidate and returns the result.

The string Case

A typical candidate for use with frozen are... strings. And because std::string cannot be used in a constexpr environment and std::string_view is only available in C++17 (with a constexpr constructor!), frozen provides a frozen::string class that acts as a view on existing data. The operator""_s can be used to easily convert C-style strings to this class.

Bonus: Usage in Constant Context

That's anecdotal, but the following is possible with frozen:

constexpr frozen::unordered_set<int, 4> UnluckyNumbers = {
    4, // from china
    9, // from japan
    17, // from Italy
    13, // Triskaidekaphobia
};

constexpr int value = ...;
static_assert(!UnluckyNumbers.count(value), "you're program is ill-blessed in some geographical location!");

Comparison to gperf

The gperf tool proposes an ahead-of-time compiler that generates perfect, minimal hash.

Consider this list of words, listed in the gperf input file format:

%%
Coeus
Crius
Cronus
Hyperion
Iapetus
Mnemosyne
Oceanus
Phoebe
Rhea
Tethys
Theia
Themis
Asteria
Astraeus
Atlas
Aura
Clymene
Dione
Helios
Selene
Eos
Epimetheus
Eurybia
Eurynome
Lelantos
Leto
Menoetius
Metis
Ophion
Pallas
Perses
Prometheus
Styx
%%

Running gperf titan.in > titan.c produces a C file that contains a const char * in_word_set(const char *str, unsigned int len) function to perform the check.

The equivalent C++ code based on frozen is the following, using frozen::string:

#include <frozen/unordered_set.h>
#include <frozen/string.h>

using namespace frozen::string_literals;

constexpr frozen::unordered_set<frozen::string> Titans = {
    "Coeus"_s, "Crius"_s, "Cronus"_s, "Hyperion"_s,
    "Iapetus"_s, "Mnemosyne"_s, "Oceanus"_s, "Phoebe"_s,
    "Rhea"_s, "Tethys"_s, "Theia"_s, "Themis"_s,
    "Asteria"_s, "Astraeus"_s, "Atlas"_s, "Aura"_s,
    "Clymene"_s, "Dione"_s, "Helios"_s, "Selene"_s,
    "Eos"_s, "Epimetheus"_s, "Eurybia"_s, "Eurynome"_s,
    "Lelantos"_s, "Leto"_s, "Menoetius"_s, "Metis"_s,
    "Ophion"_s, "Pallas"_s, "Perses"_s, "Prometheus"_s,
    "Styx"_s,
};

The benchmarking program is the following:

#include <iostream>
#include <string>
#include <unordered_set>
#include <chrono>

int main() {
    const std::unordered_set<std::string> Titans = {
        "Coeus", "Crius", "Cronus", "Hyperion",
        "Iapetus", "Mnemosyne", "Oceanus", "Phoebe",
        "Rhea", "Tethys", "Theia", "Themis",
        "Asteria", "Astraeus", "Atlas", "Aura",
        "Clymene", "Dione", "Helios", "Selene",
        "Eos", "Epimetheus", "Eurybia", "Eurynome",
        "Lelantos", "Leto", "Menoetius", "Metis",
        "Ophion", "Pallas", "Perses", "Prometheus",
        "Styx",
    };

  std::string line;
  unsigned count = 0;

  std::chrono::duration<double, std::milli> duration{0};

  while(std::cin) {
    std::getline(std::cin, line);
    auto start = std::chrono::steady_clock::now();
    if(Titans.count(line))
      count += 1;
    auto stop = std::chrono::steady_clock::now();
    duration += stop - start;
  }
  std::cout << "duration: " << duration.count() << " ms" << std::endl;
  std::cout << count << std::endl;

  return 0;
}

Yes, it's probably I/O bound, but as we only measure the hashing time, it still gives a hint on the hashing time when the key is not in the set.

And here is the result, using /usr/share/dict/british-english-large as input (fun fact: 13 titans are present in this dictionary):

Engine Timing (ms)
std 57
gperf 5
frozen 9

So frozen is within the reach of gperf, and both are way faster than the standard solution.

That's still frustrating. Fortunately, we could change the hashing function (djb2 and FNV-1a) generator used by frozen to improve this situation. The default is:

template <> struct elsa<string> {
  constexpr std::size_t operator()(string value) const {
    std::size_t d = 5381;
    for (std::size_t i = 0; i < value.size; ++i)
      d = (d * 33) + value.data[i];
    return d;
  }
  constexpr std::size_t operator()(string value, std::size_t seed) const {
    std::size_t d = seed;
    for (std::size_t i = 0; i < value.size; ++i)
      d = (d * 0x01000193) ^ value.data[i];
    return d;
  }
};

But using the much simpler:

struct olaf {
  constexpr std::size_t operator()(frozen::string value) const {
    return value.size;
  }
  constexpr std::size_t operator()(frozen::string value, std::size_t seed) const {
    std::size_t d = seed;
    std::size_t bound = std::min(value.size, (std::size_t)2);
    for (std::size_t i = 0; i < bound; ++i)
      d = (d * 0x01000193) ^ value.data[i];
    return d;
  }
};

constexpr frozen::unordered_set<frozen::string, 33, olaf> Titans = {...};

We get the same timings as gperf \o/. This generator is less robust though, so not suitable as a default.

Conclusion

Programming with constexpr in the post C++14 world is great. It makes it possible to achieve the same job as code generator while only relying on a standard C++ compiler. And in the case of perfect hashing, it gives very interesting result! One note though: on my laptop, using frozen with GCC is currently a pain, because the compiler hangs for no apparent good reason; Let's report this!

Thanks

First, thanks to Steve Hanov for his great blog.

Then thanks to Romain and Adrien for their feedback, and all the Quarkslab proof readers, especially w1gz!


If you would like to learn more about our security audits and explore how we can help you, get in touch with us!