An open source, header-only library that provides fast, immutable, constexpr-compatible implementation of std::set, std::map, std::unordered_map and std::unordered_set to C++14 users. It can be used as an alternative to gperf.
Introduction
I've always been frustrated, when initializing a const std::set<int> keys = {1, 2, 3}, to see that the structure is initialized at runtime. Gast, it's marked as const! Couldn't this be a simple binary search in a sorted, constant array of ints?
And what about this one: const std::unordered_set<int> keys = {4, 5, 6}? By the eyes of my Bonnie Mary, one could find a hash function without collision for these three little pigs and use it, instead of doing the whole stuff at runtime!
This should even make lookup faster, right?
Actually, there already exists solutions to the second problem, known as perfect minimal hashing. You may have heard of gperf. It does exactly what we want: take a list of values, find a collision-free hash function for these values and generate the appropriate C code. It slightly complicates the build but it gets the jobs done.
This article uses C++14-powered constexpr to solve both issues, initializing containers at compile time, resulting in 0-cost initialization and fast lookup. The whole stuff is bundled in the frozen library.
It's available under the Apache license 2.0 on github.
std::set<> and std::map<>
std::set<> and std::map<> are standard containers that only require their keys to be comparable. The lookup is required to be in and the implementation is usually based on a red-black tree to keep a balanced tree. The lookup is done using a binary search.
For an immutable set, there's no need for a red-black tree, a plain sorted array is enough. The constexpr constructor is in charge of sorting the initializer list, storing it into the array and then, the lookup is implemented using a standard binary search.
From the user point of view, the structure declaration only requires an extra size template parameter:
#include <frozen/set.h>
constexpr frozen::set<int, 3> keys = {1, 2, 3};
int some_user(int key) {
return keys.count(key);
}
Because keys is used in a non-constexpr context by some_user, it does not only live in the constexpr world. Let's have a look at the generated assembly to find out why:
0000000000000000 <_Z9some_useri>:
0: 83 ff 02 cmp edi,0x2
3: 7e 10 jle 15 <_Z9some_useri+0x15>
5: b8 03 00 00 00 mov eax,0x3
a: 83 ff 03 cmp edi,0x3
d: 74 0e je 1d <_Z9some_useri+0x1d>
f: 31 c0 xor eax,eax
11: 0f b6 c0 movzx eax,al
14: c3 ret
15: 0f 94 c0 sete al
18: 0f b6 c0 movzx eax,al
1b: ff c0 inc eax
1d: 39 f8 cmp eax,edi
1f: 0f 9e c0 setle al
22: 0f b6 c0 movzx eax,al
25: c3 ret
The binary search is actually unrolled, and the keys are set as immediate operands. Brest!
When trying with a larger set of elements, the unrolling + inlining still works nicely and the body of the lookup ends in a very large function, without any function call.
Implementation detail
The binary search is implemented as a recursive function, specialized for the size of the current range. This takes advantage of the compile-time information on the set size, and ends up with a fully unrolled lookup. The text-book iterative algorithm uses a while loop which performs very badly in that case, while this version plays very well with the compiler, or at least with Clang 5. Note that fully unrolling the binary search may be a bad choice, especially if it happens to stress the instruction cache too much. Yet, it works flawlessly for small sets!
template <class T, class Compare> struct LowerBound {
T const &value;
Compare const &compare;
constexpr LowerBound(T const &value, Compare const &compare)
: value(value), compare(compare) {}
template <class ForwardIt>
inline constexpr ForwardIt doit(ForwardIt first,
std::integral_constant<std::size_t, 0>) {
return first;
}
template <class ForwardIt, std::size_t N>
inline constexpr ForwardIt doit(ForwardIt first,
std::integral_constant<std::size_t, N>) {
auto constexpr step = N / 2;
auto it = first + step;
if (compare(*it, value)) {
auto constexpr next_count = N - (step + 1);
return doit(it + 1, std::integral_constant<std::size_t, next_count>{});
} else {
auto constexpr next_count = step;
return doit(first, std::integral_constant<std::size_t, next_count>{});
}
}
};
/*...*/
std::unordered_set<> and std::unordered_map<>
The main feature of frozen is a static version of std::unordered_set<> and std::unordered_map<>. Finding a collision-free hashing function for a given set of keys is a well-studied problem, known as perfect (eventually minimal) hashing problem. I did not invent anything here, just read this great blogpost that provides a simple algorithm in Python, converted into a constexpr version and voilà. For the lazy ones, the whole idea is to use a first regular hashing function and fill the buckets. For those with collisions, order by number of collisions, we iteratively try another hashing function parametrized by a seed. Once we find a seed parameter that makes all keys fall into an empty slot, we move to the next bucket and so on until only collision-free entries are left, then we put them into the empty slots. This uses an auxiliary table to hold the extra seeds and collision-free displacements.
Using this algorithm, it's possible to declare a frozen unordered set like this:
#include <frozen/unordered_set.h>
constexpr frozen::unordered_set<int, 3> keys = {1,2,4};
int some_user(int key) {
return keys.count(key);
}
with the guarantee that the frozen::unordered_set<int, 3>::count(int) call is collision-free.
The assembly dump of the some_user function tells us more about the implementation:
0000000000000000 <_Z9some_useri>:
0: 89 f8 mov eax,edi
2: 83 e0 03 and eax,0x3
5: 48 8b 04 c5 00 00 00 mov rax,QWORD PTR [rax*8+0x0]
c: 00
d: 48 85 c0 test rax,rax
10: 78 45 js 57 <_Z9some_useri+0x57>
12: 48 63 cf movsxd rcx,edi
15: 48 31 c8 xor rax,rcx
18: 48 89 c1 mov rcx,rax
1b: 48 f7 d1 not rcx
1e: 48 c1 e0 15 shl rax,0x15
22: 48 01 c8 add rax,rcx
25: 48 89 c1 mov rcx,rax
28: 48 c1 e9 18 shr rcx,0x18
2c: 48 31 c1 xor rcx,rax
2f: 48 69 c1 09 01 00 00 imul rax,rcx,0x109
36: 48 89 c1 mov rcx,rax
39: 48 c1 e9 0e shr rcx,0xe
3d: 31 c1 xor ecx,eax
3f: 6b c1 15 imul eax,ecx,0x15
42: 89 c1 mov ecx,eax
44: c1 e9 1c shr ecx,0x1c
47: 31 c1 xor ecx,eax
49: 89 c8 mov eax,ecx
4b: c1 e0 1f shl eax,0x1f
4e: 29 c8 sub eax,ecx
50: f7 d8 neg eax
52: 83 e0 03 and eax,0x3
55: eb 03 jmp 5a <_Z9some_useri+0x5a>
57: 48 f7 d0 not rax
5a: 48 8b 0c c5 00 00 00 mov rcx,QWORD PTR [rax*8+0x0]
61: 00
62: 31 c0 xor eax,eax
64: 39 3c 8d 00 00 00 00 cmp DWORD PTR [rcx*4+0x0],edi
6b: 0f 94 c0 sete al
6e: c3 ret
The first and performs the first (simplistic) hashing. Then it performs a lookup in the auxiliary table and based on the result, it either computes the auxiliary hash (based on the extra seed from the auxiliary table), or directly picks the index. Either way, it performs the comparison between the argument and the candidate and returns the result.
The string Case
A typical candidate for use with frozen are... strings. And because std::string cannot be used in a constexpr environment and std::string_view is only available in C++17 (with a constexpr constructor!), frozen provides a frozen::string class that acts as a view on existing data. The operator""_s can be used to easily convert C-style strings to this class.
Bonus: Usage in Constant Context
That's anecdotal, but the following is possible with frozen:
constexpr frozen::unordered_set<int, 4> UnluckyNumbers = {
4, // from china
9, // from japan
17, // from Italy
13, // Triskaidekaphobia
};
constexpr int value = ...;
static_assert(!UnluckyNumbers.count(value), "you're program is ill-blessed in some geographical location!");
Comparison to gperf
The gperf tool proposes an ahead-of-time compiler that generates perfect, minimal hash.
Consider this list of words, listed in the gperf input file format:
%% Coeus Crius Cronus Hyperion Iapetus Mnemosyne Oceanus Phoebe Rhea Tethys Theia Themis Asteria Astraeus Atlas Aura Clymene Dione Helios Selene Eos Epimetheus Eurybia Eurynome Lelantos Leto Menoetius Metis Ophion Pallas Perses Prometheus Styx %%
Running gperf titan.in > titan.c produces a C file that contains a const char * in_word_set(const char *str, unsigned int len) function to perform the check.
The equivalent C++ code based on frozen is the following, using frozen::string:
#include <frozen/unordered_set.h>
#include <frozen/string.h>
using namespace frozen::string_literals;
constexpr frozen::unordered_set<frozen::string> Titans = {
"Coeus"_s, "Crius"_s, "Cronus"_s, "Hyperion"_s,
"Iapetus"_s, "Mnemosyne"_s, "Oceanus"_s, "Phoebe"_s,
"Rhea"_s, "Tethys"_s, "Theia"_s, "Themis"_s,
"Asteria"_s, "Astraeus"_s, "Atlas"_s, "Aura"_s,
"Clymene"_s, "Dione"_s, "Helios"_s, "Selene"_s,
"Eos"_s, "Epimetheus"_s, "Eurybia"_s, "Eurynome"_s,
"Lelantos"_s, "Leto"_s, "Menoetius"_s, "Metis"_s,
"Ophion"_s, "Pallas"_s, "Perses"_s, "Prometheus"_s,
"Styx"_s,
};
The benchmarking program is the following:
#include <iostream>
#include <string>
#include <unordered_set>
#include <chrono>
int main() {
const std::unordered_set<std::string> Titans = {
"Coeus", "Crius", "Cronus", "Hyperion",
"Iapetus", "Mnemosyne", "Oceanus", "Phoebe",
"Rhea", "Tethys", "Theia", "Themis",
"Asteria", "Astraeus", "Atlas", "Aura",
"Clymene", "Dione", "Helios", "Selene",
"Eos", "Epimetheus", "Eurybia", "Eurynome",
"Lelantos", "Leto", "Menoetius", "Metis",
"Ophion", "Pallas", "Perses", "Prometheus",
"Styx",
};
std::string line;
unsigned count = 0;
std::chrono::duration<double, std::milli> duration{0};
while(std::cin) {
std::getline(std::cin, line);
auto start = std::chrono::steady_clock::now();
if(Titans.count(line))
count += 1;
auto stop = std::chrono::steady_clock::now();
duration += stop - start;
}
std::cout << "duration: " << duration.count() << " ms" << std::endl;
std::cout << count << std::endl;
return 0;
}
Yes, it's probably I/O bound, but as we only measure the hashing time, it still gives a hint on the hashing time when the key is not in the set.
And here is the result, using /usr/share/dict/british-english-large as input (fun fact: 13 titans are present in this dictionary):
Engine | Timing (ms) |
---|---|
std | 57 |
gperf | 5 |
frozen | 9 |
So frozen is within the reach of gperf, and both are way faster than the standard solution.
That's still frustrating. Fortunately, we could change the hashing function (djb2 and FNV-1a) generator used by frozen to improve this situation. The default is:
template <> struct elsa<string> {
constexpr std::size_t operator()(string value) const {
std::size_t d = 5381;
for (std::size_t i = 0; i < value.size; ++i)
d = (d * 33) + value.data[i];
return d;
}
constexpr std::size_t operator()(string value, std::size_t seed) const {
std::size_t d = seed;
for (std::size_t i = 0; i < value.size; ++i)
d = (d * 0x01000193) ^ value.data[i];
return d;
}
};
But using the much simpler:
struct olaf {
constexpr std::size_t operator()(frozen::string value) const {
return value.size;
}
constexpr std::size_t operator()(frozen::string value, std::size_t seed) const {
std::size_t d = seed;
std::size_t bound = std::min(value.size, (std::size_t)2);
for (std::size_t i = 0; i < bound; ++i)
d = (d * 0x01000193) ^ value.data[i];
return d;
}
};
constexpr frozen::unordered_set<frozen::string, 33, olaf> Titans = {...};
We get the same timings as gperf \o/. This generator is less robust though, so not suitable as a default.
Conclusion
Programming with constexpr in the post C++14 world is great. It makes it possible to achieve the same job as code generator while only relying on a standard C++ compiler. And in the case of perfect hashing, it gives very interesting result! One note though: on my laptop, using frozen with GCC is currently a pain, because the compiler hangs for no apparent good reason; Let's report this!
Thanks
First, thanks to Steve Hanov for his great blog.
Then thanks to Romain and Adrien for their feedback, and all the Quarkslab proof readers, especially w1gz!