Binmap: a system scanner - Quarkslab's blog

Posted Wed 09 March 2016
Authors Fred Raynal, Serge Guelton
Category Programming
Tags tool, programming, 2016

Open sourcing binmap, a tool to scan filesystem and gather intel on which binaries are there, what are their dependencies, which symbols they are using and more. This yields a global view of a system, providing the basic block for building other tools!

TL;DR: we have open sourced a system scanner on github.

Vulnerability research is not only about luck, it's also about strategy

Vulnerability research has changed a lot since grep was the researcher's best friend. Time has been flying these last years, fuzzers has been the new black for quite some time, compilers and OSes enforce mitigations and exploitation has become as much crafting as engineering.

A growing difficulty for everyone interested in vulnerability is the growing complexity of systems. Kernels or applications are a mess when one wants to start digging. Nowadays, researchers have to come with a real strategy to find vulnerability: where to look? For how long? with which tool? To what end, and so on.

One of the important part in the strategy is to decide where to look for. Many parameters come into play e.g. Is it an old codebase? Is it complex (like a parser)? Is it easily reachable? What is the impact of a vulnerability in this component?

We designed binmap to help measuring the impact of a vulnerability. The purpose is to get an overview of a system, which means finding binaries, the libraries they are using, the symbols defined inside each of them.

binmap has been designed to be very modular, to support different file formats and to extract multiple information from the system. It must be very easy to add new file format and to extract its informations. With this map, it is easier to know where to look for a vulnerability, or to measure the consequences of a CVE, like CVE-2015-0235.

With binmap a possible way to get the list of binaries impacted by a CVE is illustrated in the following session, using an old vanilla debchroot:

$ binmap scan /path/to/debchroot -o chroot.dat
$ python
>>> import blobmap
>>> db = blobmap.Blobmap('chroot.dat').last()  # load last scan database
>>> for key in db.keys():  # look for the libc
...   if 'libc' in key:
...     print key
/lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libc-2.17.so
[...]
>>> libc_users = db.predecessors('/lib/x86_64-linux-gnu/libc.so.6')  # get all binaries using the libc
>>> for libc_user in libc_users:
...   metadata = db[libc_user]
...   if 'gethostbyname' in metadata.imported_symbols:
...     print libc_user, 'uses gethostbyname and may be vulnerable'
...   if 'gethostbyname2' in metadata.imported_symbols:
...     print libc_user, 'uses gethostbyname2 and may be vulnerable'
/usr/lib/gnupg/gpgkeys_finger uses gethostbyname and may be vulnerable
/bin/hostname uses gethostbyname and may be vulnerable
/usr/lib/libxapian.so.22.6.3 uses gethostbyname and may be vulnerable
/usr/lib/perl/5.18.1/auto/Socket/Socket.so uses gethostbyname and may be vulnerable
/usr/bin/logger uses gethostbyname and may be vulnerable
/sbin/agetty uses gethostbyname and may be vulnerable
/sbin/getty uses gethostbyname and may be vulnerable
/bin/tar uses gethostbyname and may be vulnerable
/usr/bin/getent uses gethostbyname2 and may be vulnerable

`binmap`: an Overview

In order to find the relationship between the binaries and the libraries, there are 3 situations to handle:

A binary loads a library because they are dynamically linked: binding is made at compilation time, loading is then performed by the system's loader.

A binary loads a library during its execution: a function is explicitly called to load a library at runtime (e.g. dlopen or LoadLibrary).

A library is statically linked in the binary: nor the OS neither the binary itself do anything, the library is embedded in the binary.

Dealing with the first case is quite simple, and binmap support it. The second scenario requires runtime analysis and is not supported by binmap. The 3rd scenario could be achieved statically, but it is much more complex and not supported yet. This is one of the reasons why we open source binmap.

What do we expect with opensourcing `binmap`?

We made a lot of efforts to provide a well-engineered software. We have tested it on real systems but certainly not exhaustively enough. Now, it is time for binmap to reach its full potential. There are many side projects relying on it, and we expect the community to help us to achieve these goals.

System map warehouse

binmap builds a database of hashes and informations for systems. One of our goal is to provide a kind of warehouse with the database for several systems, and update the databases to track the systems as they evolve.

This is very useful when one wants to diff not only a binary but systems as a whole, to see what binaries have changed, which are new or removed.

We intend to provide a warehouse for databases, either created by us or sent by contributors. The files produced by binmap could be piped to gpg to ensure some kind of authentication and integrity of the various databases.

This needs to be automatized, so we need to setup a backend which will download and install an OS, run binmap to create the database, and update it when needed. Certainly, some tools from the devops community could help there.

Having a website accessible with hashes and some more information about clean files is something security community needs. We see it with IRMA, our asynchronous and customizable analysis platform for suspicious files, in order to quickly sort good and bad files.

Supporting static libraries

As explained, that is the biggest challenge. We did not want to go into that before having a proper piece of code. We believe that is ok for now, but this feature will require big brains and manpower, like for instance Silvio Cesare's work on binary similarities.

People working on malware have also the same issue: recognizing small piece of code (from the malware) into other binaries. The problem is the same and known to be hard. By open sourcing binmap, we hope people working on binary similarities will use it to do their research and tests.

Searching the information

Never ask a security guy to design a user interface. Security researchers tend to like it the hard way, but at some point, being smart is the right choice.

Gathering information is not the most difficult here. Presenting it, being able to retrieve the right information is. We have started to develop a web interface for binmap: it loads a database and can be used to explore the graph of relations or search for some functions. We also have an API to use the database and automatize some actions. We hope that people interested in UX will also join to improve it.

Some eye candy with the first draft of our prototype UI :-)

Risk map

When analyzing a system, it is really useful to know where previous bugs are located . By mixing the information about the binaries and libraries used by a reported vulnerability (e.g. CVE), one can draw a risk map , see what components are more prone to attacks and take action to buid a proper defense.

We have started to combine the information gathered by binmap with external sources. We focus on 2 kind of information:

The version of the files: we attempt to extract the version of a file either from the file itself or from its name. Then, we try to compare it with the latest known version from the official website. That way, we can quickly see how up-to-date is a system.

Many vulnerabilities are reported but not all of them are given a CVE. We can still map the name of a file with the known CVEs, for instance using the great CIRCL CVE Search

In the end, the information gathered on the system is compared with external information to learn more about its risks.

How to use it?

First download the code from github.

Installation

Debian/Ubuntu

The following packages are needed:

cmake

g++

libboost-python1.55-dev

libboost-system1.55-dev

libboost-program-options1.55-dev

libboost-filesystem1.55-dev

libboost-regex1.55-dev

libboost-serialization1.55-dev

zlib1g-dev

libssl-dev

libelfg0-dev

Then run:

$ mkdir _build
$ cd _build
$ cmake ..
$ make

Eventually as root:

$ make install

Windows

You need Visual Studio installed & ready. Then:

Install cmake and make sure it's in your path.

Get zlib

Get boost (get precompiled binaries)

Then run something like the following:

$ cmake -DBoost_DEBUG=ON -G "Visual Studio 12" -DBoost_USE_STATIC_LIBS=ON -DBOOST_ROOT=D:\Programming\Libraries\boost_1_55_0 -DBOOST_LIBRARYDIR=D:\Programming\Libraries\boost_1_55_0\lib32-msvc-12.0 -DZLIB_LIBRARY=D:\Programming\Libraries\zlib-1.2.8 -DZLIB_INCLUDE_DIR=D:\Programming\Libraries\zlib-1.2.8

Usage

Using binmap is a two step process:

Scan a directory (or a file), for instance:
```
$ ./binmap scan -v1 /usr/local -o local.dat
```
This creates a database containing informations about the binaries that lie in this directory.
Dump the database to the dot format:
```
$ ./binmap view -i local.dat -o local.dot
```
or inspect the database using the Python API described below.

Python API

The blobmap module gives a read-only access to the content of a binmap database:

>>> import blobmap

First thing to do is to load a database:

>>> blobs = blobmap.BlobMap('local.dat')

A BlobMap is an ordered container of blobs, in chronological order, last being the most recent entry:

>>> blob = blobs.last()

A blob is basically a directed graph, where nodes are binaries and edges represent a use dependency---something like this program depends on this library. It can be indexed by paths, as in:

>>> clang_metadata = blob['/usr/local/bin/clang']
>>> print(str(clang_metadata))
clang: 8fcffc4a97cd4aaa1a32938a9e95d3b253476121(13223 exported symbols)(1303 imported symbols)(1 hardening features)

One can access the metadata for each node independently:

>>> clang_metadata.hash
8fcffc4a97cd4aaa1a32938a9e95d3b253476121
>>> clang_metadata.hardening_features
{'fortified'}
>>> help(clang_metadata)
[...]

The graph can be navigated using the successors and predecessors methods:

>>> blob.successors('/usr/local/bin/clang')
{'/lib/x86_64-linux-gnu/libtinfo.so.5',
 '/lib/x86_64-linux-gnu/libz.so.1',
 '/lib32/libc.so.6',
 ...}

It's also possible to make a diff between two blob, in order to gather intel concerning the changes of state of a system:

>>> from blobmap import BlobMap as BM
>>> b = BM('mynewprog.dat')
>>> g1, g0 = [b[k] for k in b.keys()][-2:]
>>> diff = g0.diff(g1)
>>> diff.added
{'/.../libmy1.so'}
>>> diff.removed
{'/.../libmy0.so'}
>>> diff.updated
{'/.../myprog'}

Final words

From a simple idea, binmap leads to many side projects. Unfortunately, we are not big enough yet to develop everything by ourselves. We'd like you to use it, test it, improve it, and in the end, make all systems safer (or at least, know where the risks are).

We believe security must be simple. binmap is a simple tool, it does a few things, not everything. But based on these small building blocks, greater projects can be achieved. Let's move one step at a time. The projects we want to keep working on can be built on solid ground now, hopefully with your help.

Contact: binmap-dev(AT)quarkslab.com

If you would like to learn more about our security audits and explore how we can help you, get in touch with us!

Table of contents