Tools
=====
We describe here the different tools used by SeaCoral, and how it intergates them as
part of its generation process. When a tool operates on the inputs that have
been prepared by SeaCoral, it uses external files that can be found in `sc_drivers`
at the root of the project.
SeaCoral works in three successive phases: preprocessing, test generation, and post
processing.
Preprocessing
-------------
LAnnotate
~~~~~~~~~
SeaCoral involves a tool for preprocessing files: `frama-c-lannotate`. This is a
`frama-c `_ plug-in that belongs to the `LTest
`_ suite.
Its purpose is to add `labels` to the analyzed functions (check :ref:`the
Usage section` for an example). Several annotation modes are available;
use `frama-c -lannot-list` to list them.
Test runner
~~~~~~~~~~~
You can give SeaCoral your own set of tests with the test runner extension,
configurable with the `[test_runner] custom_tests = [file1.c,...]` in the toml
:ref:`configuration` table. If custom tests are provided, the
test runner will compile the project (annotated by lannotate), execute each
input and check which labels are covered.
Generating tests
----------------
There are four main tools for generating test cases or proving
unreachability of `pclabels`: `klee`, `libfuzzer`, `cbmc` and
`luncov`.
Klee
~~~~
`Klee `_ is a dynamic symbolic execution engine built
on top of the LLVM compiler infrastructure.
Two external files are associated with `klee`: `klee.h` and `klee_driver.h`.
.. We should remove this file from the project and add the proper include
directory.
- `klee.h` gathers the definitions of `klee` primitives used by the project.
It comes from the main `klee` repository.
- `klee_driver.h` gathers the definitions for `pc_label` as well as memory
initialization macros.
Klee operates on a harness that SeaCoral generates based on the input source code.
The purpose of the harness is to properly initialize each function arguments and
uninitialized global using `klee_make_symbolic` (with proper treatment of
pointers), and then call the studied function (aka. "entrypoint"). The
reachability of a `pclabel` at location :math:`l` and with condition expression
:math:`e` is encoded as follows: if :math:`l` is reached and :math:`e` holds,
then `klee_assert(0)` is called. This forces klee to raise an error stating
that the assertion has been violated, and to return the test corresponding to
the failed assertion. SeaCoral retreives the test that covers the `pclabel`, and
registers it as a new test; at the same time, SeaCoral also replays the test using a
process that does not fail when the assertion is reached: this allows SeaCoral to
check whether further labels are covered by the newly registered test.
NB: `klee` only generates new tests: it cannot prove the unreachability of
a `pclabel`.
Libfuzzer
~~~~~~~~~
`libfuzzer `_ is a fuzzing utility
that ships with any recent version of `clang `_. A
fuzzer searches the space of all test inputs for "interesting" items. In
general, the search is performed via selection, random mutations, and crossovers
of given or generated inputs. An input :math:`x` is deemed "interesting", and
thus kept and more susceptible to be selected for mutations, when the execution
of the tested function on :math:`x` increases a specific coverage criterion.
The latter is typically measured by means of an instrumentation of the tested
code, and does not directly correspond the kind of criteria considered in SeaCoral.
NB: `libfuzzer` only generates new tests: it cannot prove the unreachability of
a `pclabel`.
Test Harness & Structured Data
..............................
In SeaCoral, `libfuzzer` operates by linking an automatically generated test harness
with an instrumented version of the tested code. Each input given and generated
by the fuzzer solely consists of an array of bytes, that must then be fed to a
deterministic decoder to produce the structured data that the tested function
expects for each one of its arguments (and global variables). The decoding
algorithm employed allows SeaCoral to interpret generated tests and reuse them as
seeds for any other tool that may be able to exploit them, such as `klee`.
Note that, whereas every possible input data-structure (currently subject to the
assumption that there is no sharing of pointers, i.e. there can be no aliasing
between pointer arguments, and they may only point to tree-like data-structures)
can be encoded into an array of bytes, some byte vectors may lead to
inconsistent data-structures and thus need to be filtered out. SeaCoral's use of
`libfuzzer` integrates dedicated filtering phases for this purpose.
Corpus Initialization
.....................
Fuzzers require an initial corpus of *test inputs* to operate properly. In SeaCoral,
these first inputs can either be randomly generated, or come from tests already
generated in the project. The size of this initial input corpus is determined
by the `libfuzzer.init_corpus_size` configuration entry, and defaults to 1.
When an insufficient amount of existing tests exists when libfuzzer starts, this
corpus is completed by using a vector of null bytes (if the configuration entry
`libfuzzer.init_corpus_with_nulls` holds), and then a set of vectors of bytes
chosen uniformly at random.
Fuzzing Process
...............
The initial corpus is first filtered to deal with sets of inputs that induce
similar coverage, and eliminate inputs that correspond to inconsistent
data-structures or do not satisfy pre-conditions.
Then the search for new inputs happens via the fuzzing loop, for some given
total time (`libfuzzer.timeout`) and/or number of iterations (`libfuzzer.runs`).
Focus on Labels
...............
Eventually, a second filtering phase is used to filter out uneeded inputs. At
this stage, an optional tweak is available, that forces `libfuzzer` to only
consider the coverage of `pclabels` when deciding whether to keep an input in
the corpus. Indeed, as mentioned above, coverage criteria that are directly
measured by fuzzers do not directly corresond to coverage of `pclabels`. One
consequence of this mismatch is that `libfuzzer` may generate more test inputs
than necessary to satisfy SeaCoral's `pclabel`-based coverage criterion. To counter
this effect, SeaCoral offers the Boolean option `labels_only`, that makes this last
filtering phase keep only those inputs that cover at least one `pclabels` that
is not already covered by another known or generated input.
CBMC
~~~~
`CBMC `_ stands for "C Bounded Model Checking".
This is a static analysis tool for analyzing C programs.
There are three different execution modes for `cbmc`. They all generate similar
harnesses for initializing variables through the
`goto-cc `_ and
`goto-harness `_ tools.
The three modes only differ in how the `pclabel` s are defined and how `cbmc` is
called.
- CBMC cover (cbmc)
This options works with the `cover` mode of `cbmc`.
It uses the external file `cbmc_cover_driver.h`.
Each `pc_label` is defined as `__CPROVER_cover(expr)` and `CBMC` called with the `--cover` option.
`cbmc` then attempts to cover as many `pclabel` s as possible and returns a list of test cases.
NB: this mode only generates new tests: it cannot prove the unreachability of
a `pclabel`.
- CBMC with assertions (cbmc-assert)
This options works with the classic execution mode of `cbmc`.
It uses the external file `cbmc_assert_driver.h`.
Each `pc_label(p)` is defined as an assertion of the negation of `p` (`__CPROVER_assert (!p)`).
- If there exist a counter example to this assertion, it means that `p` is reachable.
- If the assertion `!p` is proven by `cbmc`, then `p` never is true: the `pclabel` is unreachable.
- CBMC with C labels (cbmc-clabel)
This has the same behaviour than `cbmc-assert` except it replaces `assertions` by
conditions leading to C labels; `cbmc` is then called to check if the C labels are
reachable or not.
LUncov
~~~~~~
Preamble: the current version of the LUncov plug-in does not allow it to handle
tasks involving complex pointer manipulation.
Part of the `LTest `_ suite, `luncov` is a `frama-c` plug-in
that perform static analyzes on a project and try to deduce uncoverability of `pclabel` s.
The analyzes are done by `EVA`, an abstract intepreter, and `WP`, a weakest precondition
calculator.
SeaCoral does not perform preprocessing for `luncov`, as `lannot` already prepares the project
for it to work as is.
NB: `luncov` does not generate new tests: it can only prove the unreachability of
`pclabel` s.
Postprocessing
--------------
Once tests are generated, SeaCoral provides two coverage report utilities: `lreplay` and
`e-acsl`.
LReplay
~~~~~~~
Part of the `LTest `_ suite, `lreplay` is a
standalone executable that will compile and execute a set of tests, and register
which labels are reached in practice.
E-ACSL
~~~~~~
`E-ACSL `_ is a `frama-c` plug-in
that translates an annotated C program into another program that detects the
violated annotations at runtime. The annotations language used by `e-acsl` is
`ACSL `_.
SeaCoral can use `e-acsl` to verify that `ACSL` annotations associated to a function
are satisfied by the testcases generated during the previous phase.