Proof assumptions
A proof assumption is anything that limits the applicability of a statement that has been otherwise proved by CBMC. An example of a proof assumptions is the assumption that packets coming off a network are of size at most 1024 bytes. This proof assumption would arise if the proof harness for a function modeled an incoming packet as any character buffer of size at most 1024 characters. Suppose that we run CBMC with this proof harness, and CBMC is unable to find any memory safety errors. We might be tempted to say that CBMC has proved the function is memory safe. But the truth is that CBMC has proved the function is memory safe only when the function is invoked on a packet of size at most 1024 characters. We believe that such a proof has value, and that hard-to-find software issues will be discovered in the course of writing such a proof, but the result is not absolute proof of memory safety.
Just to clear, every proof in mathematics depends on proof assumptions.
In number theory, a theorem may hold for all integers n
, assuming n
is nonzero. The theorem is powerful because it is applicable to nearly
every integer. In cryptography, a theorem may hold only if n
is the product
of two prime numbers p
and q
of roughly the same size. The theorem
may still have powerful applications, but it applies only to a much smaller
set of integers.
Proof assumptions must be clearly and accurately communicated to any proof reader and to any consumer of the proof. Doing this right may be the hardest part of proof. It is particularly important in the context of software proof, however, because the credibility of the proof tool and the methodology for using the tool depends on getting this right. Trust is hard to earn and easy to lose.
The two most common sources of proof assumptions are memory allocation and function abstraction.
Memory allocation
There are two common proof assumptions about memory.
Size
One class of assumption is size. Some are obvious. If the proof harness includes
size_t size;
__CPROVER_assume(size < 100);
char *buffer = (char *) malloc(size);
then the buffer is assumed to be of size smaller than 100. Some are implicit. If the proof harness includes
size_t size;
char *buffer = (char *) malloc(size);
then the buffer is implicitly bounded to the size of the maximum object that CBMC can model (which is probably smaller than the largest size representable in a 64-bit integer, see our discussion of the memory model for more information).
Invariants
One class of assumption is well-formedness or implicit data structure invariants. Suppose the proof harness includes
typedef struct {
size_t size;
char *buffer;
} buffer_t;
buffer_t *buffer = (buffer_t *) malloc(sizeof(*buffer));
buffer->buffer = (char *) malloc(buffer->size);
then the proof harness is implicitly assuming that the pointer buffer->buffer
is either NULL or a valid pointer to a buffer of size buffer-size
. This
is implicitly assuming an data structure invariant about buffer
.
Is this really something you want to assume about a buffer you receive as
input from an untrusted adversary. Is this really something want to assume
even when buffer->size == 0
?
Function abstraction
There are two common kinds of function abstraction with CBMC.
Missing functions
One is implicit function abstraction done automatically by CBMC.
When CBMC encounters
the invocation of a function missing
and cannot find a function definition
for missing
, CBMC issues a warning "no function body for missing
" and
models the function missing
as a function that
- has no side-effects (makes no change to the state), and
- returns an arbitrary, unconstrained value of the appropriate return type.
This assumption of no side-effects is a serious assumption. Almost no
functions in C are purely functional. On the other hand, it is
an easy way to model something like a random number generator returning
a unint64_t
used in some cryptographic code under test.
Including the actual function definition of the random number generator
will probably be too much for CBMC, but by leaving the function
undefined (omitting the code defining the random number generator
from the build), CBMC will consider the impact on the property being
testing of every unsigned, 64-bit integer value that can possibly
be returned by the number generator.
Stubs
One is explicit function abstraction written by the proof author for the proof.
Consider a cryptographic key generator
int key_generator(void *context, public_key_t *public, private_key_t *private)
As with the random number generator, including the code for a cryptographic key generator may be too much for CBMC. Reconstructing cryptographic keys is, after all, designed to be intractable!
A reasonable stub for the key generator is:
int key_generator(void *context, public_key_t *public, private_key_t *private)
{
__CPROVER_havoc_object(public);
__CPROVER_havoc_object(private);
int rc;
return rc;
}
The __CPROVER_havoc_object
function fills the objects pointed to by public
and private
with arbitrary, unconstrained data.
This stub has the pleasing effect of considering possible every value
(well-formed or not) that could be written to public
and private
by the generator,
and considering possible every integer value that could be used as a
return code.
This stub is, however, making the implicit assumption that the pointers are valid. A better stub would check that the function is being called correctly:
int key_generator(void *context, public_key_t *public, private_key_t *private)
{
__CPROVER_assert(context, "context is nonnull");
__CPROVER_assert(public, "public is nonnull");
__CPROVER_assert(private, "private is nonnull");
__CPROVER_havoc_object(public);
__CPROVER_havoc_object(private);
int rc;
return rc;
}