Verifying Raw Pointer Arithmetic Operations
Authors: Surya Togaru, Yifei Wang, Szu-Yu Lee, Mayuresh Joshi
Introduction
Rust is famous for its strong memory safety features and that has made it popular for building reliable and secure systems like operating systems. However, Rust also permits the use of unsafe code blocks for tasks like pointer arithmetic operations which is useful but it can bypass Rust’s safety checks and lead to security issues or bugs.
Our challenge focused on verifying pointer arithmetic in Rust. By this, we ensured that even unsafe code is used correctly which helps prevent vulnerabilities and make Rust applications more reliable and secure.
Problem statement:
AWS collaborating with the Rust Foundation to ensure the safety of Rust’s unsafe constructs using formal verification and automated reasoning. The problem statement we selected for our team was to verify the safety of standard library code that handles pointer arithmetic operations using the Kani verifier (a formal verification tool). By this, we aimed to strengthen trust in Rust’s safety guarantees in large-scale, safety-critical systems.
What is pointer arithmetic?
Pointer arithmetic operations include addition, subtraction and offset which deal with raw pointer manipulation and accessing specific memory locations. Pointer arithmetic is commonly used in applications that require precise control over memory such as operating systems, embedded systems and performance critical systems. If these operations are implemented incorrectly, they can cause serious harm and issues such as out-of-bounds memory access and data corruption/crashes.
What has our team has done?
Our team utilized Kani, a formal verification tool to check the safety of raw pointer arithmetic in Rust. Our achievements are:
- We implemented and verified function contracts for 16 pointer operations, such as
add()
,sub()
, andoffset()
. - We validated these contracts using Kani proofs across five different pointee types: integers, slices, unit, composite and dynamic traits.
This post details our approach, highlights the implementation process, and discusses the challenges encountered while ensuring the safety of pointer arithmetic in Rust.
Challenge Overview
The challenge focuses on formally verifying raw pointer arithmetic functions in Rust’s standard library. It is structured into two parts:
- Safety of pointer arithmetic functions: All unsafe functions provided in the challenge (e.g.,
offset
,byte_add
,offset_from
, etc.) must be annotated with safety contracts, which must be formally verified.- The verification must be done for the following pointee types:
- All integer types
- At least one
dyn Trait
- At least one slice
- For unit type
- At least one composite type with multiple non-ZST fields.
- The verification must be done for the following pointee types:
- Safety of usages: Some functions that utilize raw pointer arithmetic methods in their implementation must be proven safe.
Any proofs written for these functions must ensure the absence of the following undefined behaviors:
- Accessing dangling or misaligned pointers
- Invoking undefined behavior via compiler intrinsics
- Producing an invalid value, even in private fields and locals.
- Performing a place projection that violates the requirements of in-bounds pointer arithmetic.
Implementation
Approach
The implementation addressed the two parts of the challenge as follows:
-
Verification of Pointer Arithmetic Functions: Firstly, we identified
offset
andoffset_from
as foundational to other pointer arithmetic functions, as these two operations form the basis of many related functionalities. Then, we focused on formally verifying the raw pointer arithmetic functions (offset, offset_from, etc.) by specifying and verifying their safety contracts. These contracts captured preconditions and postconditions to prevent undefined behavior. Harnesses were written to verify the safety contracts. These harnesses were designed to handle five distinct pointee types, as per the challenge specifications. -
Verification of Usages: The goal was to ensure that the contracts for these methods were sufficient and that the usage of pointer arithmetic in other functions adhered to the defined safety guarantees. Documentation and analysis of function behavior were essential in defining input space and verifying that the functions operated safely across all expected inputs.
Function Analysis
The analysis began with a thorough review of the functions listed for verification. This involved examining their implementation, studying the official Rust documentation, and understanding their safety requirements. This process was instrumental in identifying potential sources of undefined behavior and clarifying their intended usage. These insights were critical for defining precise and robust safety contracts.
Among the functions analyzed, two stood out as foundational—offset
and offset_from
—due to their central role in enabling other pointer operations:
offset(self, count: isize) -> *const T
: Adds a signed offset to a pointer. The offset is specified by the argumentcount
expressed in units ofT
; e.g., a count of 3 represents a pointer offset of3 * size_of::<T>()
bytes.offset_from(self, origin: *const T) -> isize
: Calculates the distance between two pointers. The returned value is in units of T: the distance in bytes divided bymem::size_of::<T>()
.
Functions like add
and byte_offset_from
heavily rely on these operations. For instance:
add
internally callsoffset
to increment a pointer by a certain offset.byte_offset_from
casts pointers tou8
before invokingoffset_from
.
By initially focusing on the safety of offset
and offset_from
, the verification effort focused on the fundamental components that underpin many related functions. This approach made it easier to define and extend contracts and proofs to dependent functions.
Function Contracts
The preconditions and postconditions for the functions were primarily derived from the safety requirements outlined in the official Rust documentation. For example, the safety requirements stated in the documentation for the offset
function are as follows:
- The offset in bytes,
count * size_of::<T>()
, computed on mathematical integers (without “wrapping around”), must fit in anisize
. - If the computed offset is non-zero, then
self
must be derived from a pointer to some allocated object, and the entire memory range betweenself
and theresult
must be in bounds of that allocated object. In particular, this range must not “wrap around” the edge of the address space.
Preconditions
The above safety requirements lead to the following preconditions:
- If
T
is a zero-sized type, i.e.,size_of::<T>() == 0
, then the computed offset (count * size_of::<T>()
) will always be 0. Thus, both safety checks are inherently satisfied, and no additional validations are required. - For non-zero-sized types,
- The product of
count
andsize_of::<T>()
must not overflowisize
(Safety Requirement #1). - Adding the computed offset (
count * size_of::<T>()
) to the original pointer (self
) must not cause overflow (Safety Requirement #1). - Both the original pointer (self) and the result of self.wrapping_offset(count) must point to the same allocated object (Safety Requirement #2). To support reasoning about provenance of two pointers, the
same_allocation
API was introduced in Kani. This is discussed in detail in the Challenges
- The product of
These preconditions can be translated into code using the #[requires]
attribute as follows:
#[requires(
// Precondition 1
(core::mem::size_of::<T>() == 0) ||
// Precondition 2.1
(count.checked_mul(core::mem::size_of::<T>() as isize)
// Precondition 2.2
.map_or(false, |computed_offset| (self as isize).checked_add(computed_offset).is_some()) &&
// Precondition 2.3
core::ub_checks::same_allocation(self, self.wrapping_offset(count)))
)]
Postconditions
Based on the safety requirements and the function behavior, the following postconditions can be specified:
- If the computed offset is 0, the resulting pointer will point to the same address as the original pointer (
self
). - Otherwise, the resulting pointer will point to an address within the bounds of the allocated object from which the original pointer (
self
) was derived.
These postconditions can be translated into code using the #[ensures]
attribute as follows:
#[ensures(|result|
// Postcondition 1
(self.addr() == (*result).addr()) ||
// Postcondition 2
core::ub_checks::same_allocation(self, *result as *const T)
)]
These preconditions and postconditions align with the safety requirements specified above. Writing contracts for offset
first helped us lay a foundation for verifying other pointer arithmetic functions such as add
, sub
, and others.
Harnesses
Harnesses were written to validate the function contracts for various pointee types. Kani uses these harnesses to formally verify the contracts against diverse test cases, ensuring their correctness and robustness.
Each harness is designed to test the contracts of a specific function. To achieve this, they follow two primary steps:
- Generate the input arguments non-deterministically: The inputs are created to represent various valid and edge-case scenarios without explicitly hardcoding them.
- Invoke the function with these arguments: The function under test is called using the generated inputs, allowing Kani to evaluate whether the preconditions and postconditions hold for all possible inputs.
Example Proof for the offset
Function
#[kani::proof_for_contract(<*const u8>::offset)]
pub fn check_const_add_i8() {
// 200 bytes are large enough to cover all pointee types used for testing
const BUF_SIZE: usize = 200;
let mut generator = kani::PointerGenerator::<BUF_SIZE>::new();
let test_ptr: *const u8 = generator.any_in_bounds().ptr;
let count: isize = kani::any();
unsafe {
test_ptr.offset(count);
}
}
This harness validates the offset
function for pointers of type *const u8
. It ensures that the function adheres to the safety contracts defined (as given in the previous section). The <*const T>::offset
function accepts two arguments: a pointer (*const T
) and an offset (isize
). The proof generates these non-deterministically as follows:
- The
count
variable, of typeisize
, has anArbitrary
trait implemented, enabling the generation of non-deterministic values usingkani::any()
. kani::PointerGenerator
is used to create a pointertest_ptr
, guaranteed to lie within the bounds of the allocated buffer.
The offset
function is called in an unsafe block with the generated test_ptr
and count
.
Kani’s pointer generator can also create pointers with different allocation statuses, such as out-of-bounds, dangling, or null pointers. This was particularly useful in writing harnesses for offset_from
, which require testing pointers with varied allocation statuses (see here).
However, the PointerGenerator
API only supports generating pointers whose pointee types implement the Arbitrary
trait. In other words, any *const T
can be generated as long as T
has the Arbitrary trait implemented and the generator is wide enough for T
. Pointers with integer (*const u32
) or tuple (*const (u16, bool)
) pointee types can be generated but not slice (*const [T]
) or dyn Trait pointee types. To test slice pointers, one can generate a non-deterministic slice from an array and derive a pointer from it, as shown below:
let arr: [u32; 8] = kani::Arbitrary::any_array();
let slice: &[u32] = kani::slice::any_slice_of_array(&arr);
let ptr: *const [u32] = slice;
Currently, an Arbitrary trait hasn’t been implemented for pointers that could support non-deterministic generation of pointers covering the entire address space and different allocation statuses. An issue has been created to track this.
Panicking Proofs
Sometimes, negative verification is necessary. The (#[kani::should_panic]
attribute)[https://model-checking.github.io/kani/reference/attributes.html#kanishould_panic] can be used to specify that a proof harness is expected to panic.
For instance:
// Proof for unit size will panic as offset_from needs the pointee size to be greater than 0
#[kani::proof_for_contract(<*const ()>::offset_from)]
#[kani::should_panic]
pub fn check_const_offset_from_unit() {
let val: () = ();
let src_ptr: *const () = &val;
let dest_ptr: *const () = &val;
unsafe {
dest_ptr.offset_from(src_ptr);
}
}
The offset_from
function being verified in this harness panics if the pointee is ZST. Since unit type ()
is a ZST, the harness is expected to panic. The #[kani::should_panic]
attribute ensures this behavior is correctly tested.
Verifying Usages
Vec::swap_remove
The Vec::swap_remove
method removes an element at a specified index and replaces it with the last element in the vector. The Kani proof ensures:
- The vector’s length decreases by one after the operation.
- The removed element matches the original element at the specified index.
- If the removed index is not the last, the index now contains the last element of the original vector.
- All other elements remain unaffected.
Option::as_slice
The Option::as_slice
method converts an Option
containing a collection into a slice. The proof would validate:
- The result is a valid slice if the
Option
contains a value. - The length of the resulting slice matches the length of the contained collection.
VecDeque::swap
The VecDeque::swap
method swaps two elements at specified indices in a VecDeque
. The Kani proof verifies:
- The elements at the specified indices are swapped correctly.
- All other elements in the
VecDeque
remain unchanged.
Summary of Usage Proofs
These proofs ensure the following:
- For
Vec::swap_remove
, the vector maintains integrity during element removals and replacements. - For
Option::as_slice
, the resulting slice is valid and matches the contained data. - For
VecDeque::swap
, swapped and unaffected elements are verified for correctness.
Challenges
Throughout the project, we faced several technical challenges related to Rust’s specification, Kani’s limitations, and the complex nature of pointer arithmetic operations. Below are some of the key challenges and how we addressed them or how the could be addressed in the future.
1. Ensuring Pointer Stays Within Allocation Bounds
Overview
One of the critical requirements when verifying Rust’s pointer arithmetic operations is ensuring that the result of operations like add
, sub
, offset
, and offset_from
remains within the same memory allocation as the original pointer. This is essential for memory safety, as pointers crossing allocation boundaries can result in undefined behavior.
Why This Problem Matters
When pointer arithmetic crosses allocation boundaries, Rust’s guarantees about pointer provenance and memory safety no longer hold. This could allow pointers to access memory outside of their intended region, potentially leading to security vulnerabilities or crashes. Verifying that arithmetic stays within the same allocation is crucial for upholding Rust’s safety guarantees.
Example Issue:
Consider the following simplified function that performs pointer arithmetic:
unsafe fn add_offset<T>(ptr: *const T, count: usize) -> *const T {
ptr.add(count)
}
Without explicit verification, there is no guarantee that the resulting pointer ptr.add(count)
remains within the same allocation as ptr
. Since ptr
points to a heap-allocated memory, an unchecked count
could result in a pointer out of bound.
The Solution: kani::mem::same_allocation
The kani::mem::same_allocation
API was introduced to make it easier to ensure that pointer arithmetic stays within the same allocation. This API provides a simple and clear way to check if two pointers belong to the same memory allocation. It works with both sized and unsized types (e.g., slices, dyn Trait
), an example usage is as follows:
kani::mem::same_allocation(ptr1, ptr2);
The introduction of kani::mem::same_allocation
significantly simplified contract verification for pointer arithmetic functions. Instead of relying on custom assertions and manual tracking of allocation bounds, now we could use a clean and expressive way to define function contracts.
2. Determining the Necessity of Pointer Alignment in Function Contracts
Overview
When verifying pointer arithmetic functions like add
, sub
, offset
, and offset_from
, a key question arises:
Do the input and output pointers need to be aligned for these operations to be valid?
Initially, it seemed logical to require alignment checks in our function contracts. This assumption stemmed from Rust’s strict alignment rules for dereferencing pointers. However, through our verification efforts, we discovered that alignment is not a necessary requirement for pointer arithmetic itself. This insight was unexpected, as it is not explicitly documented in Rust’s documentation for pointer operations. However, it significantly influenced how we structured our function contracts and defined preconditions.
Example
To illustrate this matter, consider the following example:
let vec1 = vec![1, 2, 3];
let ptr: *const u8 = vec.as_ptr();
let ptr_unaligned = ptr.wrapping_byte_offset(1);
unsafe {ptr_unaligned.add(1);}
Here, ptr_unaligned
is not aligned for type u8
, but the operation add
is valid since it doesn’t dereference the pointer.
Impact on Verification
Once we realized that alignment checks were unnecessary, we refactored the function contracts by removing alignment checks from both preconditions and postconditions. It is now the caller’s responsibility to ensure pointer alignment when dereferencing raw pointers.
3. Handling Function Contract Stubbing with Pointer Return Types
Overview
One key insight of function contracts is their reusability. Once verified, they can stub (replace) the underlying function in other proofs, significantly reducing verification complexity. For a detailed explanation of how this works, refer to this blog post.
However, Kani’s current function contract stubbing mechanism cannot correctly handle functions that return pointers. The issue arises during the function replacement process, where Kani uses kani::any()
to generate a random value representing the function’s return value. Since Kani does not support generating random pointers (as the Arbitrary
trait is not implemented for pointers), this leads to a compilation error.
Example Issue
When we attempt to stub a pointer-returning function, the following code fails to compile:
#[kani::proof_for_contract(<*mut T>::offset)]
fn offset_proof() {
let test_ptr: *mut T = kani::any();
unsafe { test_ptr.offset(1); }
}
Error message:
Error: the trait `Arbitrary` is not implemented for `*mut T`
Solution: Currently, there is no clean workaround for this issue. The temporary solution is to avoid using Kani’s stubbing mechanism for functions with pointer return types. We raised this issue with the Kani development team, and they are tracking it for future support. Related Issue: Kani Issue #3732