Challenge 21: Verify the safety of substring-related functions in str::pattern
- Status: Open
- Tracking Issue: #278
- Start date: 2025-03-07
- End date: 2025-10-17
- Reward: 25000 USD
Goal
Verify the safety of StrSearcher implementation in str::pattern.
Motivation
String and str types are widely used in Rust programs, so it is important that their associated functions do not cause undefined behavior.
Description
The following str library functions are generic over the Pattern trait (https://doc.rust-lang.org/std/str/pattern/trait.Pattern.html):
containsstarts_withends_withfindrfindsplitsplit_inclusiversplitsplit_terminatorrsplit_terminatorsplitnrsplitnsplit_oncersplit_oncermatchesmatch_indicesrmatch_indicestrim_matchestrim_start_matchesstrip_prefixstrip_suffixtrim_end_matchesThese functions accept a pattern as input, then call into_searcher to create a Searcher for the pattern. They use thisSearcherto perform their desired operations (split, find, etc.). Those functions are implemented in (library/core/src/str/mod.rs), but the core of them are the searching algorithms which are implemented in (library/core/src/str/pattern.rs).
Assumptions
Important note: for this challenge, you can assume:
- The safety and functional correctness of all functions in
slicemodule. - That all functions in (library/core/src/str/validations.rs) are functionally correct (consistent with the UTF8 encoding description in https://en.wikipedia.org/wiki/UTF-8).
- That all the Searchers in (library/core/src/str/iter.rs) are created by the into_searcher(_, haystack) with haystack is a valid utf8 string (str). You can assume any utf8 string property of haystack.
Verify the safety of the functions in (library/core/src/str/pattern.rs) listed in the next section.
The safety properties we are targeting are:
- There is no UB happens when calling the functions after the Searcher is created.
- The impls of unsafe traits
SearcherandReverseSearchersatisfies the SAFETY condition stated in the file:
/// The trait is marked unsafe because the indices returned by the
/// [`next()`][Searcher::next] methods are required to lie on valid utf8
/// boundaries in the haystack. This enables consumers of this trait to
/// slice the haystack without additional runtime checks.
This property should hold for next_back() of ReverseSearcher too.
Success Criteria
Verify the safety of the following StrSearcher functions in (library/core/src/str/pattern.rs):
nextnext_matchnext_backnext_match_backnext_rejectnext_back_reject
The verification is considered successful if you can specify a condition (a "type invariant") C and prove that:
- If the
StrSearcheris created from any valid UTF-8 haystack, it satisfiesC. - If the
StrSearchersatisfiesC, it ensures the two safety properties mentioned in the previous section. - If the
StrSearchersatisfiesC, after it calls any function above and gets modified, it still satisfiesC.
Furthermore, you must prove the absence of undefined behaviors listed in the next section.
The verification must be unbounded---it must hold for inputs of arbitrary size.
List of UBs
All proofs must automatically ensure the absence of the following undefined behaviors ref:
- Accessing (loading from or storing to) a place that is dangling or based on a misaligned pointer.
- Reading from uninitialized memory except for padding or unions.
- Mutating immutable bytes.
- Producing an invalid value
Note: All solutions to verification challenges need to satisfy the criteria established in the challenge book in addition to the ones listed above.