(Experimental) Testing with a Large Number of Repositories
This section explains how to run Kani on a large number of crates downloaded from git forges. You may want to do this if you are going to test Kani's ability to handle Rust features found in projects out in the wild.
For the first half, we will explain how to use data from crates.io to pick targets. Second half will explain how to use a script to run on a list of selected repositories.
Picking Repositories
In picking repositories, you may want to select by metrics like popularity or by the presence of certain features. In this section, we will explain how to select top ripostes by download count.
We will use the db-dump
method of getting data from crates.io as it
is zero cost to their website and gives us SQL access. To start, have
the following programs set up on your computer.
- docker
- docker-compose.
- Start PostgreSQL. Paste in the following yaml file as
docker-compose.yaml
.version: '3.3'
may need to change.
version: '3.3'
services:
db:
image: postgres:latest
restart: always
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
volumes:
- crates-data:/var/lib/postgresql/data
logging:
driver: "json-file"
options:
max-size: "50m"
volumes:
crates-data:
driver: local
Then, run the following to start the setup.
docker-compose up -d
Once set up, run docker ls
to figure out the container's name. We
will refer to the name as $CONTAINER_NAME
from now on.
-
Download actual data from crates.io. First, run the following command to get a shell in the container:
docker exec -it --user postgres $CONTAINER_NAME bash
. Now, run the following to grab and install the data into the repository. Please note that this may take a while.wget https://static.crates.io/db-dump.tar.gz tar -xf db-dump.tar.gz psql postgres -f */schema.sql psql postgres -f */import.sql
-
Extract the data. In the same docker shell, run the following to extract the top 1k repositories. Other SQL queries may be used if you want another criteria
\copy (SELECT name, repository, downloads FROM crates WHERE repository LIKE 'http%' ORDER BY DOWNLOADS DESC LIMIT 1000) to 'top-1k.csv' csv header;
-
Clean the data. The above query will capture duplicates paths that are deeper than the repository. You can clean these out.
- URL from CSV:
cat top-1k.csv | awk -F ',' '{ print $2 }' | grep -v 'http.*'
- Remove long paths:
sed 's/tree\/master.*$//g'
- Once processed, you can dedup with
sort | uniq --unique
- URL from CSV:
Running the List of Repositories
In this step we will download the list of repositories using a script assess-scan-on-repos.sh
Make sure to have Kani ready to run. For that, see the build instructions.
From the repository root, you can run the script with
./scripts/exps/assess-scan-on-repos.sh $URL_LIST_FILE
where
$URL_LIST_FILE
points to a line-delimited list of URLs you want to
run Kani on. Repositories that give warnings or errors can be grepping
for with "STDERR Warnings" and "Error exit in" respectively.