(Experimental) Testing with a Large Number of Repositories

This section explains how to run Kani on a large number of crates downloaded from git forges. You may want to do this if you are going to test Kani's ability to handle Rust features found in projects out in the wild.

For the first half, we will explain how to use data from crates.io to pick targets. Second half will explain how to use a script to run on a list of selected repositories.

Picking Repositories

In picking repositories, you may want to select by metrics like popularity or by the presence of certain features. In this section, we will explain how to select top ripostes by download count.

We will use the db-dump method of getting data from crates.io as it is zero cost to their website and gives us SQL access. To start, have the following programs set up on your computer.

  • docker
  • docker-compose.
  1. Start PostgreSQL. Paste in the following yaml file as docker-compose.yaml. version: '3.3' may need to change.
version: '3.3'
services:
  db:
    image: postgres:latest
    restart: always
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
    volumes:
      - crates-data:/var/lib/postgresql/data
    logging:
      driver: "json-file"
      options:
        max-size: "50m"
volumes:
  crates-data:
    driver: local

Then, run the following to start the setup.

docker-compose up -d

Once set up, run docker ls to figure out the container's name. We will refer to the name as $CONTAINER_NAME from now on.

  1. Download actual data from crates.io. First, run the following command to get a shell in the container: docker exec -it --user postgres $CONTAINER_NAME bash. Now, run the following to grab and install the data into the repository. Please note that this may take a while.

    wget https://static.crates.io/db-dump.tar.gz
    tar -xf db-dump.tar.gz
    psql postgres -f */schema.sql
    psql postgres -f */import.sql
    
  2. Extract the data. In the same docker shell, run the following to extract the top 1k repositories. Other SQL queries may be used if you want another criteria

    \copy
    (SELECT name, repository, downloads  FROM crates
    WHERE repository LIKE 'http%' ORDER BY DOWNLOADS DESC LIMIT 1000)
    to 'top-1k.csv' csv header;
    
  3. Clean the data. The above query will capture duplicates paths that are deeper than the repository. You can clean these out.

    • URL from CSV: cat top-1k.csv | awk -F ',' '{ print $2 }' | grep -v 'http.*'
    • Remove long paths: sed 's/tree\/master.*$//g'
    • Once processed, you can dedup with sort | uniq --unique

Running the List of Repositories

In this step we will download the list of repositories using a script assess-scan-on-repos.sh

Make sure to have Kani ready to run. For that, see the build instructions.

From the repository root, you can run the script with ./scripts/exps/assess-scan-on-repos.sh $URL_LIST_FILE where $URL_LIST_FILE points to a line-delimited list of URLs you want to run Kani on. Repositories that give warnings or errors can be grepping for with "STDERR Warnings" and "Error exit in" respectively.