benchcomp command line

benchcomp is a single command that runs benchmarks, parses their results, combines these results, and emits visualizations. benchcomp also provides subcommands to run these steps individually. Most users will want to invoke benchcomp in one of two ways:

  • benchcomp without any subcommands, which runs the entire performance comparison process as depicted below
  • benchcomp visualize, which runs the visualization step on the results of a previous benchmark run without actually re-running the benchmarks. This is useful when tweaking the parameters of a visualization, for example changing the threshold of what is considered to be a regression.

The subcommands run and collate are also available. The diagram below depicts benchcomp's order of operation.

Gcluster_runbenchcomp runcluster_collatebenchcomp collatecluster_vizualizebenchcomp visualizesuite_1asuite_1out_1aoutputfilessuite_1a->out_1arun withvariant asuite_1a_yamlsuite_1a.yamlout_1a->suite_1a_yamlsuite_1_parser.pyresult_yamlresult.yamlsuite_1a_yaml->result_yamlsuite_1bsuite_1out_1boutputfilessuite_1b->out_1brun withvariant bsuite_1b_yamlsuite_1b.yamlout_1b->suite_1b_yamlsuite_1_parser.pysuite_1b_yaml->result_yamlsuite_2csuite_2out_2coutputfilessuite_2c->out_2crun withvariant asuite_2c_yamlsuite_2c.yamlout_2c->suite_2c_yamlsuite_2_parser.pysuite_2c_yaml->result_yamlsuite_2dsuite_2out_2doutputfilessuite_2d->out_2drun withvariant bsuite_2d_yamlsuite_2d.yamlout_2d->suite_2d_yamlsuite_2_parser.pysuite_2d_yaml->result_yamlviz_1graph.svgresult_yaml->viz_1viz_2summary.mdresult_yaml->viz_2viz_3exit 1 onregressionresult_yaml->viz_3

Running benchcomp invokes run, collate, and visualize behind the scenes. If you have previously run benchcomp, then running benchcomp visualize will emit the visualizations in the config file using the previous result.yaml.

In the diagram above, two different suites (1 and 2) are both run using two variants---combinations of command, working directory, and environment variables. Benchmark suite 2 requires a totally different command line to suite 1---for example, suite_1 might contain Kani harnesses invoked through cargo kani, while suite_2 might contain CBMC harnesses invoked through run_cbmc_proofs.py. Users would therefore define different variants (c and d) for invoking suite_2, and also specify a different parser to parse the results. No matter how different the benchmark suites are, the collate stage combines their results so that they can later be compared.

Example config file

Users must specify the actual suites to run, the parsers used to collect their results, and the visualizations to emit in a file called benchcomp.yaml or a file passed to the -c/--config flag. The next section describes the schema for this configuration file. A run similar to the diagram above might be achieved using the following configuration file:

# Compare a range of Kani and CBMC benchmarks when
# using Cadical versus the default SAT solver

variants:
  variant_a:
    config:
      directory: kani_benchmarks
      command_line: scripts/kani-perf.sh
      env: {}

  variant_b:
    config:
      directory: kani_benchmarks
      command_line: scripts/kani-perf.sh
      # This variant uses a hypothetical environment variable that
      # forces Kani to use the cadical SAT solver
      env:
        KANI_SOLVER: cadical

  variant_c:
    config:
      directory: cbmc_benchmarks
      command_line: run_cbmc_proofs.py
      env: {}

  variant_d:
    config:
      directory: cbmc_benchmarks
      command_line: run_cbmc_proofs.py
      env:
        EXTERNAL_SAT_SOLVER: cadical

run:
  suites:
    suite_1:
      parser:
        module: kani_perf
      variants: [variant_a, variant_b]

    suite_2:
      parser:
        module: cbmc_litani_parser
      variants: [variant_c, variant_d]

visualize:
  - type: dump_graph
    out_file: graph.svg

  - type: dump_markdown_results_table
    out_file: summary.md
    extra_columns: []

  - type: error_on_regression
    variant_pairs:
    - [variant_a, variant_b]
    - [variant_c, variant_d]