benchcomp configuration file

benchcomp's operation is controlled through a YAML file---benchcomp.yaml by default or a file passed to the -c/--config option. This page lists the different visualizations that are available.


A variant is a single invocation of a benchmark suite. Benchcomp runs several variants, so that their performance can be compared later. A variant consists of a command-line argument, working directory, and environment. Benchcomp invokes the command using the operating system environment, updated with the keys and values in env. If any values in env contain strings of the form ${var}, Benchcomp expands them to the value of the environment variable $var.

            command_line: echo "Hello, world"
            directory: /tmp
              PATH: /my/local/directory:${PATH}


After benchcomp has finished parsing the results, it writes the results to results.yaml by default. Before visualizing the results (see below), benchcomp can filter the results by piping them into an external program.

To filter results before visualizing them, add filters to the configuration file.

    - command_line: ./scripts/
    - command_line: cat

The value of filters is a list of dicts. Currently the only legal key for each of the dicts is command_line. Benchcomp invokes each command_line in order, passing the results as a JSON file on stdin, and interprets the stdout as a YAML-formatted modified set of results. Filter scripts can emit either YAML (which might be more readable while developing the script), or JSON (which benchcomp will parse as a subset of YAML).

Built-in visualizations

The following visualizations are available; these can be added to the visualize list of benchcomp.yaml.

Detailed documentation for these visualizations follows.


Scatterplot configuration options


Print Markdown-formatted tables displaying benchmark results

For each metric, this visualization prints out a table of benchmarks, showing the value of the metric for each variant, combined with an optional scatterplot.

The 'out_file' key is mandatory; specify '-' to print to stdout.

'extra_colums' can be an empty dict. The sample configuration below assumes that each benchmark result has a 'success' and 'runtime' metric for both variants, 'variant_1' and 'variant_2'. It adds a 'ratio' column to the table for the 'runtime' metric, and a 'change' column to the table for the 'success' metric. The 'text' lambda is called once for each benchmark. The 'text' lambda accepts a single argument---a dict---that maps variant names to the value of that variant for a particular metric. The lambda returns a string that is rendered in the benchmark's row in the new column. This allows you to emit arbitrary text or markdown formatting in response to particular combinations of values for different variants, such as regressions or performance improvements.

'scatterplot' takes the values 'off' (default), 'linear' (linearly scaled axes), or 'log' (logarithmically scaled axes).

Sample configuration:

- type: dump_markdown_results_table
  out_file: "-"
  scatterplot: linear
    - column_name: ratio
      text: >
        lambda b: str(b["variant_2"]/b["variant_1"])
        if b["variant_2"] < (1.5 * b["variant_1"])
        else "**" + str(b["variant_2"]/b["variant_1"]) + "**"
    - column_name: change
      text: >
        lambda b: "" if b["variant_2"] == b["variant_1"]
        else "newly passing" if b["variant_2"]
        else "regressed"

Example output:

## runtime

| Benchmark |  variant_1 | variant_2 | ratio |
| --- | --- | --- | --- |
| bench_1 | 5 | 10 | **2.0** |
| bench_2 | 10 | 5 | 0.5 |

## success

| Benchmark |  variant_1 | variant_2 | change |
| --- | --- | --- | --- |
| bench_1 | True | True |  |
| bench_2 | True | False | regressed |
| bench_3 | False | True | newly passing |


Print the YAML-formatted results to a file.

The 'out_file' key is mandatory; specify '-' to print to stdout.

Sample configuration:

- type: dump_yaml
  out_file: '-'


Terminate benchcomp with a return code of 1 if any benchmark regressed.

This visualization checks whether any benchmark regressed from one variant to another. Sample configuration:

- type: error_on_regression
  - [variant_1, variant_2]
  - [variant_1, variant_3]
  - metric: runtime
    test: "lambda old, new: new / old > 1.1"
  - metric: passed
    test: "lambda old, new: False if not old else not new"

This says to check whether any benchmark regressed when run under variant_2 compared to variant_1. A benchmark is considered to have regressed if the value of the 'runtime' metric under variant_2 is 10% higher than the value under variant_1. Furthermore, the benchmark is also considered to have regressed if it was previously passing, but is now failing. These same checks are performed on all benchmarks run under variant_3 compared to variant_1. If any of those lambda functions returns True, then benchcomp will terminate with a return code of 1.


Run an executable command, passing the performance metrics as JSON on stdin.

This allows you to write your own visualization, which reads a result file on stdin and does something with it, e.g. writing out a graph or other output file.

Sample configuration:

- type: run_command
  command: ./