Benchmarking SAT solvers in Go

In this article, I will present an overview of benchmarking SAT solvers in Go. We will take a look at gini and gophersat on randomly selected sat competition problems from previous years.

Background

The sat problem is a canonical problem for the easiest intractable complexity class of problems, the famous NP complete complexity class. NP complete problems are characterised as problems for which verifying whether or not a solution is correct is efficient, but for which many such potential solutions exist.

The Go tool suite comes with many performance aids, such as profiling and micro benchmarks. However, we have found that these tools are, perhaps with good reason [1], inappropriate for benchmarking hard combinatorial problems; they serve well to benchmark a unit test or library call, but they do not attempt to measure the bottom line in solvers for hard problems.

These notions are completely different. In the solver paradigm, we are focused on how many problems can be solved from a large set of problems in a given time. In micro-benchmark, subroutine problems, we are interested in the throughput of a benchmarked subroutine. Moreover, for competition problems, a majority of the problems will be sufficiently difficult to challenge a majority of the competitors, so the problems are notably harder than many application domain problems.

Tooling

Gini comes with a tool for benchmarking hard combinatorial problems, called bench. Hard combinatorial problems are generally solved using heuristics and so they have enormous variance in runtime when applied to different problems.

Here is its usage:

bench -h
bench <cmd> [options] arg arg ...
<cmd> may be
	sel
	run
	cmp
For help with a command, run bench <cmd> -h.

bench can be used to select benchmarks, to run different solvers on those benchmarks, and to compare the results.

We have collected 3 sets of SAT solver benchmarks using “bench sel” and placed them at https://github.com/go-air/satbenchmarks.

For example, the repository https://github.com/irifrance/suites contains some randomly selected SAT competition problems from previous years.

We cloned this repository and then used bench to randomly select 35 problems from them.

git clone https://github.com/irifrance/suites
bench sel -n 35 -name bench.hard -pattern "*.cnf*"  suites

Similarly, we created a suite of 25 easy SAT problems in bench.easy and 100 medium problems in bench.medium.

Running the benchmarks

The bench tool runs and records the results of a benchmark for a benchmark suite.

bench run -h
run [runoptions] suite [ suite [ suite ... ] ]
	run runs commands on benchmark suites enforcing timeouts and
	recording results.
  -cmd string
    	command to run on each instance
  -commit string
    	commit id of command
  -d string
    	delete the run
  -desc string
    	description of run.
  -dur duration
    	max per-instance duration (default 5s)
  -gdur duration
    	max run duration (default 1h0m0s)
  -name string
    	name of the run (default "run")

The bench tool is a bit taylored to SAT solvers, which exit code 10 on satisfiable and code 20 on unsatisfiable, so that benchmarking tools don’t need to parse different outputs. The “cmd” flag expects a command which follows this (unconvential by unix standards) convention. Gini comes with a flag which implements this.

The “dur” flag gives a runtime duration per-instance. The “gdur” flag gives a total timeout, which can be useful if one expects a large proportion of a suite to be solved in a given time and does not want the global duration to equal the number of instances times the per-instance duration.

Gini

Gini was run with ‘bench run -cmd “gini -satcomp -timeout 3m”’. The timeout passed to -cmd needs to be greater than the per instance timeout that bench imposes.

Gophersat

We took Gophersat at commit 72b19f5 and modified it as follows to fit the sat competition convention

diff --git a/main.go b/main.go
index f08b30c..806662f 100644
--- a/main.go
+++ b/main.go
@@ -66,7 +66,21 @@ func main() {
 			} else if count {
 				countModels(pb, verbose)
 			} else {
-				solve(pb, verbose, cert, printFn)
+				_ = printFn
+				solve(pb, verbose, cert, func(c chan solver.Result) {
+					res, ok := <-c
+					if !ok {
+						os.Exit(0)
+					}
+					switch res.Status {
+					case solver.Sat:
+						os.Exit(10)
+					case solver.Unsat:
+						os.Exit(20)
+					default:
+						os.Exit(0)
+					}
+				})

We used the following wrapper script to take care of the .bz2 inputs

#!/bin/sh

f=`mktemp`
gzip -d -c $@ > $f.cnf
exec gophersat $f.cnf

Exploring the results

The bench tool facilitates exploring results with the cmp verb

bench cmp -h
cmp [cmp options] suite
  -cactus
    	cactus plot
  -list
    	list all instances in all runs.
  -runs string
    	comma separated list of run globs (default "*")
  -sat
    	list only sat instances in runs.
  -scatter
    	scatter plot of run pairs
  -sum
    	suite summary. (default true)
  -unsat
    	list only unsat instances in runs.

Let’s look at our suites.

Easy

scott@pavillion benchblog % bench cmp bench.easy

Suite bench.easy
-----------------------------------------------------------------------------------------------------------
| Run                  | solved   | sat      | unsat     | unknown |  time      | utime      | stime      |
-----------------------------------------------------------------------------------------------------------
| gini.1m              | 25       | 6        | 19        | 0       |  60.97  s  | 59.90  s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gophersat.1m         | 25       | 6        | 19        | 0       |  50.12  s  | 49.73  s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------

Let’s take a look at a scatter plot (in the terminal :)

scott@pavillion benchblog % bench cmp -scatter bench.easy

Suite bench.easy
-----------------------------------------------------------------------------------------------------------
| Run                  | solved   | sat      | unsat     | unknown |  time      | utime      | stime      |
-----------------------------------------------------------------------------------------------------------
| gini.1m              | 25       | 6        | 19        | 0       |  60.97  s  | 59.90  s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gophersat.1m         | 25       | 6        | 19        | 0       |  50.12  s  | 49.73  s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
bench.easy: gophersat.1m v gini.1m
                ★                                                             /
                                                                            /
                                                                          /
                                                                        /
                                                                      /
                                                                    /
                                                                  /
                                                                /
                                                              /
                                                            /
                                                          /
                                                        /
                                                      /
                                                    /
                                                  /
                                                /
                                              /
                                            /
                                          /
                                        /
                                      /
                                    /
                                  /
                                /
                  ★           /
                            /
                          /
                        /
                      /
                    /
                  /                                   ☆
                / ☆
              /
            /
          /
    ★   /
      /
    ★
★ /   ☆
★

--------------------------------------------------------------------------------
	★ - gophersat.1m wins
	☆ - gini.1m wins

We can see that both sides had an outlier but gophersat manages to finish overall earlier than gini.

Medium

scott@pavillion benchblog % bench cmp bench.medium

Suite bench.medium
-----------------------------------------------------------------------------------------------------------
| Run                  | solved   | sat      | unsat     | unknown |  time      | utime      | stime      |
-----------------------------------------------------------------------------------------------------------
| gini.30s             | 65       | 41       | 24        | 35      |  1353.65s  | 1322.35s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gophersat.30s        | 55       | 32       | 23        | 45      |  1890.92s  | 1593.96s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------

Let’s look at a scatter plot.

scott@pavillion benchblog % bench cmp -scatter bench.medium

Suite bench.medium
-----------------------------------------------------------------------------------------------------------
| Run                  | solved   | sat      | unsat     | unknown |  time      | utime      | stime      |
-----------------------------------------------------------------------------------------------------------
| gini.30s             | 65       | 41       | 24        | 35      |  1353.65s  | 1322.35s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gophersat.30s        | 55       | 32       | 23        | 45      |  1890.92s  | 1593.96s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
bench.medium: gophersat.30s v gini.30s
                                                                              /
                                                                            /
                                                                          /
                                                                        /
                                                                      /
                                                                    /
                                                                  /
                                                                /
                                                              /
                                                            /
                                                          /
                                                        /
                                                      /
                                                    /
                                                  /
                                                /
                                              /
                                            /
                                          /
                                        /
                                      /
                                    /
                                  /
                                /
                              /
                            /
                          /
                        /
                      /
                    /
                  /
                /
              /
            /
          /
  ★     ☆                                                                     ☆
      /
    / ☆ ☆
★ ★ ☆   ☆
★ ☆ ☆   ☆

--------------------------------------------------------------------------------
	★ - gophersat.30s wins
	☆ - gini.30s wins

Hard

scott@pavillion benchblog % bench cmp bench.hard

Suite bench.hard
-----------------------------------------------------------------------------------------------------------
| Run                  | solved   | sat      | unsat     | unknown |  time      | utime      | stime      |
-----------------------------------------------------------------------------------------------------------
| gini.1m              | 7        | 3        | 4         | 28      |  1735.00s  | 1719.91s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gini.5s              | 6        | 3        | 3         | 29      |  154.60 s  | 149.90 s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gophersat.1m         | 7        | 3        | 4         | 28      |  1853.46s  | 1821.00s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------
| gophersat.5s         | 5        | 2        | 3         | 30      |  158.27 s  | 157.01 s   | 0.00   s   |
-----------------------------------------------------------------------------------------------------------

Conclusion

Go has some SAT solvers which can solve some tough SAT problems. The benchmarks, as is often the case, don’t really show a huge trend. In some suites, Gini does better, in others, Gophersat does better. There is also a tradeoff w.r.t. the per-instance cutoff time.

It can be noted that in this data, Gini has an edge on SAT problems, which I (the author) believes is probably due to its unique (and unpublished) dynamic vsids

Gophersat and gini are pretty complementary from an API perspective, where Gophersat has more diverse command line functionality in terms of different SAT-like formats and problems, and gini has a bunch of features which are designed to treat these problems and others more by composing library usage. (It also the basis of a model checker).

Using the bench tool offers a way to address a very different flavor of performance problems than nanoseconds or allocations or whatever per op. This flavor of benchmarking is about maximising coverage over a large set of possible input problems, and is characterised by highly variable and possibly long runtimes.

Benchmarking can give a requisite degree of confidence regarding problem difficulty of a particular application domain. Despite the fact that many problems timeout here (this is probably also the case for any solve), SAT is used in industry, from Eclipse package management to Coverity false path supporession to operator-lifecycle-management on kubernetes. Real world problems are indeed within the reach of our SAT solvers; benchmarking can validate that.

Notes

[1] It is generally encouraged to avoid hard combinatorial problems where possible in software development. Such a practice guarantees that the resulting software is safe in the sense that it won’t take exponential time to complete some subtask.