Finding Minimal Unsatisfiable Subsets

Sometimes, when searching for solutions to constraint problems, no solution exists to the problem and the solver returns UNSAT. However, this bare answer is not especially informative on its own. We might like to have some idea why the problem is unsatisfiable, or have some likely candidates for constrats that could be relaxed to find satisfiable solutions.

One way to get more information out of unsatisfiable query results is to compute minimal unsatisfiable subsets of the original query. These are subsets of the assertions of the original query that are already unsatisfiable; and which are minimal, in the sense that removing any one assertion from the subset renders it satisfiable. The value of minimal unsatisfiable subsets (MUSs) is that they can be understood as an explanation for why the query is unsatisfiable. In good cases, a MUS may be significantly smaller than the original query, and thus provide significant value to a human operator trying to understand the query failure. Alternately, in a multi-stage constraint satisfaction setup, MUSs can be feed backward to prune search space in a previous search phase.

To allow the computation of MUSs in Grackle, we have implemented the CAMUS algorithm [1], which can be layered on top of any SAT or SMT solver that supports incremental solving. The CAMUS algorithm, invented by Liffiton and Sakallah [LS2008], it able to take an arbitrary collection of statements, the conjunction of which is unsatisfiable, and compute the set of all minimal unsatisfiable subsets of that collection. If desired, it can also be used to compute just one MUS, saving some work.

Compared to some techniques for computing unsatisfiable cores, the CAMUS method has the advantage that it works directly with the set of clauses as defined by the input, as opposed to, e.g., a set of clauses produced after conjunctive normal form (CNF) conversion. This means the unsatisfiable subsets returned are directly stated in terms of the inputs to the CAMUS algorithm. This makes it easy to map these unsatisfiable subsets back into semantically meaningful units, and also gives the caller significant control over the granularity of the unsatisfiable core search.

Note: In order to be feasible, the CAMUS algorithm requires incremental solver support. To make use of this ability, the CAMUS procedures in Grackle require the use of the “online” solver backend, which maintains a persistent connection to the Yices SMT solver. The solver backend may be selected from the command line, and must be set to the online backed, as the default is to use the noninteractive “offline” backend.

Example

Consider the following script. It establishes a number of symbolic integer variables, and posits a number of constraints on them (chosen arbitrarily for this example).

x = symbolic('x', 'integer');
y = symbolic('y', 'integer');
z = symbolic('z', 'integer');
w = symbolic('w', 'integer');

% (x,y) is a distinct point from (z,w)
background = not ( x == z & y == w );

clauses = ...
  [ x > z, y <= w, 10 < z, 15 > w, y == z, x + 4*w < 20, w == 10 ];

% Is the problem satisfiable?
[r s] = check_sat( background & all(clauses) );

if(r)
  % If so print the solution
  'Satisfiable'
  s
else
  % If not, find all the MUSs
  'Unsatisfiable'
  mus = find_all_mus( background, clauses )

  % Verify that each MUS is unsatisfiable
  for i = 1:size(mus,1)
    r = check_sat( background & all(clauses(mus(i,:))) );
    if(r)
      'Oops! MUS is actually satisfiable:'
      mus(i,:)
    end
  end
end

The constraints are broken into two parts: a collection of “background” constraints and a sequence of clauses. The CAMUS algorithm will treat statements in the background clause as fixed: it will consider only situations where the background constraints are all satisfied. The collection of statements in the “clauses” list should be considered as a conjunction of additional constraints. It is these clauses that CAMUS considers the non-divisible units when searching for unsatisfiable subsets.

This particular collection of constraints is unsatisfiable, as the first call to check_sat will tell us. Thus we call the find_all_mus function to compute the complete set of all minimal unsatisfiable subsets. For this particular constraint set, there happen to be three distinct minimal subsets. The output of this example script is shown below.

$ grackle --backend=online camus_ex.m

ans =

     Unsatisfiable

 mus =

     1      1      1      0      1      1      0
     0      1      1      0      1      0      1
     1      0      1      0      0      1      1

The output of find_all_mus is a matrix of boolean values. Each row in the matrix identifies a separate MUS. The elements in each row indicate if the corresponding clause is in the subset. Thus, the first MUS listed above consists of the following five clauses: x > z, y <= w, 10 < z, y == z and x + 4*w < 20. It is straightforward to use logical indexing to select this subset of clauses as clauses(mus(1,:)). The verification loop at the end of the script does exactly this to verify that each reported MUS is actually unsatisfiable.

If we wanted only a single MUS, we could instead call the find_mus function. Moreover, if we want to compute more than one MUS, but not necessarily all, we can call find_all_mus with an upper bound; then at most that many MUSs will be computed. For example, find_all_mus(background, clauses, 100 ) will compute at most 100 MUSs.

Notes on performance

The time required to compute MUSs can be highly unpredictable, but in the worst cases, it is very expensive. The algorithm works in two steps: in the first, the solver is repeatedly queried to discover a collection of maximally satisfiable sets; and in the second phase, this collection of maximally satisfiable subsets is examined to discover minimal unsatisfiable subsets. Both phases of this algorithm can take exponential time, in the worst case. Asking for all MUSs, in particular, can result in a number of answers exponential in the number of clauses. If the initial query takes an appreciable amount of time, calculating MUSs using CAMUS is unlikely to yield satisfactory results, as it involves repeatedly querying the solver with problems of approximately the same difficulty as the initial problem. Calculating a single MUS can be significantly cheaper than calculating all, or even a bounded number of MUSs.

To mitigate this problem, it can help to limit the number of clauses submitted to CAMUS, either by incorporating more fact into the background clause, or by combining clauses together into coarser (and thus, fewer) clauses.

[1]CAMUS is simultaneously a reference to the French author Albert Camus and also an acronym for Compute All Minimal Unsatisfiable Subsets.
[LS2008]“Algorithms for Computing All Minimal Unsatisfiable Subsets.” Mark H. Liffiton and Karem A. Sakallah. Journal of Automated Reasoning. 2008.