# October 10, 2008

## SAT and the quest for the perfect study subject…

This must be the oldest post in my drafts bucket. Its creation (the title, I mean, because the body as been empty up until today) dates from May 23rd, 2005.

It was always my wish to write more about my research work but for one reason or another I end up writing about the most uninteresting topics and failing to discuss what I think to be one of the most interesting research topics out there (oh yeah, flame me…).

SAT is the short name for the SATisfiability problem (or Boolean Satisfiability Problem). The problem is: Given a propositional formula… yes, those like ((x => y) /\ ~x)… is there an assignment to the variables of the formula such that it evaluates to true?

This is the problem, simple, huh, well, simple indeed to state it but not so easy to solve it. Anyway, when there is such an assignment we say the formula is *sat* (for satisfiable), otherwise it is *unsat* (for unsatisfiable). In the example above the formula is *sat* because a value of false assigned to x and a value of false assigned to y makes the formula true. A trivially *unsat* formula would be (x /\ ~x). You see, this was the first problem to be proved NP-Complete by Cook in 1971. This is a very interesting class of problems since every NP problem can be converted into SAT and solved using a SAT solver (a software that uses the SAT problem). Sometimes it happens that these properties are interesting theoretical results with no practical application but this is not the case for SAT. Many problems are indeed reduced to SAT in order to be solved by a SAT solver. This is because current state of the art SAT solvers are so good that a problem which would take a near infinite amount of time to solve a decade ago, now takes a reasonable time to solve. We are, obviously, talking about problems where the number of variables amounts not to dozens, or even thousands but to millions!

Understand that if you apply a trial-and-error search for an assignment in a formula with 1000000 (one million) variables, in the worst case the formula is *unsat*, and you have to test 2^1000000 assignments (oh yeah, that’s a big number). However, it’s not uncommon for SAT solvers to prove the unsatisfiability of a formula that big.

It was by chance that I came to work in subjects related to SAT… it was also by chance that I came to work with my supervisor (Prof. Joao Marques-Silva), which lingers among the few that know SAT solvers inside out for a very long time.

SAT is fascinating, not only because it is a problem like the 3n+1 problem (easy to state, hard to solve), but also because of the huge application it has in both academic and industrial settings. In between a long list of applications, my favourites are:

- Model Checking, which includes Software and Hardware Verification
- Test Case Generation
- Haplotype Inference in Computational Biology
- Model Finding in Soft Formal Methods

Obviously, being such an interesting problem, several related problems exist which base themselves on SAT:

- SMT – Can we solve the SAT problem on propositional logic + decidable theories?
- #SAT – How many solutions a
*sat*formula has? - SAT Enumeration – Which solutions a
*sat*formula has?

Besides the problems related with research and commercial application, programming is certaintly a much important issue in SAT. Which data structures and algorithms to use, how to program it, which language? A yearly competition puts to test the best SAT solvers and it takes more than just reading a couple of papers to have the best solver. You need to be clever while programming, while choosing the data structures, the algorithms, and still factoring in how extensible, usable, you want it to be… How much more can you ask from a research subject?

# April 7, 2007

## SATSCheme… A new Project!

SATSCheme is a new project by myself and a colleague of mine to develop 2 SAT Solvers, one in C and other in PLT Scheme. I’ve setupĀ a blog for the project. This way if you’re interested in the developments you just have to subscribe the feed. From the Scheme perspective, it’ll be nice to see how Scheme GC will cope with the constant memory allocation usually performed by a SAT Solver, and how easy it will be to implement it. Moreover, it’ll be nice to compare it to the C version performance-wise and memory-wise. Other than that, the amount of locs will also be an interest analysis. Well, I’m sure there’ll be a huge space for discussion not only about the code itself but also about the SAT algorithms and structures. Hope you keep an eye at the page. Regarding major announcements, I’ll post them around here also.

# July 28, 2006

## Interconnected Scientific Domains

Today I was just browsing some interesting C++ articles (which will be subject of later post) and I just got face-to-face with ACCU (Association for C/C++ Users) page. I went up to see the benefits of membership and not only the conferences but also the bi-monthly journals caught my attention (and the low-cost fees).

The nice thing is that I was thinking to myself: “So many people in the SAT world program in C++, are they in anyway connected to the more Professional Programming C++ community? My immediate intuitive answer was: No!” which made sense since most of the time people in the research community are worried on publishing **the** paper and as soon as it is sent for review, a long list of deadlines follows which brings up a lot of more work: more study, more developing, more benchmarks, more writing, more … researching! This is kind of an endless cycle and no time is left in the middle for participating in out-of-the-study-subject community conferences.

Anyway, I went on to check where are the conferences, who are the organizers and which kind of people speak at these conferences. Well, amazed I was to see a name: Oliver Kullman as a speaker for the latest ACCU’06. The name is familiar! The description confirms his identity:

Oliver Kullmann

I did my PhD at the University of Frankfurt (Germany) in Mathematics in 1997. From 1999 to 2001 I was a post-doc in the group of Stephen Cook at the University of Toronto (Canada). In 2001 then I was employed as a lecturer in the computer science department at the University of Wales Swansea, where I stayed since then. Like most researchers, I have given numerous conference talks, publish regularly in international journals and visit researchers at other places.

My research is centred around the understanding of hard computational problems, where “hard” means here something around exponential time, which is typical for most problems in hardware and software verification, in artificial intelligence and in planning and scheduling.

Especially I’m interested in the study of the so-called SAT problem, the problem of deciding whether a propositional formula (boolean variables, connected by “and”, “or” and “not”) is satisfiable or not. Several years ago I’ve written a SAT solver (which was successful at that time), and now for several years I’m developing a generative C++ library for SAT solving (supported by an EPSRC grant since February 2004).

A fellow researcher in SAT is a speaker in a C/C++ users conference. That’s great. Congrats to him!

The application of C/C++ in SAT is huge since most people use it for performance critic software like SAT solvers. The interaction of SAT researchers with C/C++ professional programming seems to be quite useful since SAT researchers spend a great deal of time programming, most of the time in C/C++. So there’s no reason why they shouldn’t try to improve their programming skills and sharing their experiences with these communities. Way to go!

# July 3, 2006

## Parsing CNF with Scheme

CNF stands for Conjunctive Normal Form. In my special case, I’m interested in it since I work in SAT and since all SAT solvers use as input a CNF file, its parsing is a fundamental part of a SAT solver. SAT solvers **need** to be fast. A slow SAT solver is the same as an empty milk package, useless.

State of the Art SAT solvers have been using low-level languages for due to optimization needs for a long time, most are written in C or C++ (MiniSAT, zChaff, siege, etc). Daniel LeBerre and collaborators have been developing, however, SAT4J, a SAT solver written in Java which against all odds is competitive with other solvers.

CNF file parsing, is essential for SAT solving and since it is vital, although a trivial part of solving should be done in a split second. The syntax is quite trivial and is defined in the DIMACS file format:

The first non-comment line of the ASCII .cnf file is:

`p cnf `

where the first number is the number of clauses and the second number is the number of variables of the file. Comments start range through a line and start with the letter ‘c’.

The rest of the file is a list of numbers, which might be positive, negative or zero. A positive number stands for the positive literal of a variable. A negative number is the negative literal of a variable and zero separates different clauses.

For example, the formula:

`(1 or 2 or not 3) and (not 1 or 2) and (not 2)`

might be written as

c This is a comment of the example file.

p cnf 3 3

1 2 -3 0

-1 2 0

-2 0

I change line after zero for readability.

I’ve implemented a very simple parser in PLT-Scheme. My first version uses the SRFI libraries for string parsing and the code is as follows:

(module cnfreader mzscheme (require (prefix srfi13: (lib "13.ss" "srfi")) (prefix srfi1: (lib "1.ss" "srfi")) (lib "unitsig.ss") "cnfreader-sig.ss") (define cnfreader@ (unit/sig cnfreader^ (import) ;; Given a path string to a file, it gets the file from ;; from the disk and returns two values: a list with number ;; of variables and number of clauses and a list of lists where each list ;; is a clause and contains the variables in the clause. ;; clause-fn is a function applied to all clauses and literal-fn is applied to all literals. (define (get-clauses-from-cnf path-str clause-fn literal-fn) (let ([fp (open-input-file path-str 'text)]) (let loop ([line (read-line fp)] [clauses ()] [num-vars-clauses-list ()]) (if (eof-object? line) ;; End of File??? (begin (close-input-port fp) (values num-vars-clauses-list (srfi1:map! clause-fn clauses))) (let ([line-str (srfi13:string-trim line)]) ;; Line is a string, let's parse it! (cond [(or (srfi13:string-null? line-str) (char=? (string-ref line-str 0) #\c)) ;; String empty or comment (loop (read-line fp) clauses num-vars-clauses-list)] [(char=? (string-ref line-str 0) #\p) ;; Starts with a p. (let ([tokens (srfi13:string-tokenize line-str)]) (if (not (= (length tokens) 4)) (error (format "CNF: error , line starting with p has more than 4 elements: ~a" line-str))) (loop (read-line fp) clauses (srfi1:map! string->number (cdr (cdr tokens)))))] [else (let ([literals-list (srfi1:map! literal-fn (cdr (srfi1:reverse! (srfi1:map! string->number (srfi13:string-tokenize line-str)))))]) (loop (read-line fp) (cons literals-list clauses) num-vars-clauses-list))])))))))) (provide cnfreader@))

This file is a unit which implements a reader function for a CNF file. Receives a path-str with a string for the path of the CNF file, a literal-fn which is applied for every literal found and a clause-fn which is applied to a list of literals. In case you pass the identity function as arguments, you’ll get a list of non-zero numbers (with the number of clauses and number of variables as values).

I’ve ran this code on every example of SATLIB and performed very badly… I though… “ok, it was the string tokenize”, probably parser-tools will do it better… and I was right! Here’s the code of the same function using parser tools:

(module cnfreader2 mzscheme (require (lib "file.ss") (prefix srfi1: (lib "1.ss" "srfi")) (lib "unitsig.ss") "cnfreader-sig.ss" "cnf-parser.ss") (define cnfreader@ (unit/sig cnfreader^ (import) (define (get-clauses-from-cnf path-str clause-fn literal-fn) (call-with-input-file* path-str (lambda (fp) (let ([info/clauses-lst (cnf-parser (lambda () (cnf-lexer fp)))]) (values (car info/clauses-lst) (srfi1:map! (lambda (clause) (clause-fn (srfi1:map! literal-fn clause))) (cdr info/clauses-lst))))))))) (provide cnfreader@))

where cnf-parser is

(module cnf-parser mzscheme (require (lib "yacc.ss" "parser-tools") (lib "lex.ss" "parser-tools") (prefix : (lib "lex-sre.ss" "parser-tools"))) (define-tokens tok (NUM)) (define-empty-tokens etok (EOF INITINFO ENDCLAUSE)) (define-lex-abbrevs (integer (:: (:? #\-) (:/ #\1 #\9) (:* # #\9))) (comment (:: (:? #\newline) #\c (:* (:~ #\newline)) #\newline)) (white-space (:or #\newline #\return #\tab #\space #\vtab)) (initinfo (:: #\p (:+ whitespace) "cnf"))) (define cnf-lexer (lexer [(eof) 'EOF] ;; Ignore comments and newlines [(:or comment white-space) (cnf-lexer input-port)] [# 'ENDCLAUSE] [initinfo 'INITINFO] [integer (token-NUM (string->number lexeme))] [any-char (printf "None Matched!")])) (define cnf-parser (parser (start start) (end EOF) (tokens tok etok) (error (lambda (a b c) (printf "Token OK?: ~a~nToken Name: ~a~nToken Value:~a~n" a b c))) (grammar (start [(INITINFO NUM NUM clause-list) (cons (list $2 $3) $4)]) (clause-list [(clause) (list $1)] [(clause-list clause) (cons $2 $1)]) (clause [(literal-list ENDCLAUSE) $1]) (literal-list [(NUM) (list $1)] [(literal-list NUM) (cons $2 $1)])))) (provide cnf-parser cnf-lexer) )

Nice, eh?

Well, I’ve ran both on SATLIB, prepared some benchmarks (taken in the same CPU, running PLT-Scheme 349, compiled code with mzc) so I could see the difference between them and here’s the graph of the accumulated time between the parsers:

where the blue line is for the SRFI-based parser and the reddish line is for the parser-tools based parser, the time is in mili-Sec.

If you wish to have the complete benchmark file with times for each SATLIB file contact me. If you see that any optimization is possible, contact me. :-)

Unfortunately, I have not taken benchmarks with C/C++ parser but I will and then I’ll post a comparison.

# June 20, 2006

## SMT for everyone…

**or** *SMT in 20 minutes*!

**The Beginning**

In fact, this is my attempt to create a SMT tutorial, which everyone is able to grasp. In fact, the only requirement is basic knowledge of Mathematics and basic knowledge of the English language.

This tutorial will serve as an anchor to everyone which will, in the future, ask me what is my research all about. If you asked me that recently, then that’s why I told you to come here. It will be brief and simple but at least you’ll understand what I’m currently working on, from a non-specific point of view. From a specific point of view I’d have to write a tutorial on “*Architectural Design for Lazy SMT Multi-Theory Solvers in 2 years*” or something! (Yes, I tried to find a very complex title just for the fun of it…) :-)

**Propositional Logic**

OK, so, *yesterday I went to the beach* **and** *to the cinema*. Note that although, yesterday I went to the cinema, I did not went to the beach (which was a pity), but anyway, the sentence is **false**! For it to be true, I had to go both to the cinema and to the beach, i.e. **both** conditions had to be true. This is intuitive, right?

If had used an **or** I would have said *tommorrow I will go to the beach* **or** *to the cinema*. If I happen to go to one, or both of those places the sentence is *true*, otherwise, i.e. I go neither to the beach nor the cinema tommorrow, it is *false*. Note that usually in current speech when we say *this* **or** *that*, we mean one thing or the other but not both. In mathematics, this is not the case. An **or** is true whenever one or both things happen. The kind of **or** we usually use in current speech is known as **xor** (which stands for *exclusive or*).

We can instead of using sentences, which have a *true* of *false* value, use letters for variables which can have *true* of *false* values depending on their meaning. So, briefly, A **or** B is *false* if both A is *false* and B is *false* but *true* otherwise. A **and** B is true if A is *true* and B is *true* and *false* otherwise. Note that we can define the **and** in terms of the **or** and vice-versa. We also have the negation (**not**) operator **~**. **~**A is *false* if A is *true* and *true* otherwise. We can define many other operators in terms of these, like implication (If this then that), equivalence (this if and only if that), exclusive or (this or that but not both), and so on.. In practice, we can translate a sentence with a lot of operators into a sentence which maintains it’s truthness (it’s value of true of false) with only **and**s, **or**s and **not**s.

The truth value of a sentence depends usually on its variables:

(A **or** **~**(B **and** C)) cannot be said *true* of *false*. It depends on the values for the variables. If I tell you that A is *false*, B is *false* and C is *false* then you can say that the sentence is *true*. Note that we do not always need to specify the values of all the variables in the sentence to know the sentence value. If I say that A is *true*, no matter the values of B and C, the sentence is *true*. However, there are (evil) sentences which are *false* no matter what (kind’a like some people). For example, (A **and** **~**A). There are obviously some which are *true* no matter what, (A **or** **~**A).

Now, imagine that I give you the formula:

**~**(A **or** (B **and** **~**A)) **and** **~**(**~**B **and** A)

Now, can you find values for this sentence which makes it *true*, or is it always *false*?

(… trumpets …)

Ah, you’re trying to solve the SAT(isfiability) problem for that sentence. The formula is satisfiable if there are some values to the variables in the sentence (which we usually call formula) which make it true. If the formula is false no matter what, we say that the formula is unsatisfiable.

Who would imagine that this is one of the major problems in Computer Science? And it turns out to be pretty hard (which is fortunate since it turns out to employ a lot of colleagues of mine).

So, now, if someone tells you that it is working on SAT, you should say: “Ah, that’s nice!” instead of “Huh? What’s that?” and imagine that same person day after day trying to find new ways to solve this problem as fast as possible for huge formulas. Humm, so, let me give you an idea… Imagine that I write the formula on one straight line, in toilet paper, where a character is square and takes up the whole height of the toilet paper. Then, you would need about 180Km of toilet paper to write the whole formula… and that’s what I mean by huge.

Since, a lot of development has been done in SAT recently, there are a lot of people translating their problems in to SAT so that they are able to solve them using current efficient SAT solvers.

**Adding a Mathematical Component**

Problem is… there are problems which are not easily translated into SAT since they have inherently mathematical contraints which are hard to translate to SAT under certain circunstances. So, people decided to extend the SAT are to what is currently known as the SAT Modulo Theories (SMT) problem.

Now, in place of formulas like the ones in the last section which are composed of variables which are true or false, you can also have mathematical constraints like 2x + 3y >= 4, or f(a) – f(b) = 2, or even a[2] > 5.

What happened is that we substituted the Boolean variables (variables which can only have true or false values) into more complex expression, but otherwise more expressive which still have themselves a Boolean value. These expressions can belong to any decidable theory, linear arithmetic, uninterpreted functions and theory of arrays respectively (from the examples above).

So, the SMT formula syntax depends on the background theory (or theories) we are considering. If we consider Integer Linear Arithmetic, (x + 2y >= 3 and x < 3) or B is correct if the x,y are integers but is not correct if we consider Integer Difference Logic.

Now, we can use mix the theories and the expressions, generate arbitrarily large formulas to solve specific problems and check for their satisfiability like we did for SAT. In SAT we would search for a solution and if one existed we could output a value find for each of the variables which makes the formula true (satisfies the formula), and these values would be Boolean. Now, the values can be integer, real, Boolean, whatever.

The requirement is that the main underlying logic of these formulas is a many-sorted version of first order logic, which still leaves us the possibility of adding quantifiers but I think there are still few those who are working on Quantified SMT formulas.

Guess what? You can easily translate a Sudoku problem to SMT (which can be done to SAT, but not easily and not so intuitively)… you can also translate thousands of other (more practicaly interesting) problems to SMT. The translation, technically called *encoding* is also studied by some since the way you encode the problem strongly influences how easy it’ll be to solve.

Well, I think this is it for now… Hope you are SATisfied with this brief tutorial!

**Pointers:**

- SAT on MathWorld
- SAT on Wikipedia
- SAT Live! : Somewhere where you can find the latest news on SAT.
- SATLIB : Library of SAT benchmarks.
- Journal on Satisfiability
- SMTLIB : Library of SMT benchmarks.

# March 2, 2006

## ANN: Volume 2 of JSAT

I just received the announcement of the release of the second volume of JSAT from Hans van Maaren.

Dear all,

It is my pleasure to announce the release of the second volume of JSATFurther, IOS Press is going to publish hard copy issues of JSAT, starting

soon, first with vol 2, followed by vol 1, no 1.!!!!!!!!

If you maintain a link to JSAT on one of your sites please use the URL above.

(the URL used earlier will still work, but for reasons of “search” the

above one is preferred)Hans van Maaren

# February 28, 2006

## ANN: SMT Competition 2006 rules are out!

The rules for the 2^{nd} SMT Competition are out!

**What is SMT?**

SMT is an acronym for SAT Modulo Theories, where SAT is the well-known SATisfiability (NP-Complete) problem from Computation. SMT uses SAT to solve formulas which might span multiple theories. Current logics range from Linear Integer Arithmetic (LIA), Linear Real Arithmetic (LRA), Integer Difference Logic (IDL), Real Difference Logic (RDL) to Uninterpreted Functions (UF). More information can be found at the SMT-LIB site.

# September 6, 2005

## SAT & Sudoku

It seems SAT has been used once again … just for fun!!!

For those who do not know the SATisfiability problem, which is a very well-known Computer Science problem, might imagine it simply as the problem of finding if a formula has a solution and if there is at least one solution you should find one, we call it a model. SAT works tradicionally on propositional (boolean) formulas but it is not required to be that way.

Anyway, it seems someone decided it is time to finish off the publishing of those god damn books which come with dozens of sudoku problems… why solve a sudoku problem if we can solve a problem that solves the sudoku problem? And that problem is SAT… Check it and play with it… it’s fun… and if have just by chance a SAT solver try it… you have never though your SAT solver could solve Sudoku, right???

(in theory… it can solve mostly anything… I believe I’m saying “mostly” because I can’t really understand how it would be possible to safe Portugal from a huge financial hole… even with a SAT solver at hand!)

# April 20, 2005

## Structured Lispy Syntax on SAT Competition

I am just amazed to see the benchmarks posted on the SMT-COMP’05 homepage! And you know why?

Well, take a look at a benchmark, don’t you just recognize the syntax? :D Of course… it’s Lisp. Parsing it with Lisp would be a walk in the park. However, mostly due to the fact everyone thinks Lisp is not the fastest ‘language’… (and you can be sure we need the fastest in SMT-COMP) mostly all software is programmed in C or C++. Would this format provide an excuse for a team to use Lisp? I am curious about it.

Now, a question also comes to my mind. Even for those using C or C++, using Guile to parse the format and then passing it to C would be really nice I think. Oh well, let me just study this for the next few days and I’ll post more about this later…

**Note:** Note the ‘…’ surrounding “language” above. When I say most people think Lisp is not the fastest language, I mean most people think that current Lisp (Common Lisp, Scheme…) compilers do not generate the fastest executables.