July 3, 2006
Parsing CNF with Scheme
CNF stands for Conjunctive Normal Form. In my special case, I’m interested in it since I work in SAT and since all SAT solvers use as input a CNF file, its parsing is a fundamental part of a SAT solver. SAT solvers need to be fast. A slow SAT solver is the same as an empty milk package, useless.
State of the Art SAT solvers have been using low-level languages for due to optimization needs for a long time, most are written in C or C++ (MiniSAT, zChaff, siege, etc). Daniel LeBerre and collaborators have been developing, however, SAT4J, a SAT solver written in Java which against all odds is competitive with other solvers.
CNF file parsing, is essential for SAT solving and since it is vital, although a trivial part of solving should be done in a split second. The syntax is quite trivial and is defined in the DIMACS file format:
The first non-comment line of the ASCII .cnf file is:
where the first number is the number of clauses and the second number is the number of variables of the file. Comments start range through a line and start with the letter ‘c’.
The rest of the file is a list of numbers, which might be positive, negative or zero. A positive number stands for the positive literal of a variable. A negative number is the negative literal of a variable and zero separates different clauses.
For example, the formula:
(1 or 2 or not 3) and (not 1 or 2) and (not 2)
might be written as
c This is a comment of the example file.
p cnf 3 3
1 2 -3 0
-1 2 0
I change line after zero for readability.
I’ve implemented a very simple parser in PLT-Scheme. My first version uses the SRFI libraries for string parsing and the code is as follows:
(module cnfreader mzscheme (require (prefix srfi13: (lib "13.ss" "srfi")) (prefix srfi1: (lib "1.ss" "srfi")) (lib "unitsig.ss") "cnfreader-sig.ss") (define cnfreader@ (unit/sig cnfreader^ (import) ;; Given a path string to a file, it gets the file from ;; from the disk and returns two values: a list with number ;; of variables and number of clauses and a list of lists where each list ;; is a clause and contains the variables in the clause. ;; clause-fn is a function applied to all clauses and literal-fn is applied to all literals. (define (get-clauses-from-cnf path-str clause-fn literal-fn) (let ([fp (open-input-file path-str 'text)]) (let loop ([line (read-line fp)] [clauses ()] [num-vars-clauses-list ()]) (if (eof-object? line) ;; End of File??? (begin (close-input-port fp) (values num-vars-clauses-list (srfi1:map! clause-fn clauses))) (let ([line-str (srfi13:string-trim line)]) ;; Line is a string, let's parse it! (cond [(or (srfi13:string-null? line-str) (char=? (string-ref line-str 0) #\c)) ;; String empty or comment (loop (read-line fp) clauses num-vars-clauses-list)] [(char=? (string-ref line-str 0) #\p) ;; Starts with a p. (let ([tokens (srfi13:string-tokenize line-str)]) (if (not (= (length tokens) 4)) (error (format "CNF: error , line starting with p has more than 4 elements: ~a" line-str))) (loop (read-line fp) clauses (srfi1:map! string->number (cdr (cdr tokens)))))] [else (let ([literals-list (srfi1:map! literal-fn (cdr (srfi1:reverse! (srfi1:map! string->number (srfi13:string-tokenize line-str)))))]) (loop (read-line fp) (cons literals-list clauses) num-vars-clauses-list))])))))))) (provide cnfreader@))
This file is a unit which implements a reader function for a CNF file. Receives a path-str with a string for the path of the CNF file, a literal-fn which is applied for every literal found and a clause-fn which is applied to a list of literals. In case you pass the identity function as arguments, you’ll get a list of non-zero numbers (with the number of clauses and number of variables as values).
I’ve ran this code on every example of SATLIB and performed very badly… I though… “ok, it was the string tokenize”, probably parser-tools will do it better… and I was right! Here’s the code of the same function using parser tools:
(module cnfreader2 mzscheme (require (lib "file.ss") (prefix srfi1: (lib "1.ss" "srfi")) (lib "unitsig.ss") "cnfreader-sig.ss" "cnf-parser.ss") (define cnfreader@ (unit/sig cnfreader^ (import) (define (get-clauses-from-cnf path-str clause-fn literal-fn) (call-with-input-file* path-str (lambda (fp) (let ([info/clauses-lst (cnf-parser (lambda () (cnf-lexer fp)))]) (values (car info/clauses-lst) (srfi1:map! (lambda (clause) (clause-fn (srfi1:map! literal-fn clause))) (cdr info/clauses-lst))))))))) (provide cnfreader@))
where cnf-parser is
(module cnf-parser mzscheme (require (lib "yacc.ss" "parser-tools") (lib "lex.ss" "parser-tools") (prefix : (lib "lex-sre.ss" "parser-tools"))) (define-tokens tok (NUM)) (define-empty-tokens etok (EOF INITINFO ENDCLAUSE)) (define-lex-abbrevs (integer (:: (:? #\-) (:/ #\1 #\9) (:* # #\9))) (comment (:: (:? #\newline) #\c (:* (:~ #\newline)) #\newline)) (white-space (:or #\newline #\return #\tab #\space #\vtab)) (initinfo (:: #\p (:+ whitespace) "cnf"))) (define cnf-lexer (lexer [(eof) 'EOF] ;; Ignore comments and newlines [(:or comment white-space) (cnf-lexer input-port)] [# 'ENDCLAUSE] [initinfo 'INITINFO] [integer (token-NUM (string->number lexeme))] [any-char (printf "None Matched!")])) (define cnf-parser (parser (start start) (end EOF) (tokens tok etok) (error (lambda (a b c) (printf "Token OK?: ~a~nToken Name: ~a~nToken Value:~a~n" a b c))) (grammar (start [(INITINFO NUM NUM clause-list) (cons (list $2 $3) $4)]) (clause-list [(clause) (list $1)] [(clause-list clause) (cons $2 $1)]) (clause [(literal-list ENDCLAUSE) $1]) (literal-list [(NUM) (list $1)] [(literal-list NUM) (cons $2 $1)])))) (provide cnf-parser cnf-lexer) )
Well, I’ve ran both on SATLIB, prepared some benchmarks (taken in the same CPU, running PLT-Scheme 349, compiled code with mzc) so I could see the difference between them and here’s the graph of the accumulated time between the parsers:
where the blue line is for the SRFI-based parser and the reddish line is for the parser-tools based parser, the time is in mili-Sec.
If you wish to have the complete benchmark file with times for each SATLIB file contact me. If you see that any optimization is possible, contact me. :-)
Unfortunately, I have not taken benchmarks with C/C++ parser but I will and then I’ll post a comparison.