July 3, 2006

Parsing CNF with Scheme

Posted in Programming, SAT, Scheme at 1:51 pm by pmatos

CNF stands for Conjunctive Normal Form. In my special case, I’m interested in it since I work in SAT and since all SAT solvers use as input a CNF file, its parsing is a fundamental part of a SAT solver. SAT solvers need to be fast. A slow SAT solver is the same as an empty milk package, useless.

State of the Art SAT solvers have been using low-level languages for due to optimization needs for a long time, most are written in C or C++ (MiniSAT, zChaff, siege, etc). Daniel LeBerre and collaborators have been developing, however, SAT4J, a SAT solver written in Java which against all odds is competitive with other solvers.

CNF file parsing, is essential for SAT solving and since it is vital, although a trivial part of solving should be done in a split second. The syntax is quite trivial and is defined in the DIMACS file format:
The first non-comment line of the ASCII .cnf file is:
p cnf

where the first number is the number of clauses and the second number is the number of variables of the file. Comments start range through a line and start with the letter ‘c’.

The rest of the file is a list of numbers, which might be positive, negative or zero. A positive number stands for the positive literal of a variable. A negative number is the negative literal of a variable and zero separates different clauses.
For example, the formula:
(1 or 2 or not 3) and (not 1 or 2) and (not 2)
might be written as

c This is a comment of the example file.
p cnf 3 3
1 2 -3 0
-1 2 0
-2 0

I change line after zero for readability.

I’ve implemented a very simple parser in PLT-Scheme. My first version uses the SRFI libraries for string parsing and the code is as follows:

(module cnfreader mzscheme
  
  (require (prefix srfi13: (lib "13.ss" "srfi"))
           (prefix srfi1: (lib "1.ss" "srfi"))
           (lib "unitsig.ss")
           "cnfreader-sig.ss")
  
  (define cnfreader@ 
    (unit/sig cnfreader^
      (import)
      
      ;; Given a path string to a file, it gets the file from
      ;; from the disk and returns two values: a list with number
      ;; of variables and number of clauses and a list of lists where each list
      ;; is a clause and contains the variables in the clause.
      ;; clause-fn is a function applied to all clauses and literal-fn is applied to all literals.
      (define (get-clauses-from-cnf path-str clause-fn literal-fn)
        (let ([fp (open-input-file path-str 'text)])
          
          (let loop ([line (read-line fp)] 
                     [clauses ()] 
                     [num-vars-clauses-list ()])
            
            (if (eof-object? line) ;; End of File???
                (begin 
                  (close-input-port fp)
                  (values num-vars-clauses-list
                          (srfi1:map! clause-fn clauses)))
                
                (let ([line-str (srfi13:string-trim line)]) ;; Line is a string, let's parse it!
                  (cond [(or (srfi13:string-null? line-str) (char=? (string-ref line-str 0) #\c)) ;; String empty or comment
                         (loop (read-line fp) clauses num-vars-clauses-list)]
                        [(char=? (string-ref line-str 0) #\p) ;; Starts with a p.
                         (let ([tokens (srfi13:string-tokenize line-str)])
                           (if (not (= (length tokens) 4))
                               (error (format "CNF: error , line starting with p has more than 4 elements: ~a" line-str)))
                           (loop (read-line fp) clauses (srfi1:map! string->number (cdr (cdr tokens)))))]
                        [else
                         (let ([literals-list (srfi1:map! literal-fn
                                                          (cdr (srfi1:reverse! (srfi1:map! string->number (srfi13:string-tokenize line-str)))))])
                           (loop (read-line fp)
                                 (cons literals-list clauses)
                                 num-vars-clauses-list))]))))))))
  
  (provide cnfreader@))

This file is a unit which implements a reader function for a CNF file. Receives a path-str with a string for the path of the CNF file, a literal-fn which is applied for every literal found and a clause-fn which is applied to a list of literals. In case you pass the identity function as arguments, you’ll get a list of non-zero numbers (with the number of clauses and number of variables as values).

I’ve ran this code on every example of SATLIB and performed very badly… I though… “ok, it was the string tokenize”, probably parser-tools will do it better… and I was right! Here’s the code of the same function using parser tools:

(module cnfreader2 mzscheme
  
  (require (lib "file.ss")
           (prefix srfi1: (lib "1.ss" "srfi"))
           (lib "unitsig.ss")
           "cnfreader-sig.ss"
           "cnf-parser.ss")
  
  (define cnfreader@ 
    (unit/sig cnfreader^
      (import)
      
      (define (get-clauses-from-cnf path-str clause-fn literal-fn)
        (call-with-input-file* path-str
          (lambda (fp)
            (let ([info/clauses-lst (cnf-parser (lambda () (cnf-lexer fp)))])
              (values (car info/clauses-lst)
                      (srfi1:map! (lambda (clause)
                                    (clause-fn (srfi1:map! literal-fn
                                                           clause)))
                                  (cdr info/clauses-lst)))))))))
  
  (provide cnfreader@))

where cnf-parser is

(module cnf-parser mzscheme
  
  (require (lib "yacc.ss" "parser-tools")
           (lib "lex.ss" "parser-tools")
           (prefix : (lib "lex-sre.ss" "parser-tools")))
  
  (define-tokens tok (NUM))
  (define-empty-tokens etok (EOF INITINFO ENDCLAUSE))
  
  (define-lex-abbrevs 
    (integer (:: (:? #\-) (:/ #\1 #\9) (:* # #\9)))
    (comment (:: (:? #\newline) #\c (:* (:~ #\newline)) #\newline))
    (white-space (:or #\newline #\return #\tab #\space #\vtab))
    (initinfo (:: #\p (:+ whitespace) "cnf")))
  
  (define cnf-lexer
    (lexer
     [(eof) 'EOF]
     
     ;; Ignore comments and newlines
     [(:or comment white-space) (cnf-lexer input-port)]
    
     [# 'ENDCLAUSE]
     [initinfo 'INITINFO]
     [integer (token-NUM (string->number lexeme))]
     [any-char (printf "None Matched!")])) 
  
  (define cnf-parser
      (parser
       
       (start start)
       (end EOF)
       (tokens tok etok)
       (error (lambda (a b c) (printf "Token OK?: ~a~nToken Name: ~a~nToken Value:~a~n" a b c)))
       
       (grammar
        (start [(INITINFO NUM NUM clause-list)
                (cons (list $2 $3)
                      $4)])
       
        (clause-list [(clause) (list $1)]
                     [(clause-list clause) (cons $2 $1)])
        
        (clause [(literal-list ENDCLAUSE) $1])
        
        (literal-list [(NUM) (list $1)]
                      [(literal-list NUM) (cons $2 $1)]))))
  
  (provide cnf-parser cnf-lexer)
  )

Nice, eh?
Well, I’ve ran both on SATLIB, prepared some benchmarks (taken in the same CPU, running PLT-Scheme 349, compiled code with mzc) so I could see the difference between them and here’s the graph of the accumulated time between the parsers:

where the blue line is for the SRFI-based parser and the reddish line is for the parser-tools based parser, the time is in mili-Sec.

If you wish to have the complete benchmark file with times for each SATLIB file contact me. If you see that any optimization is possible, contact me. :-)

Unfortunately, I have not taken benchmarks with C/C++ parser but I will and then I’ll post a comparison.

4 Comments »

  1. That chart would look better without the line markers…

  2. pmatos said,

    Line Markers? Do you mean the dots for each benchmark? Probably it would but then you would not be able to see that benchmark file ~505 took a lot of time on both parsers. Probably they could be smaller but then again, I don’t know much about Gnumeric and it did the trick! :-)

    Thanks for the suggestion anyway.

  3. Khaza Anuarul Hoque said,

    hi brother,

    i really need a program that conferts input formula to CNF. i looked into ur code but i i have no idea abt PTL. do u have a C++ version. if not can u pls tell me how to run the code with the parser tool step by step.

    thnx

  4. pmatos said,

    Hello,

    The code I have to do this conversion converts s-expressions to CNF
    using the Tseitins procedure and is implemented in PLT Scheme. It’s
    available in http://planet.plt-scheme.org/display.ss?package=logic.plt&owner=pjmatos

    Have fun,

    Paulo Matos


Leave a reply to pmatos Cancel reply