July 3, 2006

Parsing CNF with Scheme

Posted in Programming, SAT, Scheme at 1:51 pm by pmatos

CNF stands for Conjunctive Normal Form. In my special case, I’m interested in it since I work in SAT and since all SAT solvers use as input a CNF file, its parsing is a fundamental part of a SAT solver. SAT solvers need to be fast. A slow SAT solver is the same as an empty milk package, useless.

State of the Art SAT solvers have been using low-level languages for due to optimization needs for a long time, most are written in C or C++ (MiniSAT, zChaff, siege, etc). Daniel LeBerre and collaborators have been developing, however, SAT4J, a SAT solver written in Java which against all odds is competitive with other solvers.

CNF file parsing, is essential for SAT solving and since it is vital, although a trivial part of solving should be done in a split second. The syntax is quite trivial and is defined in the DIMACS file format:
The first non-comment line of the ASCII .cnf file is:
p cnf

where the first number is the number of clauses and the second number is the number of variables of the file. Comments start range through a line and start with the letter ‘c’.

The rest of the file is a list of numbers, which might be positive, negative or zero. A positive number stands for the positive literal of a variable. A negative number is the negative literal of a variable and zero separates different clauses.
For example, the formula:
(1 or 2 or not 3) and (not 1 or 2) and (not 2)
might be written as
c This is a comment of the example file. p cnf 3 3 1 2 -3 0 -1 2 0 -2 0

I change line after zero for readability.

I’ve implemented a very simple parser in PLT-Scheme. My first version uses the SRFI libraries for string parsing and the code is as follows:

(module cnfreader mzscheme
  
  (require (prefix srfi13: (lib "13.ss" "srfi"))
           (prefix srfi1: (lib "1.ss" "srfi"))
           (lib "unitsig.ss")
           "cnfreader-sig.ss")
  
  (define cnfreader@ 
    (unit/sig cnfreader^
      (import)
      
      ;; Given a path string to a file, it gets the file from
      ;; from the disk and returns two values: a list with number
      ;; of variables and number of clauses and a list of lists where each list
      ;; is a clause and contains the variables in the clause.
      ;; clause-fn is a function applied to all clauses and literal-fn is applied to all literals.
      (define (get-clauses-from-cnf path-str clause-fn literal-fn)
        (let ([fp (open-input-file path-str 'text)])
          
          (let loop ([line (read-line fp)] 
                     [clauses ()] 
                     [num-vars-clauses-list ()])
            
            (if (eof-object? line) ;; End of File???
                (begin 
                  (close-input-port fp)
                  (values num-vars-clauses-list
                          (srfi1:map! clause-fn clauses)))
                
                (let ([line-str (srfi13:string-trim line)]) ;; Line is a string, let's parse it!
                  (cond [(or (srfi13:string-null? line-str) (char=? (string-ref line-str 0) #\c)) ;; String empty or comment
                         (loop (read-line fp) clauses num-vars-clauses-list)]
                        [(char=? (string-ref line-str 0) #\p) ;; Starts with a p.
                         (let ([tokens (srfi13:string-tokenize line-str)])
                           (if (not (= (length tokens) 4))
                               (error (format "CNF: error , line starting with p has more than 4 elements: ~a" line-str)))
                           (loop (read-line fp) clauses (srfi1:map! string->number (cdr (cdr tokens)))))]
                        [else
                         (let ([literals-list (srfi1:map! literal-fn
                                                          (cdr (srfi1:reverse! (srfi1:map! string->number (srfi13:string-tokenize line-str)))))])
                           (loop (read-line fp)
                                 (cons literals-list clauses)
                                 num-vars-clauses-list))]))))))))
  
  (provide cnfreader@))

This file is a unit which implements a reader function for a CNF file. Receives a path-str with a string for the path of the CNF file, a literal-fn which is applied for every literal found and a clause-fn which is applied to a list of literals. In case you pass the identity function as arguments, you’ll get a list of non-zero numbers (with the number of clauses and number of variables as values).

I’ve ran this code on every example of SATLIB and performed very badly… I though… “ok, it was the string tokenize”, probably parser-tools will do it better… and I was right! Here’s the code of the same function using parser tools:

(module cnfreader2 mzscheme
  
  (require (lib "file.ss")
           (prefix srfi1: (lib "1.ss" "srfi"))
           (lib "unitsig.ss")
           "cnfreader-sig.ss"
           "cnf-parser.ss")
  
  (define cnfreader@ 
    (unit/sig cnfreader^
      (import)
      
      (define (get-clauses-from-cnf path-str clause-fn literal-fn)
        (call-with-input-file* path-str
          (lambda (fp)
            (let ([info/clauses-lst (cnf-parser (lambda () (cnf-lexer fp)))])
              (values (car info/clauses-lst)
                      (srfi1:map! (lambda (clause)
                                    (clause-fn (srfi1:map! literal-fn
                                                           clause)))
                                  (cdr info/clauses-lst)))))))))
  
  (provide cnfreader@))

where cnf-parser is

(module cnf-parser mzscheme
  
  (require (lib "yacc.ss" "parser-tools")
           (lib "lex.ss" "parser-tools")
           (prefix : (lib "lex-sre.ss" "parser-tools")))
  
  (define-tokens tok (NUM))
  (define-empty-tokens etok (EOF INITINFO ENDCLAUSE))
  
  (define-lex-abbrevs 
    (integer (:: (:? #\-) (:/ #\1 #\9) (:* # #\9)))
    (comment (:: (:? #\newline) #\c (:* (:~ #\newline)) #\newline))
    (white-space (:or #\newline #\return #\tab #\space #\vtab))
    (initinfo (:: #\p (:+ whitespace) "cnf")))
  
  (define cnf-lexer
    (lexer
     [(eof) 'EOF]
     
     ;; Ignore comments and newlines
     [(:or comment white-space) (cnf-lexer input-port)]
    
     [# 'ENDCLAUSE]
     [initinfo 'INITINFO]
     [integer (token-NUM (string->number lexeme))]
     [any-char (printf "None Matched!")])) 
  
  (define cnf-parser
      (parser
       
       (start start)
       (end EOF)
       (tokens tok etok)
       (error (lambda (a b c) (printf "Token OK?: ~a~nToken Name: ~a~nToken Value:~a~n" a b c)))
       
       (grammar
        (start [(INITINFO NUM NUM clause-list)
                (cons (list $2 $3)
                      $4)])
       
        (clause-list [(clause) (list $1)]
                     [(clause-list clause) (cons $2 $1)])
        
        (clause [(literal-list ENDCLAUSE) $1])
        
        (literal-list [(NUM) (list $1)]
                      [(literal-list NUM) (cons $2 $1)]))))
  
  (provide cnf-parser cnf-lexer)
  )

Nice, eh?
Well, I’ve ran both on SATLIB, prepared some benchmarks (taken in the same CPU, running PLT-Scheme 349, compiled code with mzc) so I could see the difference between them and here’s the graph of the accumulated time between the parsers:

where the blue line is for the SRFI-based parser and the reddish line is for the parser-tools based parser, the time is in mili-Sec.

If you wish to have the complete benchmark file with times for each SATLIB file contact me. If you see that any optimization is possible, contact me. :-)

Unfortunately, I have not taken benchmarks with C/C++ parser but I will and then I’ll post a comparison.

Permalink

4 Comments »

Carlos Rodrigues said,

July 3, 2006 at 3:18 pm

That chart would look better without the line markers…

Reply
pmatos said,

July 3, 2006 at 3:28 pm

Line Markers? Do you mean the dots for each benchmark? Probably it would but then you would not be able to see that benchmark file ~505 took a lot of time on both parsers. Probably they could be smaller but then again, I don’t know much about Gnumeric and it did the trick! :-)

Thanks for the suggestion anyway.

Reply
Khaza Anuarul Hoque said,

May 14, 2009 at 4:03 am

hi brother,

i really need a program that conferts input formula to CNF. i looked into ur code but i i have no idea abt PTL. do u have a C++ version. if not can u pls tell me how to run the code with the parser tool step by step.

thnx

Reply
pmatos said,

May 14, 2009 at 9:53 am

Hello,

The code I have to do this conversion converts s-expressions to CNF
using the Tseitins procedure and is implemented in PLT Scheme. It’s
available in http://planet.plt-scheme.org/display.ss?package=logic.plt&owner=pjmatos

Have fun,

Paulo Matos

Reply

The Minority Blog

July 3, 2006

Parsing CNF with Scheme

4 Comments »

Leave a reply to pmatos Cancel reply

About Me

Archives

Categories

Top Posts

Meta

The Minority Blog

July 3, 2006

Parsing CNF with Scheme

Share this:

Related

4 Comments »

Leave a reply to pmatos Cancel reply

About Me

Archives

Categories

Top Posts

Meta