Algorithms for parsing and factoring boolean expressions
An interesting problem is to take a nested boolean logical expression, such as:
(A or B) and (C or A and D) or E or FAnd write an algorithm to factor out all the terms into a flat expression (no parentheses) in an efficient way. For the above expression, we would get 6 total conjunctions joined together with disjunctions (or
operators):
1  A and C 
In Python, we might want our final value to be a list of lists, where each sublist is a conjunction:
1  [['A', 'C'], ['A', 'D'], ['B', 'C'], ['B', 'A', 'D'], ['E'], ['F']] 
It’s important to notice is that we can have long chains of terms, such as A and B or C and D or E and F and G ...
as well as deep nesting, such as A or (B and (C or (D and (E or ...
.
Parsing
Before tacking on the challenge of factoring these expressions, let’s start by parsing them from plain strings into python lists. One powerful option is to use a library called parsec
, which originally comes from the Haskell community.
First we declare some basic elements
1  import parsec as p 
This defines all the basic keywords of the language, such as and
, or
, parentheses, and whitespace.
To parse whole nested expressions, we use a generator function with some recursiveness:
1 

For example, to parse a or (b and c)
, the following parsers would get called in order:
program
ona or (b and c)
expr
, thenatom
ona
, yielding [‘a’]expr
, thenatom
onor
, yielding [‘a’, ‘or’]expr
on(b and c)
compound_expr
on(b and c)
lparen
on(
(discarded)expr
, thenatom
onb
, yielding [‘a’, ‘or’, [‘b’]]expr
, thenatom
onand
, yielding [‘a’, ‘or’, [‘b’, ‘and’]]expr
, thenatom
onc
, yielding [‘a’, ‘or’, [‘b’, ‘and’, ‘c’]]rparen
on)
(discarded)
Calling program
as follows:
1  parsed = program("a or (b and c or d)", 0) 
Gives back a result that looks like:
1  ['a', 'or', ['b', 'and', 'c', 'or', 'd']] 
We can use this simple form to do our factoring work:
Factoring
Given an unfactored boolean expression, let’s consider our desired output. For an expression such as:
1  (A or B) and (C or A and D) or E or F 
We ultimately want a list of list of factored terms:
1  [['A', 'C'], ['A', 'D'], ['B', 'C'], ['B', 'A', 'D'], ['E'], ['F']] 
Where each sublist represents a conjunction (AND operator) of terms. The whole list is a disjunction (OR operator) of conjunctions (as an expression, the listoflists looks like A and C or A and D or B and C or B and A and D or E or F
).
This problem is a great example of a “divide and conquer algorithm”, where you can break the problem up into smaller and simpler parts. We can solve this first by:
 Combining a flat set of plain, atomic terms, like “
a and b or c
“ into[[a, b], [c]]
 Combining preconverted disjunctions like
[[a], [b]]
and[[c, d]]
into a single list using either an “and
“ or an “or
“ operator. Combining
[[a], [b]]
and[[c, d]]
using “and
“ gives us[[a, c, d], [b, c, d]]
 Think of it as factoring “
(a or b) and (c and d)
“, giving “(a and c and d) or (b and c and d)
“
 Think of it as factoring “
 Combining
[[a], [b]]
, and[[c, d]]
using “or
“ gives us[[a], [b], [c, d]]
 Think of it as factoring “
(a or b) or (c and d)
“, giving “a or b or (c and d)
“
 Think of it as factoring “
 Combining
Starting with an easier problem
We want to turn flat, simple expressions into lists of conjunctions. For example:
1  a or b > [['a'], ['b']] 
A few rules immediately come to mind. First, the “or
“ operator seems to append new terms as a new list to the end, creating separate sublists. Combining terms with an and
operator places the terms in the previous sublist.
 When adding a new term with an “
or
“ operation, add a new sublist with the new term as its only element  When adding a new term with an “
and
“ operation, append the new term to the last sublist  When adding the first term, add a new sublist with the term as the only element
For the or
combination case, we can write some Python to append the term as a new sublist:
1  def or_combination(disjunction, new_term): 
While the and
case appends the term to the last sublist:
1  def and_combination(disjunction, term): 
We combine the above to functions into a single function that iterates over all terms in a flat expression and converts them into a list of conjunctions
1  def expr_to_disjunction(expr): 
Combining disjunctions
A “disjunction” is a list of conjunctions. For example, the disjunction [[a], [b, c]]
represents the expression a or b and c
.
In order to collapse expressions with parentheses, we need to think about how to combine whole disjunctions. For example, say you had:
1  (a or b) and (c or d) 
Using the code above, we can tackle the two subexpressions:
1  (a or b) > [[a], [b]] 
Now we can consider how to combine [[a], [b]]
and [[c], [d]]
using the “and
“ operator. By using what we know intuitively about factoring expressions (think of or
as addition and and
as multiplication), our expected result looks like:
1  [[a, c], [a, d], [b, c], [b, d]] 
We’re doing a pairwise combination of each term from each disjunction, which can be generated with some nested looping:
1  [c1 + c2 for c1 in disj1 for c2 in disj2] 
Where disj1
in our case is [[a], [b]]
and disj2
is [[c], [d]]
. Incorporating the above code into our and_combination
function, we get:
1  def and_combination(disjunction, term): 
What about the or
case? Say we had:
1  (a and b) or (c and d) 
Each subexpression gives us:
1  (a and b) > [[a, b]] 
The two subexpressions get combined into disjunction as:
1  [[a, b], [c, d]] 
This is a simpler case than the and
case. We can combine two disjunctions with an or
operation by simply concatenating them together into one list:
1  disjunction + disjunction2 
Incorporating that into our or_combination
function, we get:
1  def or_combination(disjunction, term): 
Now our or_combination
and and_combination
functions can combine atomic lexemes (such as a
and b
) as well as fill subexpressions.
The only thing missing is a function that can actually apply this logic all the way up the tree, stitching everything together.
We want to process things bottom up, converting the most nested subexpressions into disjunction lists, and then combine those together as we travel up the tree
Traversing up the tree
In order to apply a function to every level of the expression tree bottomup, we can traverse the tree using recursion.
1  def traverse_expr(expr, fn): 
This function applies a given function fn
to every level of the tree, starting from the bottom and working up. For atomic lexemes, we don’t apply the function and simply return the lexeme. This is our base case.
For example, given the expression [['a', 'and', 'b'], 'or', ['c', 'and', 'd']]
, fn
is applied to every list of terms like so:
1  fn([fn(['a', 'and', 'b']), 'or', fn(['c', 'and', 'd'])]) 
We can pass our expr_to_disjunction
function as fn
:
1  disjunction = traverse_expr(expr, expr_to_disjunction) 
This gives us our final, correct result. To understand why, let’s take a full example, starting with a string expression:
1  string = "(a or b) and (c or d and e)" 
We first parse this into lists of lexemes:
1  expr = program(0, string) 
Then we traverse the tree bottomup and apply expr_to_disjunction
at every level:
1  result = traverse_expr(expr, expr_to_disjunction) 
result
will be [['a', 'c'], ['a', 'd', 'e'], ['b', 'c'], ['b', 'd', 'e']]
Every step of the way, starting with [['a', 'or', 'b'], 'and', ['c', 'or', 'd', 'and', 'e']]
 Apply
expr_to_disjunction
to each subexpression  >
[expr_to_disjunction(['a', 'or', 'b']), 'and', ['c', 'or', 'd', 'and', 'e']]
 >
[[['a'], ['b']], 'and', ['c', 'or', 'd', 'and', 'e']]
 >
[[['a'], ['b']], 'and', expr_to_disjunction(['c', 'or', 'd', 'and', 'e'])]
 >
[[['a'], ['b']], 'and', [['c'], ['d', 'e']]]
 Apply
expr_to_disjunction
to the whole expression  >
expr_to_disjunction([[['a'], ['b']], 'and', [['c'], ['d', 'e']]])
 >
[['a', 'c'], ['a', 'd', 'e'], ['b', 'c'], ['b', 'd', 'e']]