So, I'm mad about this.

The original paper about an implmentation of Brozozowski's Parsing Algorithm is Matt Might's Parsing With Derivatives: A Functional Pearl. For three weeks I've been hung up an a problem with the implementation that wasn't making any sense to me, and when I realized my mistake it came down to this pair of sentences:

  • The derivative of the empty parser is empty: Dc(∅) = ∅.
  • The derivative of the null parser is also empty: Dc(ε) = ∅.

The "empty parser" is called empty because the set of words it recognizes contains nothing: {}. It recognizes no words1 at all. Another name for the empty set: the NULL set, which is why the null symbol, ∅, is used there. The null parser is called that because it parses the empty string, the string written "" if you're using the Latin alphabet, and in C, C++, and other early languages was often marked by an actual symbol in the language, NUL, and referred to the ASCII byte 0000.

If you're confused, good: because so I was. It's especially not clear when the author writes, "The null string becomes the consume-nothing parser, while the empty set becomes the reject-everything parse." Note the change in terminology: here he means the null string parser and the empty parser, which in other places is called, respectively, the empty string parser and the null parser.

Once I figured out my confusion, figuring out how data preservation actually works in Might's formula was easier. But man, the people writing these math papers often do not mean for them to be read, do they?


1 "Words" in this case does not mean what it means in English or other languages; it just means sequences of letters in the alphabet making up meaningful strings, and the separation between them is arbitrary and not necessarily a whitespace or other punctuation.