"But what are they for?"
Do you remember the days in pre-college when you were taking geometry, or algebra, or calculus, and wondering when in your long and varied life you would ever have a need to use the quadratic equation? I remember having that thought lots of time. I had that thought once more, the other day, when I stumbled across Julie Moronoki's A Brief Guide to a Few Algebraic Structures.
The last six months of my life have been deeply engrossed in regular expression theory, and so it blew my mind a little when I stumbled upon this table:
Algebra | Logic | Types | Regular Expressions |
---|---|---|---|
a + b | a ∨ b | Either a b | (a|b) |
a ⨯ b | a ∧ b | (a, b) | ab |
ba | a ⇒ b | a -> b | ba |
a = b | a ⇔ b | _isomorphism_ | a ≅ b |
1 | ⊤ | Unit | ε |
0 | ⊥ | Void | ∅ |
This table is taken from Thinking with Types. That fourth column, "regular expressions," isn't in the original table, but it fits fairly well right there. The first two columns are known as the Curry-Howard Isomorphism, and is essentially a proof, in the mathematical sense, that every computer program is a mathematical equation and vice versa. Since every regular expression is a computer program and a mathematical equation, of course it fits in the Curry-Howard Isomorphism, but it does so with surprisingly direct elegance, and since type systems and regular expressions are set membership assertions with the exact same shape, every regular expression solver is a type system solver, and vice-versa.
This "algebraic structure" is known as a "semiring with closure," and I've been head deep into regular expressions as semirings for the past couple of months.
Here's the thing, though: I still wonder "What the hell are these things for?" I mean, I started to get part of it: if it fits into a semiring, you can replace it for a regular expression.
In fact, that's what regexs do: as the regex consumes a string, it replaces that point in the regex and the string with a true/false value (see the "Logic" column above): "keep going" or "fail". If you reach the end with "keep going" active, the you found the string you were looking for.
If instead of "keep going," what if you (a) collected the letter being analyzed, and (b) did something smart with it, like, had a set of empty matches and then, for the |
symbol, collected all the matches that passed, and for one expression following another, created the cartesian product set of matches. When you reached the end of the string, you'd either have an empty set ("fail"), or you'd have a set with all the legitimate matches — and that's how capture groups work in classic regular expressions.
Categorically, we say that a regular expression is a function that takes a string and a semiring and returns a new semiring. The set of the semiring may contain a boolean about recognizing the string, or it may contain the number of valid paths through the expression for that string, or it may contain the set of strings found by that expression within the string. That's it. That's the whole story.
Great. But what are semirings for?
This, I think, is where mathematics breaks down for most people. It's not enough to know "semirings" (or semigroups, or functors, or monads); you have to know not just what they are, but how they interact with other mathematical structures to produce results. It also requires a certain degree of cleverness, to be able to take those disparate elements and have that sudden insight.
Hao Huang recently did that with his Sensitivity Proof, in which he used an insight from a completely different section of mathematics to prove how sensitive a function might be, that is, how much change you would have to do to the inputs of a function in order to change the output. The problem has been well-known for 30 years, and Huang's proof is considered ridiculously elegant, the hypothesis of which fits into a friggin' tweet, and the total paper is only two pages long.
This is part of my love/hate relationship with Haskell. Haskell forces you to be a better programmer. You don't have a choice. But at the same time, there needs to be a better way of explaining: "This is what this is for."