Following on my adventures in job seeking, several weeks ago I landed a great new job with Splunk. They do big data for "internal intelligence"; monitoring your toolkit and helping you do analytics on masses of what is generally thought of as unstructured data. I help customers visualize their data, and have been doing javascript and D3 and WebGL and a ton of fun stuff streamlining how third-party developers can visualize Splunk data with their favorite visualization tool.
But there are two things that, well... one, it's javascript. My Coffee and Haskell skills are dying on the vine. And two, it's older javascript. I'm not allowed to do anything wild. No Bacon. No FRP. No Node. This thing has to be solid and working and be usable by people who don't have time to master the intricacies of the cutting edge, people who want answers not fanciers (okay, that's a stretch), and whose Javascript experience might just be a couple of O'Reilly books and an API.
Last week I released Tumble, a little hack for parsing Tumblr-like templates and rendering their contents. There's a line in the source code that reads "This code stinks!" And it did. It was the same logic repeated three times: Once in the parser when it was found, once in the parser/contexter API, and a third time in the contexter itself.
My first order of business was to reduce the parser to a single rule. There are two kinds of objects in Tumbler: rendered objects and blocks. A rendered objects looks like {this} and tells the rendered "replace this with whatever this name refers to." Usually, that's just a string from the database; sometimes, though, it can be the result of a function. A block looks like {key:something} ... a bunch of stuff ... {/key:something}. Blocks have three different rules: They can be conditional (render this block if this data is true), descendent (render this block with this other source of data), or iterative (render this block many times, once for each item in this list of data). I stared out with rules like "just text", "variable", "ifblock", "descendblock", "manyblock", and corresponding rules to find {many:foo}{/many:foo}, etc. things.
That turned out to stink. So I reduced the rule to a simple {blocktype:blockname}{/blocktype:blockname} finder rule.
The Contexter would do the right thing with each kind of blocktype, using data refernced by blockname.
The problem was the glue between contexter and parser. I was using the parser to derive the blocktype, then propagating that up to the contexter through a very crude set of manually written code paths. There had to be a better way.
Since I'd thrown out the various block types from the parser, the javascript for initial handling had also been thrown out. I found it in git. It looked like this:
// TODO: Yeah, this code stinks.
conditional = function(t, ps) {
return function(content) {
return content.if(t, function(c) {
return sections(ps, content);
});
}
};
descendant = function(t, ps) {
return function(content) {
return content.descend(t, function(c) {
return sections(ps, content);
});
}
};
You can tell. It's the same damn thing, over and over. And there's a lot of visual clutter there.
I decided to re-write the glue in coffeescript. Here's the whole of it now:
module.exports = (ast, data) ->
context = new Contexter(data)
cmd = (o) ->
switch o.unit
when 'variable' then (context) -> context.get(o.name)
when 'text' then (context) -> o.content
when 'block' then (context) -> context[o.type] o.name, (context) ->
(cmd(p)(context) for p in o.content).join("")
(cmd(o)(context) for o in ast.content).join("")
This is way smaller. And you can see where my Haskell learning has paid off. I'm actually reasoning about the kinds of data, the kinds of closures I want the interpreter to work on as it descends the parse tree. The last line is pure magic. It says "For the current point in the AST for our template, get a function that can handle the current context for our data." And it does. Perfectly. Even more beautiful is the way the 'block' handler automagically repeats that line in order to descend further down the AST tree. That last 'context' in that 'block' handler isn't the current context-- it's whatever context the internal corresponding rule requires, and I don't have to care about what that rule is, because this is just code-as-data. Sweet.
It's been years since I've worked with lex and yacc. Hell, I even worked in Antlr briefly, at F5 Networks. But Haskell, Coffeescript, and PEG have made writing something like this blazingly easy.
I pushed the latest rev up to github at ElfSternberg/Tumble. It's proving to be an interesting experiment.