Elf Sternberg: There are no good naming conventions.

The other day I was working on a DSL (Domain Specific Language), for which I had a complete AST (Abstract Syntax Tree). The AST used nested arrays to represent the tree structure, and each node of the tree structure was itself an array, with positions in the array representing the array type and payload, the second element of which could be a leaf value or an array of more nodes. I was working in a dynamic language (and, given my description, you may be able to recognize which one), so I had no type safety; arrays were arrays, and interpreting whether one meant "part of the tree" or "part of a node" was entirely segmented only in the developer's brain, and not by any syntax checking compilation pass.

As a convention, the entry point for a function that determines what any given AST "means" has the following signature:

result = evaluate(expression, environment)

Pretty much since 1959, every interpreter that reads and processes a DSL has a function with those four variable names, "result," "evaluate," "expression," and "environment." The expression is the thing you're about to interpret; the environment is the current scope, local and global variables, the context in which the expressing is going to be interpreted and evaluated. I've seen this signature over and over, in every book on compilers and interpreters I've ever read.

Because it's a convention, I thought nothing of it. Because a list of expressions is itself an expression, the variable name for either state was expression. Because of the weird nested arrays thing (totally not my fault; I blame Christian Queinnec), when there was a bug in my program the state dumps were unreadable and hand-tracing the excution was mind-boggling. I was getting lost in "What does this mean?" over and over. It's a fairly small interpreter, so getting lost in that little code was a bad sign.

There is a principle in Python that a public variable name should reflect usage and not implementation. But the more I struggled with the issue, the more I realized that distinguishing between the two implemented types made my eyes cross, and if I changed the names to distinguish which was which I could make sense of what I was reading. I changed "expression" to "node," and where I was managing a child collection I changed it to "nodes".

That made all the difference. I was now able to see which array was of what, and was able to debug the broken operation.

I then went and changed "node" and "nodes" back to "expression" and "expressions". Because that was a historical convention, I felt obliged to honor it, and if someone else encountered my code I wanted them to see the "what" and not the "how."

This led me to three important realizations.

  1. There is no such thing as a good naming convention. There are only good naming _habits_. When implementing a well-known algorithm (in this case, the classical interpreter phase "reduction of expressions"), it may make sense to follow historical examples to continue tradition. But more importantly, it is important to name things clearly. Naming things is supposedly one of the [two hardest problems](http://martinfowler.com/bliki/TwoHardThings.html) in computer science. Work hard on good names.
  2. Change your names to reflect your current issue. If you're implementing something difficult, feel free to use [Hungarian Notation](http://en.wikipedia.org/wiki/Hungarian_notation) if it gets you through the problem.
  3. Don't leave it that way. Having done whatever was necessary to get the code done, now go back and _change the names back_ something that reflects both the public API and the future maintainability of your code.

Elf M. Sternberg

Full Stack Web Developer

Where one teaches, two learn.

Blog

THERE ARE NO GOOD NAMING CONVENTIONS.