Elf Sternberg: Dear "Clean Code" fanboys: SQL is a programming language.

One thing that irks me beyond all reason is Robert Martin’s seething dislike for databases. In every presentation he’s ever given, the one thing he’s sneered at is people who “write their code around a database.” In one of his lectures he says, “I don’t want to see a database in your design. I want to see the objects you’ll use, and I want their names and locations in your project file to reflect how you’ll use them.”

This is probably the lousiest piece of advice he’s ever given. Because let me say this once and simply:

SQL is a programming language, not a storage mechanism.

In one of Uncle Bob’s presentations, he talks a lot about a payroll system, and in his one biggest example he talks about a function that calculates how much someone should get paid for travel expenses. The problem with the function is manyfold; it embodies in compiled code business rules about how much someone is allowed to be reimbursed (literally encoding dollar values into the code), it calculates the difference between expensed costs and reimbursed costs, and it finishes by printing out a report of one an employee’s expenses, with reimbursements and a flag that shows when the amount reimbursed is less than the amount expensed because the amount went over those hard coded maximums.

One of Uncle Bob’s mantras is that “the design of a system, and the layout of its code, should reflect the purpose and architecture of its domain, not its framework.” In short, a system about Payroll should talk about Payroll.

So take a look at the number one, top example on Google of Martin’s Payroll demo. Specifically, scroll down to the section labeled “My Implementation.”

I see “entities,” “use_cases,” “adapters,” “controllers,” “views,” “boundaries,” and so forth. You know what I don’t see?

I don’t see Payroll mentioned anywhere.

I see a framework. A framework in the dictionary sense: “a basic structure underlying a system, concept, or text.” That is exactly what Clean Code is. This example is in the “Uncle Bob’s Framework,” because it exists only in the minds of Uncle Bob’s minions, and is imposed ad-hoc by his loyal followers..

The Travel Expenses example is a precious one, because it exposes a lot of the tedious assumptions within Clean Code. The number one assumption is that, if you can encapsulate all of the functionality of a subsystem within a class declaration, this is somehow “better” than if the functionality is distributed across a couple of different files.

As I said in my previous example, an awful lot of code is nothing more than formatting data into human-readable results. So let me set out my thesis simply:

The difference between class Expenses { ... } and CREATE TABLE Expenses ( ... ) exists only in the minds of Clean Code fans.

Let's look closely at Uncle Bob's Travel Expenses example. In detail, the problem set is this:

Given a list of categorized expenses made by a user during a period of time, write a function that prints out a list of those expenses along with the category and type, the amount reimbursed, and highlighting those expenses which went over the maximum reimbursed amount.

So, we need the following objects in our system: User, Expense Category, Expense Type, and Expenses. And we can formulate this in the following way: “Expense types have an Expense category; Expenses have an Expense Type; Users have Expenses.” You can even diagram that out, but it’s the most basic Entity Relationship Diagram in the world.

What would that look like in SQL? Well:

create table users (
  id int primary key generated by default as identity,
  username text not null);

create table expense_categories (
  id int primary key generated by default as identity,
  expense_category text not null);

create table expense_types (
  id int primary key generated by default as identity,
  expense_type text not null,
  maximum money,
  category_id int references expense_categories);

create table expenses (
  id int primary key generated by default as identity,
  user_id int references users,
  event timestamp not null,
  type_id int references expense_types,
  amount money not null
);

Now, just look at the amount of domain-driven knowledge we get for free! All because we chose a language that has the relational algebra and its set-theoretic basis at its heart, we get: guaranteed uniqueness of users, guaranteed uniqueness of expenses, categories, and types; guaranteed non-empty strings for documentation purposes, and we get a built-in money type that’s absent from just about all damned languages. (Haskell and C# have money types, just to cut you off if you’re going to “actually…” at me.) No references to users can exist that point to non-existent users. An expense that references a non-existent expense type can never happen, and the expense types can be updated without having to recompile an entire module. In a general-purpose language like Java or C#, all of those guarantees would require hand-coding, and the writing of tests, and the checking of assertions. One of the best things about modern languages like Haskell or SQL, or like well-written Rust, is that not only do you get those guarantees for free, you can't even write tests for those guarantees because violations are impossible to generate in those languages.

A system like this requires a lot more to keep track of the changes made to the expense accounting system, but the history of changes and decisions made about how expenses are to be tracked and monitored is better done within the system, rather than being some developer’s casual annotation in the body of some version control system somewhere.

Given that this is what we have (and it emulates Uncle Bob’s code exactly), what would the expense report function look like? We have these relationships that we would like to assemble together into a report. That report is simple:

CREATE VIEW expense_report AS SELECT
  users.id AS user_id,
  expense_types.id AS expense_type_id,
  expense_categories.id AS expense_categories_id,
  username,
  event,
  expense_type,
  expense_category,
  amount,
  maximum,
  max_highlight(amount, maximum) AS over,
  max_reimbursed(amount, maximum) AS reimbursed
FROM expenses
   INNER JOIN expense_types
     ON type_id = expense_types.id
   LEFT JOIN expense_categories
     ON category_id = expense_categories.id
   JOIN users
     ON user_id = users.id;

A couple of things of note here: the two CASE statements are meant to fulfill strictly illustrative features of the report: if the maximum was exceeded, show that only the maximum will be reimbursed, and put a star next to that value. These two functions are independent of the SELECT and could be isolated into their own PSQL functions, and also note that they are independent of the values on which they operate: they do not need to be recompiled, or even touched, if the company decides to change the maximum values of a reimbursement.

Also note that this is a VIEW, that is, a separate function that only runs if we trigger it; this table does not exist, and the contents are only generated on-demand.

Want to know the expenses for a given user?

SELECT
	username,
	event,
	expense_type,
	expense_category,
	amount,
	maximum,
	reimbursed,
	over
FROM expense_report
WHERE user_id = 1
ORDER BY event;

And then there’s that expense summary:

WITH expense_summary AS
(
  SELECT
    user_id,
    username,
    expense_category,
    SUM(amount) AS total_amount,
    SUM(reimbursed) AS total_reimbursed
  FROM
    expense_report
  GROUP BY
    user_id,
    username,
    expense_category)
SELECT
  username, expense_category, total_amount, total_reimbursed
FROM
  expense_summary 
WHERE
 user_id = 1 AND
 total_reimbursed > '$0.0';

And that’s the whole thing. Four statements describing the shape of objects, and three functions describing how those objects should be processed to make it easy for humans to understand them. That’s all that matters.

A completely runnable version of this code, for Postgres, is available at Github. And the output is pretty!

username  |        event        | expense_type | expense_category | amount  | maximum | reimbursed | over 
----------+---------------------+--------------+------------------+---------+---------+------------+------
Uncle Bob | 2020-01-08 04:05:06 | Dinner       | Dining           |  $45.00 |  $50.00 |     $45.00 |  
Uncle Bob | 2020-01-09 09:05:06 | Breakfast    | Dining           |  $12.00 |  $10.00 |     $10.00 | *
Uncle Bob | 2020-01-09 16:05:06 | Air Travel   | Travel           | $250.00 |         |    $250.00 |  
Uncle Bob | 2020-01-09 19:05:06 | Taxi         | Travel           |  $27.00 |         |     $27.00 |  
Uncle Bob | 2020-01-09 21:05:06 | Dinner       | Dining           |  $53.00 |  $50.00 |     $50.00 | *
Uncle Bob | 2020-01-09 22:05:06 | Other        | Other            | $127.00 |   $0.00 |      $0.00 | *

I’ve met all of Bob’s requirements: every object has a single responsibility, and the messages it takes reflect that reality. Every object is OPEN for extension, but CLOSED for modifiability. We’ve avoided subtypes, going for composition instead. Our interfaces are appropriately segregated, and we are completely dependent upon the interfaces here, not the implementations.

We don’t even know what the implementations are. And we don’t care.

So we’re SOLID, as far as SOLID goes. The ‘D’ in SOLID is a deceit: interfaces are language-specific, and to claim that C++, or C#, or Java, or whatever languages, has “more pure interfaces” than SQL is to buy into that deceit. My UML is the only truly abstract description of the system, and SQL implements the full interface described in the UML because, again, SQL is a programming language and not a storage mechanism.

Use it as such.

Elf M. Sternberg

Full Stack Web Developer

Where one teaches, two learn.

Blog

DEAR "CLEAN CODE" FANBOYS: SQL IS A PROGRAMMING LANGUAGE.