Elf Sternberg: Monorepos with Microsoft Lage: Learnings This Week

I recently chose Lage to help refactor a monolith web application into a monorepo. Here's what I learned.

In my day job, I'm responsible for a fairly large web application of about 52,600 lines of TypeScript. In reality, that one web application is four independent major single-page applications and two minor ones, all of which rest on a classic collection of component library, server interface, and localization. It's getting a bit unwieldy to work in, and any edit cycle now takes about 25 seconds to rebuild the whole thing while working on it.

That's just a little too long in the era of hot module reloading. Plus it's hard to know where the boundaries are, and the guy I inherited it from cut a few corners here and there, stealing useful code across the hierarchy rather than re-arranging the hierarchy. I managed to disentangle the first-level dependencies a few months ago, so that no SPA is dependent upon something within another, and now the time has come to break it up into multiple workspaces. After looking through the list of possible tools, I picked Lage.

Lage

Lage is a "dependency graph-based taskrunner." And that's all it is. It's a tool for running all of the tasks necessary to build, test, and maintain a web application monorepo. Compared to many other monorepo management tools, Lage is straightforward: it cares only about a collection of tuples of (package, script) and how each task leads to a finished product. It doesn't supply it's own version of Storybook, doesn't demand you use Yarn or pnmp, doesn't require you use its internal code generators. It just cares about your dependency graph and the actions you want to perform on that graph.

If you want to make sure your code is consistent, you can always use manypkg. If you want code generators, JavaScript templates literals or mustache are right there.

The problem, though, is that Lage is poorly documented. So let's start with the conceptual basics of working with Lage.

A (JavaScript or TypeScript) monorepo consists of two parts: the root and the packages. The root contains a package.json file that describes the workspaces of your project. Inside each workspace there is a separate JavaScript or TypeScript package, with it's own package.json file.

A monorepo has tasks: to test the product, to build the product, to run a demo of the product, to lint the product, to automatically (if possible) fix minor linting and formatting errors, to localize the product. Some of these are dependent upon others; if working in TypeScript, you can't actually work with the code until it's translated into JavaScript. The context-aware applications can't be built until both the context shell and the underlying component library have been built. We may also have to extract text to be localized or convert Sass to a .css.ts format.

So we've identified some tasks:

build
watch
lint
fix
style
precommit (a combination of lint and fix)
test
storybook
i18n-extract (extract strings to be translated)
i18n-supply (supply translated strings for the application(s))

There are some task/task dependencies. For example, the most complicated task is precommit: we want to run the linters in parallel, but run the fixes in sequence at the end with prettier --write coming last.

And when we say that precommit is a complicated task, we mean that it's five tasks in one:

tsc (type-checking)
lit-analyze (check that the JS -> DOM -> JS types are correct, which tsc can't do)
codespell (check spelling in comments and documentation)
eslint --fix (standard linting)
prettier --write (code formatting; this always goes last to make sure the code pushed meets project standards)

Which in turn means that we have these independent tasks:

check:types
check:dom
check:spelling
check:code
check:format
fix:code
fix:format

There are also some impossible-to-resolve circular dependencies: i18n-extract should be run whenever one of its dependencies changes, but its dependency tree is every other package, which we don't want to worry about. We will have to rely on the developer to run i18n-extract by hand whenever new translatable strings are added to a package, since updating the localization files will result in, to use the venacular, rebuilding the world.

Once you've identified this collection of tasks, you want to configure your task runner to run them in topological order with caching. "Topological order" means that if B depends on A, A will be built before B needs it. "With caching" means that, if both packages are already built, if you changed B but not A, A will not be rebuilt at all. This also means that if D depends on A, B, and C, and B and C depend on A but not on each other, B and C can be built simultaneously.

A Lage Project Configuration

A Lage-run project is defined by three things:

o1. The presence of a workspaces clause in your root package.json, listing the packages that make up your monorepo. 2. A lage.config.js file in the project root, and calling lage from the command line. 3. The dependency graph of your packages.

Here's what the default lage.config.js looks like:

module.exports = {
  pipeline: {
    build: ["^build"],
    test: ["build"],
    lint: []
  }
};

It took me forever to figure out what this means, because the documentation, frankly, sucks. Each line is a "pipeline," meaning that when it's done running something will be produced at the end.

First, Lage looks in your root package.json file for the list of workspaces. For each workspace, it reads three things: the name entry, the scripts, and the dependencies. From the name and the scripts, it creates an internal list of (project name, script entry), and from the name and the dependencies, it creates the dependency graph, which it uses to make sure things get built in the right order.

The build pipeline in the example above has the dependency syntax of ^build. This means "run every (*, "build") script in the workspaces in dependency order." This is a subtype of dependency order, artifact dependency, because you need the artifact "a built widget library" before you can build your application. If, instead, you had written only "build", or worse, left the list empty, every package's build would be started up simultaneously with as many cores as you had and dependencies would be ignored.

The test pipeline has only build. This means "Before you run test, you must run the build pipeline." This is a task dependency rather than an artifact dependency because Lage can't read minds: it can't know that you need to build something before you can test it. This also means that every workspace's (*, "test") task will be run in parallel and in no particular order after the build is complete.

If a package has no test task, Lage will just ignore it. Not every package will have the full range of tasks. After all, textless widgets don't need internationalization and documentation rarely needs to be type-checked.

Normally, you would put these sorts of rules into each project's package.json almost instinctively. Lage wants you to move that thinking into Lage itself. It might be wise to have both, allowing you to work inside a single workspace without needing Lage for everything at first, but eventually you should hand over everything to Lage and let it orchestrate.

The other thing that took me forever is how do you define the topological dependencies? It's been a few years since I last had to maintain a monorepo, and I remembered the tooling being a nightmare. But now it's straightforward: if A and B are both packages in your workspace and B depends on A, then in ./packages/B/package.json, you have to add A to the dependencies:

  "dependencies": {
    "A": "*"
  ...
}

That's it. Now Lage knows (via ./package.json:workspaces) that A is a local package, and via ./packages/B/package.json that A is a dependency of B. Lage will now build them in the right order.

I guess if you've been working with monorepos for awhile, knowing that "A and B are workspaces" and "A is a requirement of B" are common knowledge, but they weren't for me and the Lage documentation does an awful job of helping you through that step.

Parallel and sequential

I said earlier that I'd identified several lint and fix options that I wanted to run in a specific order. I'm not running React, I'm running Lit, and Lit comes with it owns analyzer as well as eslint.

In lage.config.js, therefore, I'm going to define the following tasks:

module.exports = {
  pipeline: {
    "lint:codespell": [],
    "lint:type-check": [],
    "lint:lit-analyze": [],
    "fix:eslint": [],
    "fix:prettier": [],
    "lint:standard": ["lint:codespell", "lint:type-check", "lint:lit-analyze"],
    "fix:standard": ["^fix:eslint", "^fix:prettier"],
    "lint:precommit": ["^lint:standard", "^fix:standard"]
  }
};

What this means, essentially, is that if you call lint:precommit, it can run all the lint:* elements on all the projects, in parallel. (Note that this might not be correct; the type-check pass, for example, may depend on your code building type files from other packages. I have only a few packages so far, so I haven't had a problem yet.) Once that's done, it will run the two fixes which modify the source code in place in sequence.

Conclusion

Lage is a Microsoft project. I don't know how well-backed or well-supported it is. It seems to be less popular than the tools that "do it all" for you. Compared to all the other taskrunners out there, it follows the Unix philosophy that a program should do just one thing and it do it well. It's ironic that a project like this comes out of Microsoft, but I hope Microsoft continues to back it.

I'd like to thank Nicolaus Boulaine for reminding me I have a duty to document any hard-won knowledge. Consider this today's duty done.

Elf M. Sternberg

Full Stack Web Developer

Where one teaches, two learn.

Blog

MONOREPOS WITH MICROSOFT LAGE: LEARNINGS THIS WEEK

Lage

A Lage Project Configuration

Parallel and sequential

Conclusion