Understanding monorepos

Tags:
  • technology
  • javascript

What is a monorepo?

A monorepo is a way of organizing your codebase where all the code for an entire organization or project is stored in a single repository rather than creating multiple repositories for each project. This repository is then split into modules, each of which can be independently versioned and published. So to have a functional monorepo you need to have a way to manage the dependencies between the modules and a way to build and test the code. So the following tools are required:

  1. Version control system: A way to manage the code. (e.g. Git)
  2. Dependency management: A way to manage the dependencies between the modules. (e.g. NPM, Yarn)
  3. Build system: A way to build the code. (e.g. Nx, NPM Scripts)
  4. Testing framework: A way to test the code. (e.g. Jest)

Fundamental concepts of a monorepo

1. Modules

A monorepo is split into modules. A module is a self-contained piece of code that can be independent of the rest of the codebase. While independent means that it can be developed, tested, and deployed independently, it doesn't mean that it is standalone. It can still depend on other modules. It can be a library, a service, or an application. It can also be written in different programming languages like Python, Go or Rust.

For example, a real world monorepo could potentially contain the following modules:

ModulePurposeTool / Technology
@module/coreCore functionalityNode
@module/commonCommon functionalityNode

2. Dependencies

A module can depend on other modules. This is how the code is shared between the modules. For example, the @module/core module can depend on the @module/common module or vice versa. This is how you can avoid duplicating code and keep the code DRY (we will talk about this later).

3. Versioning

Each module can be independently versioned. This means that you can release a new version of a module without having to release a new version of the entire codebase. This is especially useful for larger organizations with many projects split across different teams. It allows each team to manage their own modules and release new versions independently. Large enterprises like Google, Facebook, and Microsoft use monorepos along with other specialized tools to manage their codebase.

Is a monorepo only for JavaScript?

Nope, a monorepo is not only for JavaScript. It can be used for any programming language. Other programming laguages support monorepos as well. For this article, we will focus on JavaScript since it's what I know best.

Benefits of a monorepo

1. Code sharing

A monorepo allows you to share code between modules. This means that you can avoid duplicating code and keep the code DRY.

DRY (Don't Repeat Yourself) is a software development principle that states that every piece of knowledge must have a single, unambiguous, authoritative representation within a system.

Copy and pasting is known to be a bad practice in software development. It can lead to bugs and inconsistencies. A monorepo allows you to avoid this by sharing code between modules. Another good thing is that when getting into the modular mindset, you start to think about how to make your code more modular and reusable. This can lead to better code quality and more maintainable code.

2. Simplified dependency management

A monorepo allows you to manage the dependencies easily because all the code is in a single repository. This means that you can avoid the complexity of managing multiple repositories and their dependencies.

In JavaScript, you can use a tool like Nx or Yarn Workspaces to manage the dependencies between the modules. In my opinion, Nx is the most powerful tool for managing monorepos in JavaScript but you can use any tool that fits your needs. Typically I just use NPM scripts for small monorepos and Nx for larger monorepos (like the one I work on at Sitecore).

3. Simplified build and test process

If packages are in separate repositories, you need to build and test each package separately. This can be time-consuming and error-prone. A monorepo allows you to build and test the entire codebase with a single command. This is especially useful when you have a large codebase with many modules that usually depend on each other. You don't want to make a change in one module and then have to build and test the entire codebase to make sure that the change didn't break anything.

4. Tooling consistency

A monorepo allows you to use the same tooling across all the modules. This means that you can avoid the complexity of managing different tools for different repositories. For example you can use the same linter, formatter, and testing framework across all the modules. This can lead to a more consistent codebase and a more consistent developer experience. Consistent codebases are easier to maintain and easier to onboard new team members.

5. Improved collaboration

Monorepos facilitate collaboration among developers by providing a single source of truth for all code. Team members can easily navigate between projects, review changes, and collaborate on shared components more effectively. Yes, sometimes you can get into some git conflicts but this depends on the architecture of the monorepo and the way each team works.

6. Simplified onboarding

A monorepo allows new team members to get up to speed more quickly because they only need to learn one codebase. This can be especially useful for large organizations with many projects. I cannot emphasize enough how important this is. It can save a lot of time and effort for both the new team members and the existing team members. Nobody wants to spend a whole day setting up their development environment (installing dependencies, configuring the build system, etc.) when they could be writing code.

Drawbacks of a monorepo

1. Complexity

A monorepo can be complex to set up and maintain. You need to have a good understanding of the tools and technologies required to manage a monorepo. For this reason it is important to get the right tooling and consult with experienced developers who have worked with monorepos before.

Shameless plug: You can consult with me if you want, I have worked with monorepos before and I can help you set up and maintain a monorepo. Book a call with me here.

2. Performance challenges

Large monorepos may suffer from performance issues, such as slower clone times, longer CI/CD build times, and increased memory usage for development tools. These performance challenges can impact developer productivity and increase infrastructure costs. Nx has capitalized on this and has built a tool that can help you manage the performance of your monorepo. It's called Nx Cloud. It's a paid service but it's worth it if you have a large monorepo.

3. Risk of monolithic architecture

Monorepos can potentially lead to tightly coupled dependencies between projects, similar to a monolithic architecture. Changes to shared components or libraries within the monorepo may have unintended consequences across multiple projects, making it more challenging to isolate and manage changes. For this reason, it's important to establish clear guidelines and best practices for managing dependencies and versioning within the monorepo. Also some tests wouldn't hurt to make sure that the changes you make don't break anything.

Is a monorepo right for you?

Depending on who you are and what you want to build, a monorepo can be a good idea or not.

Who you areWhat you want to buildIs it a good idea?
FreelancerWebsite landing pageProbably not
Developer for big companyContribute to a specific part in a companyProbably yes
Junior DeveloperSomething like Facebook, but betterProbably not
Freelancer5 Admin panels for 5 clients that look & function similarProbably yes

Diving deeper into JavaScript monorepos

By the way, you can skip this section if you are here for the basics. But if you want to dive deeper into JavaScript monorepos, keep reading.

You have many options for starting a monorepo in JavaScript, including:

  1. Nx: Nx is a set of extensible dev tools for monorepos. It is a great tool for managing monorepos in JavaScript.
  2. Yarn Workspaces: Yarn Workspaces is a feature of Yarn that allows you to manage multiple packages in a single repository.
  3. Lerna: Lerna is a tool for managing JavaScript projects with multiple packages. It can optimize the workflow around managing multi-package repositories with git and npm.
  4. PNPM: PNPM is a fast, disk space efficient package manager for JavaScript.

But for the purpose of this article, we will focus on pure JavaScript and NPM scripts. (Yes we will not use any tool mentioned above)

1. Setting up a monorepo with pure JavaScript and NPM scripts

Imagine we have two modules, @module/a and @module/b, and they are part of a monorepo.

You can initialize a monorepo by creating a new directory and running npm init to create a package.json file.

mkdir monorepo && cd monorepo && npm init -y

Then you can create the modules by creating a directory for each module and running npm init in each directory to create a package.json file.

mkdir -p packages/a packages/b && cd packages/a && npm init -y && cd ../b && npm init -y && cd ../..

IMPORTANT please ensure that each module has a package.json file with a type field that is set to module

Then modify package.json file in the root directory and add the following:

{
  "name": "monorepo",
  "private": true,
  "packages": [
    "packages/*"
  ],
}

Next, inside each module, you can create an index.js file by using the following command in the root directory:

echo "console.log('Hello from module A');" > packages/a/index.js && echo "console.log('Hello from module B');" > packages/b/index.js

Then you can add the following NPM scripts to the package.json file in the root directory:

{
  "scripts": {
    "start:a": "node packages/a/index.js",
    "start:b": "node packages/b/index.js"
  }
}

Now you can run the following commands to start each module:

npm run start:a

The output should be:

# Output:
> [email protected] start:b
> node packages/a/index.js

Hello from module A

Now you can run the following command to start the b module:

npm run start:b

The output should be:

> [email protected] start:b
> node packages/b/index.js

Hello from module B

2. Managing dependencies between modules

As we mentioned, modules can depend on other modules but how do we manage the dependencies between the modules?

You can use the npm link command to create a symbolic link between the modules. This allows you to use the code from one module in another module.

Symbolic links are a type of file that contains a reference to another file or directory in the form of an absolute or relative path and that affects pathname resolution.

For example, you can run the following command in the packages/a directory to create a symbolic link to the packages/b directory:

cd packages/a && npm link ../b

Then you can modify the index.js file in the packages/a directory to use the code from the packages/b directory:

// packages/a/index.js
import 'b';
console.log('Hello from module A');

Then you can run the following command to start the packages/a module:

npm run start:a

As you see, you just imported the b module in the a module and it worked. This is how you can manage the dependencies between the modules. Also if you take a closer look into node_modules directory you will see that the b module is not installed there. This is because it's a symbolic link.

3. Hoisting dependencies

Assuming you need a package called cowsay just in package a you can normally go inside the a directory and run npm install cowsay. You will notice that the cowsay package is installed in the root node_modules directory and not in the a directory. This is called hoisting.

This is because NPM hoists the dependencies to the root node_modules directory to avoid installing the same package multiple times. This can save disk space and reduce the installation time.

If you want to disable hoisting for a specific package, you cannot do it with NPM scripts. See this issue for more information. I never had a serious reason to use nohoist but if you want to use it, you can use Yarn Workspaces, Nx or pnpm which support nohoist out of the box. Don't worry about it too much, it's not a big deal, just something to keep in mind.

Conclusion

Depending on who you are and what you want to build, a monorepo can be a good idea or not. It can be a good idea if you are a developer for a big company and you want to contribute to a specific part in a company. It can also be a good idea if you are a freelancer and you want to build software for different clients that look and function in a similar manner, where you could share code between the modules but keep the business logic separate.

For my personal workflows (outside of the big organization I work for), 90% of the time, a monorepo is not needed. I never had a use case for it. I used it once, and it was definitely overkill.

The saying "if you need it, you will know" is true for monorepos.

At Sitecore though, we use a monorepo for our Search project and it works great for us. We have a lot of shared code between the different parts of the project and it's very easy to manage the dependencies between the modules. Credits to the team that set it up, it's a great monorepo and I have learned a lot from it.

I hope this article helped you understand the concept of monorepos and its benefits. If you have any questions or need help setting up a monorepo, feel free to reach out to me. I would be happy to help you.

References