Articles about monorepos are numerous. So in this article, I will focus in particular on the practical aspects: performance and merging of existing repositories.
This article will answer the following questions in detail:
- What is a monorepo?
- What are the advantages and disadvantages of a monorepo?
- What is the performance of a Git monorepo?
- How to merge multiple Git repositories into one monorepo while keeping their history?
What is a Monorepo?
A monorepo (short for "monolithic repository") is a repository that contains the source code of several or all projects of a team or company.
Some of the largest Internet companies, such as Google, Facebook, Microsoft, and Twitter, work with monorepos.
Why Do I Know About Monorepos?
At AndroidPIT, we set up a new product in August 2016, consisting of 25 microservices and 23 libraries, as a Git monorepo. In the beginning, all developers were hesitant, but our team decided to try it together. We kept the option open to split the repository again if we were dissatisfied. Two years later – after a mostly positive experience – we decided to merge all other products into monorepos, too.
Advantages and Disadvantages of Monorepos
Two articles on this topic are worth reading:
- Monorepos: Please don’t! – an appeal against monorepos
- Monorepos and the Fallacy of Scale – a defense of monorepos in direct response to the first article
In the following, you find the pros and cons, as I experienced them in the programmer's everyday life, and my view on the widespread criticisms.
What Are the Advantages of a Monorepo?
Reducing effort/costs:
- Fewer repositories:
Developers only need to check out and update one (or a few) repositories. New projects are created in subdirectories and automatically distributed to the other developers without them having to check out additional repos. - Consistency of the overall project:
Developers have a consistent status of the overall project at all times – not only on the main branch but also on feature branches and old commits. On the other hand, with many individual repositories, incompatibilities can occur quickly. - Atomic Commits:
Developers can commit related changes to multiple subprojects together. - Easier merging:
Creating and merging branches for changes that involve multiple subprojects is time-consuming if the projects are stored in separate repositories. In a monorepo, you only have to merge once. - Easier resolution of merge conflicts:
Handling of merge conflicts is simplified since, in a monorepo, the complete code base is in a consistent state while handling the conflict. In the case of several repositories, it may be necessary to bring them to a consistent state manually.
Improved maintainability:
- Moving code:
Code can be moved between the directories of a monorepo much more smoothly than from one repository to another, allowing developers to adjust module boundaries when changing requirements or during refactorings easily. - Central documentation:
You can store cross-project documentation in the root folder of the monorepo.
Organizational:
- Transparency:
Monorepos result in all developers having access to the entire code base. In most cases, this is desirable, encourages cross-team collaboration, and allows developers to collaborate on code from other teams. - Reduced configuration:
Monorepos reduce the configuration effort of the source code management tools.
What Are the Disadvantages of a Monorepos?
Performance / scalability:
- Performance problems:
Critics complain that monorepos lead to performance problems. I haven't noticed any performance degradation in projects of the size most of us work with (see the following section, "Performance and Scalability of a Git Monorepo"). - Increased memory requirements:
Also, increased memory requirement is criticized since each developer must clone the entire code base. Memory issues are probably irrelevant in most projects nowadays: The source code of the Linux kernel, for example, occupies about 3.5 GB. The repositories of most companies are probably smaller. - More complex builds:
Monorepos supposedly make the build pipeline more complex. That's not my experience either. You can easily build subprojects using sparse checkouts and polling filters. - Complicated versioning:
Allegedly, versioning is getting more complicated. I can't follow this either. It is possible to version the complete code base (with tags like "v1.0") as well as single subprojects (with tags like "project-a-v1.0").
Architecture:
- Poorer modularization:
Some people believe that a monorepo is an obstacle to clean and effective modularization since developers are not forced to split the code into separate repositories. This is an advantage because module boundaries are not carved in stone by monorepos (see "Improved maintainability" above).
Organizational:
- Transparency:
The aspect listed under "advantages" that all developers have access to the entire code base is not desired in all organizations. For example, you should not store projects for different customers of a contract developer in a monorepo - Open-source libraries:
If subprojects are to be released as Open Source components, a monorepo is obstructive.
Performance and Scalability of a Git Monorepo
Widespread criticism of monorepos is their poor performance and scalability.
From my experience of the last three years, I can say that there should be no performance problems in most teams. In most companies, the entire codebase is probably smaller than the Linux kernel (17.9 million lines of code in 47,000 C/C++ files according to cloc; and 5.7 million deltas), which is also managed by Git (and for which Git was originally developed).
Currently, I am working on a project at IONOS. The Java backend monorepo is smaller than the Linux kernel but has a comparable order of magnitude. Cloning the entire repository takes less than a minute, and switching branches takes between 0.2 and 1.5 seconds, depending on the age of the branch.
Most of us work in medium-sized companies, so I don't see any practical problems with the scalability of a monorepos soon.
Merging Multiple Git Repositories While Retaining Their History
Let's say we have two projects, "project-a" (exemplarily created at https://github.com/SvenWoltmann/project-a) and "project-b" (https://github.com/SvenWoltmann/project-b), which we want to combine into a repository "sparse-checkout-demo." We want to arrange the two projects in subdirectories with the respective project names.
First, we have to check out both projects (if they don't already exist), create the directory for the monorepo, and initialize a Git repository there (alternatively create one via Gitlab and clone it):
git clone git@github.com:SvenWoltmann/project-a.git
git clone git@github.com:SvenWoltmann/project-b.git
mkdir sparse-checkout-demo
git init sparse-checkout-demo
Code language: plaintext (plaintext)
You should see the following output:
With the following commands, we merge project A into the monorepo:
cd project-a
git filter-branch -f --prune-empty --tag-name-filter cat --tree-filter '
mkdir -p project-a
git ls-tree --name-only $GIT_COMMIT | xargs -I{} mv {} project-a
'
cd ../sparse-checkout-demo
git remote add project-a ../project-a
git fetch project-a
git merge --allow-unrelated-histories project-a/main
git remote rm project-a
Code language: plaintext (plaintext)
You should see the following output:
We repeat the same for project B:
cd ../project-b
git filter-branch -f --prune-empty --tag-name-filter cat --tree-filter '
mkdir -p project-b
git ls-tree --name-only $GIT_COMMIT | xargs -I{} mv {} project-b
'
cd ../sparse-checkout-demo
git remote add project-b ../project-b
git fetch project-b
git merge --allow-unrelated-histories project-b/main
git remote rm project-b
Code language: plaintext (plaintext)
The following screenshot shows the corresponding output:
That's it!
Using dir
and git log
, you can see that both subprojects and their commits are included in the monorepo:
In this example, the merging went pretty fast. For larger projects, be prepared for the process to take a few hours. At AndroidPIT, when we combined the modules that make up the website, it took about three and a half hours.
Summary
In this article, I've compared the pros and cons of Git monorepos. In my opinion, the pros outweigh the cons, so monorepos are an excellent choice for most teams. The often criticized performance and scalability issues don't affect the teams of most organizations, and easier refactoring of code across module boundaries increases the maintainability of code.
I have shown, step by step, how to merge existing repositories while preserving their history.
If you don't want to have all projects in a single repository, you can limit monorepos to those projects that are closely interwoven and where shared branching and merging make sense. You can also convert your projects into monorepos one by one and, just as easily, run monorepos and single projects side by side.
If you liked this article, feel free to share it using one of the share buttons, and leave me a comment: What is your experience with monorepos? Do you share my assessment of the pros and cons?
Do you want to be informed when the next article is published on HappyCoders.eu? Then click here to subscribe to the HappyCoders newsletter.