Monorepos - advantages and disadvantages - feature image

Monorepos – advantages and disadvantages

Originally I wanted to publish the third part of my article series about Static Code Analysis today. However, this part will be more extensive than the first two parts so that I couldn’t finish it in two weeks. Today I would like to share my experience with Git monorepos with you.

There are many articles on this subject so that I will focus in particular on three practical aspects: performance, repository merging, and Jenkins configuration for projects within monorepos.

In this article, I will clarify the following questions:

  • What is a monorepo?
  • What are the advantages and disadvantages of a monorepo?
  • What is the performance of a Git monorepo?
  • How can I merge multiple Git repositories into one monorepo while keeping their history?
  • How do I configure Jenkins to build individual modules of a monorepo?

What is a monorepo?

A monorepo (short for “monolithic repository”) is a repository that contains the source code of several or all projects of a team or company.

Some of the largest Internet companies, such as Google, Facebook, Microsoft, and Twitter, work with monorepos.

At AndroidPIT, we set up a new product in August 2016, which today consists of 25 microservices and 23 libraries, as a Git monorepo. At the beginning all developers were hesitant, but we decided to try it out together as a team. We kept the option open to split the repository again if we were dissatisfied. Two years later – after a mostly positive experience – we decided to merge all other products into monorepos as well.

Advantages and disadvantages of monorepos

Two articles on this topic are worth reading:

In the following, you find the pros and cons, as I experienced them in the programmer’s everyday life, and my view on the widespread criticisms.

What are the advantages of a monorepo?

Reducing effort/costs:

  • Developers only need to check out and update one (or a few) repositories. New projects are created in subdirectories and automatically distributed to the other developers without them having to check out additional repositories.
  • Developers have a consistent status of the overall project at all times – not only in the master but also on branches and old commits. However, with many individual repositories, incompatibilities can occur quickly.
  • Atomic Commits: Related changes to multiple subprojects can be checked in together.
  • Creating and merging branches for changes that involve multiple subprojects is time-consuming if the projects are stored in separate repositories. In a monorepo, you only have to merge once.
  • Handling of merge conflicts that refer to earlier states of the code base is simplified since in a monorepo the complete code base is in a consistent state while handling the conflict. In the case of several repositories, it may be necessary to bring them to a consistent state manually.

Improved maintainability:

  • Code can be moved between the directories of a monorepo much more smoothly than from one repository to another, allowing developers to adjust module boundaries when changing requirements or during refactorings easily.
  • Cross-project documentation can be stored in the root folder of the monorepo.

Organizational:

  • Monorepos result in all developers having access to the entire code base. In most cases, this is desirable, encourages cross-team collaboration, and allows developers to collaborate on code from other teams.
  • Monorepos reduce the configuration effort of the source code management tools.

What are the (alleged) disadvantages of a monorepos?

Performance / scalability:

  • Critics complain that monorepos lead to performance problems. I haven’t noticed any performance degradation in projects of the size most of us work with (see the following section “Performance and scalability of a Git monorepo“).
  • Also, an increased memory requirement is criticized since each developer must check out the entire code base. Memory issues are probably irrelevant in most projects today: The source code of the Linux kernel, for example, occupies about 3.5 GB. The repositories of most companies are probably smaller.
  • Monorepos supposedly make the build pipeline more complex. That’s not my experience either. Using sparse checkouts and polling filters you can easily build subprojects (see section “Jenkins configuration for a Git monorepo“).
  • Allegedly, versioning is getting more complicated. I can’t follow this either. It is possible to version the complete code base (with tags like “v1.0”) as well as single subprojects (with tags like “project-a-v1.0”).

Architecture:

  • Some people believe that a monorepo is an obstacle to clean and effective modularization since developers are not forced to split the code into separate repositories. I see this as an advantage because module boundaries are not carved in stone by monorepos (see “Improved maintainability” above).

Organizational:

  • The aspect listed under the advantages that all developers have access to the entire code base is not desired in all organizations. For example, projects for different customers of a contract developer should not be stored in a monorepo.
  • Also if subprojects are to be released as Open Source components, a monorepo is obstructive.

Performance and scalability of a Git monorepo

Widespread criticism of monorepos is their poor performance and scalability.

From my experience of the last three years, I can say that there should be no performance problems in most teams. In most companies, the entire codebase is probably smaller than the Linux kernel, which is also managed by Git (and for which Git was originally developed).

Our repository at AndroidPIT contains almost 5,000 Java and JavaScript files with about 400,000 lines of code (Linux kernel: 17.9 million lines of code in 47,000 C/C++ files according to cloc). Our repository contains about 500,000 objects (Linux: 6.9 million) and nearly 300,000 deltas (Linux: 5.7 million). Cloning the entire repository takes just under a minute. Switching between branches generally takes less than a second.

Currently, I am working on a project at 1&1 IONOS. The size of their Java backend monorepo is between AndroidPIT’s and that of the Linux kernel. Cloning the entire repository takes less than a minute. Switching branches takes between 0.2 and 1.5 seconds, depending on the age of the branch.

Most of us work in medium-sized companies, so I don’t see any practical problems with the scalability of a monorepos soon.

Merging multiple Git repositories while retaining their history

Let’s say we have two projects, “project-a” (exemplarily created at https://gitlab.com/SvenWoltmann/project-a) and “project-b” (https://gitlab.com/SvenWoltmann/project-b), which we want to combine into a repository “sparse-checkout-demo.” We want to arrange the two projects in subdirectories with the respective project names.

First, we have to check out both projects (if they don’t already exist), create the directory for the monorepo and initialize a Git repository there (alternatively create one via Gitlab and clone it):

git clone [email protected]:SvenWoltmann/project-a.git
git clone [email protected]:SvenWoltmann/project-b.git

mkdir sparse-checkout-demo
git init sparse-checkout-demo

You should see the following output:

Merge Git repositories - step 1
Merging Git repositories – step 1

With the following commands, we merge project A into the monorepo:

cd project-a
git filter-branch -f --prune-empty --tag-name-filter cat --tree-filter '
    mkdir -p project-a
    git ls-tree --name-only $GIT_COMMIT | xargs -I{} mv {} project-a
'
cd ../sparse-checkout-demo
git remote add project-a ../project-a
git fetch project-a
git merge --allow-unrelated-histories project-a/master
git remote rm project-a

You should see the following output:

Merging Git repositories - step 2
Merging Git repositories – step 2

We repeat the same for project B:

cd ../project-b
git filter-branch -f --prune-empty --tag-name-filter cat --tree-filter '
    mkdir -p project-b
    git ls-tree --name-only $GIT_COMMIT | xargs -I{} mv {} project-b
'
cd ../sparse-checkout-demo
git remote add project-b ../project-b
git fetch project-b
git merge --allow-unrelated-histories project-b/master
git remote rm project-b

The following screenshot shows the corresponding output:

Merging Git repositories - step 3
Merging Git repositories – step 3

That’s it. Using dir and git log, you can see that both subprojects and their commits are included in the monorepo:

Merging Git repositories - verification
Merging Git repositories – verification

In this example, the merging went pretty fast. For larger projects, be prepared for the process to take a few hours. At AndroidPIT, when we combined the modules that make up the website, it took about three and a half hours.

Jenkins configuration for a Git monorepo

In this section, I’ll show you how to configure Jenkins to build a subproject of a Git monorepo. First, I use the Jenkins user interface. Then I’ll show you how to write the same Jenkins job as code (using the Jenkins Job DSL Plugin). I always recommend the code variant, because you can create uniform jobs for all your projects automatically and reproducibly with little effort.

At this point, I will limit myself to a rudimentary job that checks out and compiles a Maven project from a subdirectory of the master branch of the monorepo. Additional features like selecting a branch or creating and tagging a release are not monorepo-specific and would go beyond the scope of this article. Instead, I refer to my Jenkins tutorial on build and release jobs.

It should also be mentioned that the tags are global within a repository; therefore we have used the format <project name>-<version number> for tags to release subprojects separately and to associate the tags with the individual projects.

Configuring the Jenkins Job via the user interface

First, I create a new Maven job and name it “Sparse Checkout Demo.” I enter a short description in the “General” tab:

Configuring the Git monorepo in Jenkins - General settings
Configuring the Git monorepo in Jenkins – General settings

Under “Source Code Management,” I select “Git” and specify the repository. The repository used in the example https://gitlab.com/SvenWoltmann/sparse-checkout-demo does exist – you are welcome to use it for testing. As Branch Specifier, I enter “master.”

Under “Additional Behaviours,” I click on “Add” and select “Sparse Checkout paths.” The path is “project-a/,” one of my two project directories in the demo monorepo.

I click “Add” a second time and select “Polling ignores commits in certain paths.” Under “Included Regions,” I enter my project directory again. This setting causes Jenkins to execute the job only if files in the “project-a” directory are changed.

Configuring the Git monorepo in Jenkins - Source Code Management
Configuring the Git monorepo in Jenkins – Source Code Management

Finally, under “Build,” I specify “project-a/pom.xml” as Root POM and “clean install” as Maven goal:

Configuring the Git monorepo in Jenkins - Build
Configuring the Git monorepo in Jenkins – Build

The job configuration is now complete, and the job is ready to run.

Creating the Jenkins job as code (Jenkins Job DSL)

You can create the same job with the Jenkins Job DSL as follows (the Jenkins plugin Job DSL must be installed):

mavenJob('Sparse Checkout Demo') {
    description 'This is a demo for building a project in a sub-directory of a Git Monorepo.'

    def sparseCheckoutPath = 'project-a'

    scm {
        git {
            remote {
                name 'origin'
                url 'https://gitlab.com/SvenWoltmann/sparse-checkout-demo.git'
            }

            branch 'master'

            configure { git ->
                git / 'extensions' / 'hudson.plugins.git.extensions.impl.SparseCheckoutPaths' / 'sparseCheckoutPaths' {
                    'hudson.plugins.git.extensions.impl.SparseCheckoutPath' {
                        path "$sparseCheckoutPath/"
                    }
                }
                git / 'extensions' / 'hudson.plugins.git.extensions.impl.PathRestriction' {
                    includedRegions "$sparseCheckoutPath/.*"
                }
            }
        }
    }

    rootPOM "$sparseCheckoutPath/pom.xml"
            
    goals 'clean install'
}

You have to configure the sparse checkout path and the polling filter via extensions because the Jenkins Job DSL does not natively support these features (the Git plugin adds them).

Conclusion

In this article, I’ve compared the pros and cons of Git monorepos. In my opinion, the pros outweigh the cons, so monorepos are an excellent choice for most teams. The often criticized performance and scalability issues don’t affect the teams of most organizations, and easier refactoring of code across module boundaries increases the maintainability of code.

I’ve shown how to merge repositories while keeping their history and how to configure Jenkins jobs so that you can build subprojects of a monorepo.

If you don’t want to have all projects in a single repository, you can limit monorepos to those projects that are closely interwoven and where common branching and merging make sense. You can also convert your projects into monorepos one by one and, just as easily, run monorepos and single projects side by side.

Did the article help you? Then please leave a comment or share the article using one of the following buttons. I would be really happy about it!

Leave a Comment

Your email address will not be published. Required fields are marked *