Hacker Read

sseveran · 2017-05-24 17:02:53+00:00

If you have many small repos for a large interconnected project you simply move the complexity of managing a commit that requires changes into another tool that can manage cross repo changes and dependencies. With a single repo you can change something and build it, fix any breaks and then commit it with just source source control and build system. The many small repos has in my experience been driven by either poor processes or tooling limitations.

baq | karma 14757 | avg karma 2.44 · | 2024-04-02 10:06:27

a hundred small repos are a different kind of nightmare.

the truth is, having a large code base is just hard no matter which way you handle it. you'll end up with custom repo tooling for the monorepo or blown up CI/CD infrastructure for many small repos either way. complexity will be conserved; it can be transferred, but can't be removed.

reply

Waterluvian | karma 45068 | avg karma 5.19 · | 2019-05-01 12:28:43+00:00

I like your comment about commits spanning components. Having many repos forces my team to think about how to stage changes that span many repos.

wtbob | karma 6614 | avg karma 3.13 · | 2015-08-05 03:16:56+00:00

I've worked with monolithic and project-based repositories, and in practice I've found the problems with monolithic repos are less than the problems with project-based repos, and the benefits of monolithic repos are more than the benefits of project-based repos. Certainly there can be issues at a very large organisation with very many extremely large projects—but most of us don't work at those organisations with that many projects that large.

I think that having one large repo helps identify cross-project dependency breakage faster, e.g. on a small team without fully-automated integration tests by increasing the likelihood that the person or team who broke the integration notices rather than the person or team who maintains the affected components.

There's also the issue, as jacques_chester notes, of shared components, some of which are far too small too small the be their own projects and which don't necessarily make sense thrown into a pile with other projects.

Project-specific repos make a lot of sense from an organised, a-place-for-everything-and-everything-in-its-place perspective, but real life is often quite messy and mutable, and the proper organisation for a project can change frequently (as the article notes); there's no sense chiselling it into stone.

reply

pvg | karma 14880 | avg karma 1.9 · | 2017-05-24 19:20:37+00:00

It's very much really. The fact that it's easier doesn't really matter - a repo is about access to the source code and its history with some degree of convenience. The process and policy of how you control actual change is quite orthogonal. You can have a single repo and enforce inter-module interfaces very strongly. You can have 20 repos and not enforce them at all. Same goes for builds, tests, history, etc. The underlying technology can influence the process but it doesn't make it.

AlphaSite | karma 1364 | avg karma 1.71 · | 2020-09-12 18:44:34+00:00

Also there isnt really a lot of great tooling for making sweeping changes across repos.

draw_down | karma 1827 | avg karma 0.27 · | 2018-02-13 06:11:45

You only create the repo once, whether that's easy to do is beside the point. Sure it's easy. The problem is deploying related changes together across repos. Every way of doing that sucks.

Confusion | karma 7759 | avg karma 2.9 · | 2015-08-05 06:07:19+00:00

Exactly. I frankly have no clue what this article is about. I work at a company with < 10 developers. We started with one repo and now have over 20 repositories for various bits and pieces of our code. Each one maps to its own releasible component. I don't recognize the 'I don't want to waste time dealing with multiple commands to manage multiple repositories' at all. The only time there is a difference on the cli is when you clone a repo. If anything about this would hurt, we'd change it: optimizing our workflow is something we pay a lot of attention to.

In fact, with hundreds of developers and everything in one repo, I don't see how you'd ever be able get a commit through : you'd be merging commits that others just did all the time and would have to get lucky?

reply

Plasmoid2000ad | karma 129 | avg karma 3.0 · | 2020-04-27 10:43:42+00:00

One size rarely fits all.

Moving to mono repos has been great for my team for reducing build times and removing the overhead of understanding decades of cruft built up to make a huge mono repo manageable.

But with turnover of both projects and people, and many projects not requiring active development, there's an awful lot of orphaned repos, with less than one person dedicated supporting them now.

In my environment at least, you can just stop development. Packages need to be kept up to date, as vulnerabilities are discovered, using Azure Devops means we need to move along with changes to the build process. Infrastructure and Secrets policies changes come from outside, and require us to make changes.

Making these kind of changes to a handful of repos is quite a bit of overhead. It's clear now, we went too small on the repo size, given the tools we have for the maintenance tasks we have to do.

And people advocate for even smaller repos...

reply

pkrein | karma 2819 | avg karma 15.75 · | 2015-02-20 17:44:49

Big monolithic repos actually increase the mental complexity more in our opinion. Mental complexity really comes with how much of the system you're holding in your head at once. And with smaller repos you might be dealing with 10-20 repos on any given day, but that's only 1-2% of our codebase/system. The other 98% is ignorable. So we end up holding less in our heads, assuming we've abstracted things correctly.

It did get annoying to deal with the mechanics of lots of repos. So we built tooling to make that easier. For example, CLI commands like "goto analytics.js" will clone and take us to the local copy of the repo. And "publish patch" handles all the mechanics of updating History.md from the git log, incrementing the version appropriately in package.json and component.json, tagging the commit and releasing to github and npm. Khaos, also mentioned in Sperandio's article, helps us template out new repos quickly. With a few pieces of tooling like that you can move pretty fast across lots of repos.

reply

lclarkmichalek | karma 2692 | avg karma 4.01 · | 2018-07-25 09:12:36+00:00

I've seen hardly any tools to manage dependencies across multiple repos. Modifying multiple repos at the same time isn't an issue I see many resources devoted to, and managing those cross repo versions is almost never done well. In comparison, both buck and bazel offer pretty mature monorepo management tooling. On the VCS front, you can take native git/HG a long way.

inertiatic | karma 1120 | avg karma 3.41 · | 2019-01-03 11:07:05+00:00

My current team managed to break a single "component" out into a separate repository. Then that repository broke into two, then those broke into other repositories, until we've eventually have around 10 or so different repositories that we work on every day.

An average change touches 4 of them, and touching one of them triggers on average releases on 2 or 3 of them. Even building these locally is super tedious, because we don't have any automation in place (not formally plan to) for chain building these locally.

This is a nightmare scenario for myself. A simple change can require 4 pull requests and reviews, half a day to test and a couple hours to release.

Yet my team keeps identifying small pieces that can be conceptually separated from the rest of the functionality, even if they are heavily coupled, and makes new repos for these!

reply

teraflop | karma 15299 | avg karma 6.52 · | 2015-08-05 07:49:32+00:00

The issue with multiple repositories has nothing do to with the number of commands you have to run. As you say, that's the kind of thing that can easily be automated.

The problems arise when you have to combine code from different repositories into a single deployable product. Most of us don't take Amazon's hard-line stance of making absolutely everything a microservice, so we end up with libraries of reusable code that are referenced by multiple projects. But when you store those libraries in separate repositories, it becomes impossible to describe the state of your deployed code without listing the version of every single dependency. That makes it easy for subtle inconsistencies and bugs to creep in, especially when the dependencies are multiple levels deep and are owned by different teams. If everything lives in the same tree, then a single commit ID reproducibly describes a complete system from top to bottom. And you can atomically make changes that cross module boundaries, which is difficult to do safely with separate repositories.

I don't really follow your comment about merging. Pretty much every version control system since forever has been smart enough to realize that, if I make changes only to foo/src/ and you make changes to bar/src/, our changes don't conflict and can be merged automatically without user intervention. (There might be technical difficulties; for example, if you're using Git, I would imagine that trying to view the list of commits of a small subtree of a gigantic repo might not be terribly efficient. But just like the issue of managing multiple repos, that's something that you can solve with better tool support, if you really need to and are motivated enough.)

reply

criley2 | karma 4465 | avg karma 2.92 · | 2022-06-26 09:19:20

You assert that it doesn't work however we are in a thread where a major company like Uber has switched to it making your assertion ring hollow.

I do not understand your point about conflicts as 3 commits across 3 projects with 3 different CI pathways is going to cause more conflicts than 1 commit across 1 repo. In my experience managing one code change or project across 3 repos is a 10X difficulty increaser in terms of repo management, conflicts, etc. It's not just 3X harder, it's 10X harder to me. The number of times I've seen a spelling mistake/naming difference/etc in 1 out of 3 repos because the PRs were done separately and no one noticed is too damn high.

The simplicity of having it all together strongly outweighs the benefits of multi repo in most situations IMO. The number of projects/companies/etc that would benefit from some highly engineered microservice-based multi-repo monster is probably less than 100 in my country, and 1000 worldwide.

reply

marcinzm | karma 13505 | avg karma 3.78 · | 2019-01-03 17:04:24+00:00

You're making a lot of assumptions about how such a move would be done which I don't feel are warranted. You're picking the hardest most painful option and then using it to claim the process is painful rather than that the option you chose is painful.

If I was moving many small repos into a single mono repo then I'd do it one repo at a time. Presumably your small repos are independent entities so there's no reason to do a single massive switch. Transition each repo to the new build system inside the existing repo. Once that works then you can transition that repo into the mono-repo and tie together the build systems. No need to stop releases, no massive chance of everything failing, no weeks of debugging while the world is stopped, etc. Rinse and repeat until everything is moved over. Process becomes more optimized and less error prone with each repo that is moved over.

reply

notacoward | karma 14833 | avg karma 3.12 · | 2019-05-01 13:19:21+00:00

Having to stage changes across multiple repos, and minimize dependencies between them, can definitely create more work up front. The point is that it lessens the recurring cost (not just across time but across many developers) of having those spurious dependencies in the code forever. Code spends more time being maintained than being written, so it's the recurring cost that dominates long-term productivity.

prpl | karma 1357 | avg karma 2.16 · | 2023-11-23 11:43:41

I think this is roughly true, but at a BigCo it’s not really feasible/easy unless you have a monorepo or otherwise extremely good build/integration tooling to deal with many repos (though Go can sort of deal with this)

The issue is coordinating changes (and, god help you, library releases) across repos is often an utter nightmare with multiple PR/merge builds

reply

gnoway | karma 1312 | avg karma 2.82 · | 2015-08-05 03:06:05+00:00

I'm sitting here running git gc and repack on about 50 repos right now, of varying sizes. We just actually combined two of our larger repos into one for productivity reasons, so this article resonates with me a little on that front.

I spend a lot more time on the build and administration side of things than the code side, and I personally prefer more smaller repos. Builds are faster and less error prone, less disk space is used overall (regardless of cloning scheme - I have used them all), and I do believe the separation and inherent difficulty aids quality at the expense of productivity. I'm about the only person in the company who does, though, and that tells me this discussion depends more on how you personally interact with source control than any abstract 'monolithic vs. not' ideal. Or that I'm crazy, but I refuse to accept that.

reply

oivey | karma 1635 | avg karma 3.77 · | 2022-07-05 00:59:00

This is a good way to put it. I think the opposite is also true: going with many repos means you are committing to other teams being on their own to upgrade. It’s difficult to know if you’ve fixed all downstream code if everything isn’t in one repo. That model makes sense for OSS. Not so sure about within companies.

erulabs | karma 5450 | avg karma 5.4 · | 2019-01-03 17:36:51

Sure but then you only have some small portion of the total infrastructure, which adds its own layer of complexity for the people reviewing your changes :P It's all trade offs, is all I'm saying - I honestly still can't decide between the two, although for all companies sub 20 people, I'd for sure stick with a single repo.