Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Optimizing the unoptimizable: a journey to faster C++ compile times (vitaut.net) similar stories update story
72 points by coffeeaddict1 | karma 681 | avg karma 9.33 2024-01-06 13:21:06 | hide | past | favorite | 32 comments



view as:

[flagged]

Maybe so, but please don't post unsubstantive comments to HN.

The real solution to fast compilation is to have modules. These kind of one-off hacks are interesting but will not scale.

Modules is one of reasons why Pascal is fast to compile and I wish it was more popular than it was.

Even more so in Modula-2 (as the name suggests), where you can compile module definitions separately from their implementations and already have the compiler perform syntax checks against APIs defined therein before they are even implemented.

> The real solution to fast compilation is to have modules.

I don't agree. The real solution for fast compilation times is to not have to recompile things. This means onboarding tools like ccache, organize your project around independent subprojects which eliminate/minimize compile-time dependencies, and leverage incremental builds.

There's a C++ book somewhere that describes how the subprojects approach helps lower build times to a residual length which if my memory doesn't fail me was dubbed horizontal architecture. It consists of designing every single subprojects to be stand-alone and leave any integration to the linking stage. I've used it in the past on a legacy C++ project that took slightly over 1h to pull off an end-to-end build, and peeling out 4 or 5 subprojects following the horizontal architecture approach allowed me to cut down full end-to-end build times to slightly over 8 minutes and incremental builds down to less than a minute, without bothering with compiler caches. I'm sure that if I bothered onboarding ccache I could drive incremental build times to just a few seconds.

If you know what you are doing, you can do a lot without begging for magic.


Nah, C++ needs more than that. There's some ridiculous template heavy code out there where the majority of time is spent linking. You can't even do a debug build without optimizations on Windows with these programs because the coff file format can't handle it, even when compiling with "/bigobj".

> Nah, C++ needs more than that. There's some ridiculous template heavy code out there where the majority of time is spent linking.

C++ does not "need" more than that. You personally might find it more convenient if you don't have to think through your software architecture, but the truth of the matter is that you only need to spend a few minutes looking at your project to speed everything up, which is exactly what everyone does the very moment they feel they have a problem.

It makes no sense at all to demand a whole tech stack to change around you when you can't even spend a few minutes looking for an answer to the problem you have.

> There's some ridiculous template heavy code out there where the majority of time is spent linking.

That is not a problem. Templates are only special in a build because compilers spend time generating code. Again, you can work around that without any problem at all with the tools available to you for the past two or three decades.

Properly modularizing your app with explicit template instantiation already drives down the build time of any naive project structure to a fraction of the time, and basic stuff like onboarding a compiler cache tool and moving template code out of interface headers is enough to get the linking stage to be the most expensive step of a build.

> You can't even do a debug build without optimizations on Windows with these programs because the coff file format can't handle it, even when compiling with "/bigobj".

I'm sorry but this is simply not true. I suggest you start to look at your projects to break it down to subprojects and see where the critical path of your build is. There is absolutely no project in the world whose build would not take less than a minute (or even a few seconds) with incremental builds, even if it's code that uses template metaprogramming extensively.


I agree with the thrust of your argument that there are a lot of compile-time-encapsulation tools you can employ to reduce build times.

However, I will still challenge you on your build time/object size claim. Take a codebase structured around lazy tasks (i.e. every function is a coroutine), factor in runtime parallelism/SIMD dispatch, multiply with loop abstractions (think std::ranges) and you get frighteningly close to linker limits in no time (which, of course, also implies minutes of compilation per source file). Each of those pieces is templates through and through, and there's no explicit template instantiation way out of any of them.

Also, I have to (sadly) point out that explicit template instantiation declarations only take you so far. Yes, they reduce compiler time spent in codegen, but also cause the compiler to create all required transitive (!) declarations in every TU, which will quickly offset those gains when talking about template types that have lots of small member functions.


> However, I will still challenge you on your build time/object size claim. (...)

Nothing in your example suggests long compilation times are a hard requirement. Just move your components into submodules and remove template code from interfaces. Your build only needs to recompile what you tell it to recompile.

> Also, I have to (sadly) point out that explicit template instantiation declarations only take you so far.

It takes you as far as you need to go. You instantiate what you need, you move your template code into submodules and out of interface headers, and you're set. This is not sourcery. It's common sense.

I recommend you read "large scale C++,vol1 - process and architecture" to get acquainted with basic C++ techniques to structure your project so that builds don't waste time chewing through code they don't need to touch.


I am well aware of the techniques you mention, and intimitely familiar with inspecting include trees under the light of "does it make sense for this code to be recompiled if this code changes?", particularly across module boundaries. To clarify, I'm talking about cases of multiplicative object code growth from layering templated abstractions (as per my reply) causing a single object file with a clear singular purpose to grow beyond linker limits. Not because it does too many things or needlessly compiles in a whole bunch of mistakenly-in-interface-headers templates from other modules all over the place; those would be easy to fix. But this case is not solved by "move code into submodules" (there is nothing to further sub-modularize) and "move templates out of headers" (that translation unit uses and instantiates precisely the templates that are reasonable to involve). Yes, those are the easiest fat to trim from compilation times, I know and agree, but I'm trying to point out to you that not every template in a header is a mistake, and that object code size can grow beyond expectations even if those basic techniques have been exhausted. (And to be clear, we of course use incremental builds, but that's not even relevant to object size limits).

Again, I am not objecting to the general advice you give, but maybe you can take a step back and appreciate that not every code base and C++ use case falls into the range of what you have seen and interacted with so far. There is no doubt that a vast majority of C++ code bases would benefit immensely from improving compile-time encapsulation, but that is not the be-all-end-all of solving long compilation times, and it does not give you grounds for dismissing concerns of people who have done that and still face different problems.

> You instantiate what you need, you move your template code into submodules and out of interface headers, and you're set.

You seem to have completely missed my point. I implore you to consider for a second that I might be familiar with what you are trying to tell me, and that there are indeed complications beyond the basic techniques you advocate for. Let me illustrate with an example.

https://godbolt.org/z/5j43WrM68

Here we have a very simple class that uses a few std library types. It is a template, but say that we know that it is only valid to use it with the shown basic arithmetic types (note 1). The class implementation and according explicit template instantiation definitions have therefore been moved to a separate TU (not shown) and we only have the shown code in the header. This is a simple application of those "basic C++ techniques" you mention.

What happens to the compilation time if we enable the explicit template instantiation declarations in the header? Measuring on godbolt is noisy, but by repeatedly changing the source file (e.g. just adding spaces) you can quickly get a bunch of measurements. The lowest value of 10-20 measurements is going to be a reasonable proxy for real-world compilation performance. And if you perform these measurements with and without the ETIDeclarations enabled, you will notice that they cause this short piece of code to take 100-200 ms longer to compile. Mind you, these are explicit template instantiation declarations - they tell the compiler that it does not have to generate code for these template instantiations. However, they do still force the compiler to instantiate all the declarations of all involved transitive templates. Explicit-template-instantiation-declarating X<int> requires the compiler to (internally, in some abstract way) note down the existence of every last std::vector<int>::const_reverse_iterator::operator-=(size_t) and so on (note 2). That's a lot of nested templates that it needs to (abstractly) "declare", even if it never has to actually generate code for them, which explains why the mere presence of the ETIDeclarations slows down compilation measurably - in every single TU that includes this header!

Note 1: It is, in my experience, relatively rare that the set of valid template arguments is closed and not open - in a way, an open set of permissible types is the whole point of writing templates. And such a templates intentionally being provided across a module boundary is also all but rare for lower level libraries in a bigger code base.

Note 2: It's easy to underestimate the amount of template nesting in e.g. the standard library too. Here's a single typedef in std::vector (with two further levels of typedefs expanded):

  using const_reverse_iterator = _STD reverse_iterator<_Vector_const_iterator<_Vector_val<conditional_t<_Is_simple_alloc_v<_Alty>, _Simple_types<_Ty>,
        _Vec_iter_types<_Ty, size_type, difference_type, pointer, const_pointer, _Ty&, const _Ty&>>>>>;

Somehow my Fedora laptop seemed to automagically use ccache, which made me happy.

Available since C++20, C++23 brings the whole standard library in a simple import std; and is already available in VC++, clang 17/cmake, with GCC 14 catching up.

[dead]

I agree that modules is the right long-term solution and in fact {fmt} is modularized: https://vitaut.net/posts/2023/cxx20-modules-in-clang/.

So why on earth is std::string so slow to compile, if it's possible to compile this full-featured formatting library in 1/4 of the time?

AFAICS the main problem is that <string> pulls in a lot of dependencies.

Surely this is an implementation detail that vendors can change

Implementations can be improved somewhat but there is a certain set of dependencies that must be included according to the standard and for string it's pretty big.

It isn't when using modules, import std brings in the whole standard C++ faster than that #include.

EDIT:

This compiles in 1 second on an i7 laptop.

    import std;

    int main()
    {
        std::cout << "Hello World!\n";
    }

Unfortunately the std module alone doesn't help much because 1 second is 3x slower than before the optimization described in the post but maybe more fine-grained modules will (or if std module import made faster).

For the whole library, everything.

Also Microsof Office team has been migrating to modules, with great build improvements.


Any place where I can read up more about this?

https://devblogs.microsoft.com/cppblog/integrating-c-header-...

https://devblogs.microsoft.com/cppblog/integrating-c-header-...

Key takeway

"Fortunately, we were able to show a build performance improvement great enough that the team agreed to adopt header units into the Office production build system alongside msvc 17.6.6!"

Unfortunely the performance findings blogpost is still WIP.


[dead]

Not to be "that guy", but... one second is a VERY VERY long time to compile a hello world program.

Including compiling from scratch the complete C++ standard library on first use, which I should probably have mentioned.

What's the incremental compile time? If you just change that "Hello World" string to something else, does it compiler faster the second time?

Looks like the stdio example includes compilation and linking, fmt example only times compilation.

Good catch, thanks! Fixed now. This explains why the difference was kinda low compared to another benchmark: https://github.com/fmtlib/fmt?tab=readme-ov-file#compile-tim....

Legal | privacy