Hacker Read

arc776 · 2019-09-24 05:23:44

This is a bit like saying you wouldn't use LLVM languages because of the semantics of IR, or that you have to understand the guarantees IR provides, isn't it?

Ultimately if you're really interested in performance, regardless of the stages of compilation, the juice is the machine code output at the end.

In terms of guarantees, those should be satisfied higher up in the language itself, with C generation being output according to Nim's CGen spec. The compiler is fully open source though and easy to dig into.

Having said that, the CGen output is fairly readable if you're familiar with C and I must say I've investigated it when I wasn't sure how something was generated.

reply

arc-in-space | karma 771 | avg karma 3.46 · | 2020-12-02 14:27:22

It's somewhat more constricting, certain types of compiler magic are hard to reproduce in plain C. For instance, dealing with things that need to be stack-aware, such as a GC or tail recursion, takes epic hacks like the Boehm GC or the "Cheney on the MTA" trick. You're also stuck with the C calling convention.

Plus, while generating C is simple, you're entirely on your own. With LLVM IR, you benefit from its infrastructure, the optimization passes and other LLVM-based tools, so you don't need to reinvent the wheel as much. In the end, you'll need some kind of IR anyway, and LLVM is a good place to start.

reply

pizlonator | karma 4077 | avg karma 4.97 · | 2020-04-21 19:49:51+00:00

Still not as good as emitting C code in most cases? C code gets optimized using either llvm or any other optimizer so it’s a more portable compile target.

azakai | karma 7684 | avg karma 3.12 · | 2015-11-29 19:07:35

A long-time disagreement between us :)

> [in C] It's very difficult to avoid undefined behavior,

True, but LLVM IR and other IRs have undefined behavior as well.

> you won't be able to modify the compiler backend (and you will want to as you start optimizing),

Not all languages need backend changes. And you can still add IR optimizations or backend changes that help your language, if you do. Yes, this might not be as easy, but then if you compile to LLVM IR, you might need IR changes for your special things anyhow.

> you can't add new language intrinsics, good GC is pretty much incompatible with that,

Good point, for most GC languages compiling to C is not a good option, which makes sense since C has no GC support.

> and precise control over debug info is impossible.

Yes, but using the C preprocessor you can get pretty far.

> Compile to LLVM IR and/or GCC IR instead.

More power that way, sure, but

1. It limits you to one compiler, while C has many.

2. LLVM IR changes over time (and recently had plenty of examples of this), so you'll need to track that. Whereas C is extremely stable, so long-term, it's much less work.

3. Emitting C is very easy (for those familiar with C), and also very easy to debug.

I think compiling to C is an excellent option. Sometimes better, sometimes worse.

reply

moosingin3space | karma 1288 | avg karma 2.67 · | 2021-02-09 02:19:04+00:00

For one, respected community member pcwalton has argued against compiling to C, citing the difficulty in producing performant, memory-safe output.

For another, from personal experience, I can comment that compiling to C in such a way that doesn't leak abstractions left and right is quite challenging. It's pretty hard to produce memory-safe C, so it's a ton of work, and the payoff is pretty marginal, as the most important platforms already are supported by LLVM.

reply

ziotom78 | karma 1029 | avg karma 3.42 · | 2016-01-09 07:23:06+00:00

As camgunz said, emitting C code makes your compiler far more portable, as you can find C compilers for virtually every target (e.g., AFAIK, the current version of LLVM does not support SPARC architectures).

Another theoretical advantage of generating C code is that it should be easy to try adding bits of code written in these new languages to existing C/C++ projects. In this way you can test the new language on real life projects without the need of rewriting everything from scratch. (I have never followed this approach, although I have heard of people having done this [1].)

[1] http://roscidus.com/blog/blog/2014/06/06/python-to-ocaml-ret... The languages here were quite different, though (Python/OCaml): but the author followed the idea of slowly converting a project to a new language by first rewriting parts of it. Had the OCaml compiler emitted Python code, I bet the task would have been easier.

reply

nickpsecurity | karma 14152 | avg karma 1.39 · | 2018-01-26 02:06:58+00:00

That's a good point. I doubt most languages generating C put a lot of thought into what subset is supported in proprietary compilers. If anyone tries one, then there could be profiles done for unusual compilers where the generated code has certain properties. Far as the past, I think people couldve tested the output just to see what happened. Then, either ditch high-level language per fuction, module, or project depending on how far incompatibility effects went.

How does that sound?

reply

pjmlp | karma 109153 | avg karma 1.76 · | 2018-02-16 08:33:58+00:00

Code generated by C compilers is fast in 2018.

Code generated by C compilers for C64, Spectrum, Atari, Atari ST, Amiga, Mac, CP/M, MS-DOS, Windows 3.x, Nintendo, MegaDrive,... systems meant many times the code would be 80% like this:

    void some_func(/* params */) {
      asm {
         /* actual "C" code as inline Assembly */
      }
   }

Lots of Swift sugar also gets optimized away, and there is plenty of room to improvement.

The code that current C compilers don't generate, many times is related to taking advantage of UB.

They also generate extra code for handling stuff like floating point emulation though.

Just as an example, IBM did their whole RISC research using PL/8, including an OS and optimizing compiler using an architecture similar to what LLVM uses.

They only bothered with C, after making the business case that RISC would be a good platform for UNIX workstations.

reply

akireu | karma 125 | avg karma 5.21 · | 2022-01-23 10:44:24

Ah, I remember that one. Not a fond memory, either. The C-like language is a red herring: what you actually want is the codegen backend, and having any intermediaries between your AST and the codegen's IR will just add inefficiency and uncertainty. Ironically, there are parts of LLVM-like IR poking out of it: at page 43 of the spec pdf there's a table of instructions that have their counterparts in more or less any modern codegen.

ggrrhh_ta | karma 327 | avg karma 1.58 · | 2021-10-12 02:02:58

the C code looks really nice and very generic, and, it being a generator... would it be too difficult to make use of those CPU instructions during the code generation? (then it could even create architecture-specific code, which looks like a plus to me)

trajing | karma 7 | avg karma 1.17 · | 2016-01-10 21:33:41+00:00

Excuse me if I'm wrong, but isn't compiling to C instead of e.g. LLVM IR generally considered bad practice unless you have a very good reason due to all of C's undefined behavior?

Joker_vD | karma 4965 | avg karma 2.36 · | 2020-09-15 16:14:10+00:00

Well, emitting C is easier than emitting LLIR. Not just in the sense that you know C already and don't know LLIR yet, but if your source language is close enough to C semantics (basically, it's imperative language), you can most of the time generate the code in one straightforward walk over your whole AST. In fact, translating it to some home-made lowered IR and then to C would probably produce less effective binary.

mistercow | karma 10714 | avg karma 3.06 · | 2012-10-20 14:55:10+00:00

Sure, but that doesn't make it a good intermediate representation for a programming language. I really like CoffeeScript, but compiling to it would be insanity.

That said, there are some good reasons for compiling to C. Not every architecture has a backend for LLVM yet, after all.

reply

gsg | karma 893 | avg karma 2.96 · | 2014-09-20 21:43:30+00:00

Looking at jit_codegen.c, seems it compiles a low-level IR to C, throws that in a file and invokes cc on it (see cgen_freeze). Functions are pulled out with dlsym.

Pretty heavy machinery for a JIT... but there's a comment about an LLVM backend, so I suspect this is a temporary arrangement. It seems this project is in its very early days.

reply

jokoon | karma 3945 | avg karma 1.05 · | 2017-12-02 14:24:55

Well I find it more interesting...

Also C compilers can do a lot of micro optimizations that are not trivial, so compiling something to C or maybe LLIR seems like a better choice.

reply

rayiner | karma 121493 | avg karma 4.24 · | 2013-05-05 00:54:11+00:00

Generating C is quicker to write and easier to debug (can more easily read intermediate output) and doesn't require accessing a C++ api.

magicalhippo | karma 12317 | avg karma 2.47 · | 2020-12-31 13:03:37+00:00

How much of C and the standard library does it use?

Like, if I wanted to make my own C compiler, how much would I have to implement for it to be usable with Nim-generated code?

Does it use a fairly constrained subset or does it use a lot of C and the standard library?

I imagine the latter but just curious.

reply

Dewie | karma 2355 | avg karma 1.15 · | 2014-11-15 15:35:17+00:00

I won't echo this sentiment, but; why use C for writing a compiler? It seems that languages in the ML family, among others, are nicer for that kind of domain.

I guess it at least has its uses if you want your compiler to be really fast.

reply

zeckalpha | karma 3100 | avg karma 1.88 · | 2015-08-06 03:13:58+00:00

I think it goes to LLVM IR rather than C. It does it's own type inference, so compiling to C after that would preclude many optimizations that it could take advantage of.

bobsmooth | karma 1688 | avg karma 1.96 · | 2022-03-24 23:24:16

Notice that the C used to generate the machine code via the compiler is just another very inefficient programming language.