Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

I agree that the original statement encourages that interpretation, but I think it admits the interpretation that the parser itself is in C and I think that is what was intended.


sort by: page size:

> It's all very academic and I don't have a strong opinion on it either way but I would say C has a relatively simple syntax (no classes, no inheritance, few keywords, few symbols in its syntax etc.) but its semantics (i.e. how to use the simple syntax to produce correct software) are fairly complex.

I agree that C's syntax is visually simple, but it's actually pretty complex from a parsing perspective: it's both context sensitive and binding sensitive, meaning that a correct C parser needs to be aware of the current parsing context's "live" bindings (including other functions) in order to resolve ambiguities like this:

    int foo(int(bar));
(`foo` could either be a function or a local `int` variable.)

I can understand that ultimate mindset! Rust is certainly a much more complex language from a syntax and semantics perspective, and (I think) the community wouldn't want to claim the "C successor" label anyways.


Author said it takes us back to C and you are saying it makes perfect sense that it gained dynamic language converts? By your argument, it is passed over by C programmers too making the original statement also not make sense when framed that way. Or maybe you meant only the first part made perfect sense?

> People are terrified of parsers and parsing

As if parsing in C is going to make them less terrified.


> But as I type this comment I realize that other syntax parsing tools like IDEs would be happier if the unprocessed code is still valid C.

Yes, that’s the reason. The code is still close enough to standard C to work well with Emacs for instance. It would be a deal breaker for me to require a special editing mode.


> Instead, we'll be single-pass: code generation happens during parsing

IIRC, C was specifically designed to allow single-pass compilation, right? I.e. in many languages you don't know what needs to be output without parsing the full AST, but in C, syntax directly implies semantics. I think I remember hearing this was because early computers couldn't necessarily fit the AST for an entire code file in memory at once


> If you pass it some code in some third language which could parse as mal-formed A code, then how is to detect that?

This 'other language' approach doesn't strike me as a good way of thinking about it. It misses the point that C is pretty unique in its broad use of undefined behaviour. Unlike Java, where everything has an unambiguous definition. (Well, ignoring plenty of platform-specific variation points in the standard library, such as file-path syntax.)


> So when it comes down to it all you are really doing is string parsing and string transporting, thats really the last thing you want to leave to C.

Exactly. String parsing is the biggest shortcoming in C that always gives me a second thought when I'm about to choose a language for a higher level application (especially if it incorporates user input as strings). Even such trivial thing as AT command parser is a pain in C. Of course, there are parser generators as Bison, but still it's tedious amount of work and usually not worth it.


> This applies to all languages.

Sure, but it's common for C programmers to think C is a low-level language with concepts that map straightforwardly to the target machine. You can't simultaneously think that C is a low-level language "close to the machine" and that your program can be freely rewritten into an eldritch horror. That's the only reason it would be a good "lie".

Contrast that with something like Perl, where people accept that an array is whatever Larry Wall wants it to be.


>>>>> Why are people still using parsers for untrusted input in C?

No matter what the parser itself is written in, if you're writing in C you'll be using the parser in C.


Okay, but would phrasing it that way add ANYTHING to the point he's trying to make? Would it help explain his point? (And it's not factually wrong, you're just deliberately misreading it so you can flog the dead horse about the ancient lost civilizations of programming languages before C.)

> Additionally, and this is something I really didn’t get into, languages aren’t inherently compiled or interpreted. You could have a C interpreter.

Of course; I was only addressing the confusion that other posters mentioned. While one could have a C interpreter (or any interpreter), for the purposes of the article this seems to be only a marginal point. In practice, C is (almost) never run that way, but your choice of words could suggest a parallel between the C abstract machine and the Ruby or Java VMs, and I think the way usual C implementations differ from those is more informative than how they are alike.

You do explain later that the C abstract machine is a compile-time construct, but many people read selectively and react immediately, as evidenced by several posts on this page.


> You can almost build a yacc grammar that will read C and emit assembly.

I have looked at a lot of C parsers and compilers and have written a C to Lisp transpiler and this statement is ludicrously wrong. You cannot even parse C code with yacc because of typedefs. The grammar of C is context-sensitive. And this is after the C code has been preprocessed, something that also cannot be done with yacc.

C compilers are easy to port to different machines, but that is because C is a very poor language in terms of features and control flow constructs. For the limited amount of things that C gives you as a language it also comes with a huge amount of complicated baggage when it comes to implementation and corner cases.


> This argument does not make sense: if you fork GCC or LLVM and change the language frontend for C, the thing you are parsing is no longer standard C.

It doesn't make sense to you, certainly.

To the type of person who've used a dozen different C compilers, all of them containing extensions to the language, and all of them still advertised as "C compilers", the argument makes perfect sense.

I mean, the input to the GCC C compiler is, by default, C with extensions, and yet even you still called it a C frontend, not a "SomeOtherLanguage" frontend.

You may not agree that the argument is valid or sound, but you can't with a straight face say that it is an unreasonable position to take.


> And because it is in C

I'm curious, what difference does it make if it was written in, say, Rust, or C++. Wouldn't you still be able to talk to it in a similar way? What's so special about C in this context?


> You cannot parse it properly without symbol table. No context free parser handles C properly, let alone C++. It gets progressively and exponentially worse from here.

C is actually pretty easy if you ignore all of the bad advice to use parser generators. I looked into the various YACC grammars for C I could find on the Internet, and all of them either had bugs or were incomplete. TCC[1] has a simple recursive-descent parser. With a recursive descent parser you also have the option of implementing the C pre-processor in the same step. Turns out I was able to implement a single-pass C parser and pre-processor as a bunch of Common Lisp read macros[2].

I have not looked into it, but the approach for C++ looks like it would be very different because template instantiation needs its own step.

[1] https://bellard.org/tcc/ [2] https://github.com/vsedach/Vacietis/blob/master/compiler/rea...


> C isn't the Lingua Franca of fast software anymore.

What is it, then?

> The reason why C programs can sometimes be faster, today, however is not an intrinsic property of the language but rather its inability to allow incompetent programmers to hide their bad data structures.

I'm not sure I manage to parse that properly. What's wrong about not being able to hide bad data structures? And how is this making C faster?


> elegant, easy-to-learn, easy-to-read language.

Compared to C? Yes, obviously.

Compared to languages we have now? Not so much.


> I think your entire criteria for determining whether a language or feature is good boils down to its intuitiveness to experienced C programmers ...

I see this tired argument over and over. It is especially thrown out when some esoteric language syntax is criticized.

In this case it seems to miss the authors point entirely. My understanding of his argument is that the C language maps reasonably well to machine code. Adding new features to the language could hamper that intuitive mapping between the language syntax and the actual machine code generated.

A valid criticism (which others have made) is that the actual mapping between C code and machine code on modern machines is far less intuitive than one might expect. Another valid response would be a demonstration that "closures, pattern matching, and templates" can be intuitively mapped directly to machine code.

I would be happy if I never see the "you are too blinded by your own language to understand" argument ever again. It borders on ad-hominem and extends no benefit of doubt to the original author.


> C (the language) does not cause people to write unmaintainable or hard to understand code

Well then it’s good that OP didn’t claim that C the language causes people to write such code. They said “The C ethos”, not “The C language.” It’s not about the language’s technical requirements, it’s about what’s idiomatic in a language, how it’s taught, and what style is used by the vast majority of the existing corpus of code written in that language.

Look at the C standard library’s function names, vsnfprintf/strdupa/acosh/ftok… Compare it to something like Objective-C at the other extreme, where method and variable names tend to always have fully spelled out identifiers with no abbreviations and a full description of what’s being done (`- [NSString stringByAppendingString]`, etc.)

Is it due to some technical requirement? Is stringByAppendingString illegal in C because it’s too long? Is strdup illegal in ObjC because it’s too short? Of course not! But why do we see this everywhere so consistently? Why does C have short indecipherable function names and ObjC have such long ones, if the language doesn’t require it?

Because idioms matter. If you’re learning C, you’re learning the way it is typically taught. You’re reading other C code. You’re encouraged to program the way other C programmers program. You’re likely using the standard library a lot. Likewise ObjC.

This means two things:

- Yes, in a very obvious sense, it’s not the language’s fault, it’s the programmers’s fault.

- But also, paradoxically, it is the language’s fault, because a language is not just a set of syntax in a vacuum, it’s also a corpus of existing code, a set of idioms, a community of people, and a way of thinking of things. C absolutely causes people to write hard-to-understand code when viewed through this lens.

next

Legal | privacy