Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Building a C Compiler Type System – Part 1: The Formidable Declarator (blog.robertelder.org) similar stories update story
92 points by robertelder | karma 3325 | avg karma 10.17 2016-07-07 09:35:37 | hide | past | favorite | 52 comments



view as:

If it doesn't work, you shouldn't be doing it.

I made a new language.

God says... frivolity's inestimably blooded girder wisp Polanski's Playboy's Becker's Wolfgang's boobing nascent Katina tarts Mohawks safeguard's trampoline's soliloquize proboscides assisting unbosomed expends tasted rejoinders chitchat's jute's musically secedes trawl's pounding Watts principle Dnieper


You should provide a link to your language for those unfamiliar with it.


dang, how is this guy not banned yet? Saying "nigger" in every other post is way worse than anything Mike Church has done.


I know who he is. What's your point?


Different point. That's a discussion on whether material about TempleOS is appropriate for HN, not whether Terry Davis's personal account should be banned.

When his comments show up it is because one or more of us have vouched for it, otherwise you'll only see them with showdead. Look at his comment history.

He gets this treatment because he occasionally posts something technically worthwhile, and because his outbursts are likely a result of mental illness. Even so, very, very little of what he posts is visible to most people because even a lot of otherwise good comments have offensive parts tacked on.

(In this case, the last bit is a reasonably un-offensive output from his random text generator, it would seem).


That is also true of Mike Church. Mike Church clearly has a mental illness as well. His level of paranoia is absolutely not normal. However, he's banned from HN for calling someone a "cunt" and various other things that are less bad than a typical Terry Davis post.

I think a clarification might be helpful: the TempleOS account is banned. When a banned account posts a comment, it's marked as dead. You can opt-in to seeing dead comments by going into your profile and turning on the "showdead" option.

HN semi-recently introduced a new feature called vouching. When you see a dead comment you feel is worthwhile and constructive, you can click on the comment's timestamp, then click "vouch." This will cause the comment to become visible to everyone. That's what happened here.

The "vouch" link will only appear for users with a certain amount of karma, and abusing this privilege can cause your vouching ability to be removed. It's one of the more impressive features of HN, since it gives some measure of moderation control to the community. Tangent: It's also the only way that people who create new HN accounts via Tor are able to post comments, because comments posted via Tor from new accounts are autokilled. I've only seen this matter once or twice, but the fact that vouching worked for those few cases made the HN experience feel quite special.

If you go to https://news.ycombinator.com/threads?id=TempleOS and haven't enabled your "showdead" option, you'll only see Terry's vouched comments. This may have been a source of confusion, since it could have given you the impression the account wasn't banned.

The HN mods are quite responsive via email. In general, if you have a question like the one you originally posed, you may want to try sending a short, clear email to hn@ycombinator.com.


That account is banned. That comment was unkilled because users vouched for it. Those were bad vouches.

What do you see as bad about it? It seems several of us saw that comment as acceptable. I don't want to argue the point, but I would like to understand why, given that despite the random stuff at the end, it otherwise raised a viewpoint that is related to the discussion (and I was one of the people who vouched for it).

It was an unsubstantive and distracting comment that could only lead to yet another argument about Terry, as indeed it did.

Can HolyC be used to write a network stack? Are there any plans to do this?

The key issue with C declarations is that "declarations mirror use" which means that you can interpret them this way:

    int *p; // dereferencing "p" gets you an "int"
    int **q; // dereferencing "q" twice gets you an "int"
    char a[10]; // subscripting "a" gets you a "char"
    void (*f)(void); // calling f() gives you "void"
People who write the asterisk next to the type often have a hard time understanding this.

I think you have it backwards. People who write the asterisk next to the type understand it just fine, we just don't understand how writing it next to the identifier makes sense.

At least, that's how I felt until I read your comment. Now at least I understand why using it that way doesn't drive you absolutely insane. So thanks!


absolutely wrong.

the 'type' is pointer-to-int, the 'name' is 'p', hence 'int* p'.

I understand pointers quite well, thankyou....


And "int* p, q" declares to pointers to int, amirite?

no.

You should only ever declare one thing per line.

int* p;

int q;


Only if you don't know how the thing actually works.

Then how about cases like these:

    int* a, *b;
The language was intentionally designed so that the same operators with the same precedence levels are used in declarations as well as in normal expressions.

> the 'type' is pointer-to-int, the 'name' is 'p', hence 'int* p'.

It would be great if the language actually worked that way, because that's the natural way to think about it. Unfortunately, it doesn't. In the simple case of a single pointer variable, we can put the star next to the type and pretend that C works the way we think, but it's a fiction.


I don't see how anything they said is incorrect. The type is int*, the name is p. C just has this weirdly optimized multiple value decl syntax that for some reason really wants to make it easy to declare an array of int, pointer to int, and function to int all at once instead of making it easy to declare multiple pointers to int.

It seems to be a perfectly rational reaction to say "oh this was a stupid idea, let's not take advantage of it". (which is honestly a great way to approach a lot of C's features)


C's comma operator syntax is also weird and, in my experience, many people don't really understand it that well.

edit: In reference to the last line: "oh this was a stupid idea, let's not take advantage of it".

It's a weird feature that generally makes the code more difficult to understand but people still use it a lot, similar to multiple declarations on a single line with pointers mixed in.


Maybe you don't understand it that well because the described usage of a comma is not a comma operator (or any kind of operator).


Your point? The multiple declaration syntax which is being discussed here uses comma separators, not operators.

The comma acts as a separator in declarations, function calls and initializers.

In ordinary expressions it is a normal binary operator (just like / or &&), but has the lowest precedence and simply evaluates the left-hand side, discarding the result, then (in that order) evaluates the right-hand side, returning its value as the result.


I don't really disagree; just remember that you're abusing the syntax, because if you forget, it will abuse you back.

Also, we have

  int *p, q;
which says "p is a pointer to integer; q is an integer". That's an argument against writing

  int* p, q;
because that suggests that both p and q are pointers to integer.

And yes, you can write (IIRC)

  int q, *p;
(See also http://www.stroustrup.com/bs_faq2.html#whitespace)

YRC

That's why I declare pointer variables one to a line. Wish the language would work the other way with the asterisk associativity. Mainly because this just looks wrong to me:

  int *p, *q;

  typedef int *intp;
  intp p, q;
That just screams that the * belongs to the type and not the variable name.

    void (*fn)(int*) // fn takes a pointer-to-an-int and gives you void
Wouldn't this suggest that the type this function takes is `int*`?

I've always thought the pointer declaration syntax in C was just an oversight and I get around it's ambitiousness by always declaring variables on their own line.


As the article says, int-star is an "abstract declarator". An abstract declarator simply omits the name, otherwise it is the same. Therefore, the "canonic" way to write that declaration would be:

    void (*fn)(int *);

>People who write the asterisk next to the type often have a hard time understanding this.

Why do you think so?


I've observed similar... amongst early learners of the language. I doubt it persists.

Are you saying that early C learners who do not place the asterisk near the type have an easier time understanding the same?

What about function declarations? If a function returns a pointer, will you stick the asterisk to the function name? And in a declaration you may omit parameter names entirely, so there's nothing to stick the asterisk to:

    Foo* bar(Baz*, Qux*, int, int*, int**);
I myself do it differently depending on the context. In functions I always stick the asterisk to the type, but if I declare variables, I stick it to the name:

    Foo* bar(Baz* baz) {
       Foo *a, b; Baz c, *d;
       ...
    }

The same reasoning applies to function declarations returning pointers:

    int *f(void); // "*f()" returns "int" and "f()" returns "int *"
My habit is to always put a space between the type and the asterisk, regardless of whether there is an identifier:

    sizeof(int *);
    x = (char *) y;

Avoiding this hairy problem was why I went[1] with a more verbose and more alien-looking -- but much more regular -- s-expression syntax for what is still exactly a C type system:

  number
  (address number)
  (address address number)
  (address array character)
  (map (address array character) (list number))
  (function number -> number)
etc. For simple types you can replace brackets with colons:

  address:array:character
But you have full expressiveness if you need it.

[1] https://github.com/akkartik/mu


IMHO, C's type system is quite orthogonal but is let down by the confusing declarator syntax. I designed my own experimental statically-typed language which basically inherits the C type system, but uses a "human-oriented" declaration syntax:

    a: int; // int a
    p: ptr(int); // int *p
    q: ptr(ptr(long)); // long **q
    arr: int[20]; // int arr[20]
    arrp: ptr[20](int); // int *arrp[20]
    parr: ptr(int[20]); // int (*parr)[20]
    fp: fptr(a: int, b: int): long; // long (*fp)(int a, int b)
https://github.com/bbu/quaint-lang

Ooh, Quaint looks very interesting. I think there's a lot of overlap with my Mu project, beyond these minor syntactic decisions. Lots of room for compare and contrast, and for sharing and stealing ideas :)

It's worth noting that two of the example programs in K&R are a simple parser and "unparser" for (a subset of, but still quite complete) the declaration syntax, and it is only a few dozen lines.

IMHO the biggest difficulty that beginners face with the syntax is entirely because they attempt to parse it left-to-right and aren't following the precedence; once you realise that the operators (), [], and * in declarations have the exact same precedence they do in the rest of the language, and that expressions like 2 * (3 % foo(i + 4 / x[j])) - 1 are not read left-to-right either, it all comes together and makes perfect sense.

Starting at the identifier (or where it would go, if it was an abstract declarator) and reading outwards following the precedence rules (and recursively applying this to function calls) is the only correct way to parse these declarations, and it is basically what the example program in K&R illustrate.

Thus the "clockwise spiral rule" mentioned in the post is applicable only to certain cases and incorrect in general, as Linus Torvalds explains: https://plus.google.com/+gregkroahhartman/posts/1ZhdNwbjcYF

Edit: upon pondering the example

    int f((((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))));
I do not think it is legal in C89/90/99/11, since a parameter-declaration must begin with declaration-specifiers, and the declaration-specifiers cannot begin with an opening parenthesis. (Go to http://www.quut.com/c/ANSI-C-grammar-y.html and start following the rules via declaration->init_declarator_list->init_declarator->declarator->direct_declarator->parameter_type_list->parameter_list->parameter_declaration.)

On the other hand, I believe this:

    int f(int(((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))));
is legal and the type of the parameter is (pointer to) function returning int, with plenty of redundant parentheses around the abstract declarator.

Thanks for that comment, I'll read over that thread to see if I missed anything. Upon closer reading of the spiral rule, I don't think I have actually been following it as closely as I thought, so I don't think it affects the correctness of any other parts of the article. All this time, I had been kind of thinking that the spiral rule was a sort of 'gold standard' way to remember how to read declarations, but it doesn't clearly explain how to parse stuff like

   a[1][2][3][4];

Fun fact: `typedef int I;` and `int typedef I;` are both valid and do the same thing.

For the curious on why this is: at the syntax level, typedef is treated like any other storage class specifier (static, extern, auto, register). So it is legal just like 'int static a' is legal.

And if you want to go weirder, implicit int means that you can leave out the `int`:

    typedef x;
Even "better", C declarators can be empty in order to allow for struct/enum declarations that do not list any variables. So you can leave out the variable, add qualifiers, etc:

    typedef;
    const typedef;
    typedef const;
It's a really fun syntax in some ways.

Legal | privacy