Building a C Compiler Type System – Part 1: The Formidable Declarator

TempleOS | karma 20 | avg karma 0.08 · 2016-07-07 16:02:00+00:00

If it doesn't work, you shouldn't be doing it.

I made a new language.

God says... frivolity's inestimably blooded girder wisp Polanski's Playboy's Becker's Wolfgang's boobing nascent Katina tarts Mohawks safeguard's trampoline's soliloquize proboscides assisting unbosomed expends tasted rejoinders chitchat's jute's musically secedes trawl's pounding Watts principle Dnieper

reply

marvy | karma 715 | avg karma 1.1 · 2016-07-07 17:20:14

You should provide a link to your language for those unfamiliar with it.

robertelder | karma 3325 | avg karma 10.17 · 2016-07-07 17:28:13+00:00

http://www.templeos.org/Wb/Doc/HolyC.html

umanwizard | karma 14757 | avg karma 2.5 · 2016-07-07 17:57:17+00:00

dang, how is this guy not banned yet? Saying "nigger" in every other post is way worse than anything Mike Church has done.

6581 | karma 3679 | avg karma 15.14 · 2016-07-07 19:05:03+00:00

http://motherboard.vice.com/read/gods-lonely-programmer

umanwizard | karma 14757 | avg karma 2.5 · 2016-07-08 01:36:40+00:00

I know who he is. What's your point?

tptacek | karma 394296 | avg karma 6.04 · 2016-07-07 19:11:48

https://news.ycombinator.com/item?id=10289205

umanwizard | karma 14757 | avg karma 2.5 · 2016-07-08 01:45:39+00:00

Different point. That's a discussion on whether material about TempleOS is appropriate for HN, not whether Terry Davis's personal account should be banned.

vidarh | karma 41717 | avg karma 2.6 · 2016-07-07 19:59:59+00:00

When his comments show up it is because one or more of us have vouched for it, otherwise you'll only see them with showdead. Look at his comment history.

He gets this treatment because he occasionally posts something technically worthwhile, and because his outbursts are likely a result of mental illness. Even so, very, very little of what he posts is visible to most people because even a lot of otherwise good comments have offensive parts tacked on.

(In this case, the last bit is a reasonably un-offensive output from his random text generator, it would seem).

reply

umanwizard | karma 14757 | avg karma 2.5 · 2016-07-08 01:37:26+00:00

That is also true of Mike Church. Mike Church clearly has a mental illness as well. His level of paranoia is absolutely not normal. However, he's banned from HN for calling someone a "cunt" and various other things that are less bad than a typical Terry Davis post.

sillysaurus3 | karma 15505 | avg karma 3.87 · 2016-07-08 02:29:23+00:00

I think a clarification might be helpful: the TempleOS account is banned. When a banned account posts a comment, it's marked as dead. You can opt-in to seeing dead comments by going into your profile and turning on the "showdead" option.

HN semi-recently introduced a new feature called vouching. When you see a dead comment you feel is worthwhile and constructive, you can click on the comment's timestamp, then click "vouch." This will cause the comment to become visible to everyone. That's what happened here.

The "vouch" link will only appear for users with a certain amount of karma, and abusing this privilege can cause your vouching ability to be removed. It's one of the more impressive features of HN, since it gives some measure of moderation control to the community. Tangent: It's also the only way that people who create new HN accounts via Tor are able to post comments, because comments posted via Tor from new accounts are autokilled. I've only seen this matter once or twice, but the fact that vouching worked for those few cases made the HN experience feel quite special.

If you go to https://news.ycombinator.com/threads?id=TempleOS and haven't enabled your "showdead" option, you'll only see Terry's vouched comments. This may have been a source of confusion, since it could have given you the impression the account wasn't banned.

The HN mods are quite responsive via email. In general, if you have a question like the one you originally posed, you may want to try sending a short, clear email to hn@ycombinator.com.

reply

dang | karma 18142 | avg karma 0.25 · 2016-07-08 03:02:12+00:00

That account is banned. That comment was unkilled because users vouched for it. Those were bad vouches.

vidarh | karma 41717 | avg karma 2.6 · 2016-07-08 12:25:07+00:00

What do you see as bad about it? It seems several of us saw that comment as acceptable. I don't want to argue the point, but I would like to understand why, given that despite the random stuff at the end, it otherwise raised a viewpoint that is related to the discussion (and I was one of the people who vouched for it).

dang | karma 18142 | avg karma 0.25 · 2016-07-08 18:07:39+00:00

It was an unsubstantive and distracting comment that could only lead to yet another argument about Terry, as indeed it did.

sigcode | karma 23 | avg karma 0.88 · 2016-07-08 02:12:24+00:00

Can HolyC be used to write a network stack? Are there any plans to do this?

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-07 12:00:08

The key issue with C declarations is that "declarations mirror use" which means that you can interpret them this way:

    int *p; // dereferencing "p" gets you an "int"
    int **q; // dereferencing "q" twice gets you an "int"
    char a[10]; // subscripting "a" gets you a "char"
    void (*f)(void); // calling f() gives you "void"

People who write the asterisk next to the type often have a hard time understanding this.

tinco | karma 11728 | avg karma 3.67 · 2016-07-07 18:05:22+00:00

I think you have it backwards. People who write the asterisk next to the type understand it just fine, we just don't understand how writing it next to the identifier makes sense.

At least, that's how I felt until I read your comment. Now at least I understand why using it that way doesn't drive you absolutely insane. So thanks!

reply

TickleSteve | karma 1934 | avg karma 2.29 · 2016-07-07 18:07:24+00:00

absolutely wrong.

the 'type' is pointer-to-int, the 'name' is 'p', hence 'int* p'.

I understand pointers quite well, thankyou....

reply

bonzini | karma 8127 | avg karma 2.81 · 2016-07-07 13:11:10

And "int* p, q" declares to pointers to int, amirite?

TickleSteve | karma 1934 | avg karma 2.29 · 2016-07-08 07:49:31+00:00

no.

You should only ever declare one thing per line.

int* p;

int q;

reply

bonzini | karma 8127 | avg karma 2.81 · 2016-07-11 09:14:11+00:00

Only if you don't know how the thing actually works.

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-07 18:15:08+00:00

Then how about cases like these:

    int* a, *b;

The language was intentionally designed so that the same operators with the same precedence levels are used in declarations as well as in normal expressions.

ScottBurson | karma 10295 | avg karma 2.77 · 2016-07-07 13:21:09

> the 'type' is pointer-to-int, the 'name' is 'p', hence 'int* p'.

It would be great if the language actually worked that way, because that's the natural way to think about it. Unfortunately, it doesn't. In the simple case of a single pointer variable, we can put the star next to the type and pretend that C works the way we think, but it's a fiction.

reply

Gankro | karma 1476 | avg karma 6.36 · 2016-07-07 19:14:31+00:00

I don't see how anything they said is incorrect. The type is int*, the name is p. C just has this weirdly optimized multiple value decl syntax that for some reason really wants to make it easy to declare an array of int, pointer to int, and function to int all at once instead of making it easy to declare multiple pointers to int.

It seems to be a perfectly rational reaction to say "oh this was a stupid idea, let's not take advantage of it". (which is honestly a great way to approach a lot of C's features)

reply

jotux | karma 2745 | avg karma 5.36 · 2016-07-07 19:20:01+00:00

C's comma operator syntax is also weird and, in my experience, many people don't really understand it that well.

edit: In reference to the last line: "oh this was a stupid idea, let's not take advantage of it".

It's a weird feature that generally makes the code more difficult to understand but people still use it a lot, similar to multiple declarations on a single line with pointers mixed in.

reply

penguinduck | karma 36 | avg karma 1.71 · 2016-07-07 19:45:01+00:00

Maybe you don't understand it that well because the described usage of a comma is not a comma operator (or any kind of operator).

imtringued | karma 11098 | avg karma 0.8 · 2016-07-08 05:26:28+00:00

https://en.wikipedia.org/wiki/Comma_operator

penguinduck | karma 36 | avg karma 1.71 · 2016-07-08 05:54:21+00:00

Your point? The multiple declaration syntax which is being discussed here uses comma separators, not operators.

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-07 19:53:04+00:00

The comma acts as a separator in declarations, function calls and initializers.

In ordinary expressions it is a normal binary operator (just like / or &&), but has the lowest precedence and simply evaluates the left-hand side, discarding the result, then (in that order) evaluates the right-hand side, returning its value as the result.

reply

ScottBurson | karma 10295 | avg karma 2.77 · 2016-07-07 22:05:54+00:00

I don't really disagree; just remember that you're abusing the syntax, because if you forget, it will abuse you back.

Someone | karma 30129 | avg karma 2.33 · 2016-07-07 18:21:02+00:00

Also, we have

  int *p, q;

which says "p is a pointer to integer; q is an integer". That's an argument against writing

  int* p, q;

because that suggests that both p and q are pointers to integer.

And yes, you can write (IIRC)

  int q, *p;

(See also http://www.stroustrup.com/bs_faq2.html#whitespace)

dllthomas | karma 14749 | avg karma 1.38 · 2016-07-08 00:31:06

Avernar | karma 397 | avg karma 1.42 · 2016-07-08 09:53:45+00:00

That's why I declare pointer variables one to a line. Wish the language would work the other way with the asterisk associativity. Mainly because this just looks wrong to me:

  int *p, *q;

  typedef int *intp;
  intp p, q;

That just screams that the * belongs to the type and not the variable name.

jotux | karma 2745 | avg karma 5.36 · 2016-07-07 19:09:03+00:00

    void (*fn)(int*) // fn takes a pointer-to-an-int and gives you void

Wouldn't this suggest that the type this function takes is `int*`?

I've always thought the pointer declaration syntax in C was just an oversight and I get around it's ambitiousness by always declaring variables on their own line.

reply

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-07 20:06:41+00:00

As the article says, int-star is an "abstract declarator". An abstract declarator simply omits the name, otherwise it is the same. Therefore, the "canonic" way to write that declaration would be:

    void (*fn)(int *);

paxcoder | karma 379 | avg karma 1.13 · 2016-07-07 20:53:49+00:00

>People who write the asterisk next to the type often have a hard time understanding this.

Why do you think so?

reply

dllthomas | karma 14749 | avg karma 1.38 · 2016-07-08 00:31:54

I've observed similar... amongst early learners of the language. I doubt it persists.

paxcoder | karma 379 | avg karma 1.13 · 2016-07-16 16:18:25+00:00

Are you saying that early C learners who do not place the asterisk near the type have an easier time understanding the same?

Mikhail_Edoshin | karma 526 | avg karma 1.01 · 2016-07-08 07:33:56+00:00

What about function declarations? If a function returns a pointer, will you stick the asterisk to the function name? And in a declaration you may omit parameter names entirely, so there's nothing to stick the asterisk to:

    Foo* bar(Baz*, Qux*, int, int*, int**);

I myself do it differently depending on the context. In functions I always stick the asterisk to the type, but if I declare variables, I stick it to the name:

    Foo* bar(Baz* baz) {
       Foo *a, b; Baz c, *d;
       ...
    }

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-08 08:12:39+00:00

The same reasoning applies to function declarations returning pointers:

    int *f(void); // "*f()" returns "int" and "f()" returns "int *"

My habit is to always put a space between the type and the asterisk, regardless of whether there is an identifier:

    sizeof(int *);
    x = (char *) y;

akkartik | karma 13864 | avg karma 3.99 · 2016-07-07 18:59:26+00:00

Avoiding this hairy problem was why I went[1] with a more verbose and more alien-looking -- but much more regular -- s-expression syntax for what is still exactly a C type system:

  number
  (address number)
  (address address number)
  (address array character)
  (map (address array character) (list number))
  (function number -> number)

etc. For simple types you can replace brackets with colons:

  address:array:character

But you have full expressiveness if you need it.

[1] https://github.com/akkartik/mu

reply

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-07 19:14:40+00:00

IMHO, C's type system is quite orthogonal but is let down by the confusing declarator syntax. I designed my own experimental statically-typed language which basically inherits the C type system, but uses a "human-oriented" declaration syntax:

    a: int; // int a
    p: ptr(int); // int *p
    q: ptr(ptr(long)); // long **q
    arr: int[20]; // int arr[20]
    arrp: ptr[20](int); // int *arrp[20]
    parr: ptr(int[20]); // int (*parr)[20]
    fp: fptr(a: int, b: int): long; // long (*fp)(int a, int b)

https://github.com/bbu/quaint-lang

akkartik | karma 13864 | avg karma 3.99 · 2016-07-07 19:18:21+00:00

Ooh, Quaint looks very interesting. I think there's a lot of overlap with my Mu project, beyond these minor syntactic decisions. Lots of room for compare and contrast, and for sharing and stealing ideas :)

userbinator | karma 78987 | avg karma 4.37 · 2016-07-07 21:04:17

It's worth noting that two of the example programs in K&R are a simple parser and "unparser" for (a subset of, but still quite complete) the declaration syntax, and it is only a few dozen lines.

IMHO the biggest difficulty that beginners face with the syntax is entirely because they attempt to parse it left-to-right and aren't following the precedence; once you realise that the operators (), [], and * in declarations have the exact same precedence they do in the rest of the language, and that expressions like 2 * (3 % foo(i + 4 / x[j])) - 1 are not read left-to-right either, it all comes together and makes perfect sense.

Starting at the identifier (or where it would go, if it was an abstract declarator) and reading outwards following the precedence rules (and recursively applying this to function calls) is the only correct way to parse these declarations, and it is basically what the example program in K&R illustrate.

Thus the "clockwise spiral rule" mentioned in the post is applicable only to certain cases and incorrect in general, as Linus Torvalds explains: https://plus.google.com/+gregkroahhartman/posts/1ZhdNwbjcYF

Edit: upon pondering the example

    int f((((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))));

I do not think it is legal in C89/90/99/11, since a parameter-declaration must begin with declaration-specifiers, and the declaration-specifiers cannot begin with an opening parenthesis. (Go to http://www.quut.com/c/ANSI-C-grammar-y.html and start following the rules via declaration->init_declarator_list->init_declarator->declarator->direct_declarator->parameter_type_list->parameter_list->parameter_declaration.)

On the other hand, I believe this:

    int f(int(((((((((((((((((((((((((((((((((()))))))))))))))))))))))))))))))))));

is legal and the type of the parameter is (pointer to) function returning int, with plenty of redundant parentheses around the abstract declarator.

robertelder | karma 3325 | avg karma 10.17 · 2016-07-08 02:31:25+00:00

Thanks for that comment, I'll read over that thread to see if I missed anything. Upon closer reading of the spiral rule, I don't think I have actually been following it as closely as I thought, so I don't think it affects the correctness of any other parts of the article. All this time, I had been kind of thinking that the spiral rule was a sort of 'gold standard' way to remember how to read declarations, but it doesn't clearly explain how to parse stuff like

   a[1][2][3][4];

bla2 | karma 1888 | avg karma 5.06 · 2016-07-07 22:47:13

Fun fact: `typedef int I;` and `int typedef I;` are both valid and do the same thing.

bluetomcat | karma 3230 | avg karma 3.81 · 2016-07-07 23:51:17

For the curious on why this is: at the syntax level, typedef is treated like any other storage class specifier (static, extern, auto, register). So it is legal just like 'int static a' is legal.

gsg | karma 893 | avg karma 2.96 · 2016-07-08 05:57:31+00:00

And if you want to go weirder, implicit int means that you can leave out the `int`:

    typedef x;

Even "better", C declarators can be empty in order to allow for struct/enum declarations that do not list any variables. So you can leave out the variable, add qualifiers, etc:

    typedef;
    const typedef;
    typedef const;

It's a really fun syntax in some ways.