Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

On a first skim, this looks really nice; complaints that it's unreadable are unfounded. The background that makes it readable are Wirth's Compiler Construction http://www.ethoberon.ethz.ch/WirthPubl/CBEAll.pdf plus precedence climbing http://en.wikipedia.org/wiki/Operator-precedence_parser#Prec...


view as:

complaints that it's unreadable are unfounded

Not exactly. You have to remember that language and compiler design require a LOT of work and experience to understand, and that many programmers will only see this as, frankly, spaghetti.

I think it could have used some more block comments, but that's just me.


I'm writing a chapter for AOSA on something like this: a self-hosting compiler of a subset of Python to Python bytecode. It'll present the full code (about the same length) and try to explain it well for people not into compilers yet, but in the meantime I recommend the Wirth book.

What's your favorite shorter intro? I'd especially like to reference other educational compilers that are self-hosting.


So, there is another version of Architecture of Open Source Applications in the works? I love the first two editions - they are unique books that really are gems. Are you able to divulge what projects are covered in the new version?

https://github.com/aosabook/500lines and I'm looking forward to reading it too. :)

I'm tempted to link to my draft chapter, but though the code is essentially done the text needs a lot of work.


I had the same first instinct, but given that a) it's very very tidy code and b) if you want to understand the inner workings of a compiler then you really do need to figure this out, I decided on review that it's basically self-documenting.

Of course figuring out what it's doing is one thing - understanding why it is done in this particular way is another, and while I was able to find my way around fairly quickly I'd cry if I had to re-implement it. I do love how small it is though, that gives it great educational value.


> understanding why it is done in this particular way is another

Isn't that the reason for comments in the first place?


I look to comments to tell me what a block of code is doing rather than why, eg 'Performs a Discrete Cosine Transform on the contents of the buffer' or 'Bubble sort algorithm rearranges the records in at least as much time as required to enjoy a nice cup of tea.'

The 'why' of a very low-level tools like this is the sort of thing that needs to be explored at length in a paper or (in this case) a book, otherwise they'll swamp the actual code. Sometimes as a learning exercise I'll take something like this and comment the hell out of everything, but the value there is more in writing the comments than trying to read them again later. Of course this is very much a matter of personal taste.


I am a junior dev without a ton of experience so correct me if I'm wrong, but I strongly disagree. Comments should explain "why" something was written. Wouldn't the function name indicate what you are doing (and comments in the function)? This is especially true in business logic.

reverseNaturalSortOrder(listOfItems); // case sensitive sort of items by reverse alphabetical order

or

reverseNaturalSortOrder(listOfItems); // sort this way because the upper layer expects items in reverse order since user requested it

I think it is usually significantly easier to understand what something is doing rather than why it is doing that. To answer the former it usually requires a narrow scope of focus, but the latter requires a very broad scope.


Sure, I'm not arguing against ever saying why you'd do something :) I just like descriptive comments because it speeds up the business of figuring the high level structure of the code - you're right that understanding exactly what's going on can be the most difficult part, but for me that answer usually falls out as I build a mental model of what it's doing. On the other hand some algorithms need in-depth explanation that's beyond the scope of comment (hence my example of the transform).

Mind, I mostly do hobby/experimental programming, I've just been doing it for a long time. So I'm not commending this as a work practice or anything - your point about business logic made good sense.


I agree with you completely - The code explains what you're doing, comments explain why you did it that way. Ideally, any comments that explain what you're doing would end-up being redundant when looking at the code.

I think this code details a special case of the above though, in that it comments what the enums are instead of just naming the enums. I give that a pass strictly because this code needs to be able to compile itself, and I don't think it supports named enums, so the comment was necessary to make up for that.


Its not that simple though, error fixes and edge cases often obfuscate something that was understandable. A why comment is never bad, but a what comment is often as valuable as a test

Like I noted with my special case, it's not always that simple, but I routinely find the best commented code to be code which was written with the comments explaining why and the code explaining what. There are definitely time where a what comment is warranted, but it's just not the general case.

That should almost never be the case. If you find this to be a frequent occurrence, then the code base in which you are working is not designed for the problem domain to which it is being applied.

Agreed - code should explain what code does. Duh.

There are three levels to consider:

1. Readability for modification

2. Readability for the "what"

3. Readability for the "why"

All human-readable description in code is there to make the difference between having a piece of documentation pointing you in the right direction, and having to reverse engineer your understanding. It's an optimization problem with definite tradeoffs. Although CS professors and many tutorials will tend to encourage you towards heavy description, over-description creates space for inconsistent, misleading documentation, which is worse than "not knowing what it does."

When you see code that is dense and full of short variables, it's written favorably towards modification. It is relying on a summary comment for "what" it does, and perhaps on namespaces, scope blocks, or equivalent guards to keep names from colliding. Such code lets you reach flow state quickly once you've grokked it and are ready to begin work, because more of it fits onscreen, and you're assured the smallest amount of typing and thus can quickly try and revise ideas. The summary often gets to stay the same even if your implementation is completely different. And if lines of code are small, you enjoy a high comments-to-code ratio, at least visually.

Code that builds in the "what" through more descriptive variable names pays a price through being a little harder to actually work in, with big payoffs coming mostly where the idea cannot be captured and summarized so easily through a comment.

In your example, one might instead rework the whole layout of the code so:

    var urq; /* user request type */

    ... (we build list "l") 
    
    /* adjustments for user request */
    {
      if (urq=="rns") { /* case sensitive sort by reverse alphabetical order */ ... }
    }
If you aren't reusing the algorithm, inline and summarize. And if you're writing comments about "what the upper layer expects", then you(or your team) spread and sliced up the meaning of the code and created that problem; that kind of comment isn't answering a "why," it's excusing something obtuse and unintuitive in the design, and is a code smell - the premise for that comment is hiding something far worse. If the sequence of events is intentionally meaningful and there's no danger of cut-and-paste modifications getting out of sync, it doesn't need to be split into tiny pieces. Big functions can be well-organized and easy to navigate just with scope blocks, comments, and a code-folding editor.

"Why" is a complicated thing. It's not really explainable through any one comment, unless the comment is an excuse like the example you gave. The whole program has to be written towards the purpose of explaining itself(e.g. "literate programming"), yet most code is grown organically in a way that even the original programmers can only partially explain, starting with a simple problem and simple data and then gradually expanding in complexity. Experience(and unfamiliar new problem spaces) eventually turns most programmers towards the organic model, even if they're aware of top-down architecture strategies. Ultimately, a "why" has to ask Big Questions about the overall purpose of the software; software is a form of automation, and the rationale for automating things has to be continuously interrogated.


Well, the goal is for it to be minimal, and C doesn't actually do a lot of work for you. On reading it, the algorithm it uses to parse was of less interest than the various tricks it uses to initialize and manage the state of the parser in a compact way.

Compilers are also a very well understood topic. If you have seen (and understood) a recursive decent parser with precedence climbing, the code looks as expected. It is a pretty straightforward implementation.

Basically, what you're saying is that any toy compiler example should be accompanied with a copy of the Dragon Book.

You seem to suggest that a background in compiler theory is somehow table stakes for commenting on HN. Since many here are not developers, and many developers don't have a CS degree, a few contextual comments seem appropriate.

You seem to suggest that a niche, minimalistic toy example should always be accessible to non-developers.

It's false dichotomy that you either get it or you don't. People will access things according to their ability. Where to draw the line on being inclusive? If some has genuine curiosity and motivation to ask a question and learn, then providing a few lines of overview doesn't clutter the board much and can be a positive contribution.

Exactly, where to draw a line? Explaining a concept of abstract and virtual machines may take a few pages of a dense text, explaining how to parse expressions with precedence may require dozens of pages, explaining C types will add a few more.

So, yes, it's either you're curious enough to dig into a code and find the relevant explanations somewhere else (the said Dragon Book and alike), or you won't get it, regardless of how comprehensive comments are.


If you're commenting about a remarkably clever example of an obscure topic which requires prolonged study to understand, then yes I'd suggest that a background in _______ is somehow table stakes for commenting on a focused discussion of _______ on HN.

"Toy examples" are often the result of long & deep study and practice of a subject, creating something profound which casual observers are not entitled to instantly understand. In this case, it's a very clever compiler: everybody understands this summary, and if you want "a few contextual comments" beyond the source code itself then you know where to get enough information to learn what you need to understand this.

If you don't "get it", and don't want to "get it" on your own, it's not for you.


It is unreadable. Lumping code together into big functions just so you can say "Look I've created a compiler in 4 functions" is pointless, unless your goal is to post it to HN to show everyone how clever you are.

This is not how you code when you work in a team or when you know some other poor soul has to come along and maintain it.

I suggest you take a look at [1] then go and read this excellent book by Martin Fowler: "Refactoring: Improving the Design of Existing Code" [2]

[1] https://en.wikipedia.org/wiki/Code_smell

[2] http://www.goodreads.com/book/show/44936.Refactoring


I have a feeling that the project is a mix of for-fun and to-see-if-its-possible.

Not all programming is enterprise quality, some programming is intentionally not.

Being so dismissive of that, seems a little silly to me.

The demoscene doesn't exist because of a bunch of programmers trying to make the lives of every other programmer more difficult, it exists because people like the challenge.


Of course this occurred to me, my comment was more of an emotional response to "complaints that it's unreadable are unfounded".

If quite a few people are saying that it's unreadable, you don't just dismiss them as making "unfounded" statements.

> The demoscene doesn't exist because of a bunch of programmers trying to make the lives of every other programmer more difficult, it exists because people like the challenge.

I'm pretty sure that this doesn't need to be stated.


This also is not code that would need to be written by a team.

The functions may be big, but they also don't have all that much duplicate code inside them. The choice of 4 functions isn't arbitrary either - they nicely divide the problem into:

- next() - splits the source code into a series of tokens

- expr() - parses expressions

- stmt() - parses statements

- main() - starts the processing of the source, and also contains the main interpreter VM's execution loop

Code generation is integrated into the parsing, since it's generating code for a stack-based machine and that also very nicely follows the sequence of actions performed when parsing.

In fact I'm of the opinion that the obsession with breaking up code into tiny pieces (usually accompanied by the overuse of OOP) is harmful to the understanding of the program as a whole since it encourages looking at each piece independently and misses "seeing the forest for the trees".

In contrast, this is code that is designed to be easily read and understood by a single person, showing how very simple an entire compiler and interpreter/VM can be. It doesn't attempt to hide anything with thick layers upon layers of abstraction and deep chains of function calls, but instead is the "naked essence" of the solution to the problem.

Someone used to e.g. enterprise Java may find this style of code quite jarring to their senses, but that's only because they've grown accustomed to an environment in which everything is highly-abstracted and indirect, hiding the true nature of the solution. Personally, I think the simplicity and "nakedness" of this code has a great beauty to it --- it's a functional work of art.


Breaking up code is more than just removing redundancy - it's about exactly what you have written: "Encouraging looking at each piece independently". That actually has advantages, the main being that it is easy to understand how each piece operates independently of the other pieces - simply cohesion & coupling which is applicable to any language, not just "Java". I don't believe in my experience that I miss "seeing the forest for the trees" - I encourage you to try it, perhaps by learning some functional programming.

There is one obsession that I am tired of is people posting "X awesome thing in 120 lines of JavaScript" or "Y in 4 functions". Just because the problem is reduced to small as possible metric, it doesn't make it good.

PS: Also you mention negatively connotated terms like "OOP", "Java", "thick layers of abstraction" and "deep chains of function calls" as if you've ascertained that I'm some enterprise developer that doesn't have any C experience and wouldn't know simplicity if it bit me in the ass.


I suggest you consider how someone could know of, and understand, your suggestions very well, while still having the opinion he does. Don't underestimate other people.

Most of it was readable, but the printf on lines 57--59 made me retch. I see what it's doing, but it's not what I'd call easily maintainable:

  printf("%8.4s", &"LEA ,IMM ,JMP ,JSR ,BZ ,BNZ ,ENT ,ADJ ,LEV ,LI ,LC ,SI ,SC ,PSH ,"
         "OR ,XOR ,AND ,EQ ,NE ,LT ,GT ,LE ,GE ,SHL ,SHR ,ADD ,SUB ,MUL ,DIV ,MOD ,"
         "OPEN,READ,CLOS,PRTF,MALC,MSET,MCMP,EXIT,"[*++le * 5]);

I'd like to know why this was downvoted, and if people who down voted it can explain what the code is doing please?

It's taking an integer (representing an operation) and printing out the name of that operation.

First thing to say is that "* ++le" is the integer representing the operation to perform. This basically walks through the array of instructions returning each integer in turn.

Starting at the beginning of the line, we have "printf" with a format string of "%8.4s". This means print out the first 4 characters of the string that I pass next (padded to 8 characters). There then follows a string containing all of the operation names, in numerical order, padded to 4 characters and separated by commas (so the start of each is 5 apart). Finally, we do a lookup into this string (treating it as an array) at offset "* ++le * 5", i.e. the integer representing the operation multipled by 5 (5 being the number of characters between the start of each operation name). Doing this lookup gives us a char, but actually we wanted the pointer to this char (as we want printf to print out this char and the following 3 chars), so we take the address of this char (the & at the beginning of the whole expression).

It's concise, but not exactly self-documenting.

Does that make sense?

(I didn't downvote.)


How is that not self-documenting if one knows C?

I think you and I might disagree on the meaning of self-documenting. ;)

I don't think we really do.

It is impenetrable black magick if one "knows" C -- but quite clear if one /actually/ knows C.


Ah, the No True Scotsman finally arrives to the party.

No. The printf() requires that one has read K&R. That's not a high barrier to clear. Pointers are chapter 5.

> On a first skim, this looks really nice; complaints that it's unreadable are unfounded.

Man, I can't even tell what this is supposed to be. My confusion is entirely founded. My thought process with articles like this goes something like "C in four functions, huh? Sounds like it could be clever. I'll just click and read the explanation... Oh, there isn't an explanation. Well, maybe this file will explain things! ...Nope, it's 500 lines of mostly-uncommented if-else statements. Maybe it's a compiler? I dunno!"

I'm sure there's a subset of the programming community for whom this is crystal clear on first sight, and that's great; but there's a lot more of us who could probably get the joke with a few hints, so it would be nice if you'd help out instead of declaring that if you understand it, it must be easy.


In the sense that some people won't have an idea of what's going on, this community altogether isn't particularly inclusive at all. Personally, I really don't want the topics this site covers to cater to a lowest common denominator, and I'm sure that isn't what you had in mind either, but that's the effect of taking "more of us" to mean more than you personally.

Actually, a lot of us probably don't understand what this is doing. I sure don't, but I really don't consider myself "lowest common denominator" either. I come here to learn, to be honest!

My point isn't that not knowing what this does makes you a lowest common denominator. It's that most stuff that is shared on this site at least borders on being what I'd call esoteric, and if making each individual submission more approachable or catering to a larger general audience is a goal of this community, it isn't really going that way. If it was going that way, I doubt the community would be particularly interested in this site.

This submission in particular has dubious practical use, and the description, "an exercise in minimalism", is telling of a sort of artistic intent. If you don't understand what it does, how it does it, or if you don't like it, it won't lower my opinion of you in any way, but its inclusion on this site is part of why I like to come here every now and then. I get to discuss subjects that relate to my work and hobbies, but I also get to look at weird alien code and think hard in unfamiliar terms. From what you are saying, I think you can relate.

Personally, I could glance over it and get the idea that it is a C compiler, but if you were to show me some code in written with the latest JS MVC or FRP framework, don't hold your breath for me to tell you what it does. I can't say that I fully understand this, and that's why I enjoy the rich discussion the submission spawned here.


Fair enough :-) thanks for clarifying.

The logical leap from adding a "few hints" to everything becomes "lowest common denominator" is the size of the Grand Canyon.

Personally I don't see anything wrong with suggesting to add a few hints, but the basis of that suggestion in this case was that "there's a lot more of us who could probably get the joke" with the hints. If making it approachable to more people is inherently a good thing, the logical conclusion is to make it approachable to everyone.

With code like this, the readability obviously isn't a high priority consideration and sometimes the exact opposite of the goal, with the impenetrability sometimes being part of its charm. This is Hacker News after all, and if your reaction to of a piece of code that describes itself as "an exercise in minimalism" is to leave a snarky comment about the lack of documentation, you should probably check your news elsewhere.

If you have any interest in the subject, the initial comment "just enough features to allow self-compilation and a bit more" should give the purpose of the code away. If not, it ought to have been a clear sign of dragons.


Yes, I'm sorry: I meant to defend this code from the charge of being pointless code golf, and inadvertently disparaged people without the background to enjoy reading it. It really is hard to follow without that background, which lots of good programmers don't have.

I hope someone will write an explanation. I'm still working on one for my own (quite different) little compiler.


> complaints that it's unreadable are unfounded

    int *a, *b;
    int t, *d;
I can't really see how anyone can say that code that uses one-letter variable names (with the exception of the "standard ones", whose meaning is defined at the top) is readable.

OK, 'unfounded' was a little too strong. I'd change some things myself, including comments on those declarations; but if this looks like a code-golf game to you, it's not, it's a style you're not used to.

Legal | privacy