Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login
Git Source Code Review (fabiensanglard.net) similar stories update story
143.0 points by laurent123456 | karma 6376 | avg karma 5.53 2014-03-30 03:51:03+00:00 | hide | past | favorite | 40 comments



view as:

Somewhat related: I've been using an iOS app called NapCat for reading code from GitHub repos for pleasure and edification:

https://itunes.apple.com/app/napcat-github-client-for-open/i...

The 'trending' and keyword search features make it stand out from other GitHub clients on iOS. No affiliation.


I want this for Android :D

To address something a commenter said I'd just like to point out that whatever Linus may think, a "git" in British slang is an unpleasant person, not a stupid person.

From Wikipedia

> Torvalds has quipped about the name git, which is British English slang roughly equivalent to "unpleasant person". Torvalds said: "I'm an egotistical bastard, and I name all my projects after myself. First 'Linux', now 'git'." The man page describes git as "the stupid content tracker".


Oops ok, sorry Linus! I didn't mean to suggest that you had applied "git" inaccurately to yourself. I guess it's Merriam-Webster that is one source of the incorrect definition.

http://www.merriam-webster.com/dictionary/git


And the downvotes are because...the weak attempt at humor I suppose? HN moderators -- a useful feature would be being able to annotate a downvote with a reason, so that commenters can learn where they erred.

The Merriam-Webster reference is pertinent in a subthread which is about the meaning of the word "git" (yes, not germane to the article, but to criticize that you would downvote the comment at the root of the subthread.) So all these downvotes for one good-natured sentence?


I feel this description is Linus backtracking, where the original target of the "git" label was in fact Andrew Tridgell, who was at the center of the Linux/bitkeeper reverse engineering drama [0].

[0] http://www.theregister.co.uk/2005/04/14/torvalds_attacks_tri...


Is it not a corruption of the word 'get', meaning bastard/illegitimate child? As such it's more of a general insult than anything in particular.

I'm English and 'get' and 'git' are used in different contexts.

"This job will be a complete git to do" and "He's always been a bit of an old git"

vs

"He is a complete get"

As for your question, the only source for git been a corruption of get I could find is a wikipedia post which has a big [citation needed] next to it and the sentence is incomplete.


I'm English, and i thought "get" was just some weird provincial pronunciation of "git".

As for etymology, the OED thinks it came from "get":

http://www.oed.com/view/Entry/78536?redirectedFrom=git#eid


Nice intro, I thought it was a bit short though. I was thinking, "OK, now we're going to get into it..." and then it ended.

There's a link to the next part under /Next Part/. Did you miss that (or am I misunderstanding you)?

I'm still not sure why it took an immediate digression into editors.

if the author is reading this. there's a typo in the article: wbout

I was surprised to see no mention of Perl, but looks like it's only used for tools like git-svn.

It's used for more than git-svn: git-add--interactive.perl git-cvsexportcommit.perl git-cvsserver.perl git-relink.perl git-svn.perl git-archimport.perl git-cvsimport.perl git-difftool.perl git-send-email.perl

There is actually more lines of perl than shell if we exclude tests:

  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  C                              340          20596          18649         135900
  Perl                            43           4698           4310          27503
  Bourne Shell                    77           2523           1843          18766
  C/C++ Header                   140           2636           4635          11132
  -------------------------------------------------------------------------------
  SUM:                           600          30453          29437         193301
  -------------------------------------------------------------------------------
If we don't exclude tests:

  -------------------------------------------------------------------------------
  Language                     files          blank        comment           code
  -------------------------------------------------------------------------------
  C                              341          20602          18656         135910
  Bourne Shell                   761          21736           6908         124222
  Perl                            47           4728           4325          27739
  C/C++ Header                   140           2636           4635          11132
  -------------------------------------------------------------------------------
  SUM:                          1289          49702          34524         299003
  -------------------------------------------------------------------------------

You missed a big one - gitweb.

Nice write up. I'm surprised there is so much use of shell scripts. I usually do my best to avoid writing them and use them for little more then "stringing commands together". I guess not everyone shares my dislike for shell scripting and it's syntax.

Going back to the introduction, this comment:

"Linux kernel 3.10 release had 15,803,499 lines of code !"

This is probably a stupid question, but what's going to happen to the large low level projects in the future when these devs have retired. Is the industry creating enough young talent to take over the low level stuff?

I personally went through Uni without having to write a single line of C. Even with those that go through a Computer Science (instead of Development focused) degree, how many actually leave and choose C for their personal projects or use it in the industry?


C isn't popular at colleges... and why should be it? So many more interesting, fun, crazy languages to play with... lisp, haskell, rust, ocaml, etc.

That said, once you are in the real world and you bump into your first real hard problems, the answer is often C. Need to speed up some Python, use C. Need to pour rocket fuel on your Erlang, use C. Need to run on an embedded device, use C. C tends to be low on surprises and most of them are exceptionally well documented and understood at this point. C tends to have decent libraries for whatever you need.

In certain open source communities -- you will use C because the stuff you depend on uses it. It is that simple.

I suspect the number of C developers is still relatively high, and while it won't be a primary / favorite of many -- it is the lingua franca if you will... lots of times stuff can be framed as a comparison to C, because everyone understands the context.


"C++ and Java, say, are presumably growing faster than plain C, but I bet C will still be around." (Dennis Ritchie)

"and why should be it?"

I think you answer your own question. I was going to reply to the the parent post, but I'll say it here. I find it barking mad that you can get a 4 year degree without using C or close equivalent. How else you do really understand how the machine works, how do you debug difficult hardware problems, how do you.. you get the idea.

The counter argument is that University is not trade school. Sure, but you need to understand the fundamentals. While I intensely dislike how Knuth uses a made up assembly language for his books, he has an important point - we really do need to understand how things are implemented under the hood, either because we are the ones implementing what is under the hood, or we have to understand which of many choices to use, and/or to debug problems when they come up. And that is not a detail, it is a fundamental aspect of being an engineer.

I witness people, when I ask them "why is X happening", respond with guesses. I don't mean they say "since Y and Y, I suspect Q", which is a reasonable utterance for an engineer. I just get "Q". You press as to why, and like startled birds they flush and you get "well, maybe R, or it could be P. Yes, it's P".

Whether it is a "science" degree or "engineering", we need to understand systems, and have experience with them. Reading a book about heaps only takes you so far. If you haven't implemented, say, a simple file allocation system, debugged some pointers, stepped through the allocation and deallocation of memory, looked at the assembly generated from your code, and so on (these can reasonably be replaced with other, equivalent experiences of course), how can you really understand computers?


The shell can be a disproportionately powerful programming environment.

    $> foo bar &
    $> foo baz &
parallelizes `foo` with a minimum of technical debt compared to a lower level approach. Perhaps the most famous example in all hackerdom llustrating the power of the shell was the Knuth-Mcillroy affair.

http://www.leancrew.com/all-this/2011/12/more-shell-less-egg...


"The shell can be a disproportionately powerful programming environment."

Then how do you explain why other languages are so much more popular?

Not only that, but how do you explain why the market for software developers compensates those with experience in these other languages more than those few developers who are highly competent in writing portable shell scripts?

Relative to what I have seen written by other developers, I consider myself a competent shell user.

Practice helps. On average I write or revise more than one new script per day.

And I've been doing this for years. The number of shell scripts I have written numbers in the thousands.

Is there a place where shell scripting is valued on par with the trendy languages like Python, Ruby, etc.?


Power is ill-defined and not related to popularity, which is in turn only lightly linked to market value.

(Arguably the most powerful language by weight is APL, which has a tiny but dedicated community)

Shell is very good at a narrow range of tasks involving text and file manipulation, but a couple of crippling limitations: whitespace in filenames (especially, god help you, newlines) destroys many casually written scripts, and the only structure really supported by the shell utils is newline-delimited.


I love APL!

I am a student of k and now q.

The biggest problem of the shell is the rules around quoting. That includes whitespace but also many other snafus.

As for all the utilities being geared toward line-by-line (newline delimited), this is true.

But quoting can be learnt with practice; I rarely have problems because it is second nature.

And, there's a shell utility called lex. It lets you design your own utilities (filters).

And you can create filters that read multiple lines, easy as pie. (Easier than mastering awk.)

You can even create your own programming language by combining lex with another standard utility: yacc. This is how C was created.

Do they still force CS students to learn about these utilities?


That was good. Apart from anything else it reminds me I should use `tr` more.

You didn't have an exercise in building compilers using yacc/bison or similar, generating assembler code? It was a required exercise on my university (uni. of tech. Vienna). Though it was the only time when I wrote assembler (and it wasn't x86 assembler but 64bit alpha assembler, because this instruction set is simpler to teach - it's RISC).

Where's the code review.. and did he really their in an ide circlejerk in this article? I can't even tell what it's about. Why is this on hn.

Sadly the source code review stays far away from the source code for most lines.

I would have liked to see examples on how algorithms are implemented in git and general notions on git's coding style, testing?? and other interesting stuff that one might find in one of the landmark open source projects.


I'm confused. He talks about how he reviews the code. I expect some thoughts about it. Then he talks about how he's switching to vim and it's weird? Then he links to Documentation?

You're supposed to click the link to take you to the next page.

Yes I missed that linked too the first time, he really should make it more obvious that there's a part 2 and part 3.

This is not a "source code" review. There are no comments on the overall practices that developers must deal with when writing C: resource management, string handling, general error handling and its coverage (e.g., is EVERY system call checked for errors?), possible cross-platform issues, unspoken assumptions about implementation-defined behavior, etc.

Usually Fabien gives a high level overview and dives into some of the pieces of the project that he finds interesting. I don't think this is the type of review that focuses on overall C practices unless it's something really unique to the project.

This is a series of articles, and is not yet complete. It starts from a high level description of the organization of the project, and is going into more detail as it goes, but it is not yet complete; notice that at the end of the third article there is a "Next: To be published: Git internal algorithms (Tree and Diffcore API)". I presume there will be more after that, though the author doesn't provide an overview of what he expects the full series to consist of.

Also, I don't believe this is intended to be a review looking at coding practices that may cause problems, but rather a review in the sense of a description of a real-world codebase that describes how it works. Think of it in the sense of a review of the literature, a summary of what a body of work is about, not in the sense of a code review that is looking for potential problems that may need to be fixed.

As far as the issues you bring up go, Git is pretty thoroughly cross-platform if you assume a reasonable Unixy/POSIXy platform; while it can be made to work on Windows, it's a bit more cumbersome there; you need to use something that provides at least somewhat of a Unix like environment on Windows, such as MSYS or Cygwin.

As far as resource management and so on go, one of the big problems with the Git code base is that it's designed pretty heavily around a one-shot command model, rather than something which is amenable to being used as a library or in a long-running process. This is reason why the libgit2 project exists, to provide an implementation of Git that can be used as a library and integrated into other applications. Even Linus himself, who wrote the initial Git implementation, uses libgit2 when he wants something that can be used as a library[1].

[1]: https://plus.google.com/+LinusTorvalds/posts/X2XVf9Q7MfV


I built this tool a while ago help me browse source code but haven't done anything with it yet:

http://sherlockcode.com/

Here is the demo - a now older version of the jQuery source code:

http://sherlockcode.com/demos/jquery/#!/src/core.js

Hover over a variable to see all instances of that variable highlighted and click on a variable to see all uses of that variable across all the files. You can also bookmark lines by clicking on the line number.


This is very nice. I wish GitHub had something like this built-in.

This comment has gathered more interest than the Show HN post I did about it months ago. I'm thinking of running the top 100 projects on GitHub through it this week and seeing if it has "legs".

Legal | privacy