Git's initial commit

brandonbloom | karma 1949 | avg karma 4.07 · 2014-11-24 01:06:10+00:00

I love checking out very early versions of projects. You often get to see the essence before the real world came in and ruined the beauty of it.

akkartik | karma 13864 | avg karma 3.99 · 2014-11-23 19:39:05

I do this as well. It really should be more widely broadcast.

(I've also spent some time thinking about how it's kind of a hack, and what we can do to make it better: http://akkartik.name/post/wart-layers)

reply

Monkeyget | karma 636 | avg karma 4.54 · 2014-11-24 12:57:05+00:00

There is The Architecture of Open Source Applications series of book http://aosabook.org/en/index.html were one of the author of the software explain the essence of the program.

RexRollman | karma 4158 | avg karma 2.73 · 2014-11-24 02:38:37+00:00

I know what you mean. The SystemD controversy motivated me to take a look at the initial version of NetBSD's init rc script, which was nicely simple.

Fizzadar | karma 1102 | avg karma 3.71 · 2014-11-23 19:12:07

Great to see the original command set, and the title of course: "GIT - the stupid content tracker"

derekp7 | karma 6524 | avg karma 2.68 · 2014-11-24 03:02:53+00:00

If I recall, Linus was highly pissed at the time he wrote GIT. Lots of his comments at the time were meant as a slam against the guy who was reverse-engineering the Bitkeeper protocol, which resulted in the license for Bitkeeper getting yanked for the kernel project. I wonder if Linus is still angry with Tridgell?

sjwright | karma 8739 | avg karma 2.55 · 2014-11-23 21:28:34

I've been in the situation where a combative party has spurred me on to do some of my best work. I doubt Linus holds a grudge... and considering the consequences I wouldn't be surprised if he wrote a tounge-in-cheek thank you letter!

byteCoder | karma 894 | avg karma 9.22 · 2014-11-24 01:18:03+00:00

Following the tradition of sports, I propose that commit id e83c5163316f89bfbde7d9ab23ca2e25604af290 be officially retired.

SwellJoe | karma 32744 | avg karma 4.79 · 2014-11-23 19:40:18

Every commit id is pretty much automatically retired. The odds of collisions in our lifetime are pretty small.

Or am I over-thinking it?

reply

traek | karma 1596 | avg karma 4.04 · 2014-11-23 19:44:56

Yeah, I'm pretty sure it was a joke.

VieElm | karma 676 | avg karma 5.24 · 2014-11-23 19:45:46

I was under the impression that git hashes are generated based on the contents in a commit. If that's true you cannot retire them. They're not random.

Iburinoc | karma 42 | avg karma 2.21 · 2014-11-24 05:07:07+00:00

While that is true, part of the contents that is hashed is the time of commit. Therefore if you (somehow...) managed to get that hash, you could just reset and commit again. Or amend.

In any case, I believe he was joking. The odds of a sha1 collision are very very low.

reply

sytelus | karma 16606 | avg karma 4.49 · 2014-11-24 09:06:26+00:00

Very low indeed. Chance of SHA1 collision is of the order of 2^52. One person explained it this way: Chance of everyone currently on Earth winning a jackpot in their lifetime is actually higher than a single random SHA1 collision. It should be actually mind boggling how many of the software systems and algorithms rely on hashes and them being not collided.

yuhong | karma 6263 | avg karma 1.05 · 2014-11-30 00:33:01+00:00

I think it is 2^61 now, the paper that once showed the 2^52 attack has removed the claims.

CUViper | karma 804 | avg karma 3.3 · 2014-11-24 01:46:56+00:00

Given that the only way to reuse it is to duplicate the tree and commit metadata exactly, or find an sha1 collision, I think it's pretty safe. :)

I wonder if there are any git sha1 collisions out there in aggregate, say across all of github. Would they even notice if there were?

reply

meowface | karma 10977 | avg karma 2.45 · 2014-11-24 02:18:17+00:00

>I wonder if there are any git sha1 collisions out there in aggregate, say across all of github.

Despite the incredibly high number of all commits there must be, I think the chance of a collision is still very unlikely. 2^160 is a pretty big number.

reply

MichaelGG | karma 17386 | avg karma 2.49 · 2014-11-23 20:29:33

The number of inputs before a likely collision is more on the order of 2^80. Which is still pretty large.

meowface | karma 10977 | avg karma 2.45 · 2014-11-23 20:33:01

True, the birthday paradox definitely makes it a lot more likely, but as you say the odds should still be too low.

thret | karma 1707 | avg karma 1.91 · 2014-11-24 07:20:11+00:00

This is comparable to the number of atoms in the universe. Pretty large! We will never see an accidental collision.

zxcdw | karma 1100 | avg karma 2.56 · 2014-11-24 11:36:05+00:00

Not quite, atoms in the universe is in the range of 10^80, which is a bit less than 2^266.

On the other hand, 2^80 is "only" approx. 1.2 * 10^24. Still, good luck colliding with that without big effort.

reply

userbinator | karma 78987 | avg karma 4.37 · 2014-11-23 21:29:15

There's a table in http://en.wikipedia.org/wiki/Birthday_attack which gives some numbers, but it's missing the 160-bit entry. Nevertheless, even the number of 128 bits hashes required for a random collision are extremely high.

In hindsight, it's good that git didn't choose MD5, since collisions for MD5 can be generated almost trivially now. However, the decreasing security of SHA-1 could be a concern for the future.

reply

keypusher | karma 2250 | avg karma 3.42 · 2014-11-24 02:03:02

I don't think commit hash was ever intended to be cryptographically secure. It's just a unique identifier.

> Source control management systems such as Git and Mercurial use SHA-1 not for security but for ensuring that the data has not changed due to accidental corruption. Linus Torvalds has said about Git: "If you have disk corruption, if you have DRAM corruption, if you have any kind of problems at all, Git will notice them. It's not a question of if, it's a guarantee. You can have people who try to be malicious. They won't succeed. [...] Nobody has been able to break SHA-1, but the point is the SHA-1, as far as Git is concerned, isn't even a security feature. It's purely a consistency check. The security parts are elsewhere, so a lot of people assume that since Git uses SHA-1 and SHA-1 is used for cryptographically secure stuff, they think that, OK, it's a huge security feature. It has nothing at all to do with security, it's just the best hash you can get.

http://en.wikipedia.org/wiki/SHA-1#Data_integrity

reply

gpvos | karma 7664 | avg karma 2.34 · 2014-11-24 14:16:32+00:00

Too bad he didn't use SHA-256 though. It had been available for three years at that moment.

beagle3 | karma 16421 | avg karma 2.62 · 2014-11-24 03:28:00+00:00

According to the Wikipedia entry[0], "No actual collisions have yet been produced", github or otherwise. The NSA might have produced them, but publicly non have been found, and it's not for lack of trying.

[0] http://en.wikipedia.org/wiki/SHA-1

reply

CUViper | karma 804 | avg karma 3.3 · 2014-11-23 21:48:47

I take that statement to imply "on purpose", or as part of an attack. You can't know whether there's a coincidental collision anywhere in github unless you bother to look. But I do understand that it's still extremely improbable.

hyp0 | karma 907 | avg karma 2.27 · 2014-11-24 03:52:26+00:00

In a thousand years, a git sha-1 collision is going to cause a lot of trouble.

6chars | karma 235 | avg karma 4.27 · 2014-11-24 02:52:47+00:00

Done. Don't ask me how I did it, but you will never see that hash come up again naturally during your lifetime.

kzrdude | karma 11414 | avg karma 2.35 · 2014-11-24 07:39:53+00:00

it is SHA-1 though, might still be broken during our liftime

bjcubsfan | karma 547 | avg karma 11.64 · 2014-11-24 16:01:48+00:00

Which is why he added the caveat "naturally".

fivedogit | karma 1298 | avg karma 5.85 · 2014-11-24 01:35:22+00:00

Thread from 829 days ago. https://news.ycombinator.com/item?id=4395014

Sevein | karma 249 | avg karma 9.96 · 2014-11-24 03:24:59+00:00

Good memory!

fivedogit | karma 1298 | avg karma 5.85 · 2014-11-24 05:08:57+00:00

Nah. I just use Hackbook.

https://chrome.google.com/webstore/detail/hackbook/logdfcelf...

reply

benihana | karma 5051 | avg karma 5.64 · 2014-11-24 02:03:52+00:00

Is there a reason there aren't any braces around single-line if statements? Is that a C thing? It seems kind of inviting to bugs to me.

workingandtired | karma 40 | avg karma 2.22 · 2014-11-24 02:09:25+00:00

It's a C-style language syntax option. If it's only a single line in after the if, the braces are optional. I've also seen it in C++ and PHP.

Whether or not it's sloppy is up for debate and just a matter of personal preference.

reply

solistice | karma 584 | avg karma 1.38 · 2014-11-24 02:20:04+00:00

C# also has it, and i've heard Java has got it as well, but I've never tested it in the latter.

I personally like being able to do it since it allows me to do away with the 2 extra lines auto indent puts in if i add brackets. That's a 50% reduction for a 4 line if. Maybe I should just buy a bigger monitor.

reply

workingandtired | karma 40 | avg karma 2.22 · 2014-11-24 02:24:02+00:00

The lead programmer at my last job encouraged us to use it as a way to make sure our conditionals & foreach loops (PHP programmer) weren't doing too much. If we had to use braces, it was a sign to check it out and see if it could stand some refactoring.

tenementfunster | karma -15 | avg karma -0.79 · 2014-11-24 03:47:53

Sounds a tad inane.

seanmcdirmid | karma 34701 | avg karma 1.77 · 2014-11-24 02:24:24+00:00

I use a plugin called littlebrace for visual studio, which reduces brace lines to like 3 pt font. My C# code has started looking like Python :)

TheEzEzz | karma 1048 | avg karma 3.24 · 2014-11-23 21:41:09

This sounds really neat. Can you drop a link? I browsed around but couldn't find anything.

vacri | karma 17701 | avg karma 1.83 · 2014-11-23 22:46:22

https://github.com/lukesdm/little-braces looks like the one.

seanmcdirmid | karma 34701 | avg karma 1.77 · 2014-11-24 04:58:34+00:00

Also see this fork/port to VS2013:

https://github.com/owen2/little-braces

Also, it is available from the online add-in manager if you search for "little braces." The VS2013 community edition is just in time :)

I couple it with the indent guideline plugin for best effect (braces are super small, light lines to track indent level, 2 space indent...).

reply

MichaelGG | karma 17386 | avg karma 2.49 · 2014-11-24 02:27:46+00:00

Or use a better bracing style. Or move to significant whitespace.

Someone | karma 30129 | avg karma 2.33 · 2014-11-24 07:17:34+00:00

"If it's only a single line in after the if, the braces are optional."

Not _line_, _statement_. Consider

  if(flag)
    foo(); bar();

and

  if(flag)
    foo =
      bar +
      baz;

That first example always calls bar().

Warning: I haven't tested this, and am beginning to doubt a bit. It must be correct, but why, then, don't I remember seeing this in underhanded C contests? Combining that with macros allows you to hide the semicolon.

reply

rgbrgb | karma 4183 | avg karma 3.2 · 2014-11-24 02:10:55+00:00

So as to have more code on the page. It's official style of the Linux kernel [1].

[1]: https://www.kernel.org/doc/Documentation/CodingStyle

reply

kevin_thibedeau | karma 19088 | avg karma 2.16 · 2014-11-24 02:15:03+00:00

In the C grammar, braces denote compound statements. Control flow statements can take any type of statement as their body rather than just the compound variety.

PurplePanda | karma 79 | avg karma 1.93 · 2014-11-23 23:43:43

It confuses me that it doesn't work for functions. like

    int main() return 0;

pdw | karma 2753 | avg karma 6.13 · 2014-11-24 08:43:15+00:00

In K&R C, the function braces serve to separate parameter declarations and local variables:

    int main(argc, argv)
      int argc;
      char **argv;
    {
      int local;
    }

PurplePanda | karma 79 | avg karma 1.93 · 2014-11-25 03:51:10+00:00

Thanks, I haven't seen that syntax before.

desdiv | karma 4685 | avg karma 4.56 · 2014-11-24 02:16:58+00:00

It's pointless to argue over these kind of things. Every major project/company has their own codified code style guide, and if you want to contribute/earn your salary then you must follow that style guide to the T. Here's the relevant quote from the Linux kernel coding style[0]:

    Do not unnecessarily use braces where a single statement will do.

    if (condition)
	    action();

[0] https://www.kernel.org/doc/Documentation/CodingStyle

guelo | karma 25003 | avg karma 4.71 · 2014-11-23 21:56:56

I consider that a bug in their style spec. Single line if statements are known to cause bugs.

sytelus | karma 16606 | avg karma 4.49 · 2014-11-24 08:48:06+00:00

That's what I'd thought for may be over a decade. About ~3 years ago I revamped my personal coding style to eliminate as unnecessary baggage as possible. As part of that I stopped using braces for single line if and I'd yet to bump in a bug because of that. Overall I find code looks more compact and cleaner, may be even less friction to read. Nowadays when I see a braces around single line if I get that "oh that's clunky code" feeling in my stomach. Things are worse with C# and lot of Java code where people insist not only having braces around single line if but also have { on its own separate lines.

I think a good language shouldn't have braces to mark blocks in first place. Given indentation,they are redundant most of the times and they just contribute in clunk. This is exactly the case with Python and hence this is essentially a default style and people hadn't be complaining about it's causing bugs.

reply

scott_s | karma 34069 | avg karma 3.96 · 2014-11-24 09:00:50

There's an enormous difference with Python, because the indentation is syntax. These two code snippets, one in Python, the other in C, do not mean the same thing:

C:

  if (condition)
    statement_1();
    statement_2();

Python:

  if condition:
    statement_1()
    statement_2()

Personally, I always use braces in C and C++, even though it is more clunky. I want the assurance. I also frequently have to make changes to code that does not use braces, and then I have to add the braces in because I am adding statements to a conditional. To me, that is more clunky.

mushishi | karma 590 | avg karma 3.21 · 2014-11-24 14:41:46

I nowadays resist the urge to make syntactically beautiful code if that means that it is a little bit brittle or vulnerable to mistakes.

Wrt. using { }, I omit them if I put the block to be executed on the same line, and usually I do that only with special cases, e.g.

  if(expr1) continue;

  if(expr2) throw new RuntimeException();

chinpokomon | karma 303 | avg karma 0.87 · 2014-11-23 23:55:56

I see merit in both sides of the debate. As long as the style is consistent, it should be legible. It is an argument similar to vi versus Emacs -- both can be right. It is more important to be consistent through a project. Adopt one style.

From this day forward, let us proclaim to always use brackets; so that intent is more obvious to the reader.

reply

elwell | karma 4299 | avg karma 1.76 · 2014-11-24 08:35:14+00:00

You can sidestep the braces debate by using a lisp.

qznc | karma 8930 | avg karma 2.79 · 2014-11-24 10:11:10+00:00

To introduce Lisp you need to fight the parens debate instead ...

dom96 | karma 6422 | avg karma 3.49 · 2014-11-24 05:05:31

or you could just use Python

exDM69 | karma 10126 | avg karma 3.75 · 2014-11-24 12:39:46+00:00

... if you want to get in to the whitespace debate instead

gknoy | karma 3726 | avg karma 2.22 · 2014-11-24 11:27:42

Easily solved by pretending the parens are whitespace. :-D

elwell | karma 4299 | avg karma 1.76 · 2014-11-24 19:54:05+00:00

The answer to the whitespace debate: http://en.wikipedia.org/wiki/Whitespace_(programming_languag...

infinitone | karma 337 | avg karma 1.2 · 2014-11-24 02:23:53+00:00

Its common in Java too...

alexvr | karma 460 | avg karma 2.22 · 2014-11-23 21:14:29

Most C-inspired languages (most popular programming languages) allow this. The only ones I've used that don't are Go and Swift. And I find it a little annoying.

If you're worried about bugs, there are other things in C/C++ to criticize first ;)

reply

desdiv | karma 4685 | avg karma 4.56 · 2014-11-24 05:00:03+00:00

It's true for Rust as well. What these three languages have in common is that they don't require parentheses around the switching boolean expression, and when you have things like:

    if expression1 expression2

it can be fiendishly hard to determine the boundary between expression1 and expression2.

jmount | karma 4050 | avg karma 3.07 · 2014-11-24 03:18:10+00:00

C has potentially ambiguous association on nested "else"s. The habit of too many braces is a safety.

jeffreyrogers | karma 10945 | avg karma 3.94 · 2014-11-24 02:20:59+00:00

Interesting fact about Git is that it was self hosting in two weeks, IIRC.

TazeTSchnitzel | karma 26116 | avg karma 2.81 · 2014-11-24 03:01:16+00:00

How can something that isn't a programming language be self-hosting?

jonesetc | karma 528 | avg karma 2.9 · 2014-11-24 03:06:39+00:00

Overloading the term. The OP presumably meant that the source for git was under git source control.

jeffreyrogers | karma 10945 | avg karma 3.94 · 2014-11-24 03:16:59+00:00

Yes, that's what I meant.

vacri | karma 17701 | avg karma 1.83 · 2014-11-24 04:43:54+00:00

'Hosting' means 'contain', 'serve'. A building can host a department or a convention, and a married couple can host a dinner party, with neither being required to be a webserver or programming language.

tripa | karma 252 | avg karma 1.63 · 2014-11-24 08:50:33+00:00

To add to that, IMHO self-hosting for VCSs is closer to the original meaning of the phrase than for compilers.

kazinator | karma 30751 | avg karma 1.78 · 2014-11-24 03:24:08+00:00

Version control systems are self-hosting when they are used to manage the primary repository of their own source code. This shows confidence because if the program breaks, then it breaks its own configuration management, which could be a headache to unravel. For example if the repository format changes, then the change has to be managed so that the old versions remain accessible through the new version of the software. If this is not managed, and old compiled binaries of the version control system disappear from existence, then it may become impossible to recover the old sources.

Thus, successfully self-hosting a version control system is some measure of evidence that the developers know what they are doing and can manage the changes. (And thus they understand change management and we can trust them to be working on version control software.)

http://en.wikipedia.org/wiki/Self-hosting

"Other programs [than compilers] that are typically self-hosting include kernels, assemblers, command-line interpreters and revision control software.

reply

zabcik | karma 153 | avg karma 5.67 · 2014-11-24 02:25:13+00:00

Why are there multiple main() functions? I've never seen this style before. Is it multi-process?

GauntletWizard | karma 3614 | avg karma 1.61 · 2014-11-24 02:28:12+00:00

There's a bunch of different utilities in there. Each has it's own main() function, and they're compiled into a bunch of binaries.

taurath | karma 11745 | avg karma 3.44 · 2014-11-24 02:28:45+00:00

Could just be initializers for different modules maybe?

Svenstaro | karma 613 | avg karma 2.71 · 2014-11-24 02:29:26+00:00

There are multiple main() functions because there are multiple programs! Check out https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23...

andrewchambers | karma 2271 | avg karma 2.28 · 2014-11-24 15:25:13

Look at the makefile, it just has a command line program for each basic operation.

jordigh | karma 18367 | avg karma 5.71 · 2014-11-23 21:10:46

Well, while we're looking at FIRST POSTS, here's Mercurial's, self-hosting a month after git, and like git, also created to replace bitkeeper:

http://selenic.com/hg/rev/0#l10.1

The revlog data structure from then is still around, slightly tweaked, but essentially unchanged in almost a decade.

reply

coldpie | karma 19803 | avg karma 4.39 · 2014-11-24 12:37:57

Mercurial is impressive for making Git's UI look intuitive.

EGreg | karma 7296 | avg karma 0.72 · 2014-11-25 06:10:43+00:00

Other way around

coldpie | karma 19803 | avg karma 4.39 · 2014-11-25 10:40:41

C'mon, man, you make branches by cloning the repository[1]. That's insanity.

[1] http://hginit.com/05.html

reply

jordigh | karma 18367 | avg karma 5.71 · 2014-11-25 17:23:40+00:00

git at revision 0 worked the same way. You can see that there are no references in git at that time either. They're both copying bitkeeper, which worked the same way.

Nowadays git has references (branches), and hg has bookmarks which are the same, plus hg also has the option to label every commit with a permanent branch name. They also still have branching-by-cloning, and if you listen to Linus's original Google code talk about Git, you can see that he conflates "branch" and "clone" because that's what he originally envisioned! Even in 2007 he was still thinking in bitkeeper terms too. I bet that branching with references was Junio Hamano's idea, after Linus did the code hand-off.

I find branching-by-cloning a bit more natural in hg, because you can push to any repo. It's useful for quick, throwaway, local, easy testing out of ideas. In git, you can only push if your push doesn't modify HEAD, which typically translates into only being able to push to bare repos.

reply

coldpie | karma 19803 | avg karma 4.39 · 2014-11-25 12:24:02

Interesting, thanks for the info. I've only been using Git since 2009 or so. I love Git's model of commits being objects in their own right, allowing you to cherry-pick them across branches, or rebase them to reorder or squash several commits together, for example.

My usual development routine is to make a ton of small commits that add up to a small set of good commits, to promote bisect-ability. I do dozens of rebases, squashes and amends when working on a topic branch. I have to use Mercurial for one of my clients, and it's a nightmare doing my development model in an SCM where I can't toss commits around willy-nilly like I can in Git.

reply

jordigh | karma 18367 | avg karma 5.71 · 2014-11-25 12:53:18

> I have to use Mercurial for one of my clients, and it's a nightmare doing my development model in an SCM where I can't toss commits around willy-nilly like I can in Git.

Yes you can. `hg histedit` is a lot like `git rebase -i`, and `hg rebase` is like `git rebase` without -i and `hg commit --amend` is a lot like `git commit --amend`.

There are also some really cool things that we're working on with hg:

https://www.youtube.com/watch?v=4OlDm3akbqg

reply

EGreg | karma 7296 | avg karma 0.72 · 2014-11-25 22:53:24+00:00

hg and git have feature parity at this point

hg just starts out more user friendly, and puts the rest in extensions. I like it more!

ok, hg is a bit slower

reply

hyp0 | karma 907 | avg karma 2.27 · 2014-11-24 03:53:00+00:00

It's so short.

The readme is the best explanation of git I've seen.

reply

jastanton | karma 636 | avg karma 3.12 · 2014-11-23 21:56:57

Does anyone know if the structure of git has changed much? I would like to read this thinking this is pretty close to the current implementation but I would have no idea. anyone?

asdfaoeu | karma 1037 | avg karma 2.58 · 2014-11-24 04:13:49+00:00

You can just see the structure with git cat-file

    -> % git cat-file -p 8c48d1a36c3d11db44c75a431d4f09cb0035222f
    tree 288c2d5379768f685f391bdbffd31b8965318c63
    parent 002ae35061beef02453b7fb1045a50fa2f7f30f8
    author Denis Bilenko <denis.bilenko@gmail.com> 1246939605 +0700
    committer Denis Bilenko <denis.bilenko@gmail.com> 1246939605 +0700

    MANIFEST.in: include libevent.h and libevent-internal.h
    -> % git cat-file -p 288c2d5379768f685f391bdbffd31b8965318c63
    100644 blob 6e543dc13df1b556fd95530061ac0c77a9178309.hgignore
    100644 blob 79c7beb2227ce149c7a71e58e2f7379071b7a189MANIFEST.in
    100644 blob 0d05178544942a035a82599900bec27fbac1c9c5README.eventlet
    040000 tree edb8f37fa622315dcf7bf4f7316d5e85c48cfdbdexamples
    040000 tree 64cf252d77a4162099442bb0153985fc20ed5ba3gevent
    040000 tree 261052e04b4aece469b2e767e394aafbc9d88a32greentest
    100644 blob 488e805c563dfeeb6af5e7a1a8953b706d9676e3setup.py
    -> % git cat-file -p 6e543dc13df1b556fd95530061ac0c77a9178309
    syntax: glob
    *~
    *.pyc
    *.orig
    dist
    gevent.egg-info
    build
    htmlreports
    results.*.db
    gevent/core.so

And yeah it's still very similar though it currently doesn't store the objects individually but rather packs them together.

alblue | karma 1569 | avg karma 3.56 · 2014-11-24 01:51:12

I wrote about the format of git trees (and other object types) here:

http://alblue.bandlem.com/2011/08/git-tip-of-week-trees.html

reply

coldpie | karma 19803 | avg karma 4.39 · 2014-11-24 18:40:54+00:00

While it looks arcane, this comes in handy enough when grepping through history that I actually have "cat-file -p" aliased to 'cf'.

teraflop | karma 15299 | avg karma 6.52 · 2014-11-23 23:56:12

One noteworthy difference is that in the original repository format, a tree object was just a list of named blobs. Nowadays each subdirectory of a tree is its own nested tree object, which means that when you're comparing two trees, you can skip over the directories that are identical.

I'm not sure when that change was made but it must have been very early on, because the repository format has been basically stable for many years now.

reply

rakoo | karma 6746 | avg karma 2.2 · 2014-11-24 10:24:11+00:00

It seems to be mostly the same, except that "Changeset" is now called "Commit" and "Current directory cache" is now called "index", but they are functionally the same.

It's actually really great to see that the model hasn't changed much (there must have been a long phase of thinking before though)

If you want to go deeper, you can check out this page:

http://www.git-scm.com/book/en/v2/Git-Internals-Git-Objects

reply

justintbassett | karma 36 | avg karma 1.2 · 2014-11-23 21:57:29

I wonder what the first commits for big sites/projects look like?

josephcooney | karma 1395 | avg karma 2.35 · 2014-11-24 12:12:25+00:00

I tried to compile a list of a few of them a while ago:

http://jcooney.net/post/2011/06/22/First-Check-in-Comments-f...

reply

d0m | karma 3151 | avg karma 2.19 · 2014-11-24 04:27:31+00:00

I've read so many git tutorials, I wish I had seen that README file before.

danra | karma 430 | avg karma 3.91 · 2014-11-24 04:57:44+00:00

This. I find that learning from original documentation tends to be much more efficient than learning from third party blogs/tutorials which try to "simplify" things, and usually do the opposite.

hw | karma 1440 | avg karma 2.96 · 2014-11-23 23:21:06

Does Github offer an easy way to get to the first commit of a project? Traveling page by page back in time is time consuming (yeah, i did that)

isbadawi | karma 314 | avg karma 4.42 · 2014-11-23 23:37:42

You can go to the project's network graph (append /network to the url) then press shift ?. If the project has a lot of forks like git/git does it won't work though.

ChristianBundy | karma 1852 | avg karma 2.8 · 2014-11-24 06:20:17+00:00

No, but if you have the full history you can grab it with a shell command.

    echo https://github.com/git/git/commit/$(git log --pretty=format:%H | tail -1)

royragsdale | karma 9 | avg karma 2.25 · 2014-11-23 23:41:01

https://github.com/git/git/commits?page=1091

If you want to see the commits going forward from here.

reply

JSno | karma 146 | avg karma 1.6 · 2014-11-24 06:20:33+00:00

I found so many JAVA _PROGRAMMERS_ here asking stupid c question. What a world! people don't know C are building software. That's why so many Indian java coders in US. shameful.

dirtyaura | karma 7574 | avg karma 6.09 · 2014-11-24 02:31:43

I only realised reading the README that git is a great lesson in branding.

DodgyEggplant | karma 606 | avg karma 2.3 · 2014-11-24 03:05:35

This is a great lesson in writing focused & succinct specs, when one clearly sees what his/her program is going to do.

afandian | karma 6812 | avg karma 3.55 · 2014-11-24 09:31:20+00:00

My god... the comments. Looks like the reddit culture (i.e. fun for in jokes but not particularly professional)

scintill76 | karma 1920 | avg karma 1.86 · 2014-11-24 04:05:57

"A marathon of clicking 'next page,' but the view is worth it." So, this commenter practically worships git, but apparently doesn't actually understand it well enough to know a better way to find the hash of the first commit and punch that into Github. Or, it was just a joke and they got there the quick way, but still felt obliged to post a dumb joke to inflate their own ego by "leaving their mark" on git. Maybe I'm being too mean, but yeah, I also think a lot of the comments are pointless.

nathanvanfleet | karma 1197 | avg karma 2.38 · 2014-11-24 12:46:01+00:00

It's probably the "I F*cking Love Computer Science" sub-reddits.

afandian | karma 6812 | avg karma 3.55 · 2014-11-24 07:40:56

It's lots of subreddits. There are some serious ones, but the main-stream ones all contain the usual memes, injokes etc.

I enjoy diving into reddit every now and again. But I use github for work (and code for fun, although it's 'serious' fun). Although open-source collaboration is a fundamentally social activity, I think that mixing source control with a social network does inevitably leads to these kinds of comments. And I wouldn't dream of mixing that up with my professional identity.

Maybe it's just a marker of how versatile github is, and the community of people who write programs and put them in source control.

reply

dyeje | karma 2354 | avg karma 2.17 · 2014-11-24 08:34:56

AFAIK you can't search by commit hash. You have to do some URL manipulation.

kachnuv_ocasek | karma 2068 | avg karma 5.96 · 2014-11-24 18:04:53+00:00

  git rev-list --max-parents=0 HEAD | tail -1

tinalumfoil | karma 2081 | avg karma 4.79 · 2014-11-24 18:24:44

Without having to pipe:

> git rev-list --reverse HEAD

reply

dlitz | karma 890 | avg karma 2.2 · 2014-11-24 15:43:44+00:00

> Maybe I'm being too mean, but yeah, I also think a lot of the comments are pointless.

Yeah, I think you're being a little mean. If you browse to that user's GitHub page, it looks like it's just somebody new who's excited about software. Good for them.

The comments are pointless, sure, but also harmless. Similar comments might crowd out productive discussion if they were on (say) the head of the master branch, but I doubt that any serious development is happening on git's initial commit anyway. Let the new people have their fun.

As far as newbie disruptiveness goes, it could be far worse. When I was getting started with Linux, I posted this cringeworthy gem to LKML, now enshrined in the archives for all eternity: https://lkml.org/lkml/2000/10/22/69 If newbies today are merely posting "yay, git!" and "thank you!" to a secondary forum where it doesn't disrupt development, I'd say they're doing pretty well in comparison. :)

reply

scintill76 | karma 1920 | avg karma 1.86 · 2014-11-25 00:59:57+00:00

Yeah, fair enough. Good on you for linking your own cringey post. I think a lot of developers have those early cringe moments, especially if they were young when they started.

As far as disruption, it did occur to me later that somebody may be getting notification emails about these comments. But it's not too bad, as I assume they could just send the emails to /dev/null, since Github is not the official host of git. (As a tangential note, I sort of wish Github would handle this better. So many Github-mirrored projects end up with something like "don't submit pull requests or open issues here, they will be ignored" in their repo description.)

reply

EGreg | karma 7296 | avg karma 0.72 · 2014-11-24 10:03:21+00:00

Linus wrote:

* +Side note on trees: since a "tree" object is a sorted list of +"filename+content", you can create a diff between two trees without +actually having to unpack two trees. Just ignore all common parts, and +your diff will look right. In other words, you can effectively (and +efficiently) tell the difference between any two random trees by O(n) +where "n" is the size of the difference, rather than the size of the +tree. *

Um, What?

reply

pja | karma 5680 | avg karma 3.74 · 2014-11-24 10:48:23+00:00

Since a git hash points to a sorted list of filenames and content hashes, to diff two git commits you lookup the commit objects by their hash, run down the resultant list of filename/hash pairs & then only lookup & diff the content of those files that have differing hashes (if they have the same hash, they must have the same content according to the git data model, so they can be safely ignored).

Hence diffing arbitrary commits with git is always O(N) in the number of changed files, regardless of the number of interstitial commits.

reply

throw_away | karma 1973 | avg karma 3.18 · 2014-11-25 05:41:23+00:00

In particular he's saying that for a tree, you can quickly skip sub-trees if they are the same, regardless of how deep they go. Kind of like a Merkle tree: http://en.m.wikipedia.org/wiki/Merkle_tree

I'm no git internals expert, but I suspect for a flat list of files the complexity is still O(n) where n is the number of files (not changes) because at very least you must check that n checksums are the same.

reply

pja | karma 5680 | avg karma 3.74 · 2014-11-25 03:52:28

I'm no git internals expert, but I suspect for a flat list of files the complexity is still O(n) where n is the number of files (not changes) because at very least you must check that n checksums are the same.

Sure. The constant factors make a huge difference though - even if you've cached all the data in memory walking all those structures and diffing the actual file data is going to be enormously slower than simply walking a list of hashes, so you're really saying that the total time is big * O(number of files changed) + small * O(number of files). If small*N ~ big then it's reasonable to just disregard that cost - it's going to be lost in the noise.

reply

throw_away | karma 1973 | avg karma 3.18 · 2014-11-25 19:43:21+00:00

I'm not arguing that, but rather that this ability to skip unchanged trees because the hash of all contents is bubbled up is specifically what Linus is referring to in the comment, not simply the comparison of hashes in the flat-directory use-case.

EGreg | karma 7296 | avg karma 0.72 · 2014-11-25 06:10:23+00:00

Wouldn't it still be O(total)? Or at the very least O(log total)? You have to look at all the files even if it's just to compare the hash. The size of the file doesn't matter so I think what Linus should have said was it's O(number of files) still, and maybe O(log total) in the average case. But if there are 1,000,000 files and only 2 change then I don't see how you don't have to look at all the hashes.

Jackcor | karma 2 | avg karma 1.0 · 2014-11-24 11:26:53+00:00

Did all of the inital commit code is written by Linux Torvalds ?

gpvos | karma 7664 | avg karma 2.34 · 2014-11-24 13:31:43+00:00

Yes. The interesting thing is actually that it isn't that much code.

tempodox | karma 9431 | avg karma 2.58 · 2014-11-24 11:43:23+00:00

Code comment about git:

  stupid. contemptible and despicable.

That sums it up quite well. Every day I pay thanks to The One Who Programmed Me that my workflow doesn't put me in need of that shitload of crap that is git. I pity those who do need git.

stinos | karma 5798 | avg karma 2.46 · 2014-11-24 12:21:17+00:00

Maybe I've been drilled too hard by a couple of programming gurus, but I immediately noticed there are quite a lot of repeated yet unnamed magic constants in the (otherwise pretty clean) code. According to wikipedia [1] the rule to not use them is even one of the oldest in programming. Curious what kind of profanity Linus would come up with when confronted with this :]

[1] https://en.wikipedia.org/wiki/Magic_number_%28programming%29...

reply

hnmcs | karma 159 | avg karma 3.7 · 2014-11-24 22:17:26+00:00

Gotta love the fact that there are open pull requests.

https://github.com/git/git/pulls

reply

dbdr | karma 74 | avg karma 0.99 · 2014-11-26 06:55:45+00:00

Where are the tests?