Hacker Read top | best | new | newcomments | leaders | about | bookmarklet login

If a git commit hash was directly tied to a data hash at that state (IPFS), that would be trivial.


sort by: page size:

I don't understand your statement. If a git clone of a specific commit also pulls the data that is generated from that commit hash via IPFS simultaneously, then you can always have the data at whatever state you choose. How might this have a negative effect on conserving the data? Do you mean that there are additional structures required to manage keys if the data is encrypted?

Imagine git were built on top of IPFS, and aimed specifically at datasets. Qri uses IPFS to store & move data, so all versions are just normal IPFS hashes. eg this: https://app.qri.io/b5/world_bank_population is just referencing this IPFS hash: https://ipfs.io/ipfs/QmXwh5kNGsNAysRx66jcMiw1grtFf9j7zLFGbK9...

full disclosure: I work at Qri


Git commit hashes are pretty safe.

Hmmm, “dumb crypto stuff” you say?

At a high level, a git commit hash is a SHA1 hash of the state of the git repository at the time of the commit.

https://medium.com/@jonathan_finch/git-commit-hash-number-th...


I’m not known with the internal data structure of git, but couldn’t you add the new hash as a commit in a new format “on the side”, leaving the original commit as is?

I immediately looked at the length of this commit's hash to see if it was longer than 40 hex chars -- but no, it's just an SHA-1. It would have been cool if somehow the hash of this commit that added new hashes was a new hash.

Slightly similar: for a while I've wanted to recreate just enough of git's functionality to commit and push to GitHub. My guess is the commit part would be pretty trivial (as git's object and tree model is so simple) but the push/network/remote part a bunch harder.


Git hashes can be inserted in reproducible builds, they are deterministic.

Brute Forcing a commit hash requires no gibberish binary files with random names, the data can be inserted into the commit metadata in ways that won't even show up in your git log. Forcing the entire commit hash is hard but by default everyone only checks the first 6 digits anyway.

Actually, Git is technically a blockchain.

Commit hashes are a hash of the previous commit hash + new data. The structure (history) is replicated in a decentralized way on every user's node. The only missing thing is a proof-of-work, but that is optional in the definition of a blockchain.


I have been bringing up this point in conversations about blockchain for a while now. Git is essentially a chain of commits, and the hash of a commit will change based on what you're applying it on by including the previous commit's hash in the body that gets eventually hashed into HEAD.

Also reminds me of that time back when we were using 8-char truncated git hashes as deployment tags and we eventually begun getting collisions. We also had an interesting bug when all 8 chars were numeric and Python (or the framework/library that was used) ingested what would be a string it as a number.


What I like about git is that it stores only the minimum amount of information, and this makes it easy to explain. A commit hash is a hash of canonical information, not of derived information.

It seems really ugly to store derived information in a commit (specifically, that the hash would be altered by it).

It seems that Jeff has said the same thing, but Linus disagrees. Vocally.

http://www.spinics.net/lists/git/msg161336.html


I think the reason for why the commit hash has to change is that a commit represents the entire state of a repository, not just the change made in the commit. Being able to take a sequence of commits and insert them into a repository just is not a thing that makes sense in git's model.

If you just hashed diffs, you would not get whole-repo integrity guarantees.

It is possible to go the other way with patch theory (see Darcs) but it's far from trivial to implement performantly.


That could've been possible if Git didn't use commit metadata hashes, (un)fortunately it's smart enough to do that.

Funny, I always expected Git to transition by adding a stronger hash as a piece of metadata to each commit and continue using SHA-1 for the day-to-day identifier, seeing as most of the time Git doesn't actually go back and actually verify the whole commit chain unless you ask it to.

Correct, but git doesn't recompute the hashes locally, so it wouldn't know they are wrong.

git already is a hash linked datastore with the ability to sign your 'transactions'. The doc just points out that SHA1 is not a reliable hash to address objects anymore.

No, the commit hash is a hash of the commit object, which holds a hash of the tree object, which holds hashes of the file objects.

How would git not be a clear refutation of this conjecture?

Sure, commit hashes are very high entropy identifiers, but we can still derive a lot of meaning from what they implicitly represent. Git is also a decentralized protocol. Perhaps the "authority" in these cases is whoever happens to be approving & merging a pull request? Has anyone reversed a SHA256 hash on a reliable basis yet? Does this count as secure & distributed?

Perhaps my argument here is that high entropy and human meaning are not at odds with each other. This seems like a very subjective point on the triangle.


> Commits are immutable (representing whole repo state, not a diff)

To make things more clear: Repo state here is the contents of all files, and some metadata including a pointer to the previous commit.

So a commit hash uniquely identifies not only a set of files but the unique history leading up to it! That's why we some people like to call git the original block chain (there's no proof of work involved of course so it can never be used for payments or anything like that, but the merkle tree bit is similar enough).

next

Legal | privacy