In particular he's saying that for a tree, you can quickly skip sub-trees if they are the same, regardless of how deep they go. Kind of like a Merkle tree: http://en.m.wikipedia.org/wiki/Merkle_tree
I'm no git internals expert, but I suspect for a flat list of files the complexity is still O(n) where n is the number of files (not changes) because at very least you must check that n checksums are the same.
I'm no git internals expert, but I suspect for a flat list of files the complexity is still O(n) where n is the number of files (not changes) because at very least you must check that n checksums are the same.
Sure. The constant factors make a huge difference though - even if you've cached all the data in memory walking all those structures and diffing the actual file data is going to be enormously slower than simply walking a list of hashes, so you're really saying that the total time is big * O(number of files changed) + small * O(number of files). If small*N ~ big then it's reasonable to just disregard that cost - it's going to be lost in the noise.
I'm not arguing that, but rather that this ability to skip unchanged trees because the hash of all contents is bubbled up is specifically what Linus is referring to in the comment, not simply the comparison of hashes in the flat-directory use-case.
I'm no git internals expert, but I suspect for a flat list of files the complexity is still O(n) where n is the number of files (not changes) because at very least you must check that n checksums are the same.
reply