Git commands I run before reading any code

1207 points • by grepsedawk • today at 8:53 AM • 274 comments • view on HN

Comments

Jujutsu equivalents, if anyone is curious:

What Changes the Most

    jj log --no-graph -r 'ancestors(trunk()) & committer_date(after:"1 year ago")' \
      -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
      | sort | uniq -c | sort -nr | head -20

Who Built This

    jj log --no-graph -r 'ancestors(trunk()) & ~merges()' \
      -T 'self.author().name() ++ "\n"' \
      | sort | uniq -c | sort -nr

Where Do Bugs Cluster

    jj log --no-graph -r 'ancestors(trunk()) & description(regex:"(?i)fix|bug|broken")' \
      -T 'self.diff().files().map(|f| f.path() ++ "\n").join("")' \
      | sort | uniq -c | sort -nr | head -20

Is This Project Accelerating or Dying

    jj log --no-graph -r 'ancestors(trunk())' \
      -T 'self.committer().timestamp().format("%Y-%m") ++ "\n"' \
      | sort | uniq -c

How Often Is the Team Firefighting

    jj log --no-graph \
      -r 'ancestors(trunk()) & committer_date(after:"1 year ago") & description(regex:"(?i)revert|hotfix|emergency|rollback")'

Much more verbose, closer to programming than shell scripting. But less flags to remember.

➕ show 2 replies

bsuvc • today at 12:20 PM

I love how the author thinks developers write commit messages.

All joking aside, it really is a chronic problem in the corporate world. Most codebases I encounter just have "changed stuff" or "hope this works now".

It's a small minority of developers (myself included) who consider the git commit log to be important enough to spend time writing something meaningful.

AI generated commit messages helps this a lot, if developers would actually use it (I hope they will).

➕ show 15 replies

joshstrange • today at 1:41 PM

I ran these commands on a number of codebases I work on and I have to say they paint a very different picture than the reality I know to be true.

> git shortlog -sn --no-merges

Is the most egregious. In one codebase there is a developer's name at the top of the list who outpaced the number 2 by almost 3x the number of commits. That developer no longer works at the company? Crisis? Nope, the opposite. The developer was a net-negative to the team in more ways than one, didn't understand the codebase very well at all, and just happened to commit every time they turned around for some reason.

➕ show 4 replies

ML0037 • today at 5:47 PM

i’ll try to use the in an hook and test them with Claude. Thank you !

giancarlostoro • today at 5:25 PM

> One caveat: squash-merge workflows compress authorship. If the team squashes every PR into a single commit, this output reflects who merged, not who wrote. Worth asking about the merge strategy before drawing conclusions.

I abhor squash merging for this and a few other reasons. I literally have to go out of my way to re-check out a branch. Someone who wants to use my current branch cannot do so if I merge my changes a month later, because the squash rewrites history, and now git is very confused. I don't get the obsession with "cleaning up the history" as if we're all always constantly running out of storage over 2 more commits.

➕ show 1 reply

ramon156 • today at 10:24 AM

> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about. “Oh yeah, that file. Everyone’s afraid to touch it.”

The most changed file is the one people are afraid of touching?

➕ show 6 replies

mattrighetti • today at 11:00 AM

I have a summary alias that kind of does similar things

  # summary: print a helpful summary of some typical metrics
  summary = "!f() { \
    printf \"Summary of this branch...\n\"; \
    printf \"%s\n\" $(git rev-parse --abbrev-ref HEAD); \
    printf \"%s first commit timestamp\n\" $(git log --date-order --format=%cI | tail -1); \
    printf \"%s latest commit timestamp\n\" $(git log -1 --date-order --format=%cI); \
    printf \"%d commit count\n\" $(git rev-list --count HEAD); \
    printf \"%d date count\n\" $(git log --format=oneline --format=\"%ad\" --date=format:\"%Y-%m-%d\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d tag count\n\" $(git tag | wc -l); \
    printf \"%d author count\n\" $(git log --format=oneline --format=\"%aE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d committer count\n\" $(git log --format=oneline --format=\"%cE\" | awk '{a[$0]=1}END{for(i in a){n++;} print n}'); \
    printf \"%d local branch count\n\" $(git branch | grep -v \" -> \" | wc -l); \
    printf \"%d remote branch count\n\" $(git branch -r | grep -v \" -> \" | wc -l); \
    printf \"\nSummary of this directory...\n\"; \
    printf \"%s\n\" $(pwd); \
    printf \"%d file count via git ls-files\n\" $(git ls-files | wc -l); \
    printf \"%d file count via find command\n\" $(find . | wc -l); \
    printf \"%d disk usage\n\" $(du -s | awk '{print $1}'); \
    printf \"\nMost-active authors, with commit count and %%...\n\"; git log-of-count-and-email | head -7; \
    printf \"\nMost-active dates, with commit count and %%...\n\"; git log-of-count-and-day | head -7; \
    printf \"\nMost-active files, with churn count\n\"; git churn | head -7; \
  }; f"

➕ show 2 replies

JetSetIlly • today at 10:34 AM

Some nice ideas but the regexes should include word boundaries. For example:

I have a project with a large package named "debugger". The presence of "bug" within "debugger" causes the original command to go crazy.

icedchai • today at 1:20 PM

I wouldn't trust "commit counts." The quality and content of a "commit" can vary widely between developers. I have one guy on my team who commits only working code that has been thoroughly tested locally, another guy who commits one line changes that often don't work, only to be followed by fixes, and more fixes. His "commits" have about 1/100th of the value of the first guy.

➕ show 1 reply

whstl • today at 12:02 PM

In my experience, when the team doesn't squash, this will reflect the messiest members of the team.

The top committer on the repository I maintain has 8x more commits than the second one. They were fired before I joined and nobody even remembers what they did. Git itself says: not much, just changing the same few files over and over.

Of course if nobody is making a mess in their own commits, this is not an issue. But if they are, squash can be quite more truthful.

moritzwarhier • today at 5:17 PM

Interesting ideas, but some to me seem very overgeneralizef, e.g.:

> How Often Is the Team Firefighting

> git log --oneline --since="1 year ago" | grep -iE 'revert|hotfix|emergency|rollback

> Crisis patterns are easy to read. Either they’re there or they’re not.

I disagree with the last two quoted sentences, and also, they sound like an LLM.

blenderob • today at 1:36 PM

> Is This Project Accelerating or Dying > > git log --format='%ad' --date=format:'%Y-%m' | sort | uniq -c

If the commit frequency goes down, does it really mean that the project is dying? Maybe it is just becoming stable?

➕ show 7 replies

pwr1 • today at 5:22 PM

Solid list. I'd add git log --all --oneline --graph pretty early on — gives you a quick sense of how active different branches are and whether this is a "one person commits everything" project or actually distributed. Helped me a ton on a job where I inheritied a monolith with like 4 years of history.

The git blame tip is underrated. People treat it like a gotcha tool but its maybe the fastest way to find the PR/ticket that explains a weird decision.

croemer • today at 11:02 AM

Rather than using an LLM to write fluffy paragraphs explaining what each command does and what it tells them, the author should have shown their output (truncated if necessary)

➕ show 2 replies

siva7 • today at 5:40 PM

Thanks. What a great Skill for my Claude

konovalov-nk • today at 4:49 PM

To me all of these are symptoms of the problem that I outlined in my recent blog post: https://news.ycombinator.com/item?id=47606192

and it touches in detail what exactly commit standards should be, and even how to automate this on CI level.

And then I also have idea/vision how to connect commits to actual product/technical/infra specs, and how to make it all granular and maintainable, and also IDE support.

I would love to see any feedback on my efforts. If you decide to go through my entire 3 posts I wrote, thank you

nextlevelwizard • today at 5:16 PM

These are actually fun to run. Just checked from work who makes most commits and found I have as many commits in past 2 years as 3 next people.

That probably isn’t a good sign

Ultcyber • today at 5:12 PM

Nice set of commands! I would suggest using --all flag with git log though - scans through all branches and not just the current one

bullen • today at 12:32 PM

Dying or stabilizing?

Most good projects end up solving a problem permanently and if there is no salary to protect with bogus new features it is then to be considered final?

fzaninotto • today at 12:08 PM

Instead of focusing on the top 20 files, you can map the entire codebase with data taken from git log using ArcheoloGit [1].

[1]: https://github.com/marmelab/ArcheoloGit

zdkaster • today at 4:35 PM

Can't resist making it as a git command https://github.com/zdk/git-critique

arthurjj • today at 2:23 PM

These were interesting but I don't know if they'd work on most or any of the places I've worked. Most places and teams I've worked at have 2-3 small repos per project. Are most places working with monorepos these days?

➕ show 1 reply

Cthulhu_ • today at 12:41 PM

For "what changes the most", in my project it's package.json / lock (because of automatic dependency updates) and translation / localization files; I'd argue that's pretty normal and healthy.

For the "bus factor", there's one guy and then there's me, but I stopped being a primary contributor to this project nearly two years ago, lol.

seba_dos1 • today at 10:32 AM

> If the team squashes every PR into a single commit, this output reflects who merged, not who wrote.

Squash-merge workflows are stupid (you lose information without gaining anything in return as it was easily filterable at retrieval anyway) and only useful as a workaround for people not knowing how to use git, but git stores the author and committer names separately, so it doesn't matter who merged, but rather whether the squashed patchset consisted of commits with multiple authors (and even then you could store it with Co-authored-by trailers, but that's harder to use in such oneliners).

➕ show 3 replies

StableAlkyne • today at 2:20 PM

Biggest life changer for me has been:

git clone --depth 1 --branch $SomeReleaseTag $SomeRepoURL

If you only want to build something, it only downloads what you need to build it. I've probably saved a few terabytes at this point!

alaudet • today at 2:06 PM

This is good stuff. Why I never think of things like this is beyond me. Thanks

gherkinnn • today at 10:10 AM

These are some helpful heuristics, thanks.

This list is also one of many arguments for maintaining good Git discipline.

md224 • today at 4:04 PM

The last sentence of the article is "Here’s what the rest of the week looks like." and then it just stops. Am I missing something?

➕ show 2 replies

pscanf • today at 12:45 PM

I just finished¹ building an experimental tool that tries to figure out if a repo is slopware or not just by looking at it's git history (plus some GitHub activity data).

The takeaway from my experiment is that you can really tell a lot by how / when / what people commit, but conclusions are very hard to generalize.

For example, I've also stumbled upon the "merge vs squash" issue, where squashes compress and mostly hide big chunks of history, so drawing conclusions from a squashed commit is basically just wild guessing.

(The author of course has also flagged this. But I just wanted to add my voice: yeah, careful to generalize.)

¹ Nothing is ever finished.

tetromino_ • today at 2:56 PM

Out of curiosity, I ran the 5 command on my project's public git tree. The only informative one was #4 ("Is This Project Accelerating or Dying") - it showed cliffs when significant pieces of logic were decoupled and moved to other repos.

niedbalski • today at 12:01 PM

Ages ago, google released an algorithm to identify hotspots in code by using commit messages. https://github.com/niedbalski/python-bugspots

niedbalski • today at 12:01 PM

Ages ago google wrote an algorithm to detect hotspots by using commit messages, https://github.com/niedbalski/python-bugspots

alkonaut • today at 11:33 AM

Trusting the messages to contain specific keywords seems optimistic. I don't think I used "emergency" or "hotfix" ever. "Revert" is some times automatically created by some tools (E.g. un-merging a PR).

➕ show 1 reply

guilhermeasper • today at 2:11 PM

These commands are very useful, but adapting them to the codebase makes a huge difference.

For most, I added some filters and slightly changed the regex, and it showed the reality of the codebase (I already knew the reality, I just wanted to see if it matched, and it did).

nola-a • today at 12:06 PM

For more insights on Git, check out https://github.com/nolasoft/okgit

yonatan8070 • today at 2:18 PM

My team usually uses "Squash and merge" when we finish PRs, so I feel that would skew the results significantly as it hides 99% of the commit messages inside the long description of the single squashed merge commit.

mikaoelitiana • today at 2:07 PM

I created a small TUI based on the article https://github.com/mikaoelitiana/git-audit

➕ show 1 reply

traceroute66 • today at 10:33 AM

> The 20 most-changed files in the last year. The file at the top is almost always the one people warn me about.

What a weird check and assumption.

I mean, surely most of the "20 most-changed files" will be README and docs, plus language-specific lock-files etc. ?

So if you're not accounting for those in your git/jj syntax you're going to end up with an awful lot of false-positive noise.

➕ show 1 reply

progx • today at 3:43 PM

Before, I ask AI "is this project maintained" done.

baquero • today at 12:27 PM

I put it into a gist :)

https://gist.github.com/aeimer/8edc0b25f3197c0986d3f2618f036...

drob518 • today at 3:55 PM

Nice timing. I was just today needing some of the info that these commands surface. Serendipitous!

aidenn0 • today at 2:22 PM

What's the subversion equivalents to these commands?

kittikitti • today at 4:42 PM

This is a great list of commands to quickly understand a repository. Thank you for sharing.

therealdeal2020 • today at 1:03 PM

superficial. If I have to unfuck the backend 10 times a week in our API adapter, then these commands will show me constantly changing the API adapter, although it's the backend team constantly fixing their own bugs

aa-jv • today at 10:56 AM

Great tips, added to notes.txt for future use ..

Another one I do, is:

    $alias gss='git for-each-ref --sort=-committerdate'

    $gss

    ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/heads/project-feature-development
    ce652ca83817e83f6041f7e5cd177f2d023a5489 commit refs/remotes/origin/project-feature-development
    1ef272ea1d3552b59c3d22478afa9819d90dfb39 commit refs/remotes/origin/feature/feature-removal-from-good-state
    c30b4c67298a5fa944d0b387119c1e5ddaf551f1 commit refs/remotes/origin/feature/feature-removal
    eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/HEAD
    eda340eb2c9e75eeb650b5a8850b1879b6b1f704 commit refs/remotes/origin/main
    3f874b24fd49c1011e6866c8ec0f259991a24c94 commit refs/heads/project-bugfix-emergency
    ...

This way I can see right away which branches are 'ahead' of the pack, what 'the pack' looks like, and what is up and coming for future reference ... in fact I use the 'gss' alias to find out whats going on, regularly, i.e. "git fetch --all && gss" - doing this regularly, and even historically logging it to a file on login, helps see activity in the repo without too much digging. I just watch the hashes.

heliumtera • today at 3:30 PM

So you value more rushed descriptions of changes than actual changes. Nice

tom-blk • today at 1:01 PM

Nice! Will probably adopt this, seems to give a great overview!

jayd16 • today at 2:46 PM

No searching the codebase/commits for "fuck" and shit"? That will give you an idea what what was put in under stressful circumstances like a late night during a crunch.

alt Hacker News

Git commands I run before reading any code

Comments

🔗 View 16 more comments