Matheus Tavares

09 Jun 2019

Week 4: A different approach

As I documented in the last post, we’ve been rethinking my initially proposed schedule, as things weren’t turning out exactly how I previously thought. I was probably working on functions that didn’t really need to be thread-safe and kind of walking in the dark.

Talking with my mentors we reached the conclusion that it would be more benefical to the project if I switched the approach. Starting at git-grep, which is already threaded and go refining its paralellelism while making the necessary pack access functions thread-safe, seemed the way to go. I felt more confortable with this idea, but this also meant I had to study more the codebase, so that’s what I’ve been doing.

Studying git-grep

I spent most of the week studying git-grep’s code. Thankfully, it wasn’t as complex as the pack access’ or git-blame’s. To implement threading, git-grep uses a producer-consumer mechanism, which I’m already familiar with. The idea is to have a master-thread listing the files on a repository and adding tasks to a FIFO, while N worker-threads get and perform the tasks.

I also discovered that git-grep have two “operation modes”:

  • grep the tracked files in the working tree (default)
  • grep blobs registered at the index (–cached or when a ref is given)

The first option runs with the producer-consumer mechanism and has a very good (although not linear) speedup. But the second one, which uses the pack access code to read the blobs, despite sharing the same machinary with the first, is perfomed sequentially. This very nice thread. on the mailing list explains why (it also gave me some insights on possible improvements to git-grep’s performance).

Basically, the speedup observed when greping the working tree with threads was not present when greping blobs with threads. In fact, the developers started observing a slowdown as the number of threads increased! To avoid that, threading was disabled for this case at the commit 53b8d93 (“grep: disable threading in non-worktree case”, 2011-12-12). I reverted this commit to check by myself and got the same described results, for simple regexes.

The commit’s message says:

Measurements by various people have shown that grepping in parallel is not beneficial when the object store is involved.
So until the pack machinery allows unthreaded access, we disable grep’s threading in all but the worktree case.

This last sentence is quite motivating for my project :) But to gain more certainty of the pack access weight on git-grep I decided to run some tests. In particular, I compiled Git with -g -pg -O0 and ran git grep -E '(static|extern) (int|double) \*' HEAD1 on chromium’s repository. After that, using gprof and gprof2dot I got the following graph (click it for a larger vizualization):

git-grep on chromium

As you can see, read_object_file_extended() and its childs take a significant slice of the execution time. So it seems promissing trying to perform it in parallel. Another eye catcher is grep_source_is_binary(). I’m yet not sure why it seems so time expensive, but it’s definitelly something I want to further investigate.

I also reverted the commit disabling threads on cached grep and made a plot using 8 threads. You can see the results here.

Christian suggested me this nice series of scripts to render perf’s output as a flamegraph. For the same previous test (just removing -pg), we get this image (click it to enlarge): flamegraph of git-grep on chromium (The 8-threads version can be found here)

Here we can see, again, the object store calls distributed at the recursive calls to grep_tree(). But what most draws attention in it is the massive time spent at libz.so (which should be the decompression routines?). If so, it’s another good indicative for the parallelization.

Another possible improvement suggested at that previous ML thread is to try replicating at git-grep a nice git-diff feature: diff avoids retriving blobs from the object store when it knows the files on the working tree are stat-clean (i.e. updated).

Out of curiosity I re-ran the test some more times with different sets of parameters. One of the interesting results came when I re-executed the above experiment with a simpler “abc” regex: 76% of the time was spent at kwsexec() alone! So it may be a nice hot spot for improvements.

It’s also curious how git-grep’s CPU ocupancy stays quite low with or without threads… Perhaps much of the time is spent in lock contentions or waiting for work to be assigned by the master-thread. I surelly want to inspect this further.

This week Renato and I hosted a seminar on “Object Oriented Programing in C: A case study on Git and Linux”. In it, we showed some of the OOP techniques that Git and Linux use to help organizing code and solving problems. Here you can find the slides of our presentation: oop_git_and_kernel.pdf.

I’ve been also dedicated in trying to form a group of local Git contributors in my college. More specifically, I’m helping interested students in the “Git front” of FLUSP, a group that I participate, composed of students passionated about FLOSS projects. This week, we had some new faces in FLUSP’s Git meeting :) Here is a picture of it, when we jointly downloaded Git’s source code, inspected the codebase and other things:

FLUSP's Git Meeting


I surelly underestimated the complexity of this project. And not knowing enough of the codebase makes it harder to propose ways of parallelizing/optimizing code. Expecially for some heavy sections as the pack access or blame. I’m a bit sad to not have written much code yet :( But I’ve been feeling kind of stuck, not knowing where to start. That’s why I’ve dedicated myself to studying the code and that’s why I proposed the approach change. I’m happy, though, to start digging into git-grep and gradually understand its internals. I’m not sure yet whether this new approach will work, but I’m axious to start getting my hands on code.

Next steps

The idea, now, is to focus on git-grep’s calls to the pack access code (mainly read_object_file_extended() and read_object_with_reference() ) in the effort to make them thread-safe. Then, we’ll be able to eliminate git-grep’s lock around these functions (or at least have a finer granularity in the locks). But this is more of a long term goal. I’m not exactly sure yet on how to break it into small tasks or what, concretely, I’ll work at during the following week. What I have planned is to inspect these functions further and take a look at the other optimization options I mentioned in this post. I’m also curious to further inspect kwsexec(), grep_source_is_binary() and the low CPU-usage during git-grep.


  1. I tried to choose a regex, neither so simple nor so complex for the tests. This was done in order to get a more realistic result of the average case. Too simple or too complicated regexes could bring the execution time to extremes, not allowing us to see how the object store code performs in git-grep. 

Til next time,

Matheus Tavares