Matheus Tavares

05 Aug 2019

GSoC Week 11: [WIP] grep: protecting textconv and submodules

Tags: gsoc, git

The git-grep –recurse-submodules fix

This week I finished and sent the patch fixing the worktree case in git-grep –recurse-submodules. After the reviews I also made some revisions. The last version can be seen here.

“Fixed” race conditions on textconv and submodules

Back to the parallel inflation patchset, the major leftover task was to protect the textconv and submodules operations on git-grep. My first approach to solve this was to incrementally navigate through the call stack looking for unprotected global states. As might be expected, this solution has proved too laborious, so I decided to try a more empirical one:

  1. Firstly, I prepared a big (but not too big) repository with lots of textconv operations and a submodule of reasonable size as well. Basically I added Git’s repository as a submodule of the Linux repo. And to force git-grep to perform a lot of textconv’s, I added these lines to .gitattributes:
    *.c diff=cat-textconv
    *.h diff=cat-textconv

    And this to my ~/.gitconfig:

    [diff "cat-textconv"]
  2. Then I executed the following command looking for helgrind’s complains:
    valgrind --tool=helgrind ~/bin/git -C my-prepared-repo \
    --no-pager grep --recurse-submodule --color=never \
    --textconv --threads=8 abc[0-9] HEAD
  3. For each “Possible data race” message, I tried to examine the respective code snippet and protect that. To perform this, I used a lot of the code Duy used on the hacky patch he sent me as a testing example.

Being an empirical process, this is not the perfect solution as we may miss some possible racy situations. But I did manage to get to a point with no warnings from helgrind. The resulting patchset can be seen here. Please note that it still needs a lot of cleaning.

There are still two problems with it (at least):

  • When using --textconv the execution time is hugely affected (like 2s to 5m!)
  • The thread protection at use_pack() must be refactored. Currently I’m using a single use_pack_mutex arround the function’s code, but we should use the windows_lock added to the struct packed_git to increase the parallel performance. The problem with using it now is that helgrind warns us of possible race conditions on this line: win->last_used = pack_used_ctr++;.

For the first one, I need to profile the code to find where I may have introduced a bottleneck. The second one is weird to me as it seems a little counterintuitive. As we have one mutex per packed_git, the only possible way of getting race conditions on that line would be if two calls to use_pack(), with different packed_git parameters, were using the same window. But, by what I understood so far, packed_git is a representation of one packfile and each window belong to an specific packfile. So the race condition doesn’t make sense to me yet.

Other activities

This weekend I’ve attended linuxdev-br, an amazing conference that brings together developers of the linux kernel and other FLOSS projects. It was just wonderful! There were many great talks and welcoming people! I also got the chance to co-present two speeches: the first on “How to create a local FLOSS study group”; and the second on “Object-Oriented techniques in C: a case study on Linux and Git”. I’m very happy to be able to talk a little about the Git codebase and share some techniques used there :)

The conference also had a “workshop day” on Friday which we (FLUSP) helped co-organize at our university. In total, there were 5 workshops and almost 80 people. We also hosted one of the sessions, helping people who wanted to start contributing to the linux kernel. It was a lot of work, but very rewarding!

Next steps

  • Write and run a battery of tests to:
    • Ensure the current version still produces only correct results.
    • Check the execution time in comparison to the previous version (with the unprotected textconv and submodule operations). Also compare how --recurse-submodules affects the time in this new version.
  • Profile the code from my patches and check why --textconv is running so slow.
  • Try to figure out why we cannot use the windows_lock of struct packed_git in use_pack() without race conditions.
  • Refactor and clean the code. The second commit on my patchset is very inflated and, thus, should be splitted. I also need to consider destroying the introduced mutexes after use, if possible.
  • Try to remove the obj_read_lock. Now that we have locks for the windows at packed_git and the packed_git_mru list of raw_object_store we may be closer to the possibility of removing obj_read_lock. But, that won’t be so simple as we still need to protect the delta_base_cache and other structures.

Til next time,