The git-grep –recurse-submodules fix
This week I finished and sent the patch fixing the worktree case in git-grep –recurse-submodules. After the reviews I also made some revisions. The last version can be seen here.
“Fixed” race conditions on textconv and submodules
Back to the parallel inflation patchset, the major leftover task was to protect the textconv and submodules operations on git-grep. My first approach to solve this was to incrementally navigate through the call stack looking for unprotected global states. As might be expected, this solution has proved too laborious, so I decided to try a more empirical one:
- Firstly, I prepared a big (but not too big) repository with lots of textconv
operations and a submodule of reasonable size as well. Basically I added Git’s
repository as a submodule of the Linux repo. And to force
git-grepto perform a lot of textconv’s, I added these lines to
*.c diff=cat-textconv *.h diff=cat-textconv
And this to my
[diff "cat-textconv"] textconv=cat
- Then I executed the following command looking for
valgrind --tool=helgrind ~/bin/git -C my-prepared-repo \ --no-pager grep --recurse-submodule --color=never \ --textconv --threads=8 abc[0-9] HEAD
- For each “
Possible data race” message, I tried to examine the respective code snippet and protect that. To perform this, I used a lot of the code Duy used on the hacky patch he sent me as a testing example.
Being an empirical process, this is not the perfect solution as we may miss some
possible racy situations. But I did manage to get to a point with no warnings
helgrind. The resulting patchset can be seen here.
Please note that it still needs a lot of cleaning.
There are still two problems with it (at least):
- When using
--textconvthe execution time is hugely affected (like 2s to 5m!)
- The thread protection at
use_pack()must be refactored. Currently I’m using a single
use_pack_mutexarround the function’s code, but we should use the
windows_lockadded to the
struct packed_gitto increase the parallel performance. The problem with using it now is that
helgrindwarns us of possible race conditions on this line:
win->last_used = pack_used_ctr++;.
For the first one, I need to profile the code to find where I may have
introduced a bottleneck. The second one is weird to me as it seems a little
counterintuitive. As we have one mutex per
packed_git, the only possible way
of getting race conditions on that line would be if two calls to
packed_git parameters, were using the same window. But, by what
I understood so far,
packed_git is a representation of one packfile and each
window belong to an specific packfile. So the race condition doesn’t make sense
to me yet.
This weekend I’ve attended linuxdev-br, an amazing conference that brings together developers of the linux kernel and other FLOSS projects. It was just wonderful! There were many great talks and welcoming people! I also got the chance to co-present two speeches: the first on “How to create a local FLOSS study group”; and the second on “Object-Oriented techniques in C: a case study on Linux and Git”. I’m very happy to be able to talk a little about the Git codebase and share some techniques used there :)
The conference also had a “workshop day” on Friday which we (FLUSP) helped co-organize at our university. In total, there were 5 workshops and almost 80 people. We also hosted one of the sessions, helping people who wanted to start contributing to the linux kernel. It was a lot of work, but very rewarding!
- Write and run a battery of tests to:
- Ensure the current version still produces only correct results.
- Check the execution time in comparison to the previous version
(with the unprotected textconv and submodule operations). Also compare how
--recurse-submodulesaffects the time in this new version.
- Profile the code from my patches and check why
--textconvis running so slow.
- Try to figure out why we cannot use the
use_pack()without race conditions.
- Refactor and clean the code. The second commit on my patchset is very inflated and, thus, should be splitted. I also need to consider destroying the introduced mutexes after use, if possible.
- Try to remove the
obj_read_lock. Now that we have locks for the windows at
raw_object_storewe may be closer to the possibility of removing
obj_read_lock. But, that won’t be so simple as we still need to protect the
delta_base_cacheand other structures.
Til next time,