Matheus Tavares

29 Jul 2019

GSoC Week 10: A bug in git grep submodules

Tags: gsoc, git

Yesterday I got home after 10 days of DebConf. It was an amazing experience! I thought I would be able to work more in my project during the conference, but the schedule was very intense. Unfortunatelly, I also got a sore throat and fever for a few days :( I’m still recovering from it, but I could still enjoy most of the speeches, nevertheless.

Regarding the parallel zlib RFC, I couldn’t advance much during the conference. But I managed to repeat the tests in an HDD machine, as we’ll discuss later in this report. I also took some time to work on a potential bug at git-grep. It was reported earlier this month at the mailing list. And since it’s located in the code I’m currently working on, I thought I should take a look and try to fix it during GSoC, as well. Without further ado, let’s get to those topics:

Repeating tests using HDD

I finally repeated the tests from week-8 in an machine1 with HDD to check if we have similar gains to those seen in SSD. Here you can see the means of 30 executions after 2 warmups:

Times of git-grep at chromium repo (with HDD)
  Threads 1
                        Regex 1            Regex 2
    Original:           18.4512s           22.5032s

  Threads 2
                        Regex 1            Regex 2
    Original:           19.3351s           19.9966s
    Parallel inflate:   12.4539s           14.6662s
    Reduction:          35.59%             26.66%

  Threads 8
                        Regex 1            Regex 2
    Original:           22.7605s           20.8892s
    Parallel inflate:   10.7737s           12.6165s
    Reduction:          52.66%             39.60%

As we can see, the time reductions here are pretty good, as well. So it seems that we are in the right direction :)

Grep submodules bug: ignoring worktree

On July 8th, Daniel reported on the mailing list a weird git-grep behavior regarding submodules. Even when --cached was omitted, git-grep --recurse-submodules was only greping the submodules’ objects and not the worktree. As a simple test, I did the following:

cd /tmp
mkdir sub parent
cd sub
git init
echo "A" > A
git add A && git commit -m "Add A"
cd ../parent
git init
echo "parent-A" > parent-A
git add parent-A && git commit -m "Add parent-A"
git submodule add ../sub
git add . && git commit -m "Add submodule 'sub'"
echo "Adding a string containing 'parent'" >> sub/A
git grep --recurse-submodules 'parent'

The output was:


And it should have been:

sub/A:2:Adding a string containing 'parent'

It seems that git-grep was taking the worktree modifications into account before f9ee2fc (“grep: recurse in-process using ‘struct repository’”, 02-08-2017). So it could have been either a project decision to change the behavior or a problem during the conversion to recurse in-process. Analysing the code and commit history, it seems, thought, that the second option is what happened. So I focused on trying to find a way to re-implement the old behavior.

Althought not obvious to me at first look, the fix wasn’t complex at all. Basically, the problem was at this line inside grep_submodule():

hit = grep_cache(&subopt, pathspec, 1);

grep_cache()’s third parameter is called cached. It tells the function whether greping must be performed in the cache or worktree. And at the code above, grep_cache() is always been called with cached=1, thus, ignoring the worktree. What I did to fix that is add an aditional cached parameter to grep_submodule() so that it may known the desired behavior and call grep_cache() properly. I also added a test to detect possible behavior changes regarding this case in the future. The final commits can be seen here. I will ask for a review from my mentors and send it to the mailing list when ready :)

This small patchset was also a nice learning opportunity for me to get to know more about the index file and how it’s maintained. Here are some references that may be useful to others as well:

Other activities

At DebConf I had the opportunity to pass on some of what I learned about Git and Linux so far. In particular, I helped two guys from India who got interested in contributing to those projects. We set the environments and discussed a little about the codebases and tests. For the kernel, there was even one patch already sent! You can check it here.

Next steps

For this week, I plan to send the patchset fixing the submodules’ worktree case on git-grep and continue the work on parallel inflation. The main tasks for that are the ones listed on the previous post.


  1. An Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz, with 8GB of RAM, HDD and running Manjaro Linux (the same OS where previous tests were runned). 

Til next time,