Week 10: A bug in git grep submodulesTweet
Yesterday I got home after 10 days of DebConf. It was an amazing experience! I thought I would be able to work more in my project during the conference, but the schedule was very intense. Unfortunatelly, I also got a sore throat and fever for a few days :( I’m still recovering from it, but I could still enjoy most of the speeches, nevertheless.
Regarding the parallel zlib RFC, I couldn’t advance much during the conference.
But I managed to repeat the tests in an HDD machine, as we’ll discuss later in
this report. I also took some time to work on a potential bug at
was reported earlier this month at the mailing list. And since it’s located in
the code I’m currently working on, I thought I should take a look and try to fix
it during GSoC, as well. Without further ado, let’s get to those topics:
Repeating tests using HDD
Times of git-grep at chromium repo (with HDD) ============================== Threads 1 Regex 1 Regex 2 Original: 18.4512s 22.5032s Threads 2 Regex 1 Regex 2 Original: 19.3351s 19.9966s Parallel inflate: 12.4539s 14.6662s Reduction: 35.59% 26.66% Threads 8 Regex 1 Regex 2 Original: 22.7605s 20.8892s Parallel inflate: 10.7737s 12.6165s Reduction: 52.66% 39.60%
As we can see, the time reductions here are pretty good, as well. So it seems that we are in the right direction :)
Grep submodules bug: ignoring worktree
On July 8th, Daniel reported
on the mailing list a weird
git-grep behavior regarding submodules. Even when
--cached was omitted,
git-grep --recurse-submodules was only greping the
submodules’ objects and not the worktree. As a simple test, I did the following:
cd /tmp mkdir sub parent cd sub git init echo "A" > A git add A && git commit -m "Add A" cd ../parent git init echo "parent-A" > parent-A git add parent-A && git commit -m "Add parent-A" git submodule add ../sub git add . && git commit -m "Add submodule 'sub'" echo "Adding a string containing 'parent'" >> sub/A git grep --recurse-submodules 'parent'
The output was:
And it should have been:
parent-A:1:parent-A sub/A:2:Adding a string containing 'parent'
It seems that git-grep was taking the worktree modifications into account before f9ee2fc (“grep: recurse in-process using ‘struct repository’”, 02-08-2017). So it could have been either a project decision to change the behavior or a problem during the conversion to recurse in-process. Analysing the code and commit history, it seems, thought, that the second option is what happened. So I focused on trying to find a way to re-implement the old behavior.
Althought not obvious to me at first look, the fix wasn’t complex at all.
Basically, the problem was at this line inside
hit = grep_cache(&subopt, pathspec, 1);
grep_cache()’s third parameter is called
cached. It tells the function
whether greping must be performed in the cache or worktree. And at the code
grep_cache() is always been called with
cached=1, thus, ignoring the
worktree. What I did to fix that is add an aditional
cached parameter to
grep_submodule() so that it may known the desired behavior and call
grep_cache() properly. I also added a test to detect possible behavior changes
regarding this case in the future. The final commits can be seen here.
I will ask for a review from my mentors and send it to the mailing list when
This small patchset was also a nice learning opportunity for me to get to know more about the index file and how it’s maintained. Here are some references that may be useful to others as well:
- About the index: https://hackernoon.com/understanding-git-index-4821a0765cf
- About general git’s internals and others: https://www.youtube.com/watch?v=xbLVvrb2-fY (really good talk!)
- It’s possible to inspect the index with
git ls-files --stage.
At DebConf I had the opportunity to pass on some of what I learned about Git and Linux so far. In particular, I helped two guys from India who got interested in contributing to those projects. We set the environments and discussed a little about the codebases and tests. For the kernel, there was even one patch already sent! You can check it here.
For this week, I plan to send the patchset fixing the submodules’ worktree case on git-grep and continue the work on parallel inflation. The main tasks for that are the ones listed on the previous post.
An Intel(R) Core(TM) i5-4210U CPU @ 1.70GHz, with 8GB of RAM, HDD and running Manjaro Linux (the same OS where previous tests were runned). ↩
Til next time,