Home

Matheus Tavares

03 Jun 2019

GSoC Week 3: Some Rethinking

Tags: gsoc, git

Unfortunately, this was an uproductive week :( As I’m approaching the end of the semester, I’m having many college assignments and couldn’t do everything I wanted at Git, this week…

The static conversions on sha1-file

I’ve continued working on my patch to remove static variables inside functions of sha1-file.c. I finally have an initial version of the patch, which can be seen here.

I still don’t love the idea of having all of this in a single patch, but I kind of struggled a little on how I could separate the changes. (In fact, any suggestion will be highly appreciated.)

Some Rethinking

I’ve been also reconsidering my proposed agenda and the pathway to follow during the project.

Until now, I’ve been looking to the functions on sha1-file.c and analysing which are not thread-safe as well as how could I make them be. The problem with this approach is that I may be working on snippets that won’t necessary need to be called by multiple threads and, thus, don’t really need to be thread-safe. Also, we kind of walk in the dark as the benefits of the changes can’t really be seen now. (and our goal is to far away)

I think we should, first, try to evaluate who could really benefit from threading at pack access code and how, before the conversions. Them, it should be easier to know where/how to attack. In addition, this way we can have an earlier certainty that the work being developed is worthwhile and will provide improvements. Even better: if we manage to have an already parallel code to test, we could incrementally convert functions to thread-safe, refine the locks and evaluate the performance impact, on the go! This way, instead of one big far away goal, we would have many small closer goals.

Two ideas have popped in my mind as I’ve been considering the above thoughts:

  1. We could focus on git-grep which is already parallel and uses a wide lock around read_object_file. Starting at this function, we could try making the call graph more thread-safe and go refining the big lock, along the way. The key benefits of this approach are: we would be able to quickly evaluate the performance impact at each step; and it would be easier to have small steps with clear improvements at each of them. Also, if there’s time left, we could use the same idea to parallelize other commands as git-blame.

  2. We could focus on git-blame and how it could be made parallel, making the necessary functions thread-safe, along the way (maybe starting with a big lock, as git-grep and go refining down). This seems to be a much harder path, but the pros include targeting a command for which we have a known demand on higher performance1.

As the goal of speeding up git-blame is really compelling to me, I spent part of past week reaserching on it, as well. The code wasn’t so easy for me to follow, but this old explanation from Junio helped me a lot to better understand how blame works. (Even so, if we decide to go this way, some help from blame developers would reeeeeally be appreciated.)

Anyways, as much as I’d love to target blame, the first approach seems better for one who is staring to work on the pack access code. I still need to talk with my mentors about this ideas and possible plan changes. I really want to hear their opinions on this.

Others: the patch on rebase.verboseCommit

Although we were rethinking my initial patch concept, Dscho seemed to like the original idea as a way to enable verbosity in git-rebase but not git-commit. So I’m not sure yet if I should make the changes Ævar suggested and re-sent the patch or go with the “document that rebase obeys commit.* options” approach.

Footnotes

  1. Some developers are experiencing a low git-blame performance on huge repositories. Take a look on the following links, as an example:

Til next time,
Matheus