Home

Matheus Tavares

25 May 2019

GSoC Week 2: Let's Get Started

Tags: gsoc, git

This week I discussed about my previous patch on the mailing list and started working on the first tasks for my GSoC project!

Last week’s follow-ups

Christian gave me some good advices on the patch I was working last week. What I implemented with a rebase option, he wiselly pointed out to me that could already be achieved throught the commit.verbose option. So the only advantage my patch brings could be for those who would which to see diffs when rebasing but not when commiting… As also suggested by Christian, I replied to Ævar’s email which originally mentioned the feature, attaching my patch and asking for comments.

We are coming to the conclusion that it is worth making a new patch to document that “rebase obeys all commit.* options”, together with some tests to assert that. As Ævar said, these tests could be the ones I developed in the first patch, so that we can reuse the work :D I’m planning to make and send this new version next week.

First GSoC task: cached_object

Following my initial agenda, this week I started working to protect sha1-file.c’s global states. As my first GSoC task, I took some time to investigate the cached_objects array and how I could protect it, if needed. This is a set of mocked in-memory objects that don’t existent on disk but read_object_file() is able to return.

The array is global and non thread-safe, but it’s only being used by git-blame until this day and wouldn’t lead to race conditions if blame was made parallel today. Nevertheless, I asked for comments in the mailing list on whether I should try to protect this array now. The conversation is still going, but it seems that it’s better to leave this to a future time.

While I was waiting for replies, I started working on other tasks.

Second task: static variables inside functions

For my next task, I set off hunting for static variables inside functions in the pack access call chain. These variables are shared between threads and their lifetime last throughout the whole execution. So, in order to achieve thread-safety, we must make them local or, when not possible, protect them.

Tip: cflow is a really good tool to inspect call chains. With the following command1 I was able to check for the presence of specific functions in assign_blame()’s call chain:
cflow --main assign_blame **/*.h *.h **/*.c *.c

Some of this static to local conversions are very trivial. Let’s see an example:

static int write_loose_object(const struct object_id *oid, char *hdr,
                              int hdrlen, const void *buf, unsigned long len,
                              time_t mtime)
{
        int fd, ret;
        [...]
        static struct strbuf tmp_file = STRBUF_INIT;
        static struct strbuf filename = STRBUF_INIT;

        loose_object_path(the_repository, &filename, oid);

        fd = create_tmpfile(&tmp_file, filename.buf);

        [...]

        if (mtime) {
                [...]
                if (utime(tmp_file.buf, &utb) < 0)
                        warning_errno(_("failed utime() on %s"), tmp_file.buf);
        }

        return finalize_object_file(tmp_file.buf, filename.buf);
}

The call to create_tmpfile() always resets the tmp_file variable, overriding any content it previously held. Also, utime() and finalize_object_file(), the other functions which tmp_file is passed to, don’t hold references to it. So tmp_file doesn’t really need to be a static variable. Besides making this function one step closer to being thread-safe, removing the static qualifier also saves RAM as the variable will only be loaded to memory when the function is called.

The same logic applies to filename, which can also be made local.

Note: Most of the time static variables are declared within functions, the objective is to free the caller from the necessity of freeing resouces. But as we are making them local, we need to release any dinamic allocated memory they hold or pass this responsability to the caller. Otherwise, we could be creating memory leeks.

Difficulties

My main dificulty this week was to get started. The pack access code can be quite big and complex, so I got a little overwhelmed/lost in the beginning, not knowing where to start.

Also, in recent conversations on the mailing list, I got a little uncertain if I should make the thread-safety conversion bottom-up, as I started doing this week, or begin in a higher level (with a couple wide mutexes), refining it down. I may have missunderstood the suggested idea, though. So let’s wait for more comments on it :)

Next steps

Continue the work on sha1-file.c’s global states, and re-send last week’s patch, refactored.

Footnotes

  1. The command complained about redefinitions. I guess the problem is with #ifdef macros that didn’t get processed. I tried using --preprocess="gcc -E" but it got even worse :( Anyway, the current output, even if slightly wrong, is sufficient for me, now. 

Til next time,
Matheus