Reformatting 100k Files at Google in 2011

rsc

My notes say it was 193k at the start. The final dashboard when we stopped said "216,626 / 216,890 = 99.8%; 264 to go".

The other correction I would make is that this post does not mention Nilton Volpato, who had written an earlier Buildifier and graciously accepted replacing his implementation with a new one and then taking over ownership for that new implementation as well. (Eventually ownership moved to Laurent's team.)

It looks like it was just under 2,000 commits. We did pretty extensive testing, by having Blaze load a BUILD file and its transitive closure and then dump that parsed form back out to a binary format. Any automated commit had to preserve that parsed-and-dumped binary format bit for bit. The slowest part of the testing was waiting for Blaze to do all the loads.

Every day I would prepare and test as many files as I could, break them into CLs (think PRs), mail Rob a shell script he could run to approve them all, and go to bed. Then I'd get up early in the morning (5am ET) to submit the changes, because there were various cached indexes that got updated when BUILD files got submitted, and it seemed better to send them when not many people would be working.

That scheme worked until a system did fall over and someone got paged, and then after that I agreed to only submit the large changes during business hours. :-)

wonger_

Autoformatting is so nice. Crazy to think that formatters only became popular after `gofmt`.

I also found this related quote from Russ Cox intriguing: "Most people think that we format Go code with gofmt to make code look nicer or to end debates among team members about program layout. But the most important reason for gofmt is that if an algorithm defines how Go source code is formatted, then programs, like goimports or gorename or go fix, can edit the source code more easily, without introducing spurious formatting changes when writing the code back. This helps you maintain code over time."

bsmith0

I'd be curious what role "global approvers" at Google/Google's scale typically have/how many are there/what's the process?

vitiral

I'm going to take a contrarian view here. Code formatting is amazing in a corporate environment where nobody truly cares about their code -- it's just a means to get a paycheck. It's also great for beginners to a language who are still trying to get a handle of the syntax.

But where you are nearly the sole owner of a small library and you are crafting that library to be beautiful and understandable... there is something pleasurable about structuring concepts so you have each on a single line, or creating similar functions so the concepts are structured by column.

I know not everyone will hold this view and that is fine, but when you are writing your own hobby library in your favorite language for your own purposes I recommend you try it out.

yegle

There are 3 tools that that makes maintaining BUILD files enjoyable: buildifier, buildozer and build_cleaner (internal only unfortunately).

jerf

The term "bikeshedding" comes up a lot on HN, when people spend a lot of time spinning wheels in endless debates about the things that are easy to debate while indefinitely deferring the tough conversations, but I think an underappreciated aspect of this meme is that the unimportant conversations quite frequently really are just that; unimportant, to everybody involved. Often (but not always) nobody actually cares. Caring is not necessarily represented by the amount of time or text spewed forth on a topic. A lot of things other than passion or importance factor into how much verbiage comes through and it is a mistake to overinterpret mere volume.

Unfortunately, the flip side of this coin is harder to deal with; sometimes truly important issues or things that people do deeply care about have disproportionately too little verbiage. Finding those can be very difficult.

vasco

Sounds like a quick way to get Golang installed on every Google developer machine and drive adoption.

rapfaria

After reformatting, did git blame always pointed to those commits and the most recent author, or were they added to --ignore-rev?

aiuto

I want to start by saying that I do not want to diminish or disparage the work that Russ, Rob, Laurent, and others have done. It has made the Google code base better. That is an inarguable fact. Nor do I want to pick on buildifier or gofmt or any other tool as a singleton problem. I'll talk about buildifier because that is what I personally fight with. Others may have different demons. (YDMV - "your daemon may vary". I'm taking the authorship of that one now, in case it ever takes off).

But back to the point... formatting rules without firm, incredibly strict enforcement ends up being a tax on the janitors - the people who clean the code base and do large scale changes. That makes me sad. These are the people who care a great deal about code health, and their work is hindered by the lint checks that we have imposed.

Let me give an example.

I'm trying to eliminate a constraint in the build system. It's a "small" large change - only O(30K) instances. (Yeah, Google scale is different). I have an incredible wealth of tools available to me to automate the process. For the benefit of the Googlers, I can identify Blaze targets to change, use buildozer to fix them, and ship off CLs to review. But the changes I want to make are often ones which should be reviewed by the code owners, and not globally approved. So possibly O(10K) individuals might be involved in reviews.

Let's explore the problem. First, shouts to y2mango for bringing up incremental formatters. This should be the default for all tools. And another to flymasterv for raising the question of "why not just format as each person touches a BUILD file". Here's the situation.

1. buildozer is really good at rewriting BUILD files syntactically correctly. 2. It has an unfortunate side effect of not being incremental. It calls buildifier to rewrite the entire file. 3. We update the formatting rules to make them stricter over time. That means that a "correct" BUILD file on January 1, might require changes on March 1. 4. Buildifier findings are advisory, rather than mandatory. 5. No team is staffed with repeating the monumental work this post started with.

The reality on the ground is that little touched BUILD files become stale, and would require a formatting update over time. It is actually worse than that, because many teams take the path of ignoring buildifier warnings and committing their working code anyway. Without continual BUILD file reformatting there is a lot of stale floating around. [Root cause: We could fix this by promoting people for doing that repeat work. But we don't. We promote for the initial sprint.]

And then a janitor comes along.

I use bulldozer to fix a problem. It reformats an ancient BUILD file completely (not incremental). I send it to the code owner. They see changes far beyond my 2 line fix. They reject it, or ask for a change to only the two lines that actually mattered. Sure. I can hand build the change once or twice. But not for a few hundred, or thousands files. So.... I have to hack up an incremental format. Or, it turns out that users are very happy if I don't bother with formatting at all, and just change single lines. It's not that any individual is right or wrong. It is that they all have a choice and a preference and Google created a policy that allowed individual teams to have a choice of strict compliance or not. That is the failure.

If you are going to have a policy about code formatting: - make it hard mandatory for everything except a "break glass" situation - if the policy can evolve, staff a team with enforcing it globally

The fact that Google, as a company, does not reward this behavior does not take away from any individual's accomplishments. This post may sound grumpy to an outsider, but I am constantly amazed at the tools I have available to fix things on an enormous scale. The friction is usually only where we have good intentions, without the policy teeth to enforce alignment with the intentions. That's a management problem, not a technical one.

troad

The use of light grey text on a darker grey background strains my eyes and makes this unnecessarily unpleasant to read. I'd respectfully suggest increasing the contrast dramatically.

I keep a quick little scriptlet in my bookmark bar for cases like this:

  javascript:(function(){ $('head').append('<style>*{color:#101010 !important; background:#f0f0f0 !important;}</style>'); }());

(A ten second hack job; suggested improvements from front-end friends are welcome.)

ragall

The real lesson here should be that source code shouldn't be stored in a text format, but in a well-defined strict binary format that stores the parse tree directly, which completely eliminates the need for formatters.

flymasterv

I don’t understand why they had to format 100k files. You enforce the format with a presubmit and let the code get formatted in the next change.

I have long felt that Google’s strength has always been making a bad architectural choice and then executing on it flawlessly. So many systems are designed in ways that require incredible technical execution to make them workable, and they do it.