Too much efficiency makes everything worse (2022)

refibrillator

I recognize the author Jascha as an incredibly brilliant ML researcher, formerly at Google Brain and now at Anthropic.

Among his notable accomplishments, he and coauthors mathematically characterized the propagation of signals through deep neural networks via techniques from physics and statistics (mean field and free probability theory). Leading to arguably some of the most profound yet under-appreciated theoretical and experimental results in ML in the past decade. For example see “dynamical isometry” [1] and the evolution of those ideas which were instrumental in achieving convergence in very deep transformer models [2].

After reading this post and the examples given, in my eyes there is no question that this guy has an extraordinary intuition for optimization, spanning beyond the boundaries of ML and across the fabric of modern society.

We ought to recognize his technical background and raise this discussion above quibbles about semantics and definitions.

Let’s address the heart of his message, the very human and empathetic call to action that stands in the shadow of rapid technological progress:

> If you are a scientist looking for research ideas which are pro-social, and have the potential to create a whole new field, you should consider building formal (mathematical) bridges between results on overfitting in machine learning, and problems in economics, political science, management science, operations research, and elsewhere.

[1] Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

http://proceedings.mlr.press/v80/xiao18a/xiao18a.pdf

[2] ReZero is All You Need: Fast Convergence at Large Depth

https://arxiv.org/pdf/2003.04887

godelski

I find this article a bit odd, considering what the author is an expert in: generative imagery. It's the exact problem he discusses, the lack of a target that is measurable. Defining art is well known to be ineffable, yet it is often agreed upon. For thousands of years we've been trying to define what good art means.

But you do not get good art by early stopping, you do not get it by injecting noise, you do not get it by regularization. We have better proxies than FID but they all have major problems and none even come close (even when combined).

We've gotten very good at AI art but we've still got a long way to go. Everyone can take a photo, but not everyone is a photographer and it takes great skill and expertise to take such masterpieces. Yet there are masters of the craft. Sure, AI might be better than you at art but that doesn't mean it's close to a master. As unintuitive as this sounds. This is because skill isn't linear. The details start to dominate as you become an expert. A few things might be necessary to be good, but a million things need be considered in mastery. Because mastery is the art of subtly. But this article, it sounds like everything is a nail. We don't have the methods yet and my fear is that we don't want to look (there are of course many pursuing this course, but it is very unpopular and not well received. Scale is all you need is quite exciting, but lacking sufficient complexity, which even Sutton admits to be necessary)

LarsDu88

I was trying to remember where I remember where I heard of this author's name before.

Invented the first generative diffusion model in 2015. https://arxiv.org/abs/1503.03585

remram

Those are great points! Another related law is from queuing theory: waiting time goes to infinity when utilization approaches 100%. You need your processes/machines/engineers to have some slack otherwise some tasks will wait forever.

usaphp

I think it also applies to when managers try to overoptimize work process, in the end creative people lose interest and work becomes unbearable...little chaos is necessary in a work place/life imo...

tpoacher

There was no need to invent a new law named "strong version", it already exists: Campbell's law.

The subtle difference between the two being exactly what the author describes: Goodhart's law states that metrics eventually don't work, Campbell's law states that, worse still, eventually they tend to backfire.

raister

This reminds me of Eli Goldratt's quote: "Tell me how you measure me, I will tell you how I behave."

projektfu

And that's leaving out Jevon's paradox, where increasing efficiency in the use of some scarce resource sometimes/often increases its consumption, by making the unit price of the dependent thing affordable and increasing its demand. For example, gasoline has limited demand if it requires ten liters to go one km, but very high demand at 1 L/10km, even at the same price per liter.

dooglius

Overfitting may be a special case of Goodhart's Law, but I don't think Goodhart's Law in general is the same as overfitting, so I don't think the conclusion is well-supported supported in general; there may be plenty of instances of proxy measures that do not have issues.

I'll also quibble with the example of obesity: the proxy isn't nutrient-rich good, but rather the evaluation function of human taste buds (e.g. sugar detection). The problem is the abundance of food that is very nutrient-poor but stimulating to taste buds. If the food that's widely available were nutrient-rich, it's questionable whether we would have an obesity epidemic.

natmaka

https://en.wikipedia.org/wiki/Diminishing_returns

hedora

I don’t think the author understands what efficiency measures.

All of the examples involve a bad proxy metric, or the flawed assumption that spending less improves the ratio of price to performance.

ocean_moist

Metrics are ambiguous because they are abstractions of success and miss context. If you want a pretty little number, it doesn’t come without cost/missing information.

I don’t know if this phenomenon is aptly characterized as “too much efficiency”.

shahules

Can't agree with you more my friend. Another point on a philosophical level is efficiency or optimization in life, which always focuses on tangible aspects and ignores the greater intangible aspects of life.

Trasmatta

From a social / emotional / spiritual/ humanistic perspective, this is what I see in the "productivity" and "wellness" spaces.

"Ahh, if only I hyperoptimize all aspects of my existence, then I will achieve inner peace. I just need to be more efficient with my time and goals. Just one more meditation. One more gratitude exercise. If only I could be consistent with my habits, then I would be happy."

I've come to see these things as a hindrance to true emotional processing, which is what I think many of us actually need. Or at least it's what I need - maybe I'm just projecting onto everyone else.

eru

Just add some measure of robustness to your optimization criterion. That includes having some slack for unforeseen circumstances.

dawnofdusk

The author is a very sharp individual but is there a reason he insists on labelling overfitting as a phenomenon from machine learning instead of from classical statistics?

rowanG077

I don't think it's unintuitive at all. 100% optimized means 100% without slack. No slack means any hitch at all will destroy you.

failrate

"If you do not build the slack into the system, the system will take the slack out of you."

Animats

Important subject, so-so blog post. This idea deserves further development.

The author seems to be discussing optimizing for the wrong metric. That's not a problem of too much efficiency.

Excessive efficiency problems are different. They come from optimizing real output at the expense of robustness. Just-in-time systems have that flaw. Price/performance is great until there's some disruption, then it's terrible for a while.

Overfitting is another real problem, but again, a different one. Overfitting is when you try to model something with too complex a model and and up just encoding the original data in the model, which then has no predictive power.

Optimizing for the wrong metric, and what do about it, is an important issue. This note calls out that problem but then goes off in another direction.

bbor

IMO the theory at the start of the post is well written and almost there, but it needs to more substantively engage with the relevant philosophical concepts. As a result, the title "efficiency is bad!" is incorrect in my opinion.

That said, the post is still valuable and would work much better with a framing closer to "some analogies between statistical analysis and public policy" -- the rest of the post (all the political recommendations) is honestly really solid, even if I don't see a lot of the particular examples' connections to their analogous ML approaches. The creativity is impressive, and overall I think it's a productive, thought-provoking exercise. Thanks for posting OP!

Now, for any fellow pendants, the philosophical critique:

  more efficient centralized tracking of student progress by standardized testing

The bad part of standardized testing isn't at all that it's "too efficient", it's that it doesn't measure all the educational outcomes we desire. That's just regular ol' flawed metrics.

  This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting.

Again, overfitting isn't an example of a model being too efficacious, much less too efficient (which IMO is, in technical contexts, a measure of speed/resource consumption and not related to accuracy in the first place).

Overfitting on your dataset just means that you built a (virtual/non-actual) model that doesn't express the underlying (virtual) pattern you're concerned with, but rather a subset of that pattern. That's not even a problem necessarily, if you know what subset you've expressed -- words like "under"/"too close" come into play when it's a random or otherwise meaningless subset.

  I'm not allowed to train my model on the test dataset though (that would be cheating), so I instead train the model on a proxy dataset, called the training dataset.

I'd say that both the training and test sets are actualized expressions of your targeted virtual pattern. 100% training accuracy means little if it breaks in online, real-world use.

  When a measure becomes a target, if it is effectively optimized, then the thing it is designed to measure will grow worse.

I'd take this as proof that what we're really talking about here is efficacy, not efficiency. This is cute and much better than the opening/title, but my critique above tells me that this is just a wordy rephrasing of "different things have differences". That certainly backs up their claim that the proposed law is universal, at least!

curious-tech-12

perfect reminder that when you focus too hard on the proxy, you might win the battle and lose the war