But you do not get good art by early stopping, you do not get it by injecting noise, you do not get it by regularization. We have better proxies than FID but they all have major problems and none even come close (even when combined).
We've gotten very good at AI art but we've still got a long way to go. Everyone can take a photo, but not everyone is a photographer and it takes great skill and expertise to take such masterpieces. Yet there are masters of the craft. Sure, AI might be better than you at art but that doesn't mean it's close to a master. As unintuitive as this sounds. This is because skill isn't linear. The details start to dominate as you become an expert. A few things might be necessary to be good, but a million things need be considered in mastery. Because mastery is the art of subtly. But this article, it sounds like everything is a nail. We don't have the methods yet and my fear is that we don't want to look (there are of course many pursuing this course, but it is very unpopular and not well received. Scale is all you need is quite exciting, but lacking sufficient complexity, which even Sutton admits to be necessary)
Invented the first generative diffusion model in 2015. https://arxiv.org/abs/1503.03585
The subtle difference between the two being exactly what the author describes: Goodhart's law states that metrics eventually don't work, Campbell's law states that, worse still, eventually they tend to backfire.
I'll also quibble with the example of obesity: the proxy isn't nutrient-rich good, but rather the evaluation function of human taste buds (e.g. sugar detection). The problem is the abundance of food that is very nutrient-poor but stimulating to taste buds. If the food that's widely available were nutrient-rich, it's questionable whether we would have an obesity epidemic.
All of the examples involve a bad proxy metric, or the flawed assumption that spending less improves the ratio of price to performance.
I don’t know if this phenomenon is aptly characterized as “too much efficiency”.
"Ahh, if only I hyperoptimize all aspects of my existence, then I will achieve inner peace. I just need to be more efficient with my time and goals. Just one more meditation. One more gratitude exercise. If only I could be consistent with my habits, then I would be happy."
I've come to see these things as a hindrance to true emotional processing, which is what I think many of us actually need. Or at least it's what I need - maybe I'm just projecting onto everyone else.
The author seems to be discussing optimizing for the wrong metric. That's not a problem of too much efficiency.
Excessive efficiency problems are different. They come from optimizing real output at the expense of robustness. Just-in-time systems have that flaw. Price/performance is great until there's some disruption, then it's terrible for a while.
Overfitting is another real problem, but again, a different one. Overfitting is when you try to model something with too complex a model and and up just encoding the original data in the model, which then has no predictive power.
Optimizing for the wrong metric, and what do about it, is an important issue. This note calls out that problem but then goes off in another direction.
That said, the post is still valuable and would work much better with a framing closer to "some analogies between statistical analysis and public policy" -- the rest of the post (all the political recommendations) is honestly really solid, even if I don't see a lot of the particular examples' connections to their analogous ML approaches. The creativity is impressive, and overall I think it's a productive, thought-provoking exercise. Thanks for posting OP!
Now, for any fellow pendants, the philosophical critique:
more efficient centralized tracking of student progress by standardized testing
The bad part of standardized testing isn't at all that it's "too efficient", it's that it doesn't measure all the educational outcomes we desire. That's just regular ol' flawed metrics. This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting.
Again, overfitting isn't an example of a model being too efficacious, much less too efficient (which IMO is, in technical contexts, a measure of speed/resource consumption and not related to accuracy in the first place).Overfitting on your dataset just means that you built a (virtual/non-actual) model that doesn't express the underlying (virtual) pattern you're concerned with, but rather a subset of that pattern. That's not even a problem necessarily, if you know what subset you've expressed -- words like "under"/"too close" come into play when it's a random or otherwise meaningless subset.
I'm not allowed to train my model on the test dataset though (that would be cheating), so I instead train the model on a proxy dataset, called the training dataset.
I'd say that both the training and test sets are actualized expressions of your targeted virtual pattern. 100% training accuracy means little if it breaks in online, real-world use. When a measure becomes a target, if it is effectively optimized, then the thing it is designed to measure will grow worse.
I'd take this as proof that what we're really talking about here is efficacy, not efficiency. This is cute and much better than the opening/title, but my critique above tells me that this is just a wordy rephrasing of "different things have differences". That certainly backs up their claim that the proposed law is universal, at least!
Among his notable accomplishments, he and coauthors mathematically characterized the propagation of signals through deep neural networks via techniques from physics and statistics (mean field and free probability theory). Leading to arguably some of the most profound yet under-appreciated theoretical and experimental results in ML in the past decade. For example see “dynamical isometry” [1] and the evolution of those ideas which were instrumental in achieving convergence in very deep transformer models [2].
After reading this post and the examples given, in my eyes there is no question that this guy has an extraordinary intuition for optimization, spanning beyond the boundaries of ML and across the fabric of modern society.
We ought to recognize his technical background and raise this discussion above quibbles about semantics and definitions.
Let’s address the heart of his message, the very human and empathetic call to action that stands in the shadow of rapid technological progress:
> If you are a scientist looking for research ideas which are pro-social, and have the potential to create a whole new field, you should consider building formal (mathematical) bridges between results on overfitting in machine learning, and problems in economics, political science, management science, operations research, and elsewhere.
[1] Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
http://proceedings.mlr.press/v80/xiao18a/xiao18a.pdf
[2] ReZero is All You Need: Fast Convergence at Large Depth
https://arxiv.org/pdf/2003.04887