Have you looked into these two?
- Trustworthy Online Controlled Experiments by Kohavi, Tang, and Xu
- Statistical Methods in Online A/B Testing by Georgi Georgiev
Recommended by stats stackexchange (https://stats.stackexchange.com/questions/546617/how-can-i-l...)
There's a bunch of other books/courses/videos on o'reilly.
Another potential way to approach this learning goal is to look at Evan's tools (https://www.evanmiller.org/ab-testing/) and go into each one and then look at the JS code for running the tools online.
See if you can go through and comment/write out your thoughts on why it's written that way. of course, you'll have to know some JS for that, but it might be helpful to go through a file like (https://www.evanmiller.org/ab-testing/sample-size.js) and figure out what math is being done.
The arithmetic is simple and cheap. Understanding basic intro stats principles, priceless.
This book is really great, and I highly recommend it, it goes broader than A/B, but covers everything quite well from a first principles perspective.
https://bytepawn.com/five-ways-to-reduce-variance-in-ab-test...
I was lucky to get trained well by 100m+ users over the years. If you have a problem you are trying to solve, I’m happy to go over my approach to designing optimization winners repeatedly.
Alex, I will shoot you an email shortly. Also, sebg’s comment is good if you are looking for of the more academic route to learning.
An interactive look at Thompson sampling
Other than that, Evan's stuff is great, and the Ron Kohavi book gets a +1, though it is definitely dense.
Only at test completion were financial projections attributed to test results. Don’t sugar coat it. Let people know up front just how damaging their wonderful business ideas are.
The biggest learning from this is that the financial projections from the tests were always far too optimistic compared to future development in production. The tests were always correct. The cause for the discrepancies were shitty development. If a new initiative to production is defective or slow it will not perform as well as the tests projected. Web development is full of shitty developers who cannot program for the web, and our tests were generally ideal in their execution.
A/B tests are just a narrow special case of these.
The harsh reality is A/B testing is only an optimization technique. It’s not going to fix fundamental problems with your product or app. In nearly everything I’ve done, it’s been a far better investment to focus on delivering more features and more value. It’s much easier to build a new feature that moves the needle by 1% than it is to polish a turd for 0.5% improvement.
That being said, there are massive exceptions to this. When you’re at scale, fractions of percents can mean multiple millions of dollars of improvements.
This is not a foolproof method, I'd call it only ±5 dB of evidence, so it would shift a 50% likely that they know what they're talking about to like 75% if present or 25% if absent, but obviously look at the rest of it and see if that's borne out. And to be clear: Even mentioning it if it's just to dismiss it, counts!
So e.g. I remember reading a whitepaper about “A/B Tests are Leading You Astray” and thinking “hey that's a fun idea, yeah, effect size is too often accidentally conditioned on whether the result was judged significantly significant which would be a source of bias” ...and sure enough a sentence came up, just innocently, like, “you might even have a bandit algorithm! But you had to use your judgment to discern that that was appropriate in context.” And it’s like “OK, you know about bandits but you are explicitly interested in human discernment and human decision making, great.” So, +5 dB to you.
And on the flip-side if it makes reference to A/B testing but it's decently long and never mentions bandits then there's only maybe a 25% chance they know what they are talking about. It can still happen, you might see e.g. χ² instead of the t-test [because usually you don't have just “converted” vs “did not convert”... can your analytics grab “thought about it for more than 10s but did not convert” etc.?] or something else that piques interest. Or it's a very short article where it just didn't come up, but that's fine because we are, when reading, performing a secret cost-benefit analysis and short articles have very low cost.
For a non-technical thing you can give to your coworkers, consider https://medium.com/jonathans-musings/ab-testing-101-5576de64...
Researching this comment led to this video which looks interesting and I’ll need to watch later about how you have to pin down the time needed to properly make the choices in A/B testing: https://youtu.be/Fs8mTrkNpfM?si=ghsOgDEpp43yRmd8
Some more academic looking discussions of bandit algorithms that I can't vouch for personally, but would be my first stops:
- https://courses.cs.washington.edu/courses/cse599i/21wi/resou... - https://tor-lattimore.com/downloads/book/book.pdf - http://proceedings.mlr.press/v35/kaufmann14.pdf
What gets people are incorrect procedures. To get a sense of all the ways in which an experiment can go wrong, I'd recommend reading more traditional texts on experimental design, survey research, etc.
- Donald Wheeler's Understanding Variation should be mandatory reading for almost everyone working professionally.
- Deming's Some Theory of Sampling is really good and covers more ground than the title lets on.
- Deming's Sample Design in Business Research I remember being formative for me also, although it was a while since I read it.
- Efron and Tibshirani's Introduction to the Bootstrap gives an intuitive sense of some experimental errors from a different perspective.
I know there's one book covering survey design I really liked but I forget which one it was. Sorry!