If you save 32x memory with binarization, why not do a projection to a larger dimension? Say 4096 for instance. Could this actually improve performance WHILE reducing memory?
Have people seen a practical amount of neighbors to retrieve to make binary retrieval and float32 reranking efficient?
Does it mean that discrete representation is enough for capturing high-level semantic info?
I went through the post and I have absolutely no clue what this person is talking about. But I want to be in a place where I can understand what the person is saying.

How can I reach that point? I was lost at quantized, could understand bit packing, and was even more lost when the author started talking about things like Hamming Distance.

Please help me out. I want to grow my career in this direction.

This article screams LLM generated...