GPT4 can answer questions given to it in Base64. I would imagine it suffers some degree of degradation in ability from the extra workload this causes but I haven't seen any measurements on this.
I have wondered about other architectures to help. What happens when a little subnet encodes the (16 or 32?) characters in the neighborhood of the token into an embedding that gets attached to the top level token embedding?
Part of what makes AI interesting is that it can understand a huge number of differently phrased data. It seems like different token encodings would only be a very minor complexity compared to the variety of human language.
It doesn't really explain anything besides talking about tokenization on random levels.
You need a certain amount of data to even understand that once upon a time might be a higher level concept.
I think the future is a small word encoder model that replaces the token embedding codebook.
And here’s the reason: you can still create a codebook after training and then use the encoder model only for OOV. I’m not sure there’s an excuse not to be doing this, but open to suggestions.
TechCrunch, I respect that you have a style guide vis a vis punctuation and quotation marks, but please understand when it's appropriate to break the rules. :P