EDIT: Question about FSST--lets say I build a strings table like:
struct Strings {
compressor: fsst::Compressor,
compressed: Vec<Vec<u8>>
}
Is there some optimal length for compressed given the 255 symbols limit?[1] https://github.com/spiraldb/vortex [2] https://github.com/apache/datafusion
It’s a quite neat algorithm. I saw compression ratios in the 2-3x range. However, I remember that the algorithm for finding the dictionary was a bit unclear. I wasn’t convinced that what was explained in the paper found the “optimal” dictionary. With some slight tweaks I got widely different results. I wonder if this implementation improves on this.