The problem is: seems that their costs is 1x or 2x from what they are charging.
There doesn't seem to be much flux in the low level architectures used for inferencing at this point, so may as well commit to an ASIC, as is already happening with Apple, Qualcomm, etc building NPUs into their SOCs.
but when things plateau off, this, then ASICs, would probably be the most efficient way ahead for "stable" versions of AI models during inference.
Not that I was expecting GPU like efficiency from a fairly small scale FPGA project. Nvidia engineers spent thousands of man-years making sure that stuff works well on GPUs.