dzaima
uops.info's measurements show 'inc r64', interleaved with 'movsxd' instructions, still having zero latency[0], so it can't be just merging the immediates of successive increments (or there's additional fusion happening). Plain unrolled 'inc r64' shows an average latency of 0.2 cycles, i.e. 5 dependent ops per cycle. And 0.2 used ports per instr [1].

Similarly, 'lea r64, [r64+8]' (imm8) and 'lea r64, [r64+128]' (imm32) and 'add r64, 2' (imm8); but not 'add r64, 0x1000000' (imm32).

[0]: https://uops.info/html-lat/ADL-P/INC_R64-Measurements.html

[1]: https://uops.info/html-tp/ADL-P/INC_R64-Measurements.html