On June 8, Xiaomi's MiMo team, in collaboration with inference engine vendor TileRT, officially launched the MiMo-V2.5-Pro-UltraSpeed mode, achieving 1000 tokens/s inference speed for a 1-trillion-parameter large model on a single standard 8-GPU general-purpose server for the first time, with a peak of 1200 tokens/s. This is 15 times faster than GPT-5.5 and 14 times faster than Claude Opus. A complex visualization dashboard generation task was shortened from 6 minutes 15 seconds to 13 seconds, a maximum speedup of 28 times. The technology uses FP4 quantization, DFlash speculative decoding, and TileRT custom kernel optimization, requiring no custom chips. It is currently available via application-only limited access (June 9-23), priced at 3 times the standard version but delivering 10 times the speed experience.
#Xiaomi #MiMo #LargeModelInference #AISpeed #TileRT
