TechFlow News: On March 17, Paolo Ardoino, CEO of Tether, revealed that the Tether AI team has released a new version of QVAC Fabric, integrating the cross-platform BitNet LoRA framework to enable training and inference of billion-parameter large language models (LLMs) on consumer-grade GPUs and smartphones.
The new QVAC Fabric LLM marks the first implementation of BitNet LoRA fine-tuning and inference across AMD, Intel, Apple Metal, and mobile GPUs. On flagship devices, GPU inference speed increases by 2x to 11x compared to CPU, while memory usage drops by up to 90% versus full-precision models. The Tether team has successfully fine-tuned models with up to 3.8 billion parameters on flagship smartphones—including the Pixel 9, Galaxy S25, and iPhone 16—and achieved fine-tuning of models with up to 13 billion parameters on the iPhone 16. Related code has been open-sourced on GitHub.




