Petals: Unlocking the Power of 100B+ Language Models with On-Device Offloading

Meet Petals, an open-source Artificial Intelligence system that can run 100B+ language models at home Bit-Torrent style

Recent discoveries in the NLP community have shown that language models can be used to perform real-world tasks with minor adjustments or assistance. Performance usually improves as the size increases. The trend of modern language models including hundreds of billions parameters continues. Many research groups have published LLMs that include more than 100B parameter. BLOOM, a model with 176 billion parameters that supports 46 computer and 13 natural languages, was made available by the BigScience project. The availability of 100B+ parameters models is more accessible but due to the memory and computation costs, many academics and practitioners find it difficult to use. OPT-175B, BLOOM 176B, and OPT-172B all require at least 350GB of accelerator memory and more for fine tuning.

In order to run these LLMs, you will need several powerful GPUs and/or multi-node clusters. The cost of these two options is relatively low, which limits the study topics and applications for language models. Recent efforts have sought to democratize LLMs by \”offloading’ model parameters into slower, but more affordable memory. They then execute them layer-by-layer on the accelerator. This technique allows LLMs to be executed with a low-end accelerator by loading parameters just before each forward pass. Offloading is a high-latency process, but it can handle several tokens at once. They are producing one token using BLOOM-176B, which requires at least 5.5 second for the fastest RAM system and 22 second for the fastest SSD offloading arrangement.

Many machines do not have enough RAM to deload 175B parameter. Public inference APIs can make LLMs more accessible. One party hosts the model, and others are able to query it. It is an easy choice, as the API owner does most of the work. APIs can be too rigid for research, as they do not have the ability to alter a model’s internal states or control structure. A current API may also make some research projects prohibitively expensive. In this study they explore a different approach that is motivated by the widespread crowdsourcing of neural networks.


Meet Petals: An Open-Source Artificial Intelligence (AI) System That Can Run 100B+ Language Models At Home Bit-Torrent Style