Explore the potential of PaLM using Pathways and 540 billion parameters

Pathways Language Model: Scaling up to 540 billion parameters for breakthrough performance

In recent years, large networks of neural nets trained to understand and generate language have shown impressive results in a variety of tasks. GPT-3 was the first to demonstrate that large language models can be used in few-shot learning, and achieve impressive results with no need for large-scale data collection specific to a task or updating model parameters. LLMs such as GLaM and LaMDA have achieved the best few-shot learning results for many tasks. They did this by increasing model size, activating modules sparsely, and using larger datasets. There is still much to be done in order to fully understand the capabilities of few-shot learning, as we continue pushing the limits of the model.

Google Research released its vision of Pathways last year, a model that is highly efficient and can be generalized across tasks and domains. The development of the Pathways system, which orchestrates distributed computations for accelerators, was an important milestone in realizing this vision. We introduce in \”PaLM Scaling Language Modeling With Pathways\” the Pathways Language Model, a Transformer model with 540 billion parameters, dense decoder only, trained using the Pathways system. This model allowed us to train a model efficiently across multiple TPUv4 pods. PaLM was evaluated on hundreds of language generation and understanding tasks. It achieved state-of-the art performance in most cases, often by significant margins.