From the course: Introduction to Large Language Models

PaLM and PaLM 2

- [Instructor] We're finally at one of the largest language models to date. Google released PaLM, or to give it its full name, the Pathways Language Model, in April 2022. There are a couple of key takeaways from the PaLM model. PaLM is the largest model with 540 billion parameters. Google uses the Pathway systems, which meant more hardware accelerators could be used for model training, and they could be trained more efficiently. PaLM was trained on an enormous 780 billion tokens multilingual corpus with text from over 100 languages. Now, just over 3/4 of this training data was in English. Another really interesting phenomena that the Google team picked up on was on scaling. It looked like the models could only perform certain tasks once a certain scale was reached. Here, 8 billion parameter models could perform certain tasks such as question answering, language understanding, and arithmetic. It was only when the model was scaled up to 62 billion parameters that more tasks such as translation, summarization, and common sense reasoning were possible. But it then required a much bigger jump to 540 billion parameters for the model to be able to perform tasks such as general knowledge, reading comprehension, and joke explanation amongst others. PaLM is the best performing model of the lot, outdoing Gopher and Chinchilla on these standard benchmarks that measure natural language understanding and natural language generation tasks. Let's add PaLM to our list of models. You can see that it's the largest model to date, and although it wasn't trained on as many training tokens as Chinchilla, it performs the best. If we take the lessons from the Chinchilla paper, it means that although the PaLM model was trained on 780 billion training tokens, which is more data than the 380 billion that was used for other models, it's likely that it has still been undertrained and could perform better if trained on more data. In May of 2023, Google released PaLM 2, which surpassed the capabilities of PaLM. Now, because of the increasing competitive nature of the large language model space, Google has not released information on the size of the model nor the amount of data it was trained on. PaLM 2 was trained on more than 100 languages, meaning it's able to understand and generate text that is more nuanced. It has also passed advanced language proficiency exams at the mastery level. Scientific papers and pages with mathematical expressions were part of the vast training dataset that was used to train PaLM 2, and as a result, it's even better at logic and common sense reasoning and mathematical tasks compared to the original PaLM model. PaLM 2 was also trained on a large quantity of publicly available programming code. This means it's not only excellent at generating code or correcting code in popular modern languages like Python and JavaScript, but it can also generate code in languages like Prologue and Fortran and Verilog. And what's remarkable is that PaLM 2's smallest model will be able to run on mobile devices. PaLM 2 will be used to power features in the other Google products like Gmail and Google Docs, and also expand Bard, which is another large language model released by Google. One of the other benefits of PaLM 2 is that it's been used to train Med-PaLM 2 that has been trained by Google's Health Research teams. This is the first large language model to perform at expert level on US medical licensing exam-style questions. The Google team are adding multimodal capabilities. This means you could feed it an X-ray image, and then ask the model to analyze the X-ray image and report on its findings, and then potentially query the model further on the image. Let's add PaLM 2 to our list of models. It looks like PaLM 2 is Google's best performing model, and it'll be powering many of its products.

Contents