How far can you go with a single Nvidia DGX A100 Station to train modern LLMs?
As it turns out, quite far!!!
Today we're releasing **phi-1.5**, a 1.3B parameter LLM exhibiting emergent behaviors surprisingly close to much larger LLMs, despite being trained for less than 2 weeks on only 8 A100s.
To get a first feel for what the model is capable of, see the picture below with an example completion and a comparison to Falcon 7B and Llama2-7B.
How did we achieve such a model? Well, we used the same recipe as for phi-1 (1.3B parameters model that achieved >50% on HumanEval just 3 months ago), which we call Textbooks Are All You Need. I made an explainer video for the method, recalling the results on phi-1 as well as discussing the new phi-1.5: https://lnkd.in/gTievxiE
Moreover, we have decided to open-source phi-1.5 as we believe that it can be a great tool in the community's quest to better understand LLMs, particularly towards mitigating their shortcomings (hallucinations, toxic/biased content generation ..). See the report for a lot more discussion: https://lnkd.in/gkDU7m3U.
Our models phi-1.5 and phi-1 are available right now on Hugging Face and Azure ML: https://lnkd.in/g9fTfDBP & phi-1 https://lnkd.in/gStZ-tcD . We can't wait to see what the community will discover with them.
Finally, the relatively low cost of developing the phi models has all sorts of deep implications, foremost for the environmental impact of AI, but also it means that this type of exploration can be vastly democratized. I personally can't wait to see what all of you are going to do with this technique, and what is the frontier of what can be achieved at the one billion parameter scale!