Skip to content

Code for the curation of The Stack v2 and StarCoder2 training data

License

Notifications You must be signed in to change notification settings

bigcode-project/the-stack-v2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Stack v2 & StarCoder2Data

In this repository you can find the code for building The Stack v2 dataset, as well as the extra sources used to make StarCoder2data: the training corpus of the StarCoder2 family of models.

This reposirory is a follow-up of on the work in bigcode-dataset used for The Stack v1 and StarCoderData.

About

Code for the curation of The Stack v2 and StarCoder2 training data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages