So a while ago I made a post asking the ethical validity of making an AI do my homework for me. I got a ton of super supportive and helped me see both perspectives.
Now a step up from that, I am trying to make my own language models. As good as GPT2 is, It is not good enough for most tasks. And OpenAI is not giving access to GPT3. So, I am trying to make a model which, normal people without access to super computers can run, yet it is able to beat GPT2’s performance. It has 2x the context size and attention heads of GPT2 and more layers as well. Since I access to significantly more data than was used to train GPT2 and the size is comparable to the smaller GPT3 models, I am fairly confident that the model will be able to outperform GPT2 and be on power with the smaller GPT3 models. My goal for this project is to train a variety of models ranging from 500m parameters all the up to 10b (which is the size of the second largest GPT3 model). I promise that unlike OpenAI’s GPT3, my model with truly be open for anyone to download and use.
Currently I am in the process of training the very first model. This model is has 590m parameters, and would be the smallest of them all. The model should be completely trained in a week, provided my compute resources don’t run out (I’m using Kaggle Notebooks which limit me to 30 hrs a week). Training from scratch is a lot more intensive than finetuning and this is my first time making a model so big, so fingerscrossed. I’m aiming for this model to have a lower perplexity than it’s GPT-2 counterparts (the 762M model) on the standard evaluation metrics – like LAMBADA or PTB. Oh also, since the context length is already 2x that of GPT2 (same as GPT3), n-shot instead of finetuning can be used for certain tasks. I am also looking into the processes of training a model with even larger context window (This would allow the model to be able to generate large pieces of texts, like entire stories), but that would definitely be a long time from now.
Now on to my request. I would really appreciate if anyone could help me out with compute. 30 hrs of GPU and TPU time a week is not enough to train the models I want to make. The free GPU and TPUs given by Colab and Kaggle aren’t enough to train the larger models. They are unable to even fit the larger models. I don’t afford to train these models on Google Cloud or AWS, they are just way to expensive for a student such as myself. If you have some compute resources available, and are willing to share, please message me. As I mentioned before, I will be releasing the model to the public and I believe a model such as this can genuinely help out researchers or developers.
What about Colab TPU/GPUs?
The quota is not enough to be able to train this model through. The runtime dies out before even a small portion of the training can take place.