Context Seinfeld is my favorite TV show. I wrote a script to scrape the scripts of all Seinfeld episodes from the site [seinology.com](http://www.seinology.com) and merge them into a text corpus so that I could train a language model. The source code is available [here](https://github.com/luonglearnstocode/Seinfeld-text-corpus). Hope you could find it useful. Any feedback would be appreciated. Content corpus.txt: the corpus of length 717576 words, including 64919 lines of Seinfeld scripts, ready to train language models on.