Context This is the processed data for kaggle competition (Jigsaw Unintended Bias in Toxicity Classification). 1. cleaned contractions 2. cleaned spaces 3. remove unknown characters 4. generated more features with feature engineering 5. saved embedding matrix for keras embedding layer The goal is to fight limited training resources.