Who's Tweeting - Trump or Trudeau ?
- 1 minutes read - 139 words- Loaded corpus of tweets from November 2017 into Pandas DataFrame, and passed it to scikit-learn for further processing
- Created vectorized representations of the tweets using ‘CountVectorizer’ and ‘TfidfVectorizer’ classes in order to apply machine learning after splitting the data into test and training sets
- Trained Multinomial Naive Bayes model with both the CountVectorizer and TfidfVectorizer data to check which model will perform better. Results are TF-IDF model performs better with 0.803 score than the count-based approach with 0.795 score
- Evaluated both models using a confusion matrix, without normalization with True Positives, False Positives, False Negatives, and True Negatives metrics
- Implemented Linear SVC model, and checked if using it with the TF-IDF vectors improves the accuracy of the classifier!. Result is Linear SVC model performs better with 0.841 score than the Multinomial Bayesian model