Who's Tweeting - Trump or Trudeau ?

May 23, 2020 - 1 minutes read - 139 words

Loaded corpus of tweets from November 2017 into Pandas DataFrame, and passed it to scikit-learn for further processing
Created vectorized representations of the tweets using ‘CountVectorizer’ and ‘TfidfVectorizer’ classes in order to apply machine learning after splitting the data into test and training sets
Trained Multinomial Naive Bayes model with both the CountVectorizer and TfidfVectorizer data to check which model will perform better. Results are TF-IDF model performs better with 0.803 score than the count-based approach with 0.795 score
Evaluated both models using a confusion matrix, without normalization with True Positives, False Positives, False Negatives, and True Negatives metrics
Implemented Linear SVC model, and checked if using it with the TF-IDF vectors improves the accuracy of the classifier!. Result is Linear SVC model performs better with 0.841 score than the Multinomial Bayesian model