Post

HR Analytics: Job Change of Data Scientists

The deployed web-app is live now - HR Analytics Web-App Developed a Web-app, which predicts whether an applicant is looking for a new data science job or not with an Accuracy of 84.0%. Preprocessed the data to a specific data form which is suitable for machine learning models to learn. Performed Exploratory Data Analysis to visualize, summarize and interpret the hidden information in the dataset. Applied Dummy Encoding to all Categorical Features.

Post

Digit Recognizer

Developed a Convolutional Neural Network to Classify Handwritten Digits with an accuracy of 99.12% Engineered the training, validation (15% of training set) and test datasets into 28 x 28 x 1 pixels Performed pixel normalization to make the model learn better (from [0,255] to [0,1]) Generated Sequential Model with 3 Convolutional Blocks, each block consists of 2 Conv2D layers with LeakyRelU activation layers, then the MaxPool2D layer (to reduce the size of the image), and finally the Dropout layer (to drop the few activation nodes while training).

Post

Distinguish between a honey bee and a bumble bee using Deep Learning

This project is a series of projects that walk through working with image data, building classifiers using traditional techniques, and leveraging the power of deep learning for computer vision. Image Loading and Processing - This project is the FIRST part of a series of projects, in this I will use the Python image library Pillow to load and manipulate image data. Predict Species from Images - This project is the SECOND part of a series of projects, in this I will then build a model to identify honey bees and bumble bees given an image of these insects.

Post

Who's Tweeting - Trump or Trudeau ?

Loaded corpus of tweets from November 2017 into Pandas DataFrame, and passed it to scikit-learn for further processing Created vectorized representations of the tweets using ‘CountVectorizer’ and ‘TfidfVectorizer’ classes in order to apply machine learning after splitting the data into test and training sets Trained Multinomial Naive Bayes model with both the CountVectorizer and TfidfVectorizer data to check which model will perform better. Results are TF-IDF model performs better with 0.

Post

Predicting Credit Card Approvals

Loaded and Viewed the confidential dataset, as the contributor of the dataset has anonymized the feature names Read this blog, to better understand the anonymized features Handled all missing values, as they affect the performance of machine learning model if they go unchanged Preprocessed the data into Three main tasks: Converted the non-numeric data into numeric, Splitting the data into train and test sets, and Scaled the feature values to a uniform range Fitted a Logistic Regression model (a generalized linear model), and Evaluated the model on the test set with respect to classification accuracy, and summarized the performance of a classification algorithm using Confusion matrix Performed GridSearchCV by defining the grid of values to two hyperparameters ‘tol’ and ‘max_iter’ to improve the model’s ability to predict credit card approvals Summarized the best achieved model score of 85% and the respective best parameters Link to GitHub Repository