May 18, 2020
Explored the books to be used in recommendation system, and loaded the contents of each book Pre-processed the data to facilitate the downstream analysis Referred Darwin’s most famous book: “On the Origin of Species.” for consistency of the analysis Transformed the Corpus (collection of words) into a format that is easier to deal with for the downstream analyses, i.e., transform each text into a list of the individual words (called tokens) Implemented Stemming Process to group together the inflected forms of a word so they can be analyzed as a single item: the stem Loaded the final result from a pickle file to make the process faster, as stemming algorithm takes several minutes to run Created universe of all words, i.