Analysis of Word Choice in Different Genres
We begin our analysis of word selection in song lyrics by dividing the lyrics according to song genres, counting the most frequently used...
Advanced Model SVC [3] -Change the key parameters of SVC.
a. kernel First, we changed the kernel of SVC. To see if the other function, like ‘poly’, ‘rbf’, ‘sigmoid’ are more suitable for this...
Advanced Model SVC [2] -Add new features
2. Add new features. --Add several new features and give relatively high weight: words number, words length, repeat time and protantity...
Advanced Model SVC [1] -Decrease the features
Decrease the features. Usually, when the number of features is largely bigger than the number of samples, the performance of SVC will ...
Advanced Model SVC [0]
SVC has the best accuracy score in the baseline models, so in the following steps, we will improve the SVC by three ways: 1. Decrease the...
Baseline Model-Classical Classification Models
Besides the Bayes, we solve the classification problem by using classical classification models- SVM, Logistic Regression and KNN. We...
Baseline Model-Naive Bayes
We constructed our own Naive Bayes model. Take the lyrics of about 350 songs for each genre for training Note that some songs has...
Exploratory Statistics
Our cleaned dataset is separated into two parts: tags and lyrics Dataset introduction tag dataset We have 21167 rows * 9 columns The...
Data Acquisition and Cleaning
We explored many possible datasets and finally decided to use the million song dataset produced by Columbia University. This dataset...
Project Introduction
After searching through datasets that are available online, we decided to narrow down our research topic into song genre classification....





