top of page

Advanced Model SVC [2] -Add new features

  • Writer: Genre Oracle
    Genre Oracle
  • Nov 2, 2018
  • 1 min read

2. Add new features.

--Add several new features and give relatively high weight: words number, words length, repeat time and protantity percentage.



a. Based on word format

In this part, we only consider the words’ formats.

According to the data exploratory, we find “sum of words in each song”, “average words length in each song” and “the repeat times(=sum of words/number of words)” changes by the genre. (Please see the plots below) So we suppose that these three features will be good factors when classify the genre by the lyrics.





We add the these three features into our linearSVC model, and get the accuracy score 86.62%



b. Based on word meaning

By explorating the data, we also find that the percentage of profanity words has large different from each genre.


So we added this feature into our model.

By now, our SVM dataset has 2012 features in all. (3 words format related feature + 1 words meaning related feature + 1998 words count features)

X:


Y:












With linearSVC, we get 86.63% accuracy score. The new feature profanity words percentage didn’t bring too much accuracy rate for us maybe because this feature has the same work with first three features, largely distinguishing the hiphop and rap from the other genres, but being still hard to classify the rest 12 genres.

 
 
 

Recent Posts

See All

Comments


Post: Blog2_Post

©2018 by Love and Hate. Proudly created with Wix.com

  • Facebook
  • Twitter
  • LinkedIn
bottom of page