top of page

Advanced Model SVC [1] -Decrease the features

  • Writer: Genre Oracle
    Genre Oracle
  • Nov 2, 2018
  • 1 min read
  1. Decrease the features.

Usually, when the number of features is largely bigger than the number of samples, the performance of SVC will not be good.

So, since we did the baseline model with all the words we had, and there are lots of low frequency words actually means nothing to the models, we decide to delete them.

First, we tried to decrease the columns with 500 columns per step. Which means, we tried to delete 4800 columns to 300 columns with 10 tries. (4800,4300,3800,3300,2800,2300,1800,1300,800,300).


When number of features around 1800, we get the best score- 72.6%. Also, we find that bigger than 1800, the accuracy will not increase anymore and smaller than 1800, the accuracy drop a lot.

Since that,there is meaningless to still use too many features to train the models, cause it only take the running time and bring the low accuracy.

But we still curious about what happened around 1800? Is 1800 the best number of features? How about the performance when number of features is 1900 and 1700?

To answer these question, we narrow down the step to 100 and take a close look at 1800.




When number of features is 2100, we get the best accuracy, which is 72.71%.

 
 
 

Recent Posts

See All

Kommentare


Post: Blog2_Post

©2018 by Love and Hate. Proudly created with Wix.com

  • Facebook
  • Twitter
  • LinkedIn
bottom of page