Dimensionality reduction is a well-known technique for limiting the size of the feature space and for discovering latent meaningful variables in the input data. It is particularly valuable when the raw data is sparse and its processing by machine learning algorithms becomes computationally very expensive. On the other hand, sentiment analysis refers to a collection of text classification methods that identify the polarity of the user opinions in blog posts, reviews, tweets, etc. However, since text is naturally very sparse, training classification models is often intractable, rendering the importance of dimensionality reduction even greater. In this paper we study the impact of dimensionality reduction in sentiment analysis classification tasks. Through extensive experimentation with traditional algorithms and benchmark datasets, we verify the general intuition that the dimensionality reduction methods significantly improve the data preprocessing times and the model training durations, while they sacrifice only small amounts of accuracy. Simultaneously, we highlight several exceptions to this rule, where the training times actually increase and the accuracy losses are significant. |
*** Title, author list and abstract as seen in the Camera-Ready version of the paper that was provided to Conference Committee. Small changes that may have occurred during processing by Springer may not appear in this window.