A feature selection and scoring scheme for dimensionality reduction in a machine learning task
A feature selection and scoring scheme for dimensionality reduction in a machine learning task
Blog Article
Selection of important features is very vital in machine learning tasks involving high-dimensional dataset with large features.It helps in reducing the dimensionality of a dataset and improving model performance.Most of the feature selection techniques have 5 Piece Modular Sectional restriction in the kind of dataset to be used.This study proposed a feature selection technique that is based on statistical lift measure to select important features from a dataset.The proposed technique is a generic approach that can be used in any binary classification dataset.
The technique successfully determined the most important feature subset and outperformed the existing techniques.The proposed technique was tested on lungs cancer dataset and happiness classification dataset.The effectiveness of the proposed technique in selecting important features subset was evaluated and compared with other existing techniques, namely Chi-Square, Pearson Correlation and Information Gain.Both the proposed and the existing techniques were evaluated Video Receivers on five machine learning models using four standard evaluation metrics such as accuracy, precision, recall and F1-score.The experimental results of the proposed technique on lung cancer dataset shows that logistic regression, decision tree, adaboost, gradient boost and random forest produced a predictive accuracy of 0.
919%, 0.935%, 0.919%, 0.935% and 0.935% respectively, and that of happiness classification dataset produced a predictive accuracy of 0.
758%, 0.689%, 0.724%, 0.655% and 0.689% on random forest, k-nearest neighbor, decision tree, gradient boost and cat boost respectively, which outperformed the existing techniques.