| Skills: Machine Learning | Classification | SMOTE | Model Comparison | Healthcare Analytics |
Migraines affect over one billion people worldwide and are a leading cause of disability. However, migraines are frequently underdiagnosed or misclassified, which can lead to ineffective treatment. This project explores whether machine learning models can accurately classify migraine types using patient characteristics and associated neurological symptoms.
The dataset was sourced from Kaggle and contains 400 patient records with 23 features describing migraine characteristics and symptoms.
Key variables include:
The target variable was Migraine Type, consisting of seven migraine categories.
Data preprocessing included:
Three supervised machine learning models were evaluated:
Model performance was compared using:
Because the dataset contained class imbalance, macro F1-score was used to evaluate model performance across all migraine categories.
| Model | Accuracy | Macro F1 |
|---|---|---|
| Logistic Regression | 80% | 0.76 |
| Random Forest | 90% | 0.80 |
| XGBoost | 91% | 0.84 |
Tree-based models significantly improved classification performance compared to logistic regression. XGBoost achieved the best overall performance and demonstrated strong ability to identify minority migraine classes.
Feature importance analysis from the Random Forest model identified key predictors including:
These predictors align closely with clinical migraine symptom patterns.
Machine learning models can effectively classify migraine types based on symptom patterns. XGBoost provided the strongest performance, suggesting that ensemble models are well suited for multi-class clinical classification tasks.
Python, pandas, scikit-learn, XGBoost, SMOTE, matplotlib