| Skills: Machine Learning | Classification | Feature Engineering | Model Comparison | Healthcare Analytics | Python (pandas, scikit-learn) |
Maternal health complications remain a significant global health concern. Early detection of high-risk pregnancies can improve prenatal care, enable early intervention, and reduce maternal and infant mortality.
This project develops machine learning models to classify pregnancy risk levels using maternal health indicators. Risk levels were categorized as:
The objective was to determine whether physiological measurements routinely monitored during pregnancy could accurately predict maternal risk levels.
The dataset was obtained from Kaggle and contains maternal health indicators commonly monitored during pregnancy.
Key variables included:
The target variable was RiskLevel, representing pregnancy risk categories.
Several preprocessing steps were performed:
Exploratory data analysis included visualizations examining relationships between maternal age, blood pressure, glucose levels, and pregnancy risk.
A Random Forest model was used to evaluate feature importance. The most influential predictors were:
Heart rate was removed due to weak correlation with the target variable.
This multi-class classification problem evaluated four machine learning models:
Model performance was compared using classification accuracy.
| Model | Accuracy |
|---|---|
| Random Forest | 70% |
| SVM | 68% |
| XGBoost | 65% |
| Logistic Regression | 64% |
The Random Forest model produced the highest predictive accuracy and was the most effective model for identifying high-risk pregnancies.
However, all models struggled to accurately classify mid-risk pregnancies, suggesting that additional clinical variables may be necessary for improved classification.
Machine learning models show promise for predicting maternal health risk levels using physiological indicators collected during pregnancy. While the Random Forest model performed best in this analysis, additional clinical data and further validation would be required before real-world deployment.
Python, pandas, scikit-learn, XGBoost, matplotlib