| Skills: Machine Learning | Regression Modeling | Healthcare Analytics | Feature Engineering | Model Evaluation |
This project investigates whether clinical and demographic variables available at NICU admission can be used to predict neonatal length of stay (LOS). Accurate LOS predictions could help hospitals anticipate bed utilization, staffing needs, and operational planning in neonatal intensive care units.
The dataset used was the Neonatal Sepsis Care dataset containing neonatal clinical and demographic information collected at NICU admission.
Key variables included:
The target variable was length of stay in days.
Three regression models were evaluated:
Data preparation included:
Models were evaluated using:
All models produced similar performance with an average prediction error of approximately 4 days. However, negative R² values indicated that admission-only variables did not capture enough information to explain variation in NICU length of stay.
Feature importance analysis showed that:
were the most influential predictors.
Python, pandas, scikit-learn, XGBoost, matplotlib