How CardioCare-AI Works
A clear, honest look at the dataset, algorithm, methodology, and measured performance powering every prediction.
Gradient Boosting Classifier
We evaluated Logistic Regression, Random Forest, SVM, and Neural Networks. The Gradient Boosting Classifier achieved the best balance of accuracy (73%) and interpretability — every prediction comes with feature importance scores.
Built for Explainability
Hundreds of small trees trained sequentially, each correcting the errors of the previous one — superior accuracy over any single model.
Each prediction exposes ranked feature importances — showing exactly which factors (BP, Age, Cholesterol) drove the risk score.
Natively handles numeric vitals alongside categorical lifestyle flags without one-hot encoding degradation.
11 Clinical Features
How a Prediction Is Made
You fill in 11 clinical and lifestyle fields in the prediction form.
Each field is range-checked; systolic must exceed diastolic.
Values are transformed using the StandardScaler fitted on 70k training records.
GBM outputs a probability score and 3-tier risk classification.
Model Performance
Evaluated on 21,000 unseen records. All charts use static values computed from the test set.
📋 Classification Report
Per-class precision, recall, and F1-score from the test set evaluation.
| Metric | Precision | Recall | F1-Score | Support |
|---|---|---|---|---|
| Class 0 (No Risk) | 0.71 | 0.81 | 0.76 | 10,353 |
| Class 1 (Risk) | 0.77 | 0.65 | 0.70 | 9,914 |
| Accuracy | 0.73 | 20,267 | ||
| Macro Avg | 0.74 | 0.73 | 0.73 | 20,267 |
| Weighted Avg | 0.74 | 0.73 | 0.73 | 20,267 |
🔲 Confusion Matrix
Shows how predictions align with actual outcomes. Green = correct, Red = errors.
📉 ROC Curve
Receiver Operating Characteristic — measures model's ability to distinguish between classes. Area Under Curve (AUC) closer to 1.0 = better.
📊 Feature Importance
Which input features most influenced the model's predictions — ranked by Gini importance from the Gradient Boosting ensemble.
🎯 Precision-Recall Curve
Critical for medical models — the tradeoff between detecting all real cases (recall) and avoiding false alarms (precision). Average Precision (AP) = 0.8.
📦 Risk Score Distribution
How the model spreads its confidence scores across the 21,000-patient test set. A bimodal shape (peaks at both ends) confirms the model makes decisive predictions rather than hedging near 50%.
👥 CVD Risk by Age Group
Prevalence of cardiovascular disease by age group in the training dataset. Confirms why age ranks as the #2 most important feature — risk more than doubles every decade after 40.
🔥 Feature Correlation
Shows Pearson correlation between all input features. Red = negative, Blue = positive. Hovering reveals exact values.
Limitations & Disclaimer
Not a medical diagnosis. CardioCare-AI is an educational tool. Results are probabilistic estimates and must not replace advice from a qualified healthcare professional.
Dataset scope. Trained on a single dataset of 70,000 records. May not generalise equally across all ethnicities, geographies, or clinical settings.
Known accuracy ceiling. At 73% accuracy, 27% of predictions may be incorrect. A "Low Risk" result does not mean you are free of cardiovascular disease.