System Transparency

How CardioCare-AI Works

A clear, honest look at the dataset, algorithm, methodology, and measured performance powering every prediction.

Algorithm

Gradient Boosting Classifier

We evaluated Logistic Regression, Random Forest, SVM, and Neural Networks. The Gradient Boosting Classifier achieved the best balance of accuracy (73%) and interpretability — every prediction comes with feature importance scores.

49,000+
Training Samples
21,000
Test Samples
11
Input Features
v1.0
Model Version
Why This Algorithm

Built for Explainability

🌲
Ensemble of Decision Trees

Hundreds of small trees trained sequentially, each correcting the errors of the previous one — superior accuracy over any single model.

⚖️
Feature Importance Scores

Each prediction exposes ranked feature importances — showing exactly which factors (BP, Age, Cholesterol) drove the risk score.

🧮
Handles Mixed Data

Natively handles numeric vitals alongside categorical lifestyle flags without one-hot encoding degradation.

Inputs

11 Clinical Features

🎂Age
⚧️Gender
📏Height
⚖️Weight
💓Systolic BP
💗Diastolic BP
🩸Cholesterol
🍬Glucose
🚭Smoking
🍷Alcohol
🏃Activity
Process

How a Prediction Is Made

01
📝
Input

You fill in 11 clinical and lifestyle fields in the prediction form.

02
Validate

Each field is range-checked; systolic must exceed diastolic.

03
⚙️
Scale

Values are transformed using the StandardScaler fitted on 70k training records.

04
🤖
Predict

GBM outputs a probability score and 3-tier risk classification.

Analytics

Model Performance

Evaluated on 21,000 unseen records. All charts use static values computed from the test set.

73%
Accuracy
On 21,000 held-out test records
0.80
AUC-ROC
Area under the ROC curve
77%
Precision (Risk)
Of predicted risk, 77% confirmed
65%
Recall (Risk)
65% of real risk cases detected

📋 Classification Report

Per-class precision, recall, and F1-score from the test set evaluation.

MetricPrecisionRecallF1-ScoreSupport
Class 0 (No Risk)0.710.810.7610,353
Class 1 (Risk)0.770.650.709,914
Accuracy0.7320,267
Macro Avg0.740.730.7320,267
Weighted Avg0.740.730.7320,267

🔲 Confusion Matrix

Shows how predictions align with actual outcomes. Green = correct, Red = errors.

Predicted
8,180
TN
2,173
FP
3,185
FN
6,729
TP
Total: 20,267 | Correct: 73.6%

📉 ROC Curve

Receiver Operating Characteristic — measures model's ability to distinguish between classes. Area Under Curve (AUC) closer to 1.0 = better.

📊 Feature Importance

Which input features most influenced the model's predictions — ranked by Gini importance from the Gradient Boosting ensemble.

🎯 Precision-Recall Curve

Critical for medical models — the tradeoff between detecting all real cases (recall) and avoiding false alarms (precision). Average Precision (AP) = 0.8.

📦 Risk Score Distribution

How the model spreads its confidence scores across the 21,000-patient test set. A bimodal shape (peaks at both ends) confirms the model makes decisive predictions rather than hedging near 50%.

👥 CVD Risk by Age Group

Prevalence of cardiovascular disease by age group in the training dataset. Confirms why age ranks as the #2 most important feature — risk more than doubles every decade after 40.

🔥 Feature Correlation

Shows Pearson correlation between all input features. Red = negative, Blue = positive. Hovering reveals exact values.

⚠️

Limitations & Disclaimer

Not a medical diagnosis. CardioCare-AI is an educational tool. Results are probabilistic estimates and must not replace advice from a qualified healthcare professional.

Dataset scope. Trained on a single dataset of 70,000 records. May not generalise equally across all ethnicities, geographies, or clinical settings.

Known accuracy ceiling. At 73% accuracy, 27% of predictions may be incorrect. A "Low Risk" result does not mean you are free of cardiovascular disease.

Read full caution notice →