Solution to Assignment 5#

%pip install pgmpy
Collecting pgmpy
  Downloading pgmpy-0.1.25-py3-none-any.whl (2.0 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 7.1 MB/s eta 0:00:00
?25hRequirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from pgmpy) (3.3)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from pgmpy) (1.25.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from pgmpy) (1.11.4)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.10/dist-packages (from pgmpy) (1.2.2)
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from pgmpy) (2.0.3)
Requirement already satisfied: pyparsing in /usr/local/lib/python3.10/dist-packages (from pgmpy) (3.1.2)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from pgmpy) (2.2.1+cu121)
Requirement already satisfied: statsmodels in /usr/local/lib/python3.10/dist-packages (from pgmpy) (0.14.2)
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (from pgmpy) (4.66.4)
Requirement already satisfied: joblib in /usr/local/lib/python3.10/dist-packages (from pgmpy) (1.4.2)
Requirement already satisfied: opt-einsum in /usr/local/lib/python3.10/dist-packages (from pgmpy) (3.3.0)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas->pgmpy) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas->pgmpy) (2023.4)
Requirement already satisfied: tzdata>=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas->pgmpy) (2024.1)
Requirement already satisfied: threadpoolctl>=2.0.0 in /usr/local/lib/python3.10/dist-packages (from scikit-learn->pgmpy) (3.5.0)
Requirement already satisfied: patsy>=0.5.6 in /usr/local/lib/python3.10/dist-packages (from statsmodels->pgmpy) (0.5.6)
Requirement already satisfied: packaging>=21.3 in /usr/local/lib/python3.10/dist-packages (from statsmodels->pgmpy) (24.0)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy) (3.14.0)
Requirement already satisfied: typing-extensions>=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy) (4.11.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy) (1.12)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy) (2023.6.0)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch->pgmpy)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch->pgmpy)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch->pgmpy)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch->pgmpy)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch->pgmpy)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch->pgmpy)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch->pgmpy)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch->pgmpy)
  Using cached nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch->pgmpy)
  Using cached nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
Collecting nvidia-nccl-cu12==2.19.3 (from torch->pgmpy)
  Using cached nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
Collecting nvidia-nvtx-cu12==12.1.105 (from torch->pgmpy)
  Using cached nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch->pgmpy) (2.2.0)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch->pgmpy)
  Using cached nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
Requirement already satisfied: six in /usr/local/lib/python3.10/dist-packages (from patsy>=0.5.6->statsmodels->pgmpy) (1.16.0)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2->torch->pgmpy) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy->torch->pgmpy) (1.3.0)
Installing collected packages: nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12, pgmpy
Successfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105 pgmpy-0.1.25
# Q1) Build your Bayesian Network using pgmpy as shown in class. Make use of the Variable Elimination method.

from pgmpy.models import BayesianNetwork
from pgmpy.factors.discrete import TabularCPD
from pgmpy.inference import VariableElimination

# Define the structure of the Bayesian Network

model_diabetes = BayesianNetwork([
    ('Age', 'Diabetes'),
    ('Lifestyle', 'Diabetes'),
    ('FamilyHistory', 'Diabetes'),
    ('Diabetes', 'BloodTests'),
    ('Diabetes', 'CardioRisk'),
    ('Cholesterol', 'CardioRisk'),
    ('Hypertension', 'CardioRisk')
])
# Define the Conditional Probability Distributions (CPDs)
cpd_age = TabularCPD(variable='Age', variable_card=3,
                     values=[[0.4], [0.4], [0.2]], state_names={'Age': ['Young', 'Middle-aged', 'Elderly']})

cpd_lifestyle = TabularCPD(variable='Lifestyle', variable_card=2,
                           values=[[0.7], [0.3]], state_names={'Lifestyle': ['Unhealthy', 'Healthy']})

cpd_familyhistory = TabularCPD(variable='FamilyHistory', variable_card=2,
                               values=[[0.85], [0.15]], state_names={'FamilyHistory': ['Absent', 'Present']})


cpd_cholesterol = TabularCPD(variable='Cholesterol', variable_card=2,
                             values=[[0.5], [0.5]],
                             state_names={'Cholesterol': ['Normal', 'High']})

cpd_hypertension = TabularCPD(variable='Hypertension', variable_card=2,
                              values=[[0.6], [0.4]],
                              state_names={'Hypertension': ['No', 'Yes']})


cpd_diabetes = TabularCPD(variable='Diabetes', variable_card=2,
                          values=[
                            [0.97, 0.93, 0.99, 0.95, 0.81, 0.77, 0.88, 0.81, 0.65, 0.61, 0.78, 0.70], # No Diabetes
                            [0.03, 0.07, 0.01, 0.05, 0.19, 0.23, 0.12, 0.19, 0.35, 0.39, 0.22, 0.30]  # Yes Diabetes
                          ],
                          evidence=['Age', 'Lifestyle', 'FamilyHistory'],
                          evidence_card=[3, 2, 2],
                          state_names={'Diabetes': ['No', 'Yes'], 'Age': ['Young', 'Middle-aged', 'Elderly'], 'Lifestyle': ['Unhealthy', 'Healthy'], 'FamilyHistory': ['Absent', 'Present']})

cpd_bloodtests = TabularCPD(variable='BloodTests', variable_card=2,
                            values=[[0.7, 0.3],
                                    [0.3, 0.7]],
                            evidence=['Diabetes'],
                            evidence_card=[2],
                            state_names={'BloodTests': ['Normal', 'Abnormal'], 'Diabetes': ['No', 'Yes']}
                            )


cpd_cardiorisk = TabularCPD(variable='CardioRisk', variable_card=2,
                            values=[
                                [0.9, 0.7, 0.8, 0.6, 0.85, 0.65, 0.75, 0.55],  # Low Risk
                                [0.1, 0.3, 0.2, 0.4, 0.15, 0.35, 0.25, 0.45]   # High Risk
                            ],
                            evidence=['Diabetes', 'Cholesterol', 'Hypertension'],
                            evidence_card=[2, 2, 2],
                            state_names={'CardioRisk': ['Low', 'High'], 'Diabetes': ['No', 'Yes'], 'Cholesterol': ['Normal', 'High'], 'Hypertension': ['No', 'Yes']}
                            )


# Add all CPDs to the model
model_diabetes.add_cpds(cpd_age, cpd_lifestyle, cpd_familyhistory, cpd_cholesterol, cpd_hypertension, cpd_diabetes, cpd_bloodtests, cpd_cardiorisk)


# Validate the model
assert model_diabetes.check_model()
# Initialize the inference object
infer = VariableElimination(model_diabetes)
# Q2a) What is the probability of diabetes given being elderly, unhealthy lifestyle, and with family history present?
query_result = infer.query(variables=['Diabetes'], evidence={'Age': 'Elderly', 'Lifestyle': 'Unhealthy', 'FamilyHistory': 'Present'})
print(query_result)
+---------------+-----------------+
| Diabetes      |   phi(Diabetes) |
+===============+=================+
| Diabetes(No)  |          0.6100 |
+---------------+-----------------+
| Diabetes(Yes) |          0.3900 |
+---------------+-----------------+
# Q2b)  What is the probability of diabetes given being elderly, unhealthy lifestyle, family history present and cardio risk being high?
query_result = infer.query(variables=['Diabetes'], evidence={'Age': 'Elderly', 'Lifestyle': 'Unhealthy', 'FamilyHistory': 'Present', 'CardioRisk': 'High'})
print(query_result)
+---------------+-----------------+
| Diabetes      |   phi(Diabetes) |
+===============+=================+
| Diabetes(No)  |          0.5623 |
+---------------+-----------------+
| Diabetes(Yes) |          0.4377 |
+---------------+-----------------+
# Q2c) What is the probability of diabetes given being elderly, unhealthy lifestyle, family history present, cardio risk being high and knowing that cholesterol is high and hypertension is present?
query_result = infer.query(variables=['Diabetes'], evidence={'Age': 'Elderly', 'Lifestyle': 'Unhealthy', 'FamilyHistory': 'Present', 'CardioRisk': 'High', 'Cholesterol': 'High', 'Hypertension':'Yes'})
print(query_result)
+---------------+-----------------+
| Diabetes      |   phi(Diabetes) |
+===============+=================+
| Diabetes(No)  |          0.5816 |
+---------------+-----------------+
| Diabetes(Yes) |          0.4184 |
+---------------+-----------------+
# Bonus1) What is the probability of diabetes given being elderly, unhealthy lifestyle, family history present, knowing that cholesterol is high and hypertension is present? Does it change compared to Question 2c?
query_result = infer.query(variables=['Diabetes'], evidence={'Age': 'Elderly', 'Lifestyle': 'Unhealthy', 'FamilyHistory': 'Present', 'Cholesterol': 'High', 'Hypertension':'Yes'})
print(query_result)
+---------------+-----------------+
| Diabetes      |   phi(Diabetes) |
+===============+=================+
| Diabetes(No)  |          0.6100 |
+---------------+-----------------+
| Diabetes(Yes) |          0.3900 |
+---------------+-----------------+
# Bonus 2) Is the probability of Blood Tests conditioned by Hypertension? Please motivate your answer.
query_result = infer.query(variables=['BloodTests'], evidence={'Hypertension':'Yes'})
print(query_result)
+----------------------+-------------------+
| BloodTests           |   phi(BloodTests) |
+======================+===================+
| BloodTests(Normal)   |            0.6415 |
+----------------------+-------------------+
| BloodTests(Abnormal) |            0.3585 |
+----------------------+-------------------+
query_result = infer.query(variables=['BloodTests'], evidence={'Hypertension':'No'})
print(query_result)
+----------------------+-------------------+
| BloodTests           |   phi(BloodTests) |
+======================+===================+
| BloodTests(Normal)   |            0.6415 |
+----------------------+-------------------+
| BloodTests(Abnormal) |            0.3585 |
+----------------------+-------------------+

We see that the results of blood tests are independent of hypertension.

# Bonus 3) What sanity checks can you perform to ensure that your network is properly encoded? Please expand upon and justify your strategies.