Lunarnotes - SW11_Auftrag

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Work assignment SW11 - PYTHON BASICS\n",
    "\n",
    "These are the self-study tasks of the semester week, which you will solve within one week in your JupyterHub environment. After completing your work, download a copy of the Jupyter notebook file locally to your laptop (Menu: File->Download).\n",
    "\n",
    "On ILIAS you will find the weekly scheduled assignment where you will upload your solved Jupyter notebook file. After your submission, you will receive a corresponding sample solution to the assignment. Your submission will not be corrected. Although the assignments are marked “mandatory”, they do not count towards your semester grades. Only the grades of the tests during the semester are relevant for this. \n",
    "\n",
    "We wish you every success!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "---\n",
    "## Exercise 1\n",
    "\n",
    "At a (hypothetical) Swiss university, 80% of the students are from the canton ($K$) and 20% from outside the canton ($\\overline{K}$). The distribution of cantonal students across the degree programs is as follows: \n",
    "\n",
    "- Electrical engineering ($E$): $30\\%$\n",
    "- Mechanical engineering ($M$): $30\\%$\n",
    "- Computer science ($I$): $40\\%$\n",
    "\n",
    "For those outside the canton, this is: \n",
    "\n",
    "- Electrical engineering ($E$): $5\\%$\n",
    "- Mechanical engineering ($M$): $15\\%$\n",
    "- Computer science ($I$): $80\\%$\n",
    "\n",
    "a) Create a probability tree, starting at the root with the decision $K \\leftrightarrow \\overline{K}$ - cantonal to extra-cantonal - and enter the corresponding probabilities (symbolic and numerical) on all branches and leaves.\n",
    "\n",
    "### Solution\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=img/auftrag_tree.png alt=Drawing width=550 />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "b) Now use the [numpy](https://numpy.org) random generator [numpy.random.Generator](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice) to simulate the above probabilities. Proceed as in [Theorieteil](SW11_Theorie_en.ipynb) or as in [Class](SW11_InClass_Vorlage_en.ipynb). \n",
    "\n",
    "### Solution \n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of local:  828 in percent:  82.8 %\n",
      "Number of non-local:  172 in percent:  17.2 %\n",
      "\n",
      "Number of local electrical:  301 in percent:  30.1 %\n",
      "Number of local mechanical:  310 in percent:  31.0 %\n",
      "Number of local IT:  389 in percent:  38.9 %\n",
      "\n",
      "Number of local electrical:  39 in percent:  3.9 %\n",
      "Number of local mechanical:  152 in percent:  15.2 %\n",
      "Number of local IT:  809 in percent:  80.9 %\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "PK = 0.8\n",
    "PN = 1 - PK\n",
    "\n",
    "sampleSize = 1000\n",
    "\n",
    "rng = np.random.default_rng()\n",
    "randArr_1 = rng.choice([0,1], sampleSize, p=[PK,PN])\n",
    "\n",
    "nK = np.sum(randArr_1 == 0)\n",
    "nNK = np.sum(randArr_1 == 1)\n",
    "\n",
    "print(\"Number of local: \", nK, \"in percent: \", 100*nK/sampleSize, \"%\")\n",
    "print(\"Number of non-local: \", nNK, \"in percent: \", 100*nNK/sampleSize, \"%\")\n",
    "\n",
    "PEK = 0.3\n",
    "PMK = 0.3\n",
    "PIK = 1 - PEK - PMK\n",
    "\n",
    "randArr_2 = rng.choice([0,1,2], sampleSize, p=[PEK,PMK, PIK])\n",
    "\n",
    "nEK = np.sum(randArr_2 == 0)\n",
    "nMK = np.sum(randArr_2 == 1)\n",
    "nIK = np.sum(randArr_2 == 2)\n",
    "\n",
    "print(\"\\nNumber of local electrical: \", nEK, \"in percent: \", 100*nEK/sampleSize, \"%\")\n",
    "print(\"Number of local mechanical: \", nMK, \"in percent: \", 100*nMK/sampleSize, \"%\")\n",
    "print(\"Number of local IT: \", nIK, \"in percent: \", 100*nIK/sampleSize, \"%\")\n",
    "\n",
    "PEN = 0.05\n",
    "PMN = 0.15\n",
    "PIN = 1 - PEN - PMN\n",
    "\n",
    "randArr_3 = rng.choice([0,1,2], sampleSize, p=[PEN,PMN,PIN])\n",
    "\n",
    "nEN = np.sum(randArr_3 == 0)\n",
    "nMN = np.sum(randArr_3 == 1)\n",
    "nIN = np.sum(randArr_3 == 2)\n",
    "\n",
    "print(\"\\nNumber of local electrical: \", nEN, \"in percent: \", 100*nEN/sampleSize, \"%\")\n",
    "print(\"Number of local mechanical: \", nMN, \"in percent: \", 100*nMN/sampleSize, \"%\")\n",
    "print(\"Number of local IT: \", nIN, \"in percent: \", 100*nIN/sampleSize, \"%\")\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "c) Determine all probabilities in the tree below.\n",
    "\n",
    "<img src=img/Wahrscheinlichkeitsbaum_invertiert_full2_Auftrag.png alt=Drawing width=550 />\n",
    "\n",
    "### Solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$P(E) = P(E|K)P(K) + P(E|\\overline{K})P(\\overline{K})= 0.3\\cdot 0.8 + 0.05\\cdot0.2 = 0.25$\n",
    "\n",
    "$P(M) = P(M|K)P(K) + P(M|\\overline{K})P(\\overline{K})= 0.3\\cdot0.8 + 0.15\\cdot0.2 = 0.27$\n",
    "\n",
    "$P(I) = P(I|K)P(K) + P(I|\\overline{K})P(\\overline{K})= 0.4\\cdot0.8 + 0.8\\cdot0.2 = 0.48$\n",
    "\n",
    "---\n",
    "\n",
    "$P(K|E) = \\dfrac{P(E|K)P(K)}{P(E)} = \\dfrac{0.3 \\cdot 0.8}{0.25} = 0.96$\n",
    "\n",
    "$P(\\overline{K}|E) = 1 - P(K|E) = 1-0.96 = 0.04$\n",
    "\n",
    "$P(E \\cap K) = P(E|K)P(K) = 0.3 \\cdot 0.8 = 0.24$\n",
    "\n",
    "$P(E \\cap \\overline{K}) = P(E|\\overline{K})P(\\overline{K}) = P(E|\\overline{K}) \\cdot (1 - P(K)) = 0.05 \\cdot 0.2 = 0.01$\n",
    "\n",
    "---\n",
    "\n",
    "$P(K|M) = \\dfrac{P(M|K)P(K)}{P(M)} = \\dfrac{0.3 \\cdot 0.8}{0.27} = 0.89$\n",
    "\n",
    "$P(\\overline{K}|M) = 1 - P(K|M) = 1-0.89 = 0.11$\n",
    "\n",
    "$P(M \\cap K) = P(M|K)P(K) = 0.3 \\cdot 0.8 = 0.24$\n",
    "\n",
    "$P(M \\cap \\overline{K}) = P(M|\\overline{K})P(\\overline{K}) = P(M|\\overline{K}) \\cdot (1 - P(K)) = 0.15 \\cdot 0.2 = 0.03$\n",
    "\n",
    "---\n",
    "\n",
    "$P(K|I) = \\dfrac{P(I|K)P(K)}{P(I)} = \\dfrac{0.4 \\cdot 0.8}{0.48} = 0.67$\n",
    "\n",
    "$P(\\overline{K}|I) = 1 - P(K|I) = 1-0.67 = 0.33$\n",
    "\n",
    "$P(I \\cap K) = P(I|K)P(K) = 0.4 \\cdot 0.8 = 0.32$\n",
    "\n",
    "$P(I \\cap \\overline{K}) = P(I|\\overline{K})P(\\overline{K}) = P(I|\\overline{K}) \\cdot (1 - P(K)) = 0.8 \\cdot 0.2 = 0.16$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "---\n",
    "\n",
    "d) Now determine the simulated values for the probabilities on the branches of the tree (leaves already known) from c) and compare again with the theory.\n",
    "\n",
    "### Solution"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "Number of local:  796 in percent:  79.6 %\n",
      "Number of non-local:  204 in percent:  20.4 %\n",
      "\n",
      "Number of local electrical:  239 in percent:  23.9 %\n",
      "Number of local mechanical:  257 in percent:  25.7 %\n",
      "Number of local IT:  300 in percent:  30.0 %\n",
      "\n",
      "Number of non-local electrical:  8 in percent:  0.8 %\n",
      "Number of non-local mechanical:  36 in percent:  3.6 %\n",
      "Number of non-local IT:  160 in percent:  16.0 %\n",
      "\n",
      "P(E):  24.7 %\n",
      "P(M):  29.3 %\n",
      "P(I):  46.0 %\n",
      "\n",
      "P(K|E):  96.76 %\n",
      "P(1-K|E):  3.24 %\n",
      "\n",
      "P(K|M):  87.71 %\n",
      "P(1-K|M):  12.29 %\n",
      "\n",
      "P(K|I):  65.22 %\n",
      "P(1-K|I):  34.78 %\n",
      "\n",
      "P(E∩K):  23.9 %\n",
      "P(E∩(1-K)):  0.8 %\n",
      "\n",
      "P(M∩K):  25.7 %\n",
      "P(M∩(1-K)):  3.6 %\n",
      "\n",
      "P(I∩K):  30.0 %\n",
      "P(I∩(1-K)):  16.0 %\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "PK = 0.8\n",
    "PN = 1 - PK\n",
    "\n",
    "PEK = 0.3\n",
    "PMK = 0.3\n",
    "PIK = 1 - PEK - PMK\n",
    "\n",
    "PEN = 0.05\n",
    "PMN = 0.15\n",
    "PIN = 1 - PEN - PMN\n",
    "\n",
    "sampleSize = 1000\n",
    "rng = np.random.default_rng()\n",
    "\n",
    "randArr_1 = rng.choice([0, 1], sampleSize, p=[PK, PN])  # 0 = K, 1 = ¬K\n",
    "nK = np.sum(randArr_1 == 0)\n",
    "nNK = np.sum(randArr_1 == 1)\n",
    "\n",
    "print(\"\\nNumber of local: \", nK, \"in percent: \", round(100 * nK / sampleSize, 2), \"%\")\n",
    "print(\"Number of non-local: \", nNK, \"in percent: \", round(100 * nNK / sampleSize, 2), \"%\")\n",
    "\n",
    "randArr_2 = rng.choice([0, 1, 2], nK, p=[PEK, PMK, PIK])  # 0 = E, 1 = M, 2 = I\n",
    "nEK = np.sum(randArr_2 == 0)\n",
    "nMK = np.sum(randArr_2 == 1)\n",
    "nIK = np.sum(randArr_2 == 2)\n",
    "\n",
    "print(\"\\nNumber of local electrical: \", nEK, \"in percent: \", round(100 * nEK / sampleSize, 2), \"%\")\n",
    "print(\"Number of local mechanical: \", nMK, \"in percent: \", round(100 * nMK / sampleSize, 2), \"%\")\n",
    "print(\"Number of local IT: \", nIK, \"in percent: \", round(100 * nIK / sampleSize, 2), \"%\")\n",
    "\n",
    "randArr_3 = rng.choice([0, 1, 2], nNK, p=[PEN, PMN, PIN])  # 0 = E, 1 = M, 2 = I\n",
    "nEN = np.sum(randArr_3 == 0)\n",
    "nMN = np.sum(randArr_3 == 1)\n",
    "nIN = np.sum(randArr_3 == 2)\n",
    "\n",
    "print(\"\\nNumber of non-local electrical: \", nEN, \"in percent: \", round(100 * nEN / sampleSize, 2), \"%\")\n",
    "print(\"Number of non-local mechanical: \", nMN, \"in percent: \", round(100 * nMN / sampleSize, 2), \"%\")\n",
    "print(\"Number of non-local IT: \", nIN, \"in percent: \", round(100 * nIN / sampleSize, 2), \"%\")\n",
    "\n",
    "PE = (nEK + nEN) / sampleSize\n",
    "PM = (nMK + nMN) / sampleSize\n",
    "PI = (nIK + nIN) / sampleSize\n",
    "\n",
    "print(\"\\nP(E): \", round(PE * 100, 2), \"%\")\n",
    "print(\"P(M): \", round(PM * 100, 2), \"%\")\n",
    "print(\"P(I): \", round(PI * 100, 2), \"%\")\n",
    "\n",
    "PKE = nEK / (nEK + nEN)\n",
    "PNE = 1 - PKE\n",
    "\n",
    "print(\"\\nP(K|E): \", round(PKE * 100, 2), \"%\")\n",
    "print(\"P(1-K|E): \", round(PNE * 100, 2), \"%\")\n",
    "\n",
    "PKM = nMK / (nMK + nMN)\n",
    "PNM = 1 - PKM\n",
    "\n",
    "print(\"\\nP(K|M): \", round(PKM * 100, 2), \"%\")\n",
    "print(\"P(1-K|M): \", round(PNM * 100, 2), \"%\")\n",
    "\n",
    "PKI = nIK / (nIK + nIN)\n",
    "PNI = 1 - PKI\n",
    "\n",
    "print(\"\\nP(K|I): \", round(PKI * 100, 2), \"%\")\n",
    "print(\"P(1-K|I): \", round(PNI * 100, 2), \"%\")\n",
    "\n",
    "PEandK = nEK / sampleSize\n",
    "PEandN = nEN / sampleSize\n",
    "\n",
    "print(\"\\nP(E∩K): \", round(PEandK * 100, 2), \"%\")\n",
    "print(\"P(E∩(1-K)): \", round(PEandN * 100, 2), \"%\")\n",
    "\n",
    "PMandK = nMK / sampleSize\n",
    "PMandN = nMN / sampleSize\n",
    "\n",
    "print(\"\\nP(M∩K): \", round(PMandK * 100, 2), \"%\")\n",
    "print(\"P(M∩(1-K)): \", round(PMandN * 100, 2), \"%\")\n",
    "\n",
    "PIandK = nIK / sampleSize\n",
    "PIandN = nIN / sampleSize\n",
    "\n",
    "print(\"\\nP(I∩K): \", round(PIandK * 100, 2), \"%\")\n",
    "print(\"P(I∩(1-K)): \", round(PIandN * 100, 2), \"%\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "\n",
    "e) Using a counterexample, prove that the choice of degree program and cantonal affiliation are not independent of each other.\n",
    "\n",
    "### Solution"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Indipendence between E and K:\n",
    "\n",
    "$P(E \\cap K) \\stackrel{?}{=} P(E)P(K)$\n",
    "\n",
    "Indipendence between M and K:\n",
    "\n",
    "$P(M \\cap K) \\stackrel{?}{=} P(M)P(K)$\n",
    "\n",
    "Indipendence between I and K:\n",
    "\n",
    "$P(I \\cap K) \\stackrel{?}{=} P(I)P(K)$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "P(E ∩ K):  0.239\n",
      "P(E)P(K):  0.198\n",
      "Independence test for E and K:  False \n",
      "\n",
      "P(M ∩ K):  0.257\n",
      "P(M)P(K):  0.234\n",
      "Independence test for M and K:  False \n",
      "\n",
      "P(I ∩ K):  0.3\n",
      "P(I)P(K):  0.368\n",
      "Independence test for I and K:  False \n",
      "\n"
     ]
    }
   ],
   "source": [
    "independence_EK = PE * PK\n",
    "print(\"P(E ∩ K): \", round(PEandK, 3))\n",
    "print(\"P(E)P(K): \", round(independence_EK, 3))\n",
    "print(\"Independence test for E and K: \", PEandK == independence_EK, \"\\n\")\n",
    "\n",
    "independence_MK = PM * PK\n",
    "print(\"P(M ∩ K): \", round(PMandK, 3))\n",
    "print(\"P(M)P(K): \", round(independence_MK, 3))\n",
    "print(\"Independence test for M and K: \", PMandK == independence_MK, \"\\n\")\n",
    "\n",
    "independence_IK = PI * PK\n",
    "print(\"P(I ∩ K): \", round(PIandK, 3))\n",
    "print(\"P(I)P(K): \", round(independence_IK, 3))\n",
    "print(\"Independence test for I and K: \", PIandK == independence_IK, \"\\n\")"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

SW11_Auftrag_en