Lunarnotes - SW11_Auftrag_MuLoe

{
 "cells": [
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "# Work assignment SW11 - PYTHON BASICS - Solution\n",
 "\n",
 "These are the self-study tasks of the semester week, which you will solve within one week in your JupyterHub environment. After completing your work, download a copy of the Jupyter notebook file locally to your laptop (Menu: File->Download).\n",
 "\n",
 "On ILIAS you will find the weekly scheduled assignment where you will upload your solved Jupyter notebook file. After your submission, you will receive a corresponding sample solution to the assignment. Your submission will not be corrected. Although the assignments are marked “mandatory”, they do not count towards your semester grades. Only the grades of the tests during the semester are relevant for this. \n",
 "\n",
 "We wish you every success!\n"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "---\n",
 "---\n",
 "## Exercise 1\n",
 "\n",
 "At a (hypothetical) Swiss university, 80% of the students are from the canton ($K$) and 20% from outside the canton ($\\overline{K}$). The distribution of cantonal students across the degree programs is as follows: \n",
 "\n",
 "- Electrical engineering ($E$): $30\\%$\n",
 "- Mechanical engineering ($M$): $30\\%$\n",
 "- Computer science ($I$): $40\\%$\n",
 "\n",
 "For those outside the canton, this is: \n",
 "\n",
 "- Electrical engineering ($E$): $5\\%$\n",
 "- Mechanical engineering ($M$): $15\\%$\n",
 "- Computer science ($I$): $80\\%$\n",
 "\n",
 "a) Create a probability tree, starting at the root with the decision $K \\leftrightarrow \\overline{K}$ - cantonal to extra-cantonal - and enter the corresponding probabilities (symbolic and numerical) on all branches and leaves.\n",
 "\n",
 "### Solution\n"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "<img src=img/Wahrscheinlichkeitsbaum_full2.png alt=Drawing width=650 />"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "---\n",
 "\n",
 "b) Now use the [numpy](https://numpy.org) random generator [numpy.random.Generator](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice) to simulate the above probabilities. Proceed as in [Theorieteil](SW11_Theorie_en.ipynb) or as in [Class](SW11_InClass_Vorlage_en.ipynb). \n",
 "\n",
 "### Solution \n"
 ]
 },
 {
 "cell_type": "code",
 "execution_count": null,
 "metadata": {},
 "outputs": [],
 "source": [
 "import numpy as np\n",
 "\n",
 "#Definition of the probabilities \n",
 "#(we use the underscore '_' as a placeholder for “not”)\n",
 "#P for cantonal students\n",
 "PK = 0.8\n",
 "P_K = 0.2\n",
 "#P for degree program E/M/I for cantonal students\n",
 "PEK = 0.3\n",
 "PMK = 0.3\n",
 "PIK = 0.4\n",
 "#P for degree program E/M/I for students from outside the canton\n",
 "PE_K = 0.05\n",
 "PM_K = 0.15\n",
 "PI_K = 0.8\n",
 "#Number of sample size (max. 10^6 !!)\n",
 "sampleSize = 1000\n",
 "\n",
 "#Random generator (set seed for reproducible values)\n",
 "rng = np.random.default_rng()\n",
 "\n",
 "#since we need to generate multiple series, we use a function\n",
 "def randArray(randVal, length, prob=None):\n",
 " \"\"\"\n",
 " function to generate a random vector with the values randVal and length length, where the \n",
 " probabilities in prob (list of the same length!) are used for the distribution. If prob is 'None'\n",
 " a uniform distribution is used.\n",
 " \"\"\"\n",
 " randArray = rng.choice(randVal, length, p=prob)\n",
 " return randArray\n",
 "\n",
 "#Array of students with non-cantonal (0) and cantonal (1) students\n",
 "length = sampleSize\n",
 "randArr = randArray([0,1], length, prob=[P_K, PK])\n",
 "\n",
 "#Number of non-cantonal (n_K) and cantonal (nK) students\n",
 "n_K = np.sum(randArr==0)\n",
 "nK = np.sum(randArr==1)\n",
 "\n",
 "#Array of the length of the cantonal students with degree program E/M/I (=0,1,2)\n",
 "length = nK\n",
 "randArr = randArray([0,1,2], length, prob=[PEK, PMK, PIK])\n",
 "\n",
 "#Number of cantonal students with degree program E/M/I\n",
 "nKE = np.sum(randArr==0)\n",
 "nKM = np.sum(randArr==1)\n",
 "nKI = np.sum(randArr==2)\n",
 "\n",
 "#Array of the length of out-of-canton students with degree program E/M/I (=0,1,2)\n",
 "length = n_K\n",
 "randArr = randArray([0,1,2], length, prob=[PE_K, PM_K, PI_K])\n",
 "\n",
 "#Number of out-of-canton students with degree program E/M/I\n",
 "n_KE = np.sum(randArr==0)\n",
 "n_KM = np.sum(randArr==1)\n",
 "n_KI = np.sum(randArr==2)\n",
 "\n",
 "print('Number of students outside the canton: ', n_K, 'in percent: ', 100*n_K/sampleSize, '%; Theory: ', 100*(1-PK), '%')\n",
 "print('Number of cantonal students: ', nK, 'in percent: ', 100*nK/sampleSize, '%; Theory: ', 100*PK, '%\\n')\n",
 "\n",
 "print('Number of out-of-canton students in degree program E: ', n_KE, 'in percent: ', 100*n_KE/sampleSize, '%; Theory: '\\\n",
 " , 100*P_K*PE_K, '%')\n",
 "print('Number of out-of-canton students in degree program M: ', n_KE, 'in percent: ', 100*n_KM/sampleSize, '%; Theory: '\\\n",
 " , 100*P_K*PM_K, '%')\n",
 "print('Number of out-of-canton students in degree program I: ', n_KE, 'in percent: ', 100*n_KI/sampleSize, '%; Theory: '\\\n",
 " , 100*P_K*PI_K, '%')\n",
 "\n",
 "print('Number of cantonal students in degree program E: ', nKE, 'in percent: ', 100*nKE/sampleSize, '%; Theory: '\\\n",
 " , 100*PK*PEK, '%')\n",
 "print('Number of cantonal students in degree program M: ', nKE, 'in percent: ', 100*nKM/sampleSize, '%; Theory: '\\\n",
 " , 100*PK*PMK, '%')\n",
 "print('Number of cantonal students in study program I: ', nKE, 'in percent: ', 100*nKI/sampleSize, '%; theory: '\\\n",
 " , 100*PK*PIK, '%\\n')\n",
 "\n",
 "total = n_KE + n_KM + n_KI + nKE + nKM + nKI\n",
 "print('Test for total number: ', total, 'in percent: ', 100*total/sampleSize, '%\\n')"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "---\n",
 "\n",
 "c) Determine all probabilities in the tree below.\n",
 "\n",
 "<img src=img/Wahrscheinlichkeitsbaum_invertiert_full2_Auftrag.png alt=Drawing width=550 />\n",
 "\n",
 "### Solution"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "- The probabilities at the roots are obtained using the theorem on total probability as follows (example for $P(E)$, remainder analogous). We have $K \\cup \\overline{K} = \\Omega$ and $K \\cap \\overline{K} = \\emptyset$, therefore $K, \\overline{K}$ is a partition and we can apply: \n",
 "$P(E) = P(E|K)\\cdot P(K) + P(E|\\overline{K})\\cdot P(\\overline{K}) = 0.24 + 0.01 = 0.25$ \n",
 "\n",
 "- For the probabilities on the branches, we use Bayes' theorem: $P(\\overline{K}|E) = P(E|\\overline{K}) \\cdot \\frac{P(\\overline{K})}{P(E))} = 0.05 \\cdot \\frac{0. 2}{0.25}=0.04$ For $P(E|K)$ we can then directly conclude: $P(K|E) = 1 - P(\\overline{K}|E) = 1 - 0.04 = 0.96$ The other probabilities result in the same way.\n",
 " \n",
 "- The probabilities on the leaves are equal to the corresponding values of the first probability tree, as $P(A \\cap B) = P(B \\cap A)$ generally applies. This means that we do not have to calculate them but only identify them correctly.\n",
 "\n",
 "<img src=img/Wahrscheinlichkeitsbaum_invertiert_full2.png alt=Drawing width=650 />"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "\n",
 "---\n",
 "\n",
 "d) Now determine the simulated values for the probabilities on the branches of the tree (leaves already known) from c) and compare again with the theory.\n",
 "\n",
 "### Solution"
 ]
 },
 {
 "cell_type": "code",
 "execution_count": null,
 "metadata": {},
 "outputs": [],
 "source": [
 "#Cell [1] above must be executed first\n",
 "\n",
 "#Number of all E/M/I students\n",
 "nE = nKE + n_KE\n",
 "nM = nKM + n_KM\n",
 "nI = nKI + n_KI\n",
 "\n",
 "print('Probability for students in degree program E: ', nE/sampleSize, '; Theory: 0.25')\n",
 "print('Probability for students in degree program M: ', nM/sampleSize, '; Theory: 0.27')\n",
 "print('Probability for students in study program I: ', nI/sampleSize, '; Theory: 0.48\\n')\n",
 "\n",
 "print('P(non-K|E): ', n_KE/nE, '; Theory: 0.04')\n",
 "print('P(K|E): ', nKE/nE, '; Theory: 0.96\\n')\n",
 "\n",
 "print('P(non-K|M): ', n_KM/nM, '; Theory: 1/9')\n",
 "print('P(K|M): ', nKM/nM, '; Theory: 8/9\\n')\n",
 "\n",
 "print('P(non-K|I): ', n_KI/nI, '; Theory: 1/3')\n",
 "print('P(K|I): ', nKI/nI, '; Theory: 2/3\\n')\n",
 "\n",
 "total = nE + nM + nI\n",
 "print('Test for total number: ', total, 'in percent: ', 100*total/sampleSize, '%\\n')"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "---\n",
 "\n",
 "e) Using a counterexample, prove that the choice of degree program and cantonal affiliation are not independent of each other.\n",
 "\n",
 "### Solution"
 ]
 },
 {
 "cell_type": "markdown",
 "metadata": {},
 "source": [
 "We use the case of the degree program $E$ because the difference is greatest there. If $(P(E)$ and $P(\\overline{K})$ were independent, the following would apply:\n",
 "\n",
 "$P(E \\cap \\overline{K}) = P(E) \\cdot P(\\overline{K})$\n",
 "\n",
 "Furthermore we know from above:\n",
 "\n",
 "- $P(E)=0.25$\n",
 "- $P(\\overline{K})=0.2$\n",
 "- $P(E \\cap \\overline{K})=0.01$\n",
 "\n",
 "This results in:\n",
 "\n",
 "$P(E)\\cdot P(\\overline{K}) = 0.25 \\cdot 0.2 = 0.05$\n",
 "\n",
 "This is obviously not equal to $P(E \\cap \\overline{K})=0.01$ which proves the dependency."
 ]
 },
 {
 "cell_type": "code",
 "execution_count": null,
 "metadata": {},
 "outputs": [],
 "source": []
 }
 ],
 "metadata": {
 "kernelspec": {
 "display_name": "Python 3 (ipykernel)",
 "language": "python",
 "name": "python3"
 },
 "language_info": {
 "codemirror_mode": {
 "name": "ipython",
 "version": 3
 },
 "file_extension": ".py",
 "mimetype": "text/x-python",
 "name": "python",
 "nbconvert_exporter": "python",
 "pygments_lexer": "ipython3",
 "version": "3.11.9"
 }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

SW11_Auftrag_MuLoe_en