# SW11 - Flipped Classroom

*This Jupyter notebook is intended for practicing the theory accompanying the slides and is used as a “practical review of the theory input”. There are no sample solutions and the file is not corrected and does not have to be handed in.*

In SW11, the theory script includes various simulations to illustrate the theoretical concepts of the lecture notes [SW11_Probability](SW11_Probability.pdf).


# Conditional probability

A statistic has shown that all women in the population of a country ($52\%$) indicated the following preferences for their favorite color: 

- Red: $36\%$
- Blue: $16\%$
- Green: $48\%$

For men ($48\%$), the results were as follows: 

- Red: $32\%$
- Blue: $53\%$
- Green: $15\%$

We use the following labels:

- $F$: Person is female
- $M$: Person is a male
- $R$: Favorite color is red
- $B$: Favorite color is blue
- $G$: Favorite color is green

We will now create a probability tree, starting at the root with the decision $F \leftrightarrow M$ - i.e. person is woman or man. We will plot the following probabilities on the branches of the tree: 

- $P(F)=0.52$: Probability that the person is a woman.
- $P(M)=0.48$: Probability that the person is a man.
- $P(R|F)=0.36$: Probability that the favorite color is red under the condition that the person is a woman. (analogous $B$ and $G$)
- $P(R|M)=0.32$: Probability that the favorite color is red under the condition that the person is a man. (analogous $B$ and $G$).

This results in the following probability tree:

<img src=img/Wahrscheinlichkeitsbaum1.png alt=Drawing width=400 />

---

Now we add the probabilities at the leaves of the tree. There the following probabilities are required:

- $P(F \cap R)$: Probability that the person is a woman *and* that the favorite color is red. (analogous $B$ and $G$)
- $P(M \cap R)$: Probability that the person is a man *and* that the favorite color is red. (analogous $B$ and $G$)

To determine the corresponding values, simply the product of the probabilities along the relevant path has to be calculated. E.g:

$P(F \cap R) = P(F) \cdot P(R|F)=0.52 \cdot 0.36 = 0.1872$

This results in the complete probability tree as follows

<img src=img/Wahrscheinlichkeitsbaum_full1.png alt=Drawing width=650 />


---


We now use the [numpy](https://numpy.org) random generator [numpy.random.Generator](https://numpy.org/doc/stable/reference/random/generated/numpy.random.Generator.choice.html#numpy.random.Generator.choice) to simulate the above probabilities. In a first step, we create a random vector that represents the distribution of women ($52\%$) to men ($48\%$) in the population.

In [47]:
import numpy as np

#Definition of the probabilities 
#P for woman
PF = 0.52
#P for man
PM = 1 - PF

#Number of sample size (max. 10^6 !!)
sampleSize = 1000

#Random generator (set seed for reproducible values)
rng = np.random.default_rng()

#Random vector, 0 = woman, 1 = man
randArr = rng.choice([0,1], sampleSize, p=[PF, PM])

#for visual control of the random vector
#print(randArr, '\n')

#number of women and men
nF = np.sum(randArr==0)
nM = np.sum(randArr==1)

print('Number of women: ', nF, 'in percent: ', 100*nF/sampleSize, '%')
print('Number of man: ', nM, 'in percent: ', 100*nM/sampleSize, '%')

Number of women:  507 in percent:  50.7 %
Number of man:  493 in percent:  49.3 %


If we start different runs, the results vary due to randomness. As the length of the random vector increases (parameter `sampleSize`), the variations tend to decrease.

---

Now we generate - first for women - another random vector that represents the distribution of favorite colors.

In [48]:
#The cell [1] above must be executed first!

#Definition of the probabilities 
#R, B, G for women
PFR = 0.36
PFB = 0.16
PFG = 1 - PFR - PFB

#Random vector, 0 = R, 1 = B, 2 = G, for women, therefore length of vector is 'nF' from above
randArr = rng.choice([0,1,2], nF, p=[PFR, PFB, PFG])

#for visual control of the random vector
#print(randArr, '\n')

#number of women with the respective favorite color
nFR = np.sum(randArr==0)
nFB = np.sum(randArr==1)
nFG = np.sum(randArr==2)

print('Number of women with favorite color red: ', nFR, 'in percent: ', 100*nFR/sampleSize, '%')
print('Number of women with favorite color blue: ', nFB, 'in percent: ', 100*nFB/sampleSize, '%')
print('Number of women with favorite color green: ', nFG, 'in percent: ', 100*nFG/sampleSize, '%\n')

#Control of the total number
print('Control (must be zero): ', nF - nFR - nFB - nFG)

Number of women with favorite color red:  190 in percent:  19.0 %
Number of women with favorite color blue:  89 in percent:  8.9 %
Number of women with favorite color green:  228 in percent:  22.8 %

Control (must be zero):  0


We can see that the number correspond to the theoretical probabilities within the scope of random uncertainty.

---

Now generate the corresponding random vector or the results for the men.

### Solution

In [49]:
# Your code here

PMR = 0.32
PMB = 0.53
PMG = 1 - PMR - PMB

randArr = rng.choice([0,1,2],  nM, p=[PMR, PMB, PMG])

nMR = np.sum(randArr==0)
nMB = np.sum(randArr==1)
nMG = np.sum(randArr==2)

print('Number of women with favorite color red: ', nMR, 'in percent: ', 100*nMR/sampleSize, '%')
print('Number of women with favorite color blue: ', nMB, 'in percent: ', 100*nMB/sampleSize, '%')
print('Number of women with favorite color green: ', nMG, 'in percent: ', 100*nMG/sampleSize, '%\n')

print('Control (must be zero): ', nM - nMR - nMB - nMG)

Number of women with favorite color red:  144 in percent:  14.4 %
Number of women with favorite color blue:  270 in percent:  27.0 %
Number of women with favorite color green:  79 in percent:  7.9 %

Control (must be zero):  0


---

Now we determine the simulated conditional probabilities. For this, we only need to perform the normalization correctly, i.e. e.g:

$P(R|F)= \frac{P(F \cap R)}{P(F)}$

In other words, we need to divide by the number of women instead of the total length of the random vector.

In [50]:
print('Probability for favorite color red, under the condition that the person is a woman: ', nFR/nF)
print('Probability for favorite color blue, under the condition that person is a woman: ', nFB/nF)
print('Probability for favorite color green, under the condition that the person is a woman: ', nFG/nF)

Probability for favorite color red, under the condition that the person is a woman:  0.3747534516765286
Probability for favorite color blue, under the condition that person is a woman:  0.1755424063116371
Probability for favorite color green, under the condition that the person is a woman:  0.44970414201183434


We can see that the numbers within the scope of the random uncertainty match the theoretical probabilities again.

---

Now determine the corresponding results for the men.

### Solution

In [51]:
# Your code here

print(f'Probability for favorite color red, under the condition that the person is a man: {100*nMR/nM:.1f}%')
print(f'Probability for favorite color blue, under the condition that person is a man: {100*nMB/nM:.1f}%')
print(f'Probability for favorite color green, under the condition that the person is a man: {100*nMG/nM:.1f}%')

Probability for favorite color red, under the condition that the person is a man: 29.2%
Probability for favorite color blue, under the condition that person is a man: 54.8%
Probability for favorite color green, under the condition that the person is a man: 16.0%


---
---

### Independence

Finally, we want to examine whether the choice of color is independent of gender. If this was the case, the following would apply:

$P(F \cap R) = P(F) \cdot P(R)$

This means that the probability that a woman's favorite color is red $P(F \cap R)$ is equal to the product of the probabilities that any person is a woman $P(F)$ and that the favorite color for any person - i.e. woman or man - is red $P(R)$. The same should then also apply to the other colors.

We first test this on the basis of the simulations. However, we use the probabilities for the color green for the test. Think about why? 

We have already simulated the two probabilities $P(F \cap G)$ and $P(F)$. We only need to determine the probability $P(G)$ that the favorite color for any person is green.

In [52]:
#the solution above for the men must be available

#from above
print('Probability for a woman: ', nF/sampleSize)
print('Probability that a woman`s favorite color is green: ', nFG/sampleSize)

#Number of all persons with favorite color green
nG = nFG + nMG
print('Probability that the favorite color of any person is green: ', nG/sampleSize,'\n')

#Independence test
print('Product of P(F)*P(G): ', nF/sampleSize*nG/sampleSize)

Probability for a woman:  0.507
Probability that a woman`s favorite color is green:  0.228
Probability that the favorite color of any person is green:  0.307 

Product of P(F)*P(G):  0.155649


There appears to be an obvious dependency between color affinity and gender, which is particularly evident for the colors blue and green. In the case of red, the simulated difference would hardly be recognizable within the uncertainties.

Let's determine the theoretical values as a control:

- $P(F)=0.25$
- $P(F \cap G)=0.2496$
- $P(G)=P(F \cap G) + P(M \cap G) = 0.2496 + 0.072 = 0.3216$

This results in:

$P(F)\cdot P(G) = 0.52 \cdot 0.3216 = 0.1672$

This is obviously not equal to $P(F \cap G)=0.2496$, which also proves the dependency on the basis of the theoretical numbers.