class: center, middle, inverse, title-slide # ETC2520 ### Fin ### Semester 2 2018 --- ## Week 1 **Probability** is defined as a measure of uncertainty or likelihood. -- * **Random experiment** A **Random experiment** is a process that can result in two or more different outcomes with uncertailty as to which will be observed. -- * **Event** An **event** is a possible outcome of a random experiment. An event can be - Elementary: it includes only one particular outcome of the experiment - Composite: it includes more than one elementary outcome from the set of all possible outcomes --- * **Sample space** The **sample space** is the complete listing of the elementart events that can occcue in a random experiment. the sample space will be denoted by S or C for the collection of possible outcomes. -- The sample space of a random experiment S is equivalent to the universal set of the random experiment, the space of all possible outcomes. An elementary event is an **element** of S. A compound event is a **subset** of S. When the random experiment is performed and the outcome belongs to, or is a member of the compound event, we say that compound event has occurred. --- * **Set** `$$\begin{array}{c|c} \hline \omega \in \mathrm{A} & \omega \text{ belongs to the set A}\\ \omega \ni \mathrm{A} & \omega \text{ does not belong to the set A}\\ \mathrm A \subset \mathrm B & \text{A is contained in B}\\ \mathrm A =\mathrm B \text{ if } \mathrm A \subset \mathrm B \text{ and } \mathrm A \supset \mathrm B & \text{A equals B}\\ \emptyset & \text{Empty set}\\ \mathrm A \cup \mathrm B & \text{the union of A and B}\\ \mathrm A \cap \mathrm B & \text{the intersection of A and B}\\ \mathrm A ^ C \text{ or } \bar {\mathrm A} \text{ or } \mathrm A' & \text{the complement of A} \\ \mathrm A -\mathrm B & \text{the set of points in A that's not in B} \end{array}$$` --- * **Commutative laws** `$$\begin{aligned}A\cup B &=B\cup A\\ A\cap B &=B\cap A \end{aligned}$$` -- * **Associative laws** `$$\begin{aligned}A\cup (B \cup D) &=(A\cup B) \cup D\\ A\cap (B \cap D) &=(A\cap B) \cap D \end{aligned}$$` -- * **Distributive laws** `$$\begin{aligned}A\cap(B\cup D) & = (A\cap B) \cup(A\cap D)\\ A\cup(B\cap D) & = (A\cup B) \cap(A\cup D)\end{aligned}$$` --- * **Proposition** `$$A\cap C =A\qquad A\cap A^C = \emptyset\\ A\cup C = C\qquad A\cup A^C = C\\ A\cap\emptyset=\emptyset\qquad A\cap A =A\\ A\cup \emptyset = A\qquad A\cup A=A$$` -- * **De Morgan's Laws: 1st law** `$$(A\cap B)^C = A^C \cup B^C$$` -- * **De Morgan's Laws: 1st law** `$$(A\cup B)^C = A^C \cap B^C$$` --- * **mutually exclusive** If `\(A_1,\ldots, A_n\)` are subsets of `\(S\)` and `\(A_1,\ldots, A_n\)` are disjoint sets, that is `\(A_i \cap A_j =\emptyset\)` for all `\(i \neq j\)`, then `\(A_1,\ldots, A_n\)` are said to be **mutyally exclusive**. -- * **exhaustive events** If `\(A_1,\ldots, A_n\)` are subsets of `\(S\)` such that `\(A_1\cup A_2\cup,\ldots\cup A_n = S\)`, then `\(A_1,\ldots, A_n\)` are said to be **exhaustive events**. -- The "probability" is defined by 1. `\(Pr(S) = 1\)` 2. `\(Pr(A)\)` is positive or more generally non-negative 3. the probability of two mutually exclusive events is the sum of the probabilities of each individual event. --- * **The Event Space** The event space B generated by the sample space S is defined as the collection of all subsets of S satisfying: i) `\(\emptyset \in B\)` ii) if `\(A\in B\)`, then `\(A^C \in B\)` iii) if `\(A_1 \in B\)` and `\(A_2 \in B\)`, then `\(A_1\cup A_2 \in B\)` -- Implying - `\(S \in B\)` - if `\(A_1 \in B\)` and `\(A_2 \in B\)`, then `\(A_1\cap A_2 \in B\)` - for a collection of events `\(\{A_1,\ldots,A_n\}\)` we have `$$\{A_1\cup\ldots\cup A_n\}=\bigcup^n_i A_i = A$$` is such that `\(A\in B\)` - for a collection of events `\(\{A_1,\ldots,A_n\}\)` we have `$$\{A_1\cap\ldots\cap A_n\}=\bigcap^n_i A_i = A$$` is such that `\(A\in B\)` --- * **Probability** A Probability measure is a set function with domain B and codomain [0,1], which satisfies the following axioms 1. `\(Pr(S) = 1\)` 2. `\(Pr(A)\geq 0\)` for every `\(A\in B\)` 3. If `\(A_1,\ldots, A_n\)` is a sequence of mutually exclusive events in B (namely `\(A_i\cap A_j=\emptyset\)`, for all `\(i\neq j,\ i,j=1,\ldots,n\)`), and such that `\(A=\bigcup^n_{i=1}A_i\)` and `\(A\in B\)`, then `$$P(A)=P\left(\bigcup^n_{i=1}A_i\right)=\sum^n_{i=1}P(A_i)$$` -- * **Properties** - `\(Pr(\emptyset)=0\)` - for any event `\(A\in B\)`, `\(0\leq Pr(A) \leq 1\)`, - if A' denotes the complement of an event `\(A\in B\)`, then `\(Pr(A')=1-Pr(A)\)`, - if A and B are ant two events in the event space then `$$Pr(A\cup B)=Pr(A) + Pr(B) - Pr(A\cap B)$$` - if A and B are two events in the event space such that `\(A\subset B\)`, then `\(Pr(A)\leq Pr(B)\)`. --- * Theorem **Boole's inequality** If `\(A_1,A_2,\ldots A_n\in B\)` then `$$P(A_1\cup A_2\cup \ldots \cup A_n)\leq\sum^n_{i=1} P(A_i)$$` -- **Proof** Consider `\(n=2\)` `$$P(A_1\cup A_2)=P(A_1)+P(A_2)-P(A_1\cap A_2)\leq P(A1) +P(A_2)$$` --- ## Week 2 * **Probability Space** A probability space consists of a triple (S,B,Pr(.)) where: - S is a sample space of a random experiment - B is the collection of subsets of S, a Boolean-algebra - Pr(.) is a probability measure defined on B -- * **Equiprobable Allocation** Probability Assignment Let `\(\omega_1,\ldots,\omega_N\)` denote the N elementary and mutually exclusive events that make up S. Then the probability measure given by - `\(Pr(\omega_1)=Pr(\omega_2)=\cdots=Pr(\omega_N)=1/N\)` - Pr(A) where `\(A\in B\)` is given by `\(Pr(A)=N_A/N\)` where `\(N_A\)` is the number of elementaty events in A, is called the **equiprobable or equally likely probability function** --- * Derived via an argument based upon - Symmetry - perhaps for obvious reasons, or - The Principle of Insufficient Reason - meaning that we have no reason to suppose anything other than the elementary events are equally likely. -- * **Probability as Relative Frequency** Probability Assignment Let `\(n_A\)` denote the number of times the event `\(A\in B\)` occurred in `\(n\)` repetitions of a random experiment with sample space S. Then the ratio `\(n_A/n\)` is called the relative frequency of the event A and `\(Pr(A)=p\)`, or `\(Pr(A)\approx p\)`, where `\(p\)` equals the number about which the relative frequency of A stabilizes as `\(n\)` increases, that is `$$Pr(A)={p\lim}_{n\rightarrow\infty}\left(\frac{n_A}{n}\right)=p$$` -- Which is conceptual: - _infinite_ sequence of experimental trials `$${p\lim}_{n\rightarrow\infty}\left(\frac{n_A}{n}\right)=p\ \ \text{if}\ \ \lim_{n\rightarrow \infty}Pr\left(\left|\frac{n_A}{n}-p \right|\right)=0$$` --- If `\(S={\omega_1,\ldots,\omega_N}\)` and `\(Pr(\omega_1)=Pr(\omega_2)=\cdots=Pr(\omega_N)=1/N\)` then `$$\lim_{n\rightarrow \infty}Pr\left(\left|\frac{n_{\omega_i}}{n}-\frac1N \right|\right)=0$$` The two rules are the same. -- A **pripri** probability - probabilities evaluated before the event. A **posteriori** probability - probabilities evaluated after the event. -- * **Continuous Spaces and Relative Frequency** There are an uncountable infinity of a random experiment that results in an outcome given by a point in the interval [a,b] where `\(S=\{x:a\leq x\leq b\}\)`. An argument based on symmetry or insufficient reason breaks down. The event space B can be **defined interm of intervals**. -- 1. By specifying the probability measure mathematically. 2. By evaluating the relative frequency with which different intervals occur. --- * **Conditional Probability** Let A and B be two events in `\(\mathcal{B}\)` of a probability space `\((\mathbb{S},\mathcal{B},Pr(.))\)`. Then the conditional probability of event A given that event B has occurred is denoted by `\(Pr(A|B)\)` and is given by `$$Pr(A|B)=\frac{Pr(A\cap B)}{Pr(B)},\text{if}\ Pr(B)\neq0$$` and is undefined if `\(Pr(B)=0\)` -- Satisfies the axioms of a probability measure: - `\(Pr(B|B)=1\)` - `\(Pr(A|B)\geq 0\)` - If `\(A_1\)` and `\(A_2\)` are mutually exclusive events then `$$Pr(A_1\cup A_2|B)=Pr(A_1|B)+Pr(A_2|B)$$` --- * **Independence** Let A and B be two events in `\(\mathcal{B}\)` of a probability space `\((\mathbb{S},\mathcal{B},Pr(.))\)`. Then the event A is said to be independent of the event B if and only if `$$Pr(A|B)=Pr(A)$$` -- In probability and statistics events are thought of as being independent if the occurrence of one event has no effect on the probability of occurrence of the other event. -- `$$Pr(A\cap B)=Pr(A)\cdot Pr(B)$$` -- * **Inverse Probability Law (Bayes Theorem)** `$$Pr(A|B)=\frac{Pr(B|A)Pr(A)}{Pr(B)}=\frac{Pr(B|A)Pr(A)}{Pr(B|A)Pr(A)+Pr(B|A')Pr(A')}$$` ??? 一个村子有三个小偷。小偷A偷东西的概率是a,成功率是a'。B和C分别是bb',cc'. 一天晚上丢了东西,求是A偷的可能性 aa'/(aa'+bb'+cc') -- `$$Pr(B)=Pr(B|A)Pr(A)+Pr(B|A')Pr(A')$$` --- `$$\begin{array}{c|c|c|c|c} y x &1&2&3&marginal\ f_y \\ \hline 1 & 0.40 &0.24&0.04&\\ \hline 2 & 0 & 0.16& 0.16&\\ \hline marginal\ f_x &&&&\\ \end{array}$$` `$$P(y=1|x=1)=\frac{P(AB)}{P(B)} =\frac{P(y=1\ \&\ x=1)}{p(x=1)}= \frac{0.4}{0.4} = 1$$` `$$P(y=2|x=1)=\frac{P(AB)}{P(B)} =\frac{P(y=2\ \&\ x=1)}{p(x=1)}= \frac{0}{0.4} = 0$$` -- `$$P(y=1|x=2)=$$` `$$P(y=2|x=2)=$$` `$$P(y=1|x=3)=$$` `$$P(y=2|x=3)=$$` -- `$$\begin{aligned}P(y=1|x=1)= 1\ \ \ \ & \ \ \ \ P(y=2|x=1)= 0\\ P(y=1|x=2)= 0.6\ \ \ \ & \ \ \ \ P(y=2|x=2)= 0.4\\ P(y=1|x=3)= 0.2\ \ \ \ & \ \ \ \ P(y=2|x=3)= 0.8 \end{aligned}$$` --- ## Week 3 * **Random variable** A __random variable__ is a rule that assigns a numerical outcome to an event in each possible state of the world. (A phenomena that can not be predicted with perfect accuracy) -- To define a random variable, we need 1. To list all possible numerical outcomes 2. the corresponding probability for each numerical outcome --- Let `\(X(c)\)` be a function that takes an element `\(c\in\mathcal{C}\)` (the sample space) to a number `\(x\)`. Let `\(D\)` be the set of all values of `\(x\)` that can be obtained by `\(X(c)\)` for all `\(c\in\mathcal{C}\)`. `\(D = \{x:x=X(x), c\in\mathcal{C}\}\)` is a list of all possible numbers `\(x\)` that can be obtained, and this is a sample space for `\(X\)`: Can be an interval (X is a continous random variable) or discrete or countable (discrete random variable) `$$Pr=Pr \{c\in\mathcal{C}:X(c)\in A \}$$` -- Satisfying the basic probability axioms: 1. `\(Pr\{A \}\geq0\)` 2. `\(Pr\{D\}=Pr\{c\in\mathcal{C}:X(x)\in D\} = Pr\{C\} = 1\)` 3. if `\(A_i \bigcap A_j=\emptyset\)` for all `\(i\neq j\)`, then `\(Pr\{\bigcup^\infty_{i=1}A_i\}=\sum^\infty_{i=1} \{A_i\}\)` --- * **Discrete random variable** A __discrete random variable__ `\(X\)` has a finite number of distinct outcomes. For example, rolling a die is a random variable with 6 distinct outcomes. -- * **Probability distribution** For a discrete random variable, any table listing all possible nonzero probabilities provides the entire **probability distribution** -- `$$\begin{array}{c|c} x_i & Pr(X= x_i)\\ \hline x_1 & p_1\\ x_2 & p_2\\ x_3 & p_3\\ \vdots & \vdots\\ x_n & p_n\\ \hline \text{Total} & 1 \end{array}$$` --- * **Probability mass function (pmf)** The __probability mass function__ for the random variable `\(X\)`, denoted `\(f(x)\)`, enumerates the probability `\(X=x\)` for all elements in `\(R(X)\)`. That is `$$f(x)=\Pr(x)\text{ and } f(x)=0 \text{ for all } x\not\in R(X)$$` -- * **The Cumulative distribution function (CDF)** `$$F(x)=P\{X\leq x\},\ \ -\infty< x<\infty$$` -- The **CDF** is a table listing the values that X can take, along with `\(Pr(X\le x_i)\)` `$$\begin{array}{c|c} x_i & Pr(X\le x_i)\\ \hline x_1 & p_1\\ x_2 & p_1+p_2\\ x_3 & p_1+p_2+p_3\\ \vdots & \vdots\\ x_n & p_1+p_2+p_3+\cdots +p_n=1 \end{array}$$` --- * **Expectation** the measure of location If `\(X\)` is a discrete random variable with pmf `\(f(x)\)`, then the expected value of `\(X\)`, denoted `\(\mathbb{E}[X]\)`, is given by `$$\mathbb{E}(X) = \sum_{x\in R(X)} xf(x)$$` `$$\mathbb{E}(X) = p_1x_1+p_2x_2+p_3x_3+\cdots +p_nx_n=\sum^n_{i=1}p_ix_i$$` --- * **Variance** `$$\sigma^2_X=Var(X)=E(X-\mu_x)^2$$` Variance is a measure of spread of the distribution of X around its mean. If X is an action with different possible outcomes, then Var(X) gives an indication of _riskiness_ of that action. -- * **Standard deviation** $$\sigma_X=sd(X)=\sqrt{E(X-\mu_x)^2} $$ In finance, standard deviation is called the _volatility_ in X. The advantage of standard deviation over variance is that it has the same units as X. --- * Properties of the Expected Value -- 1. For any constant `\(c\)`, `\(E(c)=c\)`. -- 2. For any constants `\(a\)` and `\(b\)`, `$$E(aX+b) = aE(X) +b$$` ??? `$$E(aX+b)=\int(ax+b)f(x)dx=\int bf(x)dx+a\int xf(x)dx=b+aE(X)$$` -- 3. Expected value is a linear operator, meaning that expected value of sum of several variables is the sum of their expected values: `$$E(X+Y+Z)=E(X)+E(Y)+E(Z)$$` -- `$$E(a +bX+cY+dZ)=a+bE(X)+cE(Y)+dE(Z)$$` -- `$$E(X^2)\neq (E(X))^2$$` -- `$$E(\log X)\neq \log{(E(X))}$$` --- * Properties of the Variance `$$Var(aX) = a^2Var(X)\\ Var(a+ X) = Var(X)$$` -- `$$Var(X+Y)=Var(X) + Var(Y) + 2Cov(X,Y)$$` -- `$$Var(X) = E(X^2) - \mu^2$$` -- * **Independence** If event `\(A\)` and `\(B\)` are independent `$$\Pr(A\cap B)=\Pr(A)\Pr(B)$$` Random variable `\(X_1\)` and `\(X_2\)` are independent if and only if `$$f(x_1, x_2)=f_1(x_1)f_2(x_2)$$` -- If `\(X\)` and `\(Y\)` are independent $$Var(X+Y)Var(X) + Var(Y) $$ --- * **Uniform distribution (discrete)** <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-1-1.png" width="45%" height="45%" style="display: block; margin: auto;" /> -- * **Bernoulli Trials** `$$\begin{array}{l|r}x & Pr\{X=x\}\\ \hline 1 & p\\ 0 & 1-p\end{array}$$` `$$E[X]=1\times p+0\times(1-p)=p$$` `$$Var(X)=(1-p)^2\times p +(0-p)^2\times(1-p)=p(1-p)$$` --- * **Binomial Distribution** (multipal independent Bernoulli Trials) X is the number of successes occurring in n (Bernoulli) trials `$$\begin{aligned} Pr(X=x) &=\left(\begin{array}{c}n\\ x\end{array}\right)p^x(1-p)^{n-x}\\ &= \frac{n!}{x!(n-x)!}p^x(1-p)^{n-x},\quad \text{for } x=0,1,2,\ldots,n \end{aligned}$$` * Combinations `$$\left(\begin{array}{c}a\\ b\end{array}\right)=\frac{a!}{b!(a-b)!}=C^a_b$$` -- `$$\begin{aligned} E[X]&=\sum^n_{x=0}xPr(X=x)\\ &=\sum^n_{x=0}x\left(\begin{array}{c}n\\ x\end{array}\right)p^x(1-p)^{n-x} =np\end{aligned}$$` `$$\begin{aligned}Var(X)&=\sum^n_{x=0}(x-np)^2Pr(X=x)\\ &=np(1-p) \end{aligned}$$` --- <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-2-1.png" width="45%" height="45%" /> -- * **The Hyergeometric Distribution** (dependent) with out replacement `$$Pr(X=x) = \frac{\left(\begin{array}{c}k\\ x\end{array}\right)\left(\begin{array}{c}N-k\\ n-x\end{array}\right)}{\left(\begin{array}{c}N\\ n\end{array}\right)}$$` `$$E[X]=\frac{nk}{N}$$` `$$Var(X)=\frac{nk(N-k)(N-n)}{N^2(N-1)}$$` --- * **The Negative Binomial Distribution** Let `\(Y=\)` the total number of failures before the `\(r^{th}\)` success. The probability that in a sequance of `\(y+r\)` (Bernoulli) trials the last trial yields the `\(r^{th}\)` success. `\(Pr(Y=y)\)` equals the probability of `\(r-1\)` sucesses in the first `\(y+r-1\)` trials, times the probability of a success on the last trial. `$$Pr(Y=y)=\left(\begin{array}{c}y+r-1\\ r-1\end{array}\right)p^r(1-p)^y$$` `$$E[Y]=\frac{r(1-p)}{p}$$` `$$Var(Y) = \frac{r(1-p)}{p^2}$$` --- * **The Geometric Distribution** The negative binomial distribution when `\(r=1\)` `$$Pr(Y=y)=p(1-p)^y$$` `$$E[Y]=\frac{(1-p)}{p}$$` `$$Var(Y) = \frac{(1-p)}{p^2}$$` --- ## Week 4 * **Continuous random variable** A __continuous random variable__ can take a continum of values within some interval (infinitely many values). For example, rainfall in Melbourne in May can be any number in the range from 0.00 to 200.00 mm. -- * **Probability Distribution Function** (CDF) non-decreasing `$$F_X(x)=Pr(X\le x)$$` `$$F_X(-\infty)=0\qquad F_X(\infty)=1\qquad 0\le F_X(x)\le1$$` --- * **Probability density function** A random variable `\(X\)` is called continuous if its range is un-countable infinite and there exists a non-negative-valued function `\(f(x)\)` defined on `\(\mathbb{R}\)` such that for any event `\(B\subset R(X)\)`, we have `$$\Pr(B)=\int_B f(x)dx\ge 0, f(x)=0 \text{ for all } x\not\in R(X)\\ \int_\Omega f(x)dx=1$$` `$$F_X(x)=\int^x_{-\infty}f_X(t)dt =Pr(X\le x)\qquad f_X(x)=\frac{dF_X(x)}{dx}$$` -- If `\(X\)` is a continous random variable with pdf `\(f(x)\)`, then the **expected value** of `\(X\)`, denoted `\(\mathbb{E}[X]\)`, is given by `$$\mathbb{E}(X) = \int^b_{a} xf(x)dx$$` `$$\mathbb{E}(h(X)) = \int^b_{a} h(x)f(x)dx$$` `$$Var(X)=\int^b_{a}(x-E[X])^2f(x)dx$$` --- * **Uniform distribution (discrete)** `\(X\sim Unif(a,b)\)` `$$\text{PDF: }f_X(x)=\left\{\begin{array}{cl}\frac{1}{b-a} & \text{for } a<x<b\\ 0 & \text{otherwise} \end{array}\right.$$` `$$\text{CDF: }F_X(x)=\left\{\begin{array}{cl}0&x\le a\\ \frac{x-a}{b-a} & a<x\le b\\ 1 & b<x \end{array}\right.$$` `$$E[X]=\int^b_a\frac x{b-a}dx=\frac{a+b}2\\ Var(X)=E[X^2]-E^2[X]=\frac{(a-b)^2}{12}$$` -- <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-3-1.png" width="45%" height="45%" /><img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-3-2.png" width="45%" height="45%" /> --- * **Normal distribution** A random variable `\(X\)` has pdf `$$\phi_{(\mu, \sigma)}(x)=\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$` `\(\mu\)` is the location parameter; `\(\sigma\)` is the scale parameter. Often, denoted as `\(X\sim N(\mu, \sigma^2)\)`. `$$\Phi_{(\mu, \sigma)}(x)=\int^x_{-\infty}\frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(t-\mu)^2}{2\sigma^2}}dt$$` -- Linear transformation of independent normal distributions is still normal distribution. --- Special case: stansard normal distribution, `\(\mu=0\)` and `\(\sigma = 1\)` `$$X\sim N(\mu, \sigma^2) \Leftrightarrow Z=\frac{(X-\mu)}\sigma\sim N(0,1)$$` -- `$$\phi(z)=\phi(-z)\qquad \Phi(-z)=-\Phi(z)\\ \begin{aligned}Pr(x_1<X<x_2)&=Pr(z_1<Z<z_2)\text{ where }z_i=\frac{x_i-\mu}{\sigma}\\ &=\Phi(z_2)-\Phi(x_1)\end{aligned}$$` <img src="ETC2520_slides_S2_2018_files/figure-html/norm-1.png" width="45%" height="45%" style="display: block; margin: auto;" /> --- * **The Chi-squared distribution** `\(X\sim \chi^2(n)\)` If `\(Z_1, Z_2,\ldots, Z_n\)` are independent standard normal random variables, then `$$Y=Z_1^2+Z_2^2+\cdots+Z_n^2$$` has a **chi-squared** distribution with n degrees of freedom. `$$E[X]=n\qquad Var(X)=2n$$` If `\(X\sim \chi^2(v)\)` and `\(Y\sim \chi^2(w)\)` are _independent_ then `\(X+Y\sim \chi^2(v+w)\)` <img src="ETC2520_slides_S2_2018_files/figure-html/chi-1.png" width="45%" height="45%" style="display: block; margin: auto;" /> --- * **The Student-t distribution** `\(T\sim t_v\)` If `\(Z\sim N(0,1)\)` and `\(Y\sim \chi^2(v)\)` are _independent_ then `$$T=\frac{Z}{\sqrt{Y/v}}$$` has a **Student-t** distribution with `\(v\)` degrees of freedom. `$$E[X]=0 \text{ for } v>1\qquad Var(X) =\frac v{v-2} \text{ for }v>2$$` t distribution has fatter tails than N(0,1). As df increases, it gets more similar to N(0,1). <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-4-1.png" width="45%" height="45%" style="display: block; margin: auto;" /> --- * **The F distribution** `\(F\sim F_{v_1,v_2}\)` If `\(X\sim \chi^2(v_1)\)` and `\(Y\sim \chi^2(v_2)\)` are _independet_, then `$$F=\frac{\frac X{v_1}}{\frac Y{v_2}}$$` has an **F distribution** with `\(v_1\)` numerator and `\(v_2\)` denominator degrees of freedom. `$$E[X]=\frac{v_2}{v_2-2} \text{ for }v_2>2\qquad Var(X)=\frac{2v_2^2(v_1+v_2-2)}{v_1(v_2-2)^2(v_2-4)}\text{ for } v_2 >4$$` <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-5-1.png" width="45%" height="45%" style="display: block; margin: auto;" /> --- * **The lognormal distribution** `\(Y\)` has a **lognormal distribution** when `\(ln(Y)=X\)` has a normal distribution `$$X=ln(Y)\sim N(1,1)\qquad Y\sim lognormal(\mu=1, \sigma=1)$$` <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-6-1.png" width="45%" height="45%" style="display: block; margin: auto;" /> --- * **Transformation of variable** For `\(Y=\phi(X)\)` where `\(\phi\)` is a one to one function (monotonically) and `$$\phi^{-1}(\phi(x))=x\qquad \phi(\phi^{-1}(y))=y$$` * Using the CDF **Monotone increasing** `$$\begin{aligned}F_Y(y)&=Pr(Y\le y)\\ &=Pr(\phi(x)\le y)\\ &=Pr(X\le \phi^{-1}(y))\\ &=F_X(\phi^{-1}(y)) \end{aligned}$$` **Monotone decreasing** `$$\begin{aligned}F_Y(y)&=Pr(Y\le y)\\ &=Pr(\phi(x)\le y)\\ &=Pr(X\ge \phi^{-1}(y))\\ &=1-F_X(\phi^{-1}(y)) \end{aligned}$$` --- * Using the pdf **Monotone increasing** `$$\begin{aligned}f_Y(y) &= \frac{dF_Y(y)}{dy}=\frac{dF_X(\phi^{-1}(y))}{dy}\\ &=\frac{dF_X(x)}{dx}\times\frac{d\phi^{-1}(y)}{dy}\\ &=f_X(x)\times \frac{d\phi^{-1}(y)}{dy}\\ &=f_x(\phi^{-1}(y))\times\frac{d\phi^{-1}(y)}{dy}\end{aligned}$$` **Monotone decreasing** `$$\begin{aligned}f_Y(y) &= \frac{dF_Y(y)}{dy}=-\frac{dF_X(\phi^{-1}(y))}{dy}\\ &=-\frac{dF_X(x)}{dx}\times\frac{d\phi^{-1}(y)}{dy}\\ &=-f_X(x)\times \frac{d\phi^{-1}(y)}{dy}\\ &=f_x(\phi^{-1}(y))\times\left|\frac{d\phi^{-1}(y)}{dy}\right|\end{aligned}$$` --- <!-- Super simplified version from here --> ## Week 5 & 6 <img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-7-1.png" width="45%" height="45%" /><img src="ETC2520_slides_S2_2018_files/figure-html/unnamed-chunk-7-2.png" width="45%" height="45%" /> --- * **Poisson Process** The number of arrivals `\(N_t\)` in a finite interval of length `\(t\)` obeys `\(Poisson(\lambda t)\)` 1. `\(N(0)=0\)` and `\(N(s)\le N(t)\)` when `\(s<t\)` (non-decreasing) 2. Independent increment for any `\(0<t_1<t_2<\cdots<t_{n-1}<t_n\)`, the increment `\(N(t_2)-N(t_1), \cdots, N(t_{n}-N_{n-1}\)` are mutually independent 3. Stationary increment for any `\(0<t_1<t_2\)` and `\(h>0\)`, `\(P(N(t_2+h)-N(t_1+h)=k)=P(N(t_2)-N(t_1)=k)\)` `$$Pr(X=x)=\frac{(\lambda h)^xe^{-\lambda h}}{x!}$$` --- Theorem : The Poisson process is a pure birth process, i.e., in an infinitesimal time interval `\(h\)` there may occur only one arrival. This appens with the probability `\(\lambda h\)` independent of arriavals outside the interval `$$P(N_{t+h}-N_t=0)=P(N_h=0)=e^{-\lambda h}=1-\lambda h +o(h)$$` `$$P(N_{t+h}-N_t=1)=P(N_h=1)=\lambda he^{-\lambda h}=\lambda h +o(h)$$` * The time between claim `$$\begin{aligned}P(W_i >t) &=\frac{P(W_i>t)f_{T_{i-1}}(s)}{f_{T_{i-1}}(s)}\\ &=P(T_i>t+s|T_{i-1}=s)\\ &=P(N_{t+s}=i-1|N_s=i-1)\\ &=P(N_{t+s}-N_s=0)\\ &=P(N_t=0)\\ &=e^{-\lambda t} \end{aligned}$$` --- * Normal distribution approximate Binomial distribution `$$X\sim Binomial(n,p)\quad E(X)=np\quad Var(X)=np(1-p)\quad Y=\frac{X-np}{\sqrt{np(1-p)}}=\frac{X-np}{\sigma}$$` `$$\begin{aligned}M_Y(t) &= E(e^{tY})=E(e^{\frac{t(x-np)}{\sqrt{np(1-p)}}})=E(e^{\frac{t(x-np)}{\sigma}})= e^{\frac{-npt}{\sigma}}M_X(t/\sigma)\\ &=e^{\frac{-npt}{\sigma}}((1-p)+pe^{t/\sigma})^n=((1-p)e^{-pt/\sigma}+pe^{(1-p)t/\sigma})^n\\ &=\left[(1-p)(1-\frac{pt}{\sigma}+\frac{p^2t^2}{2\sigma^2}+o(\frac{t^3}{\sigma^3}))+p(1+\frac{(1-p)t}{\sigma}+\frac{(1-p)^2t^2}{2\sigma^2}+o(\frac{t^3}{\sigma^3}))\right]^n\\ &=\left[q(1-\frac{pt}{\sigma}+\frac{p^2t^2}{2\sigma^2}+o(\frac{t^3}{\sigma^3}))+p(1+\frac{qt}{\sigma}+\frac{q^2t^2}{2\sigma^2}+o(\frac{t^3}{\sigma^3}))\right]^n\\ &=\left[ q-\frac{pqt}{\sigma}+\frac{qp^2t^2}{2\sigma^2}+p+\frac{pqt}{\sigma}+\frac{pq^2t^2}{2\sigma^2} +o(\frac{t^3}{\sigma^3}) \right]^n\\ &= \left[1+\frac{pqt^2}{2\sigma^2}(p+q)+o(\frac{t^3}{\sigma^3}) \right]^n\\ &=\left[1+\frac{pqt^2}{2npq}+o(\frac{t^3}{\sigma^3}) \right]^n\\ \lim_{n\rightarrow\infty}M_Y(t)&=\lim_{n\rightarrow\infty}(1+\frac{t^2}{2n})^n=e^{\frac{t^2}{2}} \end{aligned}$$` --- * **L.I.E.** (Law of Iterated Expectation) `$$E[h(X,Y)]=E[E[h(X,Y)|Y]]=E[E[h(X,Y)|X]]$$` `$$E_{(X,Y)}[h(X,Y)]=E_{Y}[E_{X}[h(X,Y)|Y]]=E_{X}[E_{Y}[h(X,Y)|X]]$$` `$$\begin{aligned}E(E(X|Y))&=E(\sum_x x\cdot P(X=x|Y) )\\ &= \sum_y(\sum_x x\cdot P(X=x|Y))\cdot P(Y=y)\\ &=\sum_y\sum_x x\cdot P(X=x, Y=y)\\ &=\sum_x x \sum_y P(X=x, Y=y)\\ &= \sum_xx\cdot P(X=x)=E(X) \end{aligned}$$` `$$\begin{aligned}E(E(X|Y))&=E(\int_x x\cdot f(x|y)dx )\\ &= \int_y(\int_x x\cdot f(x|y)dx)\cdot f(y)dy\\ &= \int_y\int_x x\cdot f(x|y) f(y)dxdy\\ &=\int_y\int_x x\cdot f(x,y)dxdy\\ &=\int_x x \int_y f(x,y)dydx\\ &= \int_xx\cdot f(x)dx=E(X) \end{aligned}$$` --- ## Week 7 & 8 * WLLN and CLT Refer to [here](https://fya.netlify.com/hist_asy.pdf) -- * **Chebyshev's Inequality** another form `$$E(\bar{Y_n})=\mu\qquad Var(\bar{Y_n})=\frac{\sigma^2}{n}$$` `$$Pr(|\bar{Y_n}-\mu|>\epsilon)\leq \frac{Var(\bar{Y_n})}{\epsilon^2}=\frac{\sigma^2}{n\epsilon^2}$$` --- * Unbiasedness of `\(\hat{\sigma}^2\)` `$$\begin{aligned} s_y^2=\hat{\sigma}_y^2&=\frac{1}{n-1}\sum_{i=1}^n{(y_i-\hat{y})^2}\\ \end{aligned}$$` -- `$$\begin{aligned}E[\hat{\sigma}_y^2]&=E \left[{\frac {1}{n-1}}\sum_{i=1}^{n}{\big (}y_{i}-{\bar{y}}{\big )}^{2}\right]\\ &=E {\bigg [}{\frac {1}{n-1}}\sum _{i=1}^{n}{\bigg (}(y_{i}-\mu )-({\bar{y}}-\mu ){\bigg )}^{2}{\bigg ]}\\ &=E {\bigg [}{\frac {1}{n-1}}\sum _{i=1}^{n}{\bigg (}(y_{i}-\mu )^{2}-2({\bar{y}}-\mu )(y_{i}-\mu )+({\bar{y}}-\mu )^{2}{\bigg )}{\bigg ]}\\ &=E {\bigg [}{\frac {1}{n-1}}\sum _{i=1}^{n}(y_{i}-\mu )^{2}-{\frac {2}{n-1}}({\bar{y}}-\mu )\sum _{i=1}^{n}(y_{i}-\mu )+{\frac {n}{n-1}}({\bar{y}}-\mu )^{2}{\bigg ]}\\ &=E{\bigg [}{\frac {1}{n-1}}\sum _{i=1}^{n}(y_{i}-\mu )^{2}-{\frac {2}{n-1}}({\bar{y}}-\mu )\cdot n\cdot(\bar{y}-\mu )+{\frac {n}{n-1}}({\bar{y}}-\mu )^{2}{\bigg ]}\\ &=E{\bigg [}{\frac {1}{n-1}}\sum _{i=1}^{n}(y_{i}-\mu )^{2}-{\frac {n}{n-1}}({\bar{y}}-\mu )^{2}{\bigg ]}\\ &={\frac {1}{n-1}}\sum _{i=1}^{n}E((y_{i}-\mu )^{2})-{\frac {n}{n-1}}E(({\bar{y}}-\mu )^{2})\\ &=\frac{n}{n-1}\sigma^2 - \frac{n}{n-1}\frac{1}{n}\sigma^2\\ &= \sigma^2 \end{aligned}$$` --- `$$\begin{aligned}E[\hat{\sigma}_y^2]&=E \left[{\frac {1}{n}}\sum_{i=1}^{n}{\big (}y_{i}-{\bar{y}}{\big )}^{2}\right]\\ &=\frac{n}{n}\sigma^2 - \frac{n}{n}\frac{1}{n}\sigma^2\\ &= \frac{n-1}{n}\sigma^2 \end{aligned}$$` --- * **Jensen's Inequality** `$$E[g(.)]<g(E[.])$$` -- Via Jensen's Inequality, we know `\(E_{\theta_0}[l(\theta_0)]\ge E_{\theta_0}[l(\theta)]\)` where `\(E_{\theta_0}[.]\)` denotes the expection taken with resoect to `\(\prod^n_{s=1}f(x_s|\theta_0)\)` and `\(\theta_0\)` is the true, but unknown parameter value. -- * Proof: `$$\begin{aligned}E\left\{\log[\frac{f(Y|\theta^\#)}{f(Y|\theta_0)}] \right\} &< \log\left\{E[\frac{f(Y|\theta^\#)}{f(Y|\theta_0)}] \right\}\\ \log\left\{E[\frac{f(Y|\theta^\#)}{f(Y|\theta_0)}] \right\} &=\log\int_y \frac{f(Y|\theta^\#)}{f(Y|\theta_0)} f(Y|\theta_0)dy\\ &=\log\int_y {f(Y|\theta^\#)}dy = \log1=0\\ E\left\{\log[\frac{f(Y|\theta^\#)}{f(Y|\theta_0)}] \right\} &<0\\ \Rightarrow E_{\theta_0}[l(\theta)] &< E_{\theta_0}[l(\theta_0)]\end{aligned}$$` --- ## Week 9 * **MSE Decomposition** `$$MSE_T(\theta)=E[(T-\tau(\theta))^2]=Var(T)+[E[T]-\tau(\theta)]^2$$` * Proof `$$Bias=E[T]-\tau(\theta)\qquad Var(T)=E[(T-E(T))^2]$$` `$$\begin{aligned}E[(T-\tau(\theta))^2]&=E[(T-E[T]+E[T]-\tau(\theta))^2]\\ &=E[(T-E[T])^2]+(E[T]-\tau(\theta))^2+2E[(T-E[T])(E[T]-\tau(\theta))] \end{aligned}$$` `$$\begin{aligned}E[(T-E[T])(E[T]-\tau(\theta))]&=E[T]E[T]-E[T]E[T]-E[T]\tau(\theta)+E[T]\tau(\theta)\\ &=0 \end{aligned}$$` --- * **Cramer-Rao Lower Bound** `\(T(x_1,\ldots,x_n)\)` is an unbiased estimator of `\(\tau(\theta)\)` `$$Var(T)\ge\frac{[\frac{\partial\tau(\theta)}{\partial\theta}]^2}{nE[(\frac{\partial \log f(X:\theta)}{\partial\theta})^2]}$$` under **regularity conditions**: 1. `\(\frac{\partial \log f(X:\theta)}{\partial\theta}\)` exists for all `\(x\)` and all `\(\theta\)` 2. Integration of `\(f(X:\theta)\)` w.r.t. x and derivation w.r.t. `\(\theta\)` are interchangable 3. `\(0<E[(\frac{\partial \log f(X:\theta)}{\partial\theta})^2]<\infty\)` --- * Proof Cauchy-Schwarz Inequality `$$(E[XY])^2\le E[X^2]E[Y^2]$$` `$$(Cov(X,Y))^2\le Var(X)Var(Y)$$` `$$Var(X)\ge \frac{(Cov(X,Y))^2}{Var(Y)}$$` `$$X=T\qquad Y=S(\theta)=\sum\frac{\partial\log f(X:\theta)}{\partial\theta}$$` `$$Var(T)\ge \frac{(Cov(T,S(\theta)))^2}{Var(S(\theta))}$$` --- `$$\begin{aligned}E\left[\frac{\partial\log f(X:\theta)}{\partial\theta}\right]&=\int\frac{\partial\log f(X:\theta)}{\partial\theta}f(X:\theta)dx\\ &=\int\frac{\partial f(X:\theta)}{\partial\theta} \frac{1}{f(X:\theta)}f(X:\theta)dx\\ &=\int\frac{\partial f(X:\theta)}{\partial\theta}dx\\ &=\frac{\partial}{\partial\theta}\int f(X:\theta)dx = 0 \end{aligned}$$` `$$\begin{aligned}Var\left(\frac{\partial\log f(X:\theta)}{\partial\theta}\right)&=E\left[\left(\frac{\partial\log f(X:\theta)}{\partial\theta}\right)^2\right]-\left(E\left[\frac{\partial\log f(X:\theta)}{\partial\theta}\right]\right)^2\\ &=E\left[\left(\frac{\partial\log f(X:\theta)}{\partial\theta}\right)^2\right] \end{aligned}$$` `$$E[S(\theta)]=n\times 0\qquad Var(S(\theta))=n\times Var\left(\frac{\partial\log f(X:\theta)}{\partial\theta}\right)=nE\left[\left(\frac{\partial\log f(X:\theta)}{\partial\theta}\right)^2\right]$$` --- `$$\begin{aligned}Cov(T,S(\theta))&=E[T\cdot S()]-E[T]E[S(\theta)]=E[T\cdot S()]\\ &=\int\cdots\int T(x_1,\ldots,x_n)\sum\frac{\partial\log f(X:\theta)}{\partial\theta}\prod f(X:\theta)dx_1\ldots dx_n\\ &=\int\cdots\int T(x_1,\ldots,x_n)\frac{\partial\sum\log f(X:\theta)}{\partial\theta}\prod f(X:\theta)dx_1\ldots dx_n\\ &=\int\cdots\int T(x_1,\ldots,x_n)\frac{\partial\log \prod f(X:\theta)}{\partial\theta}\prod f(X:\theta)dx_1\ldots dx_n\\ &=\int\cdots\int T(x_1,\ldots,x_n)\frac{\partial \prod f(X:\theta)}{\partial\theta}\frac{1}{\prod f(X:\theta)}\prod f(X:\theta)dx_1\ldots dx_n\\ &=\int\cdots\int T(x_1,\ldots,x_n)\frac{\partial \prod f(X:\theta)}{\partial\theta}dx_1\ldots dx_n\\ &=\frac{\partial }{\partial\theta}\int\cdots\int T(x_1,\ldots,x_n) \prod f(X:\theta)dx_1\ldots dx_n\\ &=\frac{\partial }{\partial\theta}E[T]=\frac{\partial }{\partial\theta}\tau(\theta) \end{aligned}$$` `$$Var(T)\ge\frac{[\frac{\partial\tau(\theta)}{\partial\theta}]^2}{nE[(\frac{\partial \log f(X:\theta)}{\partial\theta})^2]}$$` ---