Muhammad Haris Rao
Our goal is a proof of the following theorem:
Theorem: (Weierstrass Approximation Theorem) The polynomials are dense in the space $\mathcal{C}\left( [0, 1], \mathbb{R} \right)$ of continuous real valued functions on $[0, 1]$ with respect to the uniform norm. That is, for any choice of continuous $f: [0, 1] \to \mathbb{R}$, there is a sequence of polynomials $f_n : [0, 1] \to \mathbb{R}$ such that \begin{align*} \lim_{n \to \infty} \left( \underset{x \in [0, 1]}{\sup} |f_n(x) - f(x)| \right) = 0 \end{align*}
The proof will be based off probabilistic methods. We will make use of the following lemma:
Theorem(Markov and Chebychev Inequalities) Let $X \ge 0$ be a non-negative real valued random variable with finite mean. Then, \begin{align*} \mathbf{P} \left( X \ge k \right) \le \frac{\mathbf{E}[X]}{k} \end{align*} for any real $k \ge 0$. Consequently, if $X$ is any square integrable random variable then, \begin{align*} \mathbf{P} \left( |X - \mathbf{E}[X]| \ge k \right) \le \frac{ \text{Var}[X]}{k^2} \end{align*}
Proof of Weierstrass Approximation Theorem. We wish to approximate some given (uniformly) continuous function $f: [0, 1] \to \mathbb{R}$. Given $p \in [0, 1]$, let $\{ X_n^{(p)} \}_{n \ge 1}$ be a sequence of independent Bernoulli random variables, each with success probability $p \in [0, 1]$, and set \begin{align*} S_n^{(p)} &= \sum_{k = 1}^n X_k^{(p)} \end{align*} so that $S_n^{(p)}$ is a binomial random variable with $n$ trials and success probability $p$. Of course, the average of the first $n$ of the independent sequence of Bernoulli random variables is $S_n^{(p)} / n$, so the weak law of large numbers tells us that for large values of $n$, the value of $S_n^{(p)}/n$ is very likely to be close to the mean $\mathbf{E}[X_k] = p$. Because $S_n^{(p)}/n$ is likely to be very close to $p$, we expect that $f(S_n^{(p)} / n)$ is close to $f(p)$ with high probability as well, so that presumably, \begin{align*} \mathbf{E} \left[ f\left( S_n^{(p)} / n \right) \right] \approx f(p) \end{align*} (note: this is where continuity of $f$ comes into play. To go from the statement that $x$ is close to $y$ to the statement that $f(x)$ is close to $f(y)$, we generally require that $f$ be continuous).
The larger the value of $n$, the closer we expect this approximation to be, so we suspect that $\mathbf{E}[f(S_n^{(p)}/n)]$ as a function of $p \in [0, 1]$ converges to the function $f(p)$ in some sense. It turns out this convergence will be in the uniform norm which is what we want. We will make this all rigorous now.
First, see that \begin{align*} \mathbf{E}\left[ f \left( S_n^{(p)} / n \right) \right] &= \sum_{k = 0}^n f \left( k / n \right) \mathbf{P} \left( S_n = k \right) = \sum_{k = 0}^n f \left( k / n \right) { n \choose k } p^k (1 - p)^{n - k} \end{align*} which is a (Bernstein) polynomial in $p \in [0, 1]$. So setting $f_n(p) = \mathbf{E}[f(S_n^{(p)}/n)]$, we have a sequence of polynomials $f_n : [0, 1] \to \mathbb{R}$. Now to show $f_n \longrightarrow f$ as $n \to \infty$ in the uniform norm.
For this, choose a real number $M \ge 0$ such that \begin{align*} |f(x) - f(y)| \le M \end{align*} for all $x, y \in [0, 1]$. Next, let $\varepsilon > 0$ be arbitrary. It suffices to demonstrate that \begin{align*} \lim_{n \to \infty} \left( \underset{p \in [0, 1]}{\sup} \left| f_n(p) - f(p) \right| \right) \le \varepsilon \end{align*} We have that \begin{align*} \left| f_n(p) - f(p) \right| &= \left| \mathbf{E}\left[ f \left( S_n^{(p)}/ n \right) \right] - \mathbf{E}\left[f(p)\right] \right| \\ &= \left| \mathbf{E}\left[ f \left( S_n^{(p)}/ n \right) - f(p) \right] \right| \\ &\le \mathbf{E} \left[ \left| f \left( S_n^{(p)}/ n \right) - f(p) \right| \right] \\ &= \mathbf{E} \left[ \left| f \left( S_n^{(p)}/ n \right) - f(p) \right| \mathbf{1}_A \right] + \mathbf{E} \left[ \left| f \left( S_n^{(p)}/ n \right) - f(p) \right|\mathbf{1}_{A^c} \right] \end{align*} where $\mathbf{1}_A$ is the indicator function on any event $A$ of the underlying probability space over which we are taking the expectation.
Using the uniform continuity of $f$, pick $\delta > 0$ such that whenever $|x - y| < \delta$, then $|f(x) - f(y)| < \varepsilon$, and then take $A$ to be the event \begin{align*} A &= \left\{ \left| \frac{S_n^{(p)}}{n} - p \right| < \delta \right\} \end{align*} Then the expectation over the set $A$ becomes \begin{align*} \mathbf{E} \left[ \left| f \left( S_n^{(p)}/ n \right) - f(p) \right| \mathbf{1}_A \right] &< \mathbf{E} \left[ \varepsilon \mathbf{1}_A \right] = \varepsilon \mathbf{P} \left( A \right) \le \varepsilon \end{align*} Using Chebyshev's inequality with the bound $M \ge 0$ gives the following estimate for the expectation over $A^c$: \begin{align*} \mathbf{E} \left[ \left| f \left( S_n^{(p)}/ n \right) - f(p) \right|\mathbf{1}_{A^c} \right] &\le M \mathbf{E} \left[ \mathbf{1}_{A^c} \right] \\ &= M \mathbf{P} \left( \left| \frac{S_n^{(p)}}{n} - p \right| \ge \delta \right) \\ &\le \frac{M \text{Var} \left[ S_n^{(p)}/n \right]}{\delta^2} \\ &\le \frac{M}{4 n \delta^2} \end{align*} Here, we used the fact that the variance of a binomial random variable with parameters $n, p$ is $np(1 - p)$, and that $p(1 - p) \le 1/4$ for $p \in [0, 1]$. So now overall we have the inequality \begin{align*} \left| f_n(p) - f(p) \right| < \varepsilon + \frac{M}{4 n \delta^2} \end{align*} Notice that the bound is independent of $p$. Hence, \begin{align*} \lim_{n \to \infty} \left( \underset{p \in [0, 1]}{\sup} \left| f_n(p) - f(p) \right| \right) \le \lim_{n \to \infty} \left( \varepsilon + \frac{M}{4 n \delta^2} \right) = \varepsilon \end{align*} for all $\varepsilon > 0$. This can only mean \begin{align*} \lim_{n \to \infty} \left( \underset{p \in [0, 1]}{\sup} \left| f_n(p) - f(p) \right| \right) = 0 \end{align*} which completes the proof.
Corollary: Let $K \subseteq \mathbb{R}$ be compact, and let $f : K \longrightarrow \mathbb{R}$ be continuous. Then there is a sequence of polynomials $f_n : \mathbb{R} \longrightarrow \mathbb{R}$ such that \begin{align*} \sup_{x \in K} \left| f_n(x) - f(x) \right| \longrightarrow 0 \text{ as $n \to \infty$} \end{align*}
Proof. By Heine-Borel theorem, $K$ is bounded so there is $M \in \mathbb{R}_{> 0}$ such that $K \subseteq [-M, M]$. Define $\phi : K \longrightarrow [0, 1]$ by \begin{align*} \phi \left( x \right) = \frac{x + M}{2M} \end{align*} If $x \in K$, then $x \ge -M$, we have $x + M \ge 0$ so $\phi(x) \ge 0$. Further, since $x \le M$, we have $x + M \le 2M$ so also $\phi(x) \le 1$. This proves that $\phi$ maps $K$ into the unit interval as has been implicitly claimed. By the Tietze extension theorem, we have that $f$ extends to a continuous function $g : [-M, M] \longrightarrow \mathbb{R}$ such that $g \mid_K = f$. Define $\phi : [-M, M] \longrightarrow [0, 1]$ by the rule $\phi(x) = (x + M)/2M$. It is easiily checked that $\phi$ morphs $[-M, M]$ homeomorphically with the unit interval. Therefore, $g \circ \phi^{-1} : [0, 1] \longrightarrow \mathbb{R}$ is a continuous function on $[0, 1]$. By the Weierstrass approximation theorem, there exists a sequence of polynomials $g_n : [0, 1] \longrightarrow \mathbb{R}$ which converge unifomrly to $g \circ \phi^{-1}$ as $n \to \infty$. Since $\phi$ is a polynomial, it follows that $g_n \circ \phi : [-M, M] \longrightarrow \mathbb{R}$ is a polynomial as well. Now to prove $g_n \circ \phi$ converges uniformly to $g : [-M, M] \longrightarrow \mathbb{R}$ on $[-M, M]$. This is simply due to the fact that \begin{align*} \sup_{x \in [-M, M]} \left| g(x) - (g_n \circ \phi) (x) \right| = \sup_{x \in [0, 1]} \left| g(\phi^{-1}(x)) - (g_n \circ \phi) (\phi^{-1}(x)) \right| \sup_{x \in [0, 1]} \left| \left( g \circ \phi^{-1} \right) (x) -g_n \left( x \right) \right| \longrightarrow 0 \text{ as $n \to \infty$} \end{align*} since $g_n \longrightarrow g \circ \phi^{-1}$ uniformly on $[0, 1]$. Finally, set $f_n = g_n \circ \phi \mid_K$ which is polynomial. Then, \begin{align*} \sup_{x \in K} \left| f(x) - f_n \left( x \right) \right| = \sup_{x \in K} \left| \left( g \mid_K \right) (x) - \left( g_n \circ \phi \mid_K \right) \left( x \right) \right| \le \sup_{x \in [-M, M]} \left| g(x) - (g_n \circ \phi) (x) \right| \longrightarrow 0 \text{ as $n \to \infty$} \end{align*} Thus, the sequence of polynomials $f_n$ converges uniformly to $f$ on $K$.
Lemma:Let $X$ be a compact Hausdorff space, and $\mathcal{A} \subseteq \mathcal{C} \left( X, \mathbb{R} \right)$ an $\mathbb{R}$-algebra of continuous functions. If for every $x, y \in X$ there exists $f \in \mathcal{A}$ such that $f(x) \ne f(y)$, then $\overline{\mathcal{A}} = \mathcal{C} \left( X, \mathbb{R} \right)$ where the closure is with respect to the uniform norm.
Proof. First, to prove that for every $f \in \mathcal{A}$, we have $|f| \in \overline{\mathcal{A}}$. Since $f$ is continuous and $X$ compact, its range $f(X) \subseteq \mathbb{R}$ is a compacft set. Thus, there exists a sequence of polynomials $p_n : f(X) \longrightarrow \mathbb{R}$ which converge uniformly to the absolute value function $| \cdot | : f(X) \longrightarrow \mathbb{R}_{\ge 0}$. It follows that $p_n \circ f \longrightarrow |f|$ uniformly on $X$. Because $\mathcal{A}$ is an algebra and $f \in \mathcal{A}$, it follows that $p_n \circ f \in \mathcal{A}$ since $p_n \circ f$ is obtained by only pointwise addition, multiplication and scalar mulitplication of $f$. Thus, $|f|$ is a limit point of $\mathcal{A}$ as was to be shown.
This implies that for any $f, g \in \mathcal{A}$, the maximums and minimums of $f, g$ are in the closure $\overline{\mathcal{A}}$. This is because \begin{align*} \max\{f, g \} &= \frac{f + g}{2} + \frac{|f - g|}{2} \\ \min\{f, g \} &= \frac{f + g}{2} - \frac{|f - g|}{2} \end{align*} and $\overline{\mathcal{A}}$ is obviously closed under pointwise addition amd multiplication like $\mathcal{A}$.
Now let $f \in \mathcal{C} \left( X, \mathbb{R} \right)$ be arbitrary. We will prove that $f \in \overline{\mathcal{A}}$. For this, instead of finding a sequence in $\mathcal{A}$ converging to $f$ it suffices to find a sequence in $\overline{A}$ converging to $f$ since the closed set $\mathcal{A}$ contains its limit points.
Let $\varepsilon > 0$. Choose $x_0 \in X$ arbitrarily. For every $y \in X - \{ x_0 \}$ then there is $f_y \in \mathcal{A}$ such that $f_y (x_0) \ne f_y(y) \ne 0$ since $\mathcal{A}$ separates points. Let $V_y = f_y^{-1} \left( \left( f(y) - \varepsilon, f(y) + \varepsilon \right) \right)$. Evidently, $y \in V_y$ so the collection of open sets $\{ V_y \mid y \in X \}$ covers $X$. By compactness of $X$, theres is a finite subcover $\{ V_{y_i} \mid i \in \{ 1,2,\cdots, n\} \}$ for some choice of $\{ y_1, y_2, \cdots, y_n \} \subseteq X$. Now take \begin{align*} g = \min\{ f_{y_1}, f_{y_2}, \cdots, f_{y_n} \} \end{align*} Then for every $y \in X$, we have $y \in V_{y_i}$ for some $i \in \{ 1, 2, \cdots, n \}$ and so $f_{y_i} (y) \in \left( f(y_i) - \varepsilon, f(y_i) + \varepsilon \right)$. Thus, \begin{align*} g \left( y \right) &= \min\{ f_{y_1}, f_{y_2}, \cdots, f_{y_n} \} \le f_{y_i} \left(y\right) < f(y_i) + \varepsilon \end{align*}