Markov’s inequality
For any non-negative random variable and , we have
Proof
Fix any . We can represent any real number as
Then we have
Chebyshev’s inequality
Let any random variable. Then, for every we have
Proof
By using Markov’s inequality on the non-negative random variable we have
Chernoff’s inequality
Let be independent Bernoulli random variables with parameters . Let , with mean .
Then, for every value , we have
Proof
We use the same approach as for the Chernoff’s inequality, passing through the Markov’s inequality.
where last inequality holds since for every .
Therefore
Since this function is convex, we can minimize the bound finding the value of s.t. the first derivative becomes . By monotonicity, it holds when the first derivative of the is , i.e., when
Finally
In a similar way, we can prove the lower tail bound:
Chernoff’s inequality: small deviations
Let be independent Bernoulli random variables with parameters . Let , with mean .
Then, for every we have
where is an absolute constant.
Proof
We can re-write the probability
We want now to maximize and , and this is equivalent to maximize and .
here.
And
Since , then
Since , then
Finally
Chernoff’s inequality: Additive form
Let be independent Bernoulli random variables with parameters . Let , with mean .
Then, for every we have
In the special case where , i.e. and , we have
Hoeffding’s inequality
Let be independent symmetric Bernoulli random variables, i.e., random binary random variables that assumes value or with probability . Let . Then, for every , we have
Proof
We start multiplying both sides of the inequality by a value and and exponentiate them (as for the proof of Chebyshev’s inequality), and then we apply Markov’s inequality.
where the last inequality holds by the indipendence of the random variables.
For every , since , we have that (see hyperbolic functions).
In can be prooved (see this) that , for every . Therefore,
By combining the previous inequality we have
for every .
Since the function is convex, we can minimize it simply by finding the value of such that the derivative is (thus providing a better upper bound). The derivative is , and it’s equal to when . Therefore,
Hoeffding’s inequality, two-sided
Let be independent symmetric Bernoulli random variables, i.e., random binary random variables that assumes value or with probability . Let . Then, for every , we have
Proof
Let . We can simply apply the Hoeffding’s inequality for the variables instead of , and obtain the same bound for . Then