Hypothesis testing

One sample hypothesis testings

Null hypothesis H₀: already known, established, default, status quo, old, pre-existing, current practice, well-known, working assumption, nothing new, boring. The (generic) parameter φ equals some number a; there is no difference.
Alternative hypothesis H_A: new, exciting, hoped/wished, changed, different, research, challenger, the conjecture, claim. Either the parameter φ<a, or φ>a, or φ≠a; there is a difference, there is an effect.
Test if the sample (i.e. its statistic and its size, n) provides enough evidence to overthrow ("warrant rejection of") the null hypothesis. Is the sample statistic extreme enough.
Either "reject" or "fail to reject" the null hypothesis; never "accept" it. Rejecting it ≡ "support" the alternative.
The alternative hypothesis is neither rejected nor accepted.
Nothing is ever "proven". (would need entire population to prove anything)

8-3. T-Test for mean μ. Uses μ, s, x̄, and n. Test statistic is t.
Data is normal or n≥30.

8-3 (Obsolete). Z-Test for mean μ if σ known (rare) [or with large n and use s for σ]. Uses μ, σ, x̄, and n. Test statistic is z.
Data is normal or n≥30.

8-2. 1-PropZTest for proportion p. Uses #yeses or p̂, p, and n. Test statistic is z.
Binary nominal data. Normal distribution is approximating a Binomial distribution.

8-4. Χ²-test for standard deviation σ. Uses σ, s, and n. Test statistic is Χ².
Population must be normal.

The test statistic is a measure of discrepancy between a sample statistic and the H₀ claimed value of the population parameter.

Given null hypothesis H₀: parameter = a
Choose one:
H_A: parameter (stat) < a	"H_A < H₀"	Left-tailed
H_A: parameter (stat) > a	"H_A > H₀"	Right-tailed
H_A: parameter (stat) ≠ a	"H_A ≠ H₀"	Two-tailed

μ:
σ: OR (if σ unknown) s: OR both if doing Χ²-test
x̄:
n:
OR 8-3. OR 8-4.

OR    8-2. Proportion test: Uses #yeses or p̂, p, n
#Yeses, x: OR p̂:     and p:
np: nq: Both should be ≥5    Standard error=√(pq)/√n:

Power: specific value of p: α: = p̂:

Result:
     z:                                  Standard error=σ/√n:
   Critical value: One-tailed: α=0.1:±1.28    α=0.05:±1.645    α=0.01:±2.324    Two-tailed: α=0.1:±1.645    α=0.05:±1.96    α=0.01:±2.576
   If Left-tailed and (-)z≤-CritValue then Reject H₀ at that α level.
   If Right-tailed and z≥CritValue then Reject H₀ at that α level.

OR     t: df:       StdErr=s/√n:           set t and df, look at p-value below
   Critical value: α=0.05: α=0.01:
   If Left-tailed and (-)t≤CritValue then Reject H₀ at that α level.
   If Right-tailed and t≥CritValue then Reject H₀ at that α level.
   If Two-tailed and t≥|CritValue| then Reject H₀ at that α level.

OR     Χ²: df:
   Critical value: α=0.05: α=0.01:
   If Left-tailed and Χ²≤CritValue then Reject H₀ at that α level.
   If Right-tailed and Χ²≥CritValue then Reject H₀ at that α level.

*** p-value (CDF(z) or TCDF(t,df) or Chisqr_CDF(Χ²,df)):
Chance that the test statistic would be as much or more if H₀ were true.
"If the p is low, the null must go."
Typically the critical/rejection region ("level of significance", α) is chosen to be .05 or .01, so if p is less than it reject H₀; if p is not less than the critical value don't reject H₀ ("fail to reject").
Probability (area) in a tail (or two) of the test statistic's PDF curve.
If p is high (bigger than α), can't reject H₀.
Selecting Two-tailed case doubles the p-value over the One-tailed cases.
Mean and Proportion One-tailed tests are "symmetric". SD is
Tip: if the p-value is like .9, check that you selected the appropriate "tail" above before failing to reject.

Exs.
T-test: μ=100, s=10, n=30. Try x̄= 102, 103, 104, 105. Ha>H0
T-test: μ=100, s=10, n=30. Try x̄= 102, 103, 104, 105. Ha≠H0

Effect of s:
T-test: μ=100, s=5, n=30. Try x̄= 101, 102, 103. Ha>H0
T-test: μ=100, s=5, n=30. Try x̄= 101, 102, 103. Ha≠H0

Effect of n:
T-test: μ=100, s=10, n=100. Try x̄= 101, 102, 103. Ha>H0
T-test: μ=100, s=10, n=100. Try x̄= 101, 102, 103. Ha≠H0

NB. p-hacking: great pressures (professional, monetary, publication bias, ideological) to have positive result.
So cheating and lying by:
stop data collection when p≤.05
discard data that prevents p≤.05
repeat the experiment until get p≤.05
test for different effects until find one with p≤.05

NB. Also possible to have:
H₀: φ≤a and H_A: φ>a
H₀: φ≥a and H_A: φ<a

NB. With very large sample a very small difference between x̄ and claimed μ can be "significant".