Workshop offered by the Interfaculty Center Data processing and Statistics (icds.be).

Current draft (Apr 04, 2021) aims to introduce researchers to the key ideas in sample size calculation that would help them design their study (55 pages). Our target audience is primarily the research community at VUB / UZ Brussel.




We invite you to help us improve this document by sending us feedback
or anonymously at icds.be/consulting (right side, bottom)


01 Sample Size Calculation


02 Sample Size Calculation: demarcation



















03 Sample Size Calculation: a difficult design issue


04 Simple Example


05 Reference Example


06 A formula you could use


  • for this particular case:
    • sample size (n → ?)
    • difference (d=signal → 2)
    • uncertainty (\(\sigma\)=noise → 4)
    • type I errors (\(\alpha\).05, so \(Z_{\alpha/2}\) → -1.96)
    • type II errors (\(\beta\).2, so \(Z_\beta\) → -0.84)

  • sample size = 2 groups x 63 observations = 126
  • note: formula’s are test and statistic specific but logic remains same
  • this and other formula’s implemented in various tools
    our focus: GPower
\(n = \frac{(Z_{\alpha/2}+Z_\beta)^2 * 2 * \sigma^2}{d^2}\) \(n = \frac{(-1.96-0.84)^2 * 2 * 4^2}{2^2} = 62.79\)








07 GPower: the building blocks in action









08 GPower: a useful tool


  • popular and well established
  • free @ http://www.gpower.hhu.de/
  • implements wide variety of tests
  • implements various visualizations
  • documented fairly well
  • note: not all tests are included !
  • note: not without flaws !
  • other tools exist (some paying)
  • for complex models: impossible
    alternative: simulation (generate and analyze)







09 GPower input


  • ~ reference example
  • t-test : difference two indep. means
  • apriori: calculate sample size
  • effect size = standardized difference [Determine]
    • Cohen’s \(d\)
    • \(d\) = |difference| / SD_pooled
    • \(d\) = |0-2| / 4 = .5
  • \(\alpha\) = .05
    2 - tailed (\(\alpha\)/2 → .025 & .975)
  • \(power = 1-\beta\) = .8
  • allocation ratio = 1
    (equally sized groups)







10 GPower output


  • sample size (\(n\)) = 64 x 2 = (128)
  • degrees of freedom (\(df\)) = 126 (128 - 2)
  • critical t = 1.979
    • decision boundary given \(\alpha\) and \(df\)
      qt(.975,126)
  • non centrality parameter (\(\delta\)) = 2.8284
    • shift Ha (true) away from Ho (null)
      2/(4*sqrt(2))*sqrt(64)
  • distributions: central Ho and non-central Ha
  • power ≥ .80 (1-\(\beta\)) = 0.8015








11 Protocol: reference example


t tests - Means: Difference between two independent means (two groups)
Analysis: A priori: Compute required sample size

Input:
Tail(s) = Two
Effect size d = 0.5000000
α err prob = 0.05
Power (1-β err prob) = .8
Allocation ratio N2/N1 = 1

Output:
Noncentrality parameter δ = 2.8284271
Critical t = 1.9789706
Df = 126
Sample size group 1 = 64
Sample size group 2 = 64
Total sample size = 128
Actual power = 0.8014596








12 Building Blocks



13 GPower Statistical Tests


  • test family - statistical tests [in window]
    • Exact Tests (8)
    • \(t\)-tests (11) → reference
    • \(z\)-tests (2)
    • \(\chi^2\)-tests (7)
    • \(F\)-tests (16)
  • focus on the density functions
  • tests [in menu]
    • correlation & regression (15)
    • means (19) → reference
    • proportions (8)
    • variances (2)
  • focus on the type of parameters








  • 14 Central Ho and Non-Central Ha Distributions


    • Ho acts as \(\color{red}{benchmark}\) → eg., no difference
      • set \(\color{green}{cut off}\) on Ho ~ t(ncp=0,df) using \(\alpha\),
      • reject Ho if test returns implausible value
    • Ha acts as \(\color{blue}{truth}\) → eg., difference of .5 SD
      • Ha ~ t(ncp!=0,df)
      • ncp as violation of Ho → shift (location/shape)
    • ncp : non-centrality parameter combines
      • assumed effect size (target or signal)
      • conditional on sample size (information)
    • ncp : determines overlap → power ↔︎ sample size
      • probability beyond \(\color{green}{cut off}\) at Ho evaluated on Ha
    https://apps.icds.be/shinyt/







    15 Note: Divide by N Perspective as alternative


    • divide by n: sample size ~ standard deviation
    • non-centrality parameter: sample size ~ location

    \(n = \frac{(Z_{\alpha/2}+Z_\beta)^2 * 2 * \sigma^2}{d^2}\)
    \(n = \frac{(-1.96-0.84)^2 * 2 * 4^2}{2^2}\)
    \(n = 62.79\)








    16 Note: Ho and Ha, asymmetry in statistical testing


    17 Type I/II Error Probability


    • inference test based on cut-off’s (density → AUC=1)
    • type I error: incorrectly reject Ho (false positive):
      • cut-off at Ho, error prob. \(\alpha\) controlled
      • one/two tailed → one/both sides informative ?
    • type II error: incorrectly fail to reject Ho (false negative):
      • cut-off at Ho, error prob. \(\beta\) depends on Ha
      • Ha assumed known in a power analyses
    • power = 1 - \(\beta\) = probability correct rejection (true positive)
    • inference versus truth
      • infer: effect exists vs. unsure
      • truth: effect exist vs. does not
    infer=Ha infer=Ho sum
    truth=Ho \(\alpha\) 1-\(\alpha\) 1
    truth=Ha 1-\(\beta\) \(\beta\) 1





    18 Exercise on Errors, create plot


    • ~ reference example
    • create plot
      (X-Y plot for range of values)
    • plot sample size by type I error
    • set plot to 4 curves
      • for power .8 in steps of .05
    • set \(\alpha\) on x-axis
      • from .01 to .2 in steps of .01
    • use effect size .5
    • notice Table option







    19 Exercise on Errors, interpret plot


    • where on the red curve (right)
      type II error = 4 * type I error ?
    • when smaller effect size (.25), what changes ?
    • switch power and sample size (32 in step of 32)
      what is relation type I and II error ?

    • what would be difference between curves for \(\alpha\) = 0 ?





    20 Decide Type I/II Error Probability


    • popular choices
      • \(\alpha\) often in range .01 - .05 → 1/100 - 1/20
      • \(\beta\) often in range .2 to .1 → power = 80% to 90%
    • \(\alpha\) & \(\beta\) inversely related
      • \(\alpha\) & \(\beta\) often selected in 1/4 ratio
        type I error is 4 times worse !!
      • which error you want to avoid most ?
        • cheap aids test ? → avoid type II
        • heavy cancer treatment ? → avoid type I
      • probability for errors always exists





    21 Control Type I Error






    22 for fun: P(effect exists | test says so)






    23 Effect Sizes, in principle






    24 Effect Sizes, in literature


    • Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.

    • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed).
    • famous Cohen conventions but beware, just rules of thumb
    • more than 70 different effect sizes… most of them related
    • Ellis, P. D. (2010). The essential guide to effect sizes: statistical power, meta-analysis, and the interpretation of research results.





    25 Effect Sizes, in GPower (Determine)


    • effect sizes are test specific
      • t-test → group means and sd’s
      • one-way anova →
        variance explained & error
      • regression →
        again other parameters
      • . . . .
    • GPower helps with Determine
      • sliding window
      • one or more effect size specifications





    26 Exercise on Effect Sizes, ingredients Cohen’s d


    For the reference example:

    • change mean values from 0 and 2 to 4 and 6, what changes ?
    • change sd values to 2 for each, what changes ?
      • effect size ?
      • total sample size ?
      • critical t ?
      • non-centrality ?
    • change sd values to 8 for each, what changes ?
    • change sd to 2 and 5.3, or 1 and 5.5,
      how does it compare to 4 and 4 ?





    27 Exercise on Effect Sizes, plot


    • plot powercurve: power by effect size
    • compare 6 sample sizes: 34 in steps of 34
    • for a range of effect sizes in between .2 and 1.2
    • use \(\alpha\) equal to .05

    • pinpoint the situations from previous section on the plot (sd=4 and 2).
    • how does power change when doubling the effect size ?
    • powercurve → X-Y plot for range of values








    28 Exercise on Effect Size, imbalance


    For the reference example:

    • compare for allocation ratios 1, .5, 2, 10, 50

    • repeat for effect size 1, and compare

    • ? no idea why n1 \(\neq\) n2

    after calculate plot, to change allocation ratio





    29 Effect Sizes, how to determine them in theory


    30 Effect Sizes, how to determine them in practice


    31 Relation Sample & Effect Size, type I & II Errors


    • building blocks:
      • sample size (\(n\))
      • effect size (\(\Delta\))
      • alpha (\(\alpha\))
      • power (\(1-\beta\))
    • each parameter
      conditional on others
    • GPower → type of power analysis
      • Apriori: \(n\) ~ \(\alpha\), power, \(\Delta\)
      • Post Hoc: power ~ \(\alpha\), \(n\), \(\Delta\)
      • Compromise: power, \(\alpha\) ~ \(\beta\:/\:\alpha\), \(\Delta\), \(n\)
      • Criterion: \(\alpha\) ~ power, \(\Delta\), \(n\)
      • Sensitivity: \(\Delta\) ~ \(\alpha\), power, \(n\)





    32 Exercise on Type of Power Analysis


    Solution for Type of Power Analysis






    33 getting your hands dirty


    # calculator
    m1=0;m2=2;s1=4;s2=4
    alpha=.025;N=128
    var=.5*s1^2+.5*s2^2
    d=abs(m1-m2)/sqrt(2*var)*sqrt(N/2)
    tc=tinv(1-alpha,N-1)
    power=1-nctcdf(tc,N-1,d)

    • in R
    • qt → get quantile on Ho (\(Z_{1-\alpha/2}\))
    • pt → get probability on Ha (non-central)
    .n <- 64
    .df <- 2*.n-2
    .ncp <- 2 / (4 * sqrt(2)) * sqrt(.n)
    .power <- 1 -
        pt(
            qt(.975,df=.df),
            df=.df, ncp=.ncp
        ) - 
        pt( qt(.025,df=.df), df=.df, ncp=.ncp)
    round(.power,4)
    ## [1] 0.8015





    34 GPower, beyond the independent t-test






    35 Dependence between groups






    Solution for dependence between groups






    36 Non-parametric distribution









    Solution for non-parametric distribution









    37 A relations perspective, regression analysis






    Solution on a relations perspective






    38 A variance ratio perspective, ANOVA






    Solution on a variance ratio perspective






    39 A variance ratio perspective on multiple groups


    • multiple groups → not one effect size d
    • F-test statistic & effect size f, ratio of variances \(\sigma_{between}^2 / \sigma_{within}^2\)
    • difference between multiple groups summarized in variance \(\sigma_{between}^2\)

    • example: one control and two treatments
      • reference example + 1 group
      • sd within each group, for all groups (C,T1,T2) = 4
      • means C=0, T1=2 and for example T2=4





    40 Multiple Groups: Omnibus






    Solution for multiple groups omnibus

    41 Multiple Groups: Pairwise






    Solution for multiple groups pairwise






    42 Multiple Groups: Contrasts


    • contrasts are linear combinations → planned comparison
      • eg., 1 * T1 -1 * C \(\neq\) 0 & 1 * T2 -1 * C \(\neq\) 0
      • eg., .5 * (1 * T1 + 1 * T2) -1 * C \(\neq\) 0
    • effect sizes for planned comparisons must be calculated !!
      • variance ratios
      • standard deviation of contrasts → between variance
      • compare between variance for contrast with within variance
    • each contrast
      • requires 1 degree of freedom
      • combines a specific number of levels
    • multiple testing correction may be required

    group means \(\mu_i\)
    pre-specified coefficients \(c_i\)
    sample sizes \(n_i\)
    total sample size \(N\)


    \(\sigma_{contrast} = \frac{|\sum{\mu_i * c_i}|}{\sqrt{N \sum_i^k c_i^2 / n_i}}\)





    43 Multiple Groups: Contrasts (continued)






    Solution for multiple groups contrasts






    44 Multiple Factors


    Solution for multiple factors






    45 Repeated Measures






    46 Repeated Measures Within






    Solution for repeated measures within






    47 Repeated Measures Between





    Solution for repeated measures between






    48 Repeated Measures Interaction Within x Between









    Solution for repeated measures interaction within x between

    49 Correlations






    Solution for correlations









    50 Proportions






    Solution for proportions






    51 Exercise proportions


    Solution for proportions


    52 Dependent Proportions






    Solution for dependent proportions






    53 Not Included


    54 Simulation Example t-test


    gr <- rep(c('T','C'),64)
    y <- ifelse(gr=='C',0,2)
    dta <- data.frame(y=y,X=gr)
    cutoff <- qt(.025,nrow(dta))
     
    my_sim_function <- function(){
        dta$y <- dta$y+rnorm(length(dta$X),0,4)     # generate (with sd=4)
        res <- t.test(data=dta,y~X)                 # analyze
        c(res$estimate %*% c(-1,1),res$statistic,res$p.value)
    }
    sims <- replicate(10000,my_sim_function())      # many iterations
    dimnames(sims)[[1]] <- c('diff','t.stat','p.val')
    
    mean(sims['p.val',] < .05)  # p-values  0.8029
    mean(sims['t.stat',] < cutoff)  # t-statistics 0.8029
    mean(sims['diff',] > sd(sims['diff',])*cutoff*(-1)) # differences 0.8024





    55 Focus / Simplify


    56 Conclusion








    Methodological and statistical support to help make a difference

    website: https://www.icds.be/ includes information on who we serve, and how

    booking: https://www.icds.be/consulting/ for individual consultations