Current draft (Mar 29, 2021) aims to introduce researchers to the key ideas in research methodology that would help them plan their study and write a research proposal. Our target audience is primarily the research community at VUB / UZ Brussel, those applying for funding at the WFWG in particular.


Note that we present our view, suitable for communicating research at VUB / UZ Brussel, not necessarily outside. Therefore, what we present should only be used for guidance, not as an argument or proof of any kind.


We invite you to help us improve this document by sending us feedback
or anonymously at icds.be/consulting (right side, bottom)


01 Methodology and Statistics: Research Proposal





02 Key Ingredients


outline: key ingredients and main components




03 Research Aim: first key ingredient


  • research aim: concisely express what the study intends to achieve
    • frame in terms of the essence, avoid unnecessary (technical) details
    • be specific, go beyond very general statements
    • operationalize, relate to empirical evidence

  • focus, highlight research questions of primary interest
    • argue what the results should be -at a minimum- for it to be a successful study
    • comment on additional gains the study could offer

  • example: The aim is to show that the new treatment P is not worse than the common treatment Q, and we consider our study a success when the average score on measurement Y, with values expected between 16 and 64, is maximally 10% less for P. It will further be explored to what extent patient characteristics X could further explain the scores Y in both treatments.
    • detailed statement instead of ‘investigate’ treatment
    • linked to the empirical evidence
    • focus, not a vague list of measurements

04 Categorizations of Research Aims


  • the type of research aim determines how to deal with it (properties and requirements)
  • various categorizations can be considered, for example:
    • confirmatory / exploratory / preparatory / technological
    • quantitative / qualitative
    • inferential / descriptive

  • note: type labels are informal, to be used for guidance only
  • typically, studies tend to be valued more if
    • they build on an understanding of future results (without being certain)
    • they offer an future understanding beyond the study (wider / deeper)
    • thus: if you can, frame it as such

05 Descriptive versus Inferential Research


  • inferential, study a population using a sample, implies generalization
    • (ideally) uses representative samples, large enough, randomly sampled
    • more ambitious, thus difficult → guarantee inference is possible
      • relates to statistical testing, estimation, prediction
  • descriptive, study the observed data as such, no generalization
    • present data -as is- without reference to uncertainty nor p-values
    • easy to perform → argue that data in itself is of interest
      • relates to summaries (average, median, …, correlation) and values
  • note: inferential studies typically imply descriptive preliminary analysis



06 Quantitative versus Qualitative Research


  • quantitative research addresses quantifiable empirical aspects
    • typically makes use of visualization and statistics to summarize and generalize
    • can be descriptive and/or inferential
    • typically aims to reduce complexity (operationalization before data analysis)
    • main focus is summarizing and generalization → determines how to argues for it
  • qualitative research addresses understanding
    • especially focused on reasons, opinions, motivations, …
    • is descriptive and can be hypothesis generating (inductive)
    • typically embraces complexity
    • main focus is interpretation / understanding ~ meaning → determines how to argues for it
  • mixed methods combines both

  • rarely pure one or other
  • note: leaving out statistics does not make it a qualitative study
  • note: asking respondents questions does not make it qualitative




07 Objectives: confirmatory, explanatory, preparatory, techn(olog)ical



08 Confirmatory Research Aims


  • goal:: confirm an expected difference, relation, …
  • means:: in advance specify results -at a minimum- to support claim
    • aim for
      • significant difference (superiority / non-inferiority) → statistical test
      • equivalence given a margin of error → statistical test
        • absence of evidence is not evidence of absence
      • accurate estimate → statistical estimation
      • expected observations → falsifying alternative hypotheses
      • …

  • focus::
    • explain (statistical) link research design and (especially) primary aim
    • for a statistical test or estimation → calculate sample size using standard error
      • make assumptions and conditions explicit: effect size, statistics, type I and II errors
      • conclude on required sample size: costs & availability



09 Exploratory Research Aims


  • goal:: explore observations, possible differences, relations, …
    • without any guarantee on what will be the results
  • means:: in advance specify results -at a minimum- to support merit
    (no reference to significance or accuracy)
    • interest in data as such (descriptive)
      • no -primary- interest in statistical testing or estimation
    • interest in parameter estimates (differences, relations, …)
      • aim for statistical testing or estimation but
        • no guarantee sufficient power and/or accuracy
        • maybe try ‘upgrading’ it to confirmatory research ? → decide on effect of interest
    • interest in prediction
      • aim for cross-validation (does not include standard errors)
        • open question how to argue sample size, refer to similar research / common practice
        • possible to justify afterwards using bootstrapping → discuss criteria
    • qualitative research

  • focus::
    • argue why the data or parameter estimates by themselves are of interest
      • even if not significant/inaccurate
      • or why significance/accuracy is likely
        • based on similar existing research
      • put more weight on substantive arguments
    • sample size -justification- with balance information and cost
      • argue that merit outweighs cost
      • low cost of collection, or data already available (eg., retrospective)
    • still explain (statistical) link research design and potential inferences



10 Preparatory Research Aims


  • goal:: prepare for a future study… typically a small scale set-up
  • means:: in advance specify what information is required for a future study and how it will be obtained
    • pilot study
      • aim to successfully set up future study
    • phase I and II clinical designs
      • aim to avoid risk, harm, … future study
      • requires decision criteria to proceed or not
    • database development, data collection procedures, …

  • focus::
    • argue based on information required for future (actual) study
      • explain how unavailable information is obtained
      • argument could be (partially) qualitative, descriptive, …
        • example: understand instructions, register observations, …
    • results are not by themselves of interest
      • no statistical testing is implied, that is for future (actual) studies
    • sample size -justification- based on an absolute minimal cost
      • for example, with animal experiments typically 3 animals per condition
        • allow the estimation of variance
      • low cost of collection, or data already available (eg., retrospective)



11 Techn(olog)ical advancements


  • goal:: to design, engineer, create, … not to extract information from the outside world
  • means:: argue merit of the final product, rarely any statistics involved
  • focus::
    • argument based on what the advancement offers, in balance with the costs
    • no statistical justification, and that is alright !!



12 Research Design: second key ingredient


  • research design: strategy to achieve the research aim
    • effective (question can be answered) and efficient (with acceptable/minimal costs)
    • determines how (potential) observations provide information on research aim
      • a poor design makes a study inefficient at best, completely ineffective at worst
      • statistics can not solve design problems
      • example: study different treatment effects with fixed order ~ confounding


  • three types of design attributes
    • quantity of observations (sample size)
    • quality of observations, dependent on
      • what is observed
      • how it is observed (method)
      • under which conditions it is observed (~ variables)
    • generalizability (from sample to population), dependent on
      • selection/allocation of research units
      • missing data mechanism

  • focus, highlight the good choices when relevant
    • show what is done right, that you have it under control
    • discuss in relation to relevance, maybe name-drop
    • do not dwell on what you (may) fail to deal with


13 Quantity of Observations


  • collect enough relevant observations
    • more observations are more informative, ensure that sufficient (effective)
      • more observations required if they are less informative by themselves (see quality)
    • more observations may be more costly, ensure not too many (efficient)
      • remember, costs in terms of time, money, risk, stress, …

  • justify quantity of observations
    • typically only required for the primary research questions, and/or costly observations
    • for confirmatory research, perform sample size -calculation-
      • specify statistical test(s) or estimation(s) in focus
        (implied by what -at a minimum- ensures a successful study).
      • specify effect size aimed for, justify both the effect and the uncertainty
        • effect: ideally justified substantively or at least referring to common practice or literature
        • uncertainty: variance of measurement, ideally based on earlier data/research or pilot
        • use rules of thumb when all reasoning fails, only
      • specify operational characteristics (type I error \(\alpha\) and type II error \(\beta\) are related)
        • note: an \(\alpha\) of .05 and power of .8 (\(\beta\) = .2) implies type I error 4 times more severe
    • for exploratory / preparatory research this includes a sample size -justification-
      • feasibility and/or low cost
      • minimum requirement for a first impression
      • similar / equivalent research



14 Quality of Observations


  • collect informative observations (especially important if relatively few)
    • observations made with most informative method
      • validity & reliability
      • example stress:
        • self rating vs. neuro-endocrine vs. …
        • memory vs. now vs. imagined situation vs. …
    • observations made under the most informative conditions
      • link with research question
      • include suspected confounders
      • example stress:
        • short/long term difference - evolution
        • lab - naturalistic ?

  • general principle: isolate the effect, avoid / measure unwanted influences
    • control confounding variables (~ complete the model)
    • maximize systematic variability (~ explained variance)
    • minimize non-systematic variability (~ unexplained variance)

  • highlight how quality is ensured, lack of quality is avoided
    • show you have it under control, you know why you set up your study this way
    • be specific, relate to your research design and aim
    • do not highlight weaknesses (too strongly)



15 Quality of Observations — Confounders


  • explain control of confounding (unwanted outside influence)
    • balance out confounding
      • randomization (large enough sample size)
      • stratified randomization
    • measure confounding
      • repeated measures (~ conditioning)
      • cross-over designs (~ conditioning)
      • blocking, keep variances sources stable and estimate (~mini-experiments)
    • avoid confounding
      • matching, create similar groups to compare
      • (double-) blinding, avoid influence experimenter/experimented
    • and more …

  • often more complex designs are more efficient but also more complex to analyze
    • eg., mixed models to deal with repeated measures



16 Quality of Observations — Non-Systematic Variability


  • variability that is not understood ~ uncertainty
  • explain minimization of non-systematic variability
    • use of proper measurement tools
      • ensure reliability / precision to avoid noisy measurements
      • combine measurement tools
      • repeat observations (replications) to average out noisy measurements
      • use tools that are well understood/studied
    • use of all information available
      • include all relevant predictors
        • not only focus on main variables of interest (~ model)
        • potentially consider combinations (interaction, polynomial, …)
      • avoid the use of categories when continuous registrations are available
      • consider (multiple) imputation when missingness is modest



17 Quality of Observations — Systematic Variability


  • variability that is understood ~ information
  • explain maximization of systematic variability
    • maximally differentiate conditions
      • DoE, design of experiments
      • example: detect change, effectiveness, …
    • largely implies minimization of non-systematic variability
      • ~ appropriate and sufficiently rich model for the observations



18 Quality of Observations — Experimental Control


  • safeguard quality of information easier if (some) control on conditions
    • experimental study, exerts control by definition
      • choose the conditions of observation
      • randomize the allocation to conditions
      • necessary condition for causal conclusions
    • observational study, does not exert control, efficiency loss at best
      • increase quantity to compensate loss in quality
      • includes naturalistic data collections, surveys, retrospective data collection, …

  • highlight what you can control, and how you use that control

19 Generalizability


  • collect a sample of observations with the aim to generalize (inference)
    • conclude upon more than just the observed sample → avoid biased sampling
    • type of sampling
      • probabilistic (sampling: random, stratified, multi-stage, ….)
        • necessary for conclusive, unbiased, objective inferences
        • conclude upon population sampled from
      • non-probabilistic (sampling: diversity, expert, ….)
        • potentially biasing, more subjective, and best used only exploratory, descriptive
          • argue why not biasing in your case
          • no issue for qualitative studies
  • missing data: safeguard against / remediate
    • generalization fails when substantial non-random missing data
      • depends on mechanism of missingness
        • problem if not at random → biased
        • results in loss of precision
    • explain how it is avoided and dealt with
      • how many missing values to expect, for what reason
      • what to do to further minimize this number
        • improve data collection
        • obtain information to deal with missingness

20 Statistics


  • link between research design and research aim
    • highlight how given a design, statistics is able to resolve the aims
      • focus on primary research questions
      • sketch of secondary research questions → maybe do some name-dropping
    • reflect on the type of data and challenges they offer
      • continuous/ordinal/nominal
      • skewed, outliers, boundary values, …
      • but avoid being vague or to give a lecture statistics 101
  • introduce intended statistical analysis
    • include
      • statistical tests (evaluate whether effects exist; p-values)
      • statistical estimation (evaluate confidence interval)
      • prediction (evaluate model fit using individual observations)
    • specify the expected inferences, be specific
  • note: sample size calculations are not part of statistics
    • often based on simplified statistical testing/estimation
    • typically not discussed separately



21 Small Example


The aim is to show that the proposed treatment is an improvement over the current standard method, as it will show in higher scores on my measurement.

Participants are randomized to either the treatment or control group. The control group is given a dummy treatment of which a post experiment survey addresses whether participants were aware. Each participant is measured twice, once immediately before and once immediately after the (dummy) treatment was administered. Each measurement results in a continuous score on a 0-10 scale. A mixed model compares the change between post and pre treatment measurement while accounting for pre-treatment scores. Also the following possible confounders will be included: …

A sample size was derived for a t-test that focuses on the detection of a post-treatment difference of 2 in favor of the treatment. The minimal clinical difference was decided upon by our expert panel. In literature, the standard method is indicated to have a population standard deviation of about 4. Because no information is available on the new treatment it is assumed that the same population standard deviation applies. This leads to a sample size of 51 patients in each of both groups, required for a one-sided test, type I error of .05 and power of .8. Earlier experiments showed a drop-out of about 10%, so 51 * (100/(100-10)) < 57 patients are included per group. Because the more advanced statistical test and the inclusion of various potential confounders, this number can be considered conservative but realistic. This number will also present no difficulties to collect in our center.



22 Some Practical Suggestions


  • isolate methodological / statistical arguments from substantive reasoning

  • use concise and consistent labeling of data and conditions

  • visualize and structure wherever possible

    • data collection process

    • categories of observations and their relation

    • a text a statistical referee does not want to read

      We will compare the effect of [treatment] on [condition] by randomizing mice over groups with [description of procedure the statistician does not need to know about] and groups with [description of another procedure the statistician does not need to know about ]. Each procedure will be performed with 2 [substances] and each at 3 different doses, namely [dosages].
      We expect procedure A with substance 1, respectively substance 2 to achieve on average 13 units, respectively 15 units and a standard deviation of 1, respectively 1.5 at the lowest dosis, with a 10% increase in both mean and standard deviation at the second dosis and 20% increase for the third. For procedure B with substance 1, respectively substance 2 to achieve on average 11 units, respectively 13 units and a standard deviation of 1, respectively 1.3 at the lowest dosis, with a 10% increase in both mean and standard deviation at the second dosis and 20% increase for the third. We will also include a control group which achieves an average of 6 and standard deviation 0.5, where we expect dosis to have no effect.

    • could be turned into

    • experiments can also be visualized, eg., time-line

    • design:
      tables to list conditions between (rows) and within (columns)




23 Conclusion


  • be clear on your aim and how to reach it with an appropriate design
  • highlight what is essential for your type of study
  • highlight the good choices that you have made
  • focus on what deserves focus
  • use a language understood by the relevant referee
    • statistical lingo
    • visualization / structure






Methodological and statistical support to help make a difference

website: https://www.icds.be/ includes information on who we serve, and how

booking: https://www.icds.be/consulting/ for individual consultations