Statistics
General information
These guidelines have been written for the benefit of sound scientific work and to help authors prepare their manuscripts in accordance with good statistical standards. The guidelines are applicable to retrospective clinical studies as well as to experimental studies, RCTs and epidemiological studies. However, all aspects are not equally important for all types of studies. For instance, RCTs typically include a given number of patients based on calculations of statistical power. In exploratory studies, the number of units studied may be based on other considerations but may still be justified.
The following general principles should also be followed:
- The investigator should ensure that his data are of high quality.
- All data should also be stored and retrievable at request.
- The use of a statistical method presupposes appropriate knowledge and understanding.
- Presentation of statistical results should focus on their clinical, not statistical, importance.
Statistical analyses are closely related to the design and activities of the research itself. However, the guidelines do not address the issues related to the design and conduct of research. Instead, we refer readers to the EQUATOR Network website (www.equator-network.org) where guidelines for reporting specific research designs can be found (e.g. CONSORT, STROBE, STARD, and PRISMA) These guidelines for reporting methodologies all include items on reporting statistics, but the guidelines presented here are more specific and complement, not duplicate, those in the methodology guidelines.
The following aspects of the results section needs special attention:
- Participant flow (i.e., a Figure: a diagram illustrating study flow and attrition).
- Baseline characteristics (i.e., a Table format reporting descriptive statistics for all participants in the intention-to-infer from population).
- Main findings illustrated (i.e., illustration of the primary findings based on the prespecified objectives rather than chance findings [i.e., not based on significant “P values”]).
- Main analyses on the primary and key secondary objectives (i.e., Table(s) reporting statistical measures for each group and difference between them [with 95% confidence intervals]).
- Handling of missing data: missing data is unavoidable in epidemiological and clinical research and must be handled and explained otherwise it could undermine the credibility and validity of the research results.
- Multiplicity issues: Most manuscripts include and rely on more than 1 set of 95% confidence intervals and P values. However, performing multiple statistical significance tests increase the chance of false-positive test results. When a single statistical test is performed at a 5% significance level, there is just a 5% chance of a false-positive result, but if repeated tests are performed, each at a 5% significance level, a false-positive test result can be expected. Problems related to this inflation of the significance level are known as multiplicity issues, which need to be acknowledged in the interpretation of the research findings.
Statistical report
All authors considering submitting a manuscript to the Acta Orthopaedica should consult our statistical guidance paper:
Christensen R, Ranstam J, Overgaard S, Wagner P. Guidelines for a structured manuscript: Statistical methods and reporting in biomedical research journals. Acta Orthop. 2023 May 10;94:243-249. doi: 10.2340/17453674.2023.11656. PMID: 37170796
Link: https://actaorthop.org/actao/article/view/11656
All statistical methods should be clearly specified and, when necessary, (for unusual methods) referenced; for every statistical result, the method used should be clearly described. The fulfillment of the assumptions underlying the statistical methods is an important issue to address. No data should be removed, imputed, weighted, adjusted or trimmed unless this is clearly described and justified, and its consequences are explained. All statistical methods should be clearly specified and—when unusual methods are necessary—referenced. For every statistical result, the method used for deriving it should be clearly described. It is also important to address in sufficient detail the assumptions underlying the statistical methods used. No data should be removed, imputed, weighted, adjusted, or trimmed without clearly describing and justifying why and explaining the subsequent effects (i.e., see sensitivity analyses). Independent of research design, there are several important principles that apply when reporting results in a respected journal. Unfortunately, some authors choose which data to report and how to report it after too many analyses have been performed. This choice will cause such “creatively minded” authors to deselect the outcomes and analyses that would not fit into what they feel constitutes a “significant manuscript.”
Descriptive Statistics: Descriptive statistics form an indispensable part of medical research manuscripts. Suitable tables should clearly describe the important features of the collected outcome variables and of the key prognostic and demographic variables. The results of the main analyses relating to the objectives of the study should be clearly described and presented, with descriptive statistics detailing both the central tendency and measures of dispersion (spread) of the data. We use means and standard deviations, or medians and interquartile ranges, as well as counts and proportions to inform the reader regarding the distribution of observations in variables for analysis and reporting.
Statistical Tests: The relation between the studied hypothesis and the presented results from null hypothesis testing (P values) should be clearly explained in the manuscript. The tests should be used with a defined effect size (e.g., estimating treatment effects), and the estimation uncertainty (usually via a confidence interval) should be considered in the results presentation. Unless the use of 1-sided tests is specifically justified (and performed at half the alpha level), the tests should be 2-sided. Authors should present P values with real numbers if these are greater than 0.001, using one digit except zeros. Otherwise, they should use “P < 0.001”. Authors should not use “ns,” “P > 0.05,” or asterisks. We recommend that authors present analysis results with 95% confidence intervals instead of P values.
Confidence intervals is preferable to statistical tests: We recommend that authors present analysis results with 95% confidence intervals instead of P values. Authors who wish to publish a manuscript with statistical tests must comply with 2 Acta Orthopaedica principles for concluding whether scientifically important differences exist:
- A statistically non-significant test is not sufficient to claim “no difference.” To show “no difference,” a smallest clinically relevant size of the difference (it might be 0) must be defined. If all clinically relevant differences are excluded from the difference’s confidence interval, a “no difference” or similarity/comparability conclusion is reasonable.
- A statistically significant test does not necessarily imply a clinically important difference. The importance of the tested null hypothesis depends on the smallest clinically relevant difference that should be defined a priori. If the difference’s confidence interval excludes all clinically irrelevant differences, a conclusion concerning the existence of a clinically important difference is reasonable.
Missing Data: We encourage authors to recognize the importance of missing data—to embrace this issue and discuss (as part of the Results and Discussion section) how missing data affect the clinical findings. Missing data is unavoidable, but its potential to undermine the validity of research results is frequently ignored in the medical literature.
Multiple statistical tests: Most manuscripts include and rely on more than 1 set of 95% confidence intervals and P values. However, performing multiple statistical significance tests increase the chance of false-positive test results. When a single statistical test is performed at a 5% significance level, there is just a 5% chance of a false-positive result, but if repeated tests are performed, each at a 5% significance level, a false-positive test result can be expected. Problems related to this inflation of the significance level are known as multiplicity issues, which need to be acknowledged in the interpretation of the research findings.
Analyzing repeated measurements: Repeated measurements on the same participant are correlated and not statistically independent, so a statistical method allowing correlated observations should be used (e.g., when analyzing repeated measurements using mixed-effects models). A possible alternative would be to summarize all values from each participant into an individual estimate of a clinically relevant entity (e.g., the magnitude of a peak value, area under a curve, doubling time, etc.) and then use these estimates as input in an analysis with only one observation per participant. Again, when multiple null hypotheses are tested with the aim of confirming a prespecified hypothesis, care should be taken to avoid spurious significance by using techniques for simultaneous inference. Pre-specification is, however, necessary for confirmation. Again, the use of techniques for simultaneous inference without a prespecified null hypothesis should be explained and have a clear, valid purpose.
Publications on Biostatistics in Acta Orthopaedica
Problems in orthopedic research: dependent observations. Ranstam J.Acta Orthop Scand. 2002 Aug;73(4):447-50.
P-values in research reports. Ranstam J.Acta Orthop. 2005 Jun;76(3):289-90
Statistical analysis of arthroplasty data. I. Introduction and background. Ranstam J, Kärrholm J, Pulkkinen P, Mäkelä K, Espehaug B, Pedersen AB, Mehnert F, Furnes O; NARA study group. Acta Orthop. 2011 Jun;82(3):253-7.
Statistical analysis of arthroplasty data. II. Guidelines. Ranstam J, Kärrholm J, Pulkkinen P, Mäkelä K, Espehaug B, Pedersen AB, Mehnert F, Furnes O; NARA study group. Acta Orthop. 2011 Jun;82(3):258-67.
The importance of clear language. Ranstam J.Acta Orthop. 2013 Oct;84(5):443
The Cox model is better than the Fine and Gray model when estimating relative revision risks from arthroplasty register data. Ranstam J, Robertsson O. Acta Orthop. 2017 Dec;88(6):578-580.
Are competing risks models appropriate to describe implant failure? Ranstam J.Acta Orthop. 2018 Jun;89(3):253.
Hypothesis-generating and confirmatory studies, Bonferroni correction, and pre-specification of trial endpoints. Ranstam J.Acta Orthop. 2019 Aug;90(4):297.
Time to restrict the use of p-values in Acta Orthopaedica. Ranstam J.Acta Orthop. 2019 Feb;90(1):1-2.
Systematic reviews, meta-analyses, randomized trials, and observational studies. Ranstam J, Wagner P.Acta Orthop. 2022 Jan 3;93:1-2.
The Cox model is better than the Fine and Gray model when estimating relative revision risks from arthroplasty register data. Ranstam J, Robertsson O.Acta Orthop. 2017 Dec;88(6):578-580.
"There was no difference (p = 0.079)". Ranstam J.Acta Orthop. 2021 Aug;92(4):371-372.