Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

The Beginner's Guide to Statistical Analysis | 5 Steps & Examples

Statistical analysis means investigating trends, patterns, and relationships using quantitative data . It is an important research tool used by scientists, governments, businesses, and other organizations.

To draw valid conclusions, statistical analysis requires careful planning from the very start of the research process . You need to specify your hypotheses and make decisions about your research design, sample size, and sampling procedure.

After collecting data from your sample, you can organize and summarize the data using descriptive statistics . Then, you can use inferential statistics to formally test hypotheses and make estimates about the population. Finally, you can interpret and generalize your findings.

This article is a practical introduction to statistical analysis for students and researchers. We’ll walk you through the steps using two research examples. The first investigates a potential cause-and-effect relationship, while the second investigates a potential correlation between variables.

Table of contents

Step 1: write your hypotheses and plan your research design, step 2: collect data from a sample, step 3: summarize your data with descriptive statistics, step 4: test hypotheses or make estimates with inferential statistics, step 5: interpret your results.

To collect valid data for statistical analysis, you first need to specify your hypotheses and plan out your research design.

Writing statistical hypotheses

The goal of research is often to investigate a relationship between variables within a population . You start with a prediction, and use statistical analysis to test that prediction.

A statistical hypothesis is a formal way of writing a prediction about a population. Every research prediction is rephrased into null and alternative hypotheses that can be tested using sample data.

While the null hypothesis always predicts no effect or no relationship between variables, the alternative hypothesis states your research prediction of an effect or relationship.

Planning your research design

A research design is your overall strategy for data collection and analysis. It determines the statistical tests you can use to test your hypothesis later on.

First, decide whether your research will use a descriptive, correlational, or experimental design. Experiments directly influence variables, whereas descriptive and correlational studies only measure variables.

Your research design also concerns whether you’ll compare participants at the group level or individual level, or both.

First, you’ll take baseline test scores from participants. Then, your participants will undergo a 5-minute meditation exercise. Finally, you’ll record participants’ scores from a second math test.

In this experiment, the independent variable is the 5-minute meditation exercise, and the dependent variable is the math test score from before and after the intervention. Example: Correlational research design In a correlational study, you test whether there is a relationship between parental income and GPA in graduating college students. To collect your data, you will ask participants to fill in a survey and self-report their parents’ incomes and their own GPA.

Measuring variables

When planning a research design, you should operationalize your variables and decide exactly how you will measure them.

For statistical analysis, it’s important to consider the level of measurement of your variables, which tells you what kind of data they contain:

Many variables can be measured at different levels of precision. For example, age data can be quantitative (8 years old) or categorical (young). If a variable is coded numerically (e.g., level of agreement from 1–5), it doesn’t automatically mean that it’s quantitative instead of categorical.

Identifying the measurement level is important for choosing appropriate statistics and hypothesis tests. For example, you can calculate a mean score with quantitative data, but not with categorical data.

In a research study, along with measures of your variables of interest, you’ll often collect data on relevant participant characteristics.

Population vs sample

In most cases, it’s too difficult or expensive to collect data from every member of the population you’re interested in studying. Instead, you’ll collect data from a sample.

Statistical analysis allows you to apply your findings beyond your own sample as long as you use appropriate sampling procedures . You should aim for a sample that is representative of the population.

Sampling for statistical analysis

There are two main approaches to selecting a sample.

In theory, for highly generalizable findings, you should use a probability sampling method. Random selection reduces several types of research bias , like sampling bias , and ensures that data from your sample is actually typical of the population. Parametric tests can be used to make strong statistical inferences when data are collected using probability sampling.

But in practice, it’s rarely possible to gather the ideal sample. While non-probability samples are more likely to at risk for biases like self-selection bias , they are much easier to recruit and collect data from. Non-parametric tests are more appropriate for non-probability samples, but they result in weaker inferences about the population.

If you want to use parametric tests for non-probability samples, you have to make the case that:

Keep in mind that external validity means that you can only generalize your conclusions to others who share the characteristics of your sample. For instance, results from Western, Educated, Industrialized, Rich and Democratic samples (e.g., college students in the US) aren’t automatically applicable to all non-WEIRD populations.

If you apply parametric tests to data from non-probability samples, be sure to elaborate on the limitations of how far your results can be generalized in your discussion section .

Create an appropriate sampling procedure

Based on the resources available for your research, decide on how you’ll recruit participants.

Your participants are self-selected by their schools. Although you’re using a non-probability sample, you aim for a diverse and representative sample. Example: Sampling (correlational study) Your main population of interest is male college students in the US. Using social media advertising, you recruit senior-year male college students from a smaller subpopulation: seven universities in the Boston area.

Calculate sufficient sample size

Before recruiting participants, decide on your sample size either by looking at other studies in your field or using statistics. A sample that’s too small may be unrepresentative of the sample, while a sample that’s too large will be more costly than necessary.

There are many sample size calculators online. Different formulas are used depending on whether you have subgroups or how rigorous your study should be (e.g., in clinical research). As a rule of thumb, a minimum of 30 units or more per subgroup is necessary.

To use these calculators, you have to understand and input these key components:

Prevent plagiarism. Run a free check.

Once you’ve collected all of your data, you can inspect them and calculate descriptive statistics that summarize them.

Inspect your data

There are various ways to inspect your data, including the following:

By visualizing your data in tables and graphs, you can assess whether your data follow a skewed or normal distribution and whether there are any outliers or missing data.

A normal distribution means that your data are symmetrically distributed around a center where most values lie, with the values tapering off at the tail ends.

Mean, median, mode, and standard deviation in a normal distribution

In contrast, a skewed distribution is asymmetric and has more values on one end than the other. The shape of the distribution is important to keep in mind because only some descriptive statistics should be used with skewed distributions.

Extreme outliers can also produce misleading statistics, so you may need a systematic approach to dealing with these values.

Calculate measures of central tendency

Measures of central tendency describe where most of the values in a data set lie. Three main measures of central tendency are often reported:

However, depending on the shape of the distribution and level of measurement, only one or two of these measures may be appropriate. For example, many demographic characteristics can only be described using the mode or proportions, while a variable like reaction time may not have a mode at all.

Calculate measures of variability

Measures of variability tell you how spread out the values in a data set are. Four main measures of variability are often reported:

Once again, the shape of the distribution and level of measurement should guide your choice of variability statistics. The interquartile range is the best measure for skewed distributions, while standard deviation and variance provide the best information for normal distributions.

Using your table, you should check whether the units of the descriptive statistics are comparable for pretest and posttest scores. For example, are the variance levels similar across the groups? Are there any extreme values? If there are, you may need to identify and remove extreme outliers in your data set or transform your data before performing a statistical test.

From this table, we can see that the mean score increased after the meditation exercise, and the variances of the two scores are comparable. Next, we can perform a statistical test to find out if this improvement in test scores is statistically significant in the population. Example: Descriptive statistics (correlational study) After collecting data from 653 students, you tabulate descriptive statistics for annual parental income and GPA.

It’s important to check whether you have a broad range of data points. If you don’t, your data may be skewed towards some groups more than others (e.g., high academic achievers), and only limited inferences can be made about a relationship.

A number that describes a sample is called a statistic , while a number describing a population is called a parameter . Using inferential statistics , you can make conclusions about population parameters based on sample statistics.

Researchers often use two main methods (simultaneously) to make inferences in statistics.

You can make two types of estimates of population parameters from sample statistics:

If your aim is to infer and report population characteristics from sample data, it’s best to use both point and interval estimates in your paper.

You can consider a sample statistic a point estimate for the population parameter when you have a representative sample (e.g., in a wide public opinion poll, the proportion of a sample that supports the current government is taken as the population proportion of government supporters).

There’s always error involved in estimation, so you should also provide a confidence interval as an interval estimate to show the variability around a point estimate.

A confidence interval uses the standard error and the z score from the standard normal distribution to convey where you’d generally expect to find the population parameter most of the time.

Hypothesis testing

Using data from a sample, you can test hypotheses about relationships between variables in the population. Hypothesis testing starts with the assumption that the null hypothesis is true in the population, and you use statistical tests to assess whether the null hypothesis can be rejected or not.

Statistical tests determine where your sample data would lie on an expected distribution of sample data if the null hypothesis were true. These tests give two main outputs:

Statistical tests come in three main varieties:

Your choice of statistical test depends on your research questions, research design, sampling method, and data characteristics.

Parametric tests

Parametric tests make powerful inferences about the population based on sample data. But to use them, some assumptions must be met, and only some types of variables can be used. If your data violate these assumptions, you can perform appropriate data transformations or use alternative non-parametric tests instead.

A regression models the extent to which changes in a predictor variable results in changes in outcome variable(s).

Comparison tests usually compare the means of groups. These may be the means of different groups within a sample (e.g., a treatment and control group), the means of one sample group taken at different times (e.g., pretest and posttest scores), or a sample mean and a population mean.

The z and t tests have subtypes based on the number and types of samples and the hypotheses:

The only parametric correlation test is Pearson’s r . The correlation coefficient ( r ) tells you the strength of a linear relationship between two quantitative variables.

However, to test whether the correlation in the sample is strong enough to be important in the population, you also need to perform a significance test of the correlation coefficient, usually a t test, to obtain a p value. This test uses your sample size to calculate how much the correlation coefficient differs from zero in the population.

You use a dependent-samples, one-tailed t test to assess whether the meditation exercise significantly improved math test scores. The test gives you:

Although Pearson’s r is a test statistic, it doesn’t tell you anything about how significant the correlation is in the population. You also need to test whether this sample correlation coefficient is large enough to demonstrate a correlation in the population.

A t test can also determine how significantly a correlation coefficient differs from zero based on sample size. Since you expect a positive correlation between parental income and GPA, you use a one-sample, one-tailed t test. The t test gives you:

The final step of statistical analysis is interpreting your results.

Statistical significance

In hypothesis testing, statistical significance is the main criterion for forming conclusions. You compare your p value to a set significance level (usually 0.05) to decide whether your results are statistically significant or non-significant.

Statistically significant results are considered unlikely to have arisen solely due to chance. There is only a very low chance of such a result occurring if the null hypothesis is true in the population.

This means that you believe the meditation intervention, rather than random factors, directly caused the increase in test scores. Example: Interpret your results (correlational study) You compare your p value of 0.001 to your significance threshold of 0.05. With a p value under this threshold, you can reject the null hypothesis. This indicates a statistically significant correlation between parental income and GPA in male college students.

Note that correlation doesn’t always mean causation, because there are often many underlying factors contributing to a complex variable like GPA. Even if one variable is related to another, this may be because of a third variable influencing both of them, or indirect links between the two variables.

Effect size

A statistically significant result doesn’t necessarily mean that there are important real life applications or clinical outcomes for a finding.

In contrast, the effect size indicates the practical significance of your results. It’s important to report effect sizes along with your inferential statistics for a complete picture of your results. You should also report interval estimates of effect sizes if you’re writing an APA style paper .

With a Cohen’s d of 0.72, there’s medium to high practical significance to your finding that the meditation exercise improved test scores. Example: Effect size (correlational study) To determine the effect size of the correlation coefficient, you compare your Pearson’s r value to Cohen’s effect size criteria.

Decision errors

Type I and Type II errors are mistakes made in research conclusions. A Type I error means rejecting the null hypothesis when it’s actually true, while a Type II error means failing to reject the null hypothesis when it’s false.

You can aim to minimize the risk of these errors by selecting an optimal significance level and ensuring high power . However, there’s a trade-off between the two errors, so a fine balance is necessary.

Frequentist versus Bayesian statistics

Traditionally, frequentist statistics emphasizes null hypothesis significance testing and always starts with the assumption of a true null hypothesis.

However, Bayesian statistics has grown in popularity as an alternative approach in the last few decades. In this approach, you use previous research to continually update your hypotheses based on your expectations and observations.

Bayes factor compares the relative strength of evidence for the null versus the alternative hypothesis rather than making a conclusion about rejecting the null hypothesis or not.

Is this article helpful?

Other students also liked.

More interesting articles

What is your plagiarism score?

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Language: English | Croatian

Practical recommendations for statistical analysis and data presentation in Biochemia Medica journal

The aim of this article is to highlight practical recommendations based on our experience as reviewers and journal editors and refer to some most common mistakes in manuscripts submitted to Biochemia Medica . One of the most important parts of the article is the Abstract . Authors quite often forget that Abstract is sometimes the first (and only) part of the article read by the readers. The article Abstract must therefore be comprehensive and provide key results of your work. Problematic part of the article, also often neglected by authors is the subheading Statistical analysis , within Materials and methods , where authors must explain which statistical tests were used in their data analysis and the rationale for using those tests. They also need to make sure that all tests used are listed under Statistical analysis section, as well as that all tests listed are indeed used in the study. When writing Results section there are several key points to keep in mind, such as: are results presented with adequate precision and accurately; is descriptive analysis appropriate; is the measure of confidence provided for all estimates; if necessary and applicable, are correct statistical tests used for analysis; is P value provided for all tests, etc. Especially important is not to make any conclusions on the causal relationship unless the study is an experiment or clinical trial. We believe that the use of the proposed checklist might increase the quality of the submitted work and speed up the peer-review and publication process for published articles.

Sažetak

Ovaj članak donosi praktične preporuke temeljene na našem iskustvu kao recenzenata i urednika Časopisa, te upućuje na neke od najčešćih pogrešaka u rukopisima zaprimljenim u časopis Biochemia Medica . Jedan od najvažnijih dijelova svakog članka jest Sažetak . Autori često zaborave da je Sažetak ponekad prvi (i jedini) dio članka koji će čitatelj pročitati. Stoga on mora biti sadržajan i pružiti ključne rezultate istraživanja. Problematičan dio članka, koji autori također često zapostave, jest odlomak Statistička analiza unutar poglavlja Materijali i metode u kojoj autori trebaju navesti statističke testove primijenjene u analizi podataka te objasniti način izbora tih testova. Autori također trebaju voditi brigu da su svi testovi navedeni u odlomku Statistička analiza doista i primijenjeni u istraživanju, kao i da su svi korišteni testovi zaista navedeni u odlomku Statistička analiza . Kod pisanja poglavlja Rezultati potrebno je obratiti pažnju na nekoliko točaka: jesu li rezultati prikazani odgovarajućom preciznošću i točnošću, je li deskriptivna analiza prikladna, je li mjera pouzdanosti izražena za sve rezultate (u slučaju da je ona potrebna i primjenjiva), jesu li u analizi primijenjeni ispravni statistički testovi, je li za sve testove prikazana P vrijednost itd. Osobito je važno ne stvarati zaključke o uzročno-posljedičnoj vezi, osim u slučaju ako istraživanje predstavlja eksperiment ili klinički pokus. Vjerujemo da bi primjena predloženog Podsjetnika mogla unaprijediti kvalitetu zaprimljenih rukopisa te ubrzati njihovu recenziju i postupak objavljivanja članaka.

Introduction

The Editors at Biochemia Medica are committed to continuously improve the quality of the articles published in the Journal. This may be achieved by helping authors to improve their manuscripts through peer-review process. One of the major problems in manuscripts submitted to Biochemia Medica is the quality of the data analysis and data presentation. The improper use of statistical methods is unethical because it leads to biased results and incorrect conclusions. Moreover, this is a substantial waste of time and money. The most common errors occurring in Biochemia Medica have already been reported in this Journal ( 1 ).

To improve the quality of data analysis and reporting in manuscripts submitted for possible publication, the increasing number of journals have issued statistical guidelines and have also introduced the statistical editors who are responsible for statistical peer-review ( 2 – 4 ).

The aim of this article is to provide practical recommendations for authors who wish to submit their work to Biochemia Medica . It should however be made clear that this article by no means provides a substitute for a comprehensive textbook in biostatistics. On contrary, readers are encouraged to take this only as a reminder and to consult a textbook for a more comprehensive coverage of the issues mentioned in this article.

Are key results included in the Abstract ?

One of the most important parts of the article is the Abstract . Authors quite often forget that Abstract is sometimes the first (and only) part of the article read by the readers. As stated in the Instructions to authors , Abstract for all original articles must be structured into following four headings: Introduction , Materials and methods , Results and Conclusions . Furthermore, Abstract must be comprehensive and provide key results of the study. If not done so already in the Materials and methods section of the Abstract , authors certainly need to make sure that readers are informed about the number and size of the studied groups. All estimates need to be presented with the appropriate summary measures, confidence intervals and P values (where applicable). For all tested differences and associations, the level of significance must be provided.

Below is the example for poorly written Results section of the Abstract :

The concentration of New BioMarker™ in patients with acute myocardial infarction was higher than in healthy controls (P < 0.05). There was a significant correlation of New BioMarker™ with serum copeptine concentrations.

The following is the example for well written Results section of the Abstract:

There were 250 patients with acute myocardial infarction and 232 healthy controls. The concentration of New BioMarker™ was higher in patients than in healthy controls (7.3 ± 0.6 mmol/L vs . 5.4 ± 0.5 mmol/L, respectively; P = 0.002). New BioMarker™ was associated with serum copeptine concentration (r = 0.67, P = 0.026).

Is Statistical analysis section written well, accurate and comprehensive?

Problematic section often neglected by authors is Statistical analysis as the subheading within the section Materials and methods . Within the sub-heading Statistical analysis authors need to explain which statistical tests were used in their data analysis and the rationale for using those tests. Care must be taken to assure that: a) all tests used are listed in the Materials and methods under Statistical analysis , as well as b) that all tests listed are indeed applied in the study. From this section, every reader should be able to understand which test exactly was used for every comparison of the data presented with the Results section. At the end of the Statistical analysis , authors need to state the level of significance applied in their study and statistical program used.

When writing the section Statistical analysis , authors need to make sure to address all issues listed below:

The following is the example for poorly written Statistical analysis subheading of the Materials and methods section:

Statistical analysis.

Data were presented as mean ± standard deviation. Differences were tested by t-test. Pearson correlation was used to analyze the association between all studied parameters. Data analysis was done using MedCalc.

The following is the example for well written Statistical analysis subheading of the Materials and methods section:

The Kolmogorov-Smirnov test was used to assess the normality of distribution of investigated parameters. All parameters in our study were distributed normally. Data were expressed as mean ± standard deviation. Differences were tested by two-tailed t-test. Pearson’s correlation was used to analyze the association between all studied parameters. The values P < 0.05 were considered statistically significant. Statistical analysis was done using MedCalc 12.1.4.0 statistical software (MedCalc Software, Mariakerke, Belgium).

Key points to keep in mind when writing the Results section

The next section that should be carefully inspected prior of submitting, to detect for any possible flaws and errors in data analysis and presentation is Result section.

When results are reported, authors need to make sure that:

This is, unfortunately, not always the case. Authors quite often fail to describe their data with adequate precision and by using the appropriate summary measures. Quite often, it is not clear from the text whether the assumptions for tests were met and have appropriate tests been used in the data analysis. This part of the manuscript is crucial and needs to be written with great attention and care. To help our readers to avoid all possible mistakes below we summarize some key points they need to keep in mind when writing Results section of their manuscripts.

Is the descriptive analysis adequate?

When describing numerical data, it is essential that proper measures of central tendency and dispersion are used. Before presenting the data, normality of the distributions needs to be tested. Generally speaking, if the data are normally distributed and if sample size is ≥ 30, parametric summary measures (mean and standard deviation) may be used. However, if sample size is small (N < 30) or if data are not normally distributed, authors are advised to use median and interquartile range (IQR), from first (Q1) to third quartile (Q3) or some other measures, like range. We wish to point out that there is no uniformly accepted opinion about the cut-off number for the sample size, but according to Dawson and Trapp, samples under 30 subjects per group are considered small and require non-parametric statistics ( 5 ).

Since SEM (standard error of the mean) is not the measure of dispersion, its use is not allowed when summarizing and describing the data. Using SEM instead of standard deviation is one of the ten most common mistakes occurring in the manuscripts submitted to biomedical journals ( 6 ).

More extensive review on the ways of summarizing and interpreting numerical data has recently been published in this journal within the section Lessons in biostatistics ( 7 ) and elsewhere ( 8 ).

Are results presented with adequate precision and accurately?

The golden rule is to present the data with the precision which corresponds to the precision of the raw data obtained by the measurement. For instance, when reporting the number of cigarettes smoked in some studied period, it is completely unnecessary and wrong to state that the number of cigarettes was: 10.21 ± 3.16. This is wrong because the reported precision does not correspond to the precision of the measurement. The number of cigarettes is measured by counting. So, the observed number of cigarettes should be the whole number, without any decimals: 10 ± 3.

Example for the flawed data presentation of the observations is provided in the Table 1a .

The example for erroneously presented results for observations in two groups (groups A and B).

WBC - white blood cells

The problem with data presented in Table 1a is that all three parameters were presented with inadequate precision which does not correspond to the precision of the way those data were measured:

General rules when reporting frequencies are listed below:

Correct way to present data is provided in Table 1b .

The example for correctly presented results for observations in two groups (groups A and B).

When necessary and applicable, the authors need to make sure to provide the measure of confidence and P value for all their estimates. This is especially important when presenting estimates of diagnostic accuracy, odds ratios, relative risks, regression analysis results etc. In the Tables 2a and ​ and2b, 2b , we list some most common examples for flawed and correct presentation of your estimates.

Examples for flawed presentation of results.

AUC - area under the curve

Examples for correct presentation of results.

The reason why confidence intervals are important is because they show how precise is the corresponding estimate. If confidence interval is too wide, this means that the precision of the estimate is small. Confidence intervals may be used to assess the difference between two estimates. For example, if authors wish to compare two areas under the curve (AUC) for two parameters, it needs to be checked if their confidence intervals overlap. In case those two confidence intervals overlap, it may be concluded that there is no statistically significant difference in the AUC for those two parameters at the corresponding level of significance relative to the confidence interval. More extensive review on the use and interpretation of the confidence intervals has already been published in this Journal within the section Lessons in biostatistics ( 9 ).

Let us say that we wish to compare the AUC for parameter A and B. Their AUC and corresponding 95% confidence intervals are 0.78 (0.63–0.89) and 0.99 (0.80–0.99). The question is: is there a statistically significant difference in the AUC for parameters A and B? Since their 95% confidence intervals overlap (from 0.80 to 0.89) we may conclude that there is no statistically significant difference in those two parameters, for the significance level alpha = 0.05.

It is noteworthy to mention that AUC is always reported with two decimal places, as well as its upper and lower 95% confidence interval limits.

Were correct statistical tests used for the analysis?

The choice of statistical test is determined by the type of the data and the way they are measured. There are several assumptions that need to be checked prior to the choice of the test:

Depending on the answers to the above listed questions, researcher makes the choice of the statistical test. Common errors are: i) authors did not test for those assumptions prior to the applying the statistical test; or ii) they fail to describe the way the test was selected; or iii) reader is not informed at all about the test used to analyze data in the study.

If data are not normally distributed and/or if sample size is small (N < 30), non-parametric tests should be used.

Tom Lang has reviewed 20 most common statistical errors occurring in biomedical research articles and has provided statistical reporting guidelines to be followed by authors, editors, and reviewers who lack some knowledge about the statistical analysis ( 6 ). Listed below are some of the most commonly errors occurring in manuscripts submitted to Biochemia Medica :

If there are three or more groups, authors should use ANOVA or its non-parametric analogue. When testing the data with tests for testing differences between three or more groups (such as ANOVA or Kruskal-Walis test), authors need to make sure to give P value for ANOVA as well as for post-hoc comparisons.

Only if P for ANOVA or Kruskal-Walis test implies that the tested difference among groups is significant, authors may proceed with post hoc test for multiple comparisons. Post hoc test is not done if P > 0.05, when applying tests for testing differences between three or more groups.

Furthermore, what also needs to be stated is the name of the test used for post-hoc comparisons, because different tests have different uses, as well as advantages and disadvantages ( 10 ).

Authors need to make sure that all tests used in their work have met the assumptions for their use and this information needs to be provided to the readers in the sections Statistical analysis and Results . More comprehensive review on the choice of the right statistical test has been extensively elaborated in this Journal within the section Lessons in biostatistics ( 11 ).

Is P value provided for all tests done in the study?

P value needs to be stated as exact number with three decimal places (i.e. P = 0.027). The use of expressions like NS, P > 0.05, P < 0.05 and P = 0.0000 is strongly discouraged. P should be provided with capital letter and should not be italicized. P < 0.001 is the smallest P value that should be reported. There is no point to provide more than 3 decimals for P, with the exception of some studies when large samples and rare events are studied ( 12 ).

Data interpretation

Even if correct statistical test was used to analyze the data, mistakes can still occur when authors interpret their results. When interpreting the data and results, authors need to make sure to take into account the a priori stated level of significance. This means that differences may be interpreted as significant, only if P value is below the stated level of significance. Expressions like ‘ borderline significant’ are strongly discouraged and will not be accepted.

Furthermore, statements like this are also discouraged:

If statistical significance was not observed, data should not be reported and discussed as significant. Moreover, no matter how obvious, difference should not be discussed unless the authors have tested for its statistical significance. Unfortunately, this often occurs when differences between two or more measures of diagnostic accuracy (AUC, sensitivities and specificities), correlation coefficients and odds ratios are being discussed.

Correlation analysis

Interpretation of the results of correlation analysis is frequently incorrect. When interpreting the results of the correlation analysis, authors first need to explore the level of the significance of the correlation coefficient. Correlation coefficient may be interpreted only if significant. If the obtained P value is > 0.05 (or above the predetermined level of significance), correlation coefficient is not significant and should not be interpreted.

When interpreting the value of the correlation coefficient, authors should follow the generally accepted classification by Colton (1974) ( 5 ). There is no correlation between the data if r < 0.25, even if P value is very low. The use and interpretation of correlation analysis is nicely reviewed by Udovicic M et al . in Biochemia Medica ( 13 ).

Conclusions on the causal relationship

When there is an association between measured parameters, authors often tend to make conclusions on the causal relationship of their observations. This is strongly discouraged. The existence of association does not prove the causal relationship of the data.

For example, the association of higher body mass index (BMI) with increased serum C-reactive protein (CRP) levels does not prove that CRP induces the increase in BMI, nor that BMI increase induces the increase in CRP. This only means that people with higher BMI tend to have higher concentrations of CRP.

Only if the study is an experiment or clinical trial, authors are allowed to make conclusions on the causality of the data. Since most of the studies submitted to our Journal are observational ( i.e. researcher only observes the differences, associations in variables of interest, without any intervention of the investigator on the study population), it is not acceptable to report any effect or induction of measured parameters. Furthermore, if the study is observational and involves monitoring of some parameters over time, it is justifiable to report the increase and decrease of monitored parameter. Otherwise, expressions like increase and decrease are not acceptable and authors are encouraged to use expressions like higher and lower, instead.

Listed below are several examples of incorrect statements which are strongly discouraged for all observational studies (which did not involve monitoring of parameters of interest over time). Each incorrect statement is followed by a suggestion for revised, correct expression.

Compared with the control group, ox-LDL levels were significantly increased in patients on hemodialysis (P = 0.001).

Compared with the control group, ox-LDL levels were significantly higher in patients on hemodialysis (P = 0.001).

We found a significantly decreased level of GPx in blood of asthmatic children as compared to age and sex matched controls (13.61 ± 5.73 vs. 15.22 ± 6.75, respectively; P = 0.036).

We found a significantly lower level of GPx in blood of asthmatic children as compared to age and sex matched controls (13.61 ± 5.73 vs. 15.22 ± 6.75, respectively; P = 0.036).

We observed that carrying AA genotype is significantly increased in healthy controls compared to patients (OR 2.5, 95% Ci = 1.7–3.9; P = 0.012,).

We observed that frequency of AA genotype is significantly higher in healthy controls compared to patients (OR 2.5, 95% Ci = 1.7–3.9; P=0.012).

Obstructive sleep apnea induced the increase in concentrations of hsCRP compared to healthy controls (P = 0.045).

Concentrations of hsCRP were higher in children with obstructive sleep apnea, compared to healthy controls (P=0.045).

Logistic regression identified serum copeptin (OR 3.1; 95% Ci = 1.7–12.4; P = 0.043) as an independent predictor of 1-month mortality of patients suffering from traumatic brain injury. We therefore conclude that copeptin induces mortality after traumatic brain injury.

Logistic regression identified serum copeptin (OR 3.1; 95% Ci = 1.7–12.4; P = 0.043) as an independent predictor of 1-month mortality of patients suffering from traumatic brain injury. We therefore conclude that increased serum copeptin concentrations are associated with higher risk of mortality after traumatic brain injury.

Herein we provide the short checklist for all future authors who wish to submit their articles to Biochemia Medica ( Table 3 ). We strongly encourage authors to check the items from the checklist prior to submitting their work for potential publication in our Journal. The aim of the checklist is to remind authors to some most important issues related to their data analysis and presentation. More extensive checklist for editing and reviewing statistical and epidemiological methodology in biomedical research papers, has already been published in this Journal with the aim to assist statistical reviewers and editors as well as to authors when planning their study and preparing their manuscripts ( 14 ).

Checklist for authors who submit their work to Biochemia Medica.

Conclusions

Authors are encouraged to browse through some older issues of this Journal for some more comprehensive coverage of some specific statistical terms and related issues. The point of this article was to provide a more general and basic guidance and alert authors to some important key factors that need to be remembered when writing an article. We invite all future authors to read this article and complete the checklist prior to submitting their work to our Journal. This will increase the quality of the submitted work and speed up the peer-review and publication process for published articles.

Potential conflict of interest

None declared.

An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals

BMC Medical Research Methodology volume  12 , Article number:  60 ( 2012 ) Cite this article

11k Accesses

29 Citations

2 Altmetric

Metrics details

The application of statistics in reported research in trauma and orthopaedic surgery has become ever more important and complex. Despite the extensive use of statistical analysis, it is still a subject which is often not conceptually well understood, resulting in clear methodological flaws and inadequate reporting in many papers.

A detailed statistical survey sampled 100 representative orthopaedic papers using a validated questionnaire that assessed the quality of the trial design and statistical analysis methods.

The survey found evidence of failings in study design, statistical methodology and presentation of the results. Overall, in 17% (95% confidence interval; 10–26%) of the studies investigated the conclusions were not clearly justified by the results, in 39% (30–49%) of studies a different analysis should have been undertaken and in 17% (10–26%) a different analysis could have made a difference to the overall conclusions.

It is only by an improved dialogue between statistician, clinician, reviewer and journal editor that the failings in design methodology and analysis highlighted by this survey can be addressed.

Peer Review reports

Statistics is an essential component of medical research from design initiation to project reporting, and it influences all aspects of the research process from data collection and management to analysis and interpretation. The application of statistics to medical sciences, and particularly in our area of interest, trauma and orthopaedic surgery, has become more widespread and complex. However, there is considerable evidence, both anecdotal and in the literature [ 1 ], of poor reporting and use of statistical methods in orthopaedics papers. Although our experience providing statistical support more widely in medicine leads us to suspect that similar opinions, about the quality of both design and statistical analysis, exists within many other medical disciplines. So our selection of general orthopaedic journals is not solely to highlight particularly bad practice in this discipline, as we suspect much of what we report here is generally applicable to research across all disciplines, and as such orthopaedic publications simply provide an exemplar of this larger population. In an attempt to quantify the extent of poor reporting and use of statistical methods, Parsons et al. [ 2 ] undertook a large survey of the orthopaedic literature to assess both the quality of reporting and the appropriate and correct use of statistical methods. The first part of this study found major deficiencies in reporting, with 59% (95% confidence interval; 56–62%) and 58% (56–60%) compliance with CONSORT [ 3 ] and STROBE [ 4 ] guidelines, and commented on differences between journals and paper types [ 2 ]. In the second part of the study, the quality of statistical analysis methods was assessed using a detailed questionnaire which was completed for a random sample of orthopaedics papers by two experienced statisticians. The results of this survey are discussed in detail here.

A random sample of 100 papers from the general orthopaedic literature was obtained and included 27 randomized controlled trials (RCTs), 30 case–control (CC) studies, 16 longitudinal (L) studies and 27 cross-sectional (CS) studies. The sample was stratified by study type to ensure accurate representation of each of the four types of study and additional inclusion criteria were as follows:

Published research papers from seven general orthopaedic journals [ 5 ] covering a range of impact factors [ 6 ]; Journal of Bone and Joint Surgery (American), Clinical Orthopaedics and Related Research, Journal of Bone and Joint Surgery (British), Acta Orthopaedica, Archives of Orthopaedic and Trauma Surgery, International Orthopaedics and BMC Musculoskeletal Disorders

Original research only &#x2013; excluding trial protocols, reviews, meta-analyses, short reports, communications and letters

Published between 1 st January 2005 and 1 st March 2010 (study start date)

No more than one paper from any single research group

Papers published by research groups based at our own institutes were excluded to avoid assessment bias

Full details of the search strategy and methods used to collect the sample are provided by Parsons et al. [ 2 ].

The statistical quality of each paper was assessed using a validated questionnaire [ 7 ], which was adapted to reflect the specific application to orthopaedic research [ 2 ]. After randomly numbering the papers from 1 to 100, each paper was read and independently assessed using the questionnaire by two experiences statisticians (NP and CP). Even numbered papers were read by NP and odd numbered papers were read by CP. The questionnaire was divided into two parts. Part one captured data describing the type of study, population under study, design, outcome measures and the methods of statistical analysis and the results of this were reported in Parsons et al. [ 2 ]. A random sample of 16 papers from the original 100, stratified by study type to ensure balance, was selected and read by both statisticians to assess the level of agreement between the two reviewers for individual items on part one of the questionnaire. Parsons et al. [ 2 ] reported kappa statistics in the range 0.76 to 1.00 with a mean of 0.96 suggesting good agreement between the reviewers for this more objective part of the survey. The second part of the questionnaire required generally more subjective assessments concerning the presentation of data and the quality and appropriateness of the statistical methods used (see Additional file 1 for questionnaire details). The results of this part are reported in detail here. The survey allowed a detailed investigation of issues such as the description of the sample size calculation, missing data, the use of blinding in trials, the experimental unit, multiple testing and presentation of results.

The correctness, robustness, efficiency and relevance [ 7 ] of the statistical methods reported in the sample papers were assessed using a yes or no assignment for each characteristic. Correctness refers to whether the statistical method was appropriate. For instance, it is not correct to use an unpaired t -test to compare an outcome from baseline to the trial endpoint for a single group of patients. Many statistical methods rely on a number of assumptions (e.g. normality, independence etc.); if those assumptions are incorrect, the selected method can produce misleading results. In this context we would describe the selected methods as lacking robustness . A statistical method was rated as inefficient if, for example, a nonparametric rather than a parametric method was used for an analysis where data conformed to a known distribution (e.g. using a Mann–Whitney test, rather than a t -test). Finally, an analysis was regarded as relevant if it answered the question posed in the study. For instance, a principal components analysis may be correct and efficient for summarising a multivariate dataset, but may have no bearing on the stated aim of a paper.

The majority of the survey items were objective assessments of quality, e.g. an incorrect method of analysis was used, with a small number of more subjective items, e.g. could a different analysis make a difference to the conclusions?

The outcomes of part two of the statistical questionnaire are summarized in the following three subsections covering study design, statistical methods and the presentation of results.

Study design

A number of key themes emerged from the analysis of the questionnaire data. Foremost amongst these were the description of the study design, identification of the experimental unit, details of the sample size calculation, the handling of missing data and blinding for subjective measures. These topics are discussed individually below.

Experimental unit

The experimental unit is a physical object which can be assigned to a treatment or intervention. In orthopaedics research, it is often an individual patient. However, other possibilities include things such as a surgeon, a hip or a knee. The experimental unit is the unit of statistical analysis and, for simple study designs, it is synonymous with the data values, i.e. there is a single outcome measure for each experimental unit. For more complex designs, such as repeated measures, there may be many data values for each experimental unit. Failure to correctly identify the experimental unit is a common error in medical research, and often leads to incorrect inferences from a study [ 1 , 8 ].

The experimental unit was not identified correctly in 23% (15–33%; 95% confidence interval based on normal approximation to binomial) of the sampled studies. Of the 77 papers that correctly identified the experimental unit, 86% (75–92%) correctly summarised the data by patient. By far the most common reason for incorrect identification of the experimental unit was confusion between limbs and individual patients when analysing and reporting results. For example, one paper reported data for 100 patients but summarised outcomes for 120 feet, whereas another reported patient pain scores after surgery for both left and right ankles on some patients and single ankles for other patients. Failure to identify the correct experimental unit can lead to ‘dependencies’ in data. For example, outcome measures made on left and right hips for the same patient will be correlated, but outcome measures between individual hips from two different patients will be uncorrelated. Only one paper, where data were available from one or both legs for patients, identified this as an important issue and the authors decided to use what we would regard as an inappropriate strategy by taking the mean of the two values as the outcome for bilateral patients. Almost all of the statistical analyses reported in these studies (e.g. t-tests, ANOVA, regression) are based on an assumption that outcome data (formally the residuals) are uncorrelated; if this is not the case then the reported inferences are unlikely to be valid.

Sample size

The size of the sample used in a study, i.e. the number of experimental units (usually patients), largely determines the precision of estimates of study population characteristics such as means and variances. That is, the number of patients in the study determines how confidently we can draw inferences from the results of that study and use them to inform decisions about the broader population of patients with that particular condition or problem. In clinical trials, a pre-study power analysis is usually used to estimate the sample size [ 9 ], although methods are available for many other study types [ 10 ]. It is particularly important for RCTs, where specific null hypotheses are tested, that a clear description of the methodology and rationale for choosing a sample size is given. For example, the outcome is assumed to be normally distributed, treatment group differences will be assessed using a t -test, the power to detect a defined clinically meaningful difference is set to at least 80% and the type I error rate, or significance level, is set to 5%.

The sample size was not justified in the Methods section for 19% (7–39%) of the 27 papers describing RCTs. A specific calculation, with sufficient details to allow the reader to judge the validity, was not given for 30% (14–50%) of RCTs. These studies often simply described the sample size in vague terms, for instance "…based on a priori sample size estimation, a total of 26 patients were recruited…" . For 3 papers reporting RCTs, the validity of the sample size calculation was questionable, for 3 papers there was a lack of clearly stated assumptions and in 2 papers the calculation was simply not credible. For example, one paper gave sparse details about the population variance, minimum clinically important difference and required power which resulted in a recruitment target of 27 patients for a two arm trial. For purely practical reasons one would always want an even number of patients in a two arm trial. In another paper, 400 patients were recruited to a study, based on a vague description about how this number was arrived at, and exactly 200 patients were randomly allocated to each of two treatment groups. A cynical reader might question the likelihood of such an exact split of patients between treatment groups; there is only a 1 in 25 chance of an exact split for a simple 1 to 1 randomization. However, this might simply be a case of poor reporting, where in reality blocking or minimization were used to equalise numbers in the treatment arms, thus giving more credence to the description of the design. For the 73 observational studies, only 34% (24–46%) justified the sample size, that is there was some discussion in the paper on how the sample size was arrived at; this was often minimal, for instance a simple statement that the number of patients required to answer the research question was the number of patients who were available at the time of study, or those who accepted an invitation to participate (e.g. "…all patients were invited to join the study…" ).

Missing data

Missing data are observations that were intended to be made but were not made [ 11 ]; the data may be missing for unexpected reasons (e.g. patient withdrawal from a study), or intentionally omitted or not collected. It is important to carefully document why data are missing in the study design when reporting. If data can be considered to be missing at random, then valid inferences can still be made. However, if values are missing systematically, then it is more dangerous to draw conclusions from that study. For example, if in a clinical trial comparing different types of hip replacement all of the missing data occurs in one particular arm of the trial, the remaining data is unlikely to be representative of the overall result in that group of patients; the missing data may be because those patients went to another hospital for their revision surgery.

Data were missing, either for a complete unit or a single observation, in 34% (25–44%) of the papers, of these 34 papers only 62% (44–77%) documented and explained the reasons for this. An audit of the data reported in each paper allowed the statistical assessors to identify 13 papers (13% of the total sample) where data were missing with no explanation. Data missingness was generally inferred from the numbers reported in the results being less than those reported in the methods, with no explanation or reason offered by the authors of the study. In the 34 papers reporting missing data, 28 based the analysis on complete cases, 2 imputed missing data and for the remaining 4 papers it was unclear as to what methodology was used.

Subjective assessments and blinding

Many orthopaedic studies report subjective assessments, such as a pain or a functional score after surgery or a radiological assessment of the quality of a scan. To reduce the risk of bias for these kinds of assessments it is desirable, where possible, to ‘blind’ the assessor to the treatment groups to which the patient was allocated.

Subjective assessments were undertaken in 16 of the 27 RCTs (59%; 95% CI 39–77%) and in 6 of these studies (38%; 95% CI 16–64%), the assessments were not done blind and no explanation was given as to why this was not possible.

Statistical methods

Statistical methods should always be fully described in the methods section of a paper and only the statistics described in the methods should be reported in the results section. In 20% (13–29%) of the papers in our survey, statistical methods not previously stated in the methods section were reported in the results section [ 2 ]. In addition to the poor reporting of the methods used, a number of specific issues were identified.

Analysis methods

The most commonly reported statistical methods were chi-squared (χ 2 ) and Fisher’s exact tests (47%; 95% CI 37–57%), t-tests (45%; 95% CI 35–55%), regression analysis (33%; 95% CI 24–43%) and Mann–Whitney tests (28%; 95% CI 20–38%). The selection of an appropriate method of analysis is crucial to making correct inferences from study data.

In 52% (32–71%) of papers where a Mann–Whitney, Wilcoxon rank sum or Wilcoxon signed rank test was used, the analysis was considered to be inefficient and the reported analysis was only considered to be correct 70% (50–86%) of the time. The t -test was used inappropriately, with a lack of robustness, in 26% (14–41%) of papers and in an equivalent proportion of papers (26%; 95% CI 14–41%) it was reported in such a way as to be irrelevant to the stated aims of the paper. This lack of relevance was, on occasion, due to method selection such as the choice between a parametric and a nonparametric test, but more often was simply a result of poor reporting and lack of clarity in the description. Many papers reported a list of the statistical tools used in the analysis, but in the results gave only short statements such as “A was better than B (p = 0.03)” with no details as to which test was used to obtain the p-value; so-called ‘orphan’ p-values [ 12 ]. It was therefore impossible to assess whether the correct test was used for the relevant comparison.

Seven papers (7%; 95% CI 3–14%) reported clear methodological errors in the analysis. Two papers wrongly used the Wilcoxon signed-rank test to compare independent samples and another paper used an independent samples t -test where a paired test should have been used. One paper committed the reverse error of using a paired t -test to compare cases and controls in an unpaired case–control study and another paper used a t -test to compare differences in proportions rather than, for instance, a χ 2 test. Another study calculated the arithmetic mean of a number of percentages, all based on different denominator populations. And finally, one study outlined reasons for conducting a non-parametric analysis in the methods only to later report an analysis of covariance, a parametric method of analysis based on assumptions of normality.

Parametric versus non-parametric tests

Parametric statistical tests assume that data come from a probability distribution with a known form. That is, the data from the study can be described by a known mathematical model; the most widely used being the normal distribution. Such tests make inferences about the parameters of the distribution based on estimates obtained from the data. For example, the arithmetic mean and variance are parameters of the normal distribution measuring location and spread respectively. Non-parametric tests are often used in place of parametric tests when the assumptions necessary for the parametric method do not hold; for instance the data might be more variable or more skewed than expected. However, if the assumptions are (approximately) correct, parametric methods should be used in preference to non-parametric methods as they provide more accurate and precise estimates, and greater statistical power [ 13 ].

Many of the papers in this survey showed no clear understanding of the distinction between these types of tests, evidenced by reporting that made no statistical sense: e.g. "…continuous variables were determined to be parametric using Kolmogorov-Smirnov tests…" , "…the t-test was used for parametric variances…" , "…non-parametric statistics were used to compare outcome measures between groups (one way ANOVA)…" and "…Student's t-test and the Mann–Whitney test were used to analyse continuous data with and without normal distribution…" . Continuous variables may be assumed to be approximately normal in an analysis, but it makes no sense to describe variables or variances as parametric. It is also incorrect to label an analysis of variance (ANOVA) as non-parametric. In at least 5 papers (5%; 95% CI 2–12%), the authors opted to use non-parametric statistical methods, but then summarised data in tables and figures using means and standard deviations, the parameters of the normal distribution, rather than correctly using medians and ranges or inter-quartile ranges.

The survey showed that 52% (42–62%) of papers used non-parametric tests inefficiently; that is they reported the results of non-parametric tests for outcomes that evidence from the paper suggested were approximately normal. Three papers (3%; 95% CI 0–9%) compared the lengths of time to an outcome event between groups by using the non-parametric Mann–Whitney (M-W) test based on converting the times to ranks. By doing this, much of the information about real differences between individual records is lost; for example outcomes of 1 day, 2 days and 100 days become 1, 2 and 3 when converted to ranks. Although times are often positively skewed, they are usually approximately normally distributed after logarithmic transformation [ 14 ]. A more efficient analysis can therefore usually be achieved by using a t -test on log-transformed times rather than applying a M-W test to untransformed data. This is not to say that non-parametric tests should never be used, but that for many variable types (e.g. times, areas, volumes, ratios or percentages) there are simple and well-known transformations that can be used to force the data to conform more closely to the assumptions required for parametric analysis, such as normality or equality of variances between treatment groups.

Multiple comparisons

Problems of multiple comparisons, or multiple testing, occur when considering the outcomes of more than one statistical inference simultaneously. In the context of this survey, it is best illustrated by considering a number of reported statistical tests for one study all reporting evidence for significance at the 5% level. By definition, if one undertakes 20 hypothesis tests on data where we know that there is no true difference, we will expect to see one significant result at the 5% level by chance alone. Therefore, if we undertake multiple tests, we require a stronger level of evidence to compensate for this. For example, the Bonferroni correction preserves the ‘familywise error rate’ (α), or the probability of making one or more false discoveries, by requiring that each of n tests should be conducted at the α/n level of significance, i.e. it adjusts the significance level to account for multiple comparisons [ 15 ].

The questionnaire recorded the number of hypotheses tested in each paper, based on an approximate count of the number of p-values reported. Three papers did not report p-values, 31 papers (31%; 95% CI 22–41%) reported less than 5 p-values, 36 papers (36%; 95% CI 27–46%) reported between 5 and 20 p-values and 30 papers (30%; 95% CI 21–40%) reported more than 20 p-values. Issues of the relevance and the need for formal adjustment for multiple comparisons will clearly be very problem specific [ 16 ]. Whilst most statisticians would concede that the formal adjustment of p-values to account for multiple comparisons may not necessarily be required when reporting a small number of hypothesis tests, if reporting more than 20 p-values from separate analyses, some discussion of the rationale and need for so many statistical tests should be provided and formal adjustment for multiple-comparison considered. In an extreme case, one paper reported a total of 156 p-values without considering the effect of this on inferences from the study. A Bonferroni correction to the significance level would have resulted in at least 21 of the 35 reported significant p-values in this study to be regarded as no longer significant. Where some adjustment was made for multiple comparisons (7 papers), the Bonferroni correction was the most common method (5 papers). One other paper used Tukey’s Honestly Significant Difference (HSD) test and another set the significance level to 1% (rather than 5%) in an ad-hoc manner to account for undertaking 10 tests.

Presentation of results

The clear and concise presentation of results, be it the labelling of tables and graphs or the terminology used to describe a method of analysis or a p-value, is an important component of all research papers. The statistical assessment of the study papers identified two important presentational issues.

Graphs and tables

The statistical assessors were asked to comment on the quality of the data presentation in the papers which included tables and graphs. Graphs and tables were clearly titled in only 29% (21–39%) of papers. For instance, typical examples of uninformative labels included “Table I: Details of Study” and “Table II: Surgical Information” . Furthermore, only 43% of graphs and tables were considered to be clearly labelled. In particular, a number of the papers included tables with data in parentheses without further explanation. The reader was then left to decide whether the numbers indicated, for example, 95% confidence intervals, inter-quartile ranges (IQRs) or ranges. Some tables also included p-values with no indication of the statistical test used. The description of graphical displays was occasionally confusing. One paper stated that the bars of a box-and-whisker plot represented the maximum and minimum values in a dataset, when there were clearly points outside the bars. By convention, the bars represent 1.5 times the inter-quartile range, with points outside the bars identified as ‘outliers’. Interestingly, another paper claimed that the boxes showed standard deviations, rather than the correct IQR, so there is clearly a wider misunderstanding of these figures.

Raw data for individual patients (or experimental units) were displayed graphically or in tables in only 9% (4–17%) of papers. Raw data, as opposed to means, medians or other statistics, always provide the simplest and clearest summary of a study, and direct access to the data for the interested reader. Although we accept that there may be practical reasons why authors would not want to present such data, it is disappointing that such a small proportion of investigators decided to do so.

Terminology

The lack of appropriate statistical review, either prior to submission or at the review stage, was apparent in the catalogue of simple statistical reporting errors found in these papers. For instance, methods were reported that, to our knowledge, do not exist: e.g. "multiple variance analysis" or the “least squares difference" post-hoc test after ANOVA. Presumably the latter refers to a least significant difference test, but the former is ambiguous. Another class of reporting error were those that simply made no statistical sense in the context they were reported: e.g. "…there was no difference in the incidence among the corresponding groups (chi-squared test, p = 0.05)…" , and "…there were no significant differences in the mean T-score or Z-score between the patients and controls…" . The former remark was made in the context of rejecting the null hypothesis at the 5% level for significance and the latter presumably implied that mean t-statistics and z-scores were compared between groups, which makes no statistical sense. The inadequate or poor reporting of p-values was also widespread, and typical errors included "p < 0.000009" , “p < 0.134” and, more generally, the use of “p = NS” or “p < 0.05” . P-values should generally be quoted to no more than 3 decimal places, and be exact (as opposed to an inequality e.g. p < 0.05), unless very small when p < 0.001 is acceptable.

A number of key issues have emerged from our survey of 100 papers investigating the quality of statistical analysis and design in trauma and orthopaedic research. These points are summarised below with recommendations for improvement.

It is important that authors clearly identify the experimental unit when reporting. This was a source of confusion for 23% (95% CI 15–33%) of the papers in our survey and reflects a fundamental lack of understanding. If no attempt is made to modify the analysis to account for data dependencies, researchers should at least state that they are making an assumption of (approximate) independence between multiple observations for the same unit (e.g. functional outcomes from the left and right side for the same individual after a bilateral procedure). This then at least allows the reader to decide whether the assumption is reasonable in the context of the study.

Where specific hypotheses are being tested, for instance in an RCT, the sample size should be discussed, and usually a power calculation reported with sufficient details to allow one to verify the reported sample size. In this survey, 30% (14–50%) of the RCTs gave no such calculation and 19% (7–39%) of them provided no justification for the sample size in the methods section. A clear description of the methodology used for sample size determination (e.g. power level, significance) and the design used (e.g. randomization) is critical for judging the quality of research. However, in studies where a sample size calculation may not be relevant, for example if the research is exploratory or researchers have access to a limited number of participants, authors should provide an open discussion of this in the methods section.

The lack of a clear explanation offered by a number of the papers in this survey for missing data goes hand-in-hand with the poor reporting of patient numbers. Parsons et al. [ 2 ] showed that only 57% of these papers stated exact numbers of patients in both the methods and results sections. RCTs should report a (CONSORT-style [ 3 ]) flowchart documenting exactly what happened to all of the participants in the trial, and all studies should state how and why any patient data was missing or excluded from the analysis. Furthermore, all studies should state the size of sample used to estimate parameters, as a way of explicitly stating whether all or only partial information was available for inference.

It is important for the credibility of reported research to take all practical steps to remove potential sources of bias from a study. Blinding an assessor to the treatment allocation in RCTs is a simple method to achieve this. We expect that when subjective scores are used then blinding is necessary, or if blinding is not possible some explanation should be offered as to why it was not possible or practical.

This survey has highlighted the common use of inefficient or irrelevant statistical methods, with 7 papers reporting clear methodological errors. Not only does this suggest that many of the studies reported in these papers have had little or no expert statistical input, it is clear that many of the papers have not undergone adequate statistical review prior to publication. The lack of clear association between the description of the statistical methods and the reporting of the outcome (e.g. p-value) was widespread. However, this kind of issue could be easily corrected by obtaining appropriate expert statistical review prior to submission. If a reviewer notes a statistical error, or does not understand a statistical analysis plan, they should recommend that an expert opinion is sought during the review process after submission to the journal.

Non-parametric tests were used widely in the studies in this survey in a manner that suggested a lack of understanding as to when they are appropriate. For instance when selecting between a (parametric) t -test or a (non-parametric) Mann–Whitney test, the latter test should only be used for outcomes that are not approximately normally distributed and should be reported with medians and ranges (or interquartile ranges), not means and standard deviations [ 13 ]. However, where a natural transformation is available to make an outcome ‘more normal’, undertaking the analysis on the transformed scale using normal test statistics is usually a more efficient and powerful option than the non-parametric alternative. The widespread misuse of non-parametric tests in this survey suggests that this issue is not widely appreciated.

Carrying out multiple hypothesis tests was a common practice in many of the papers reviewed, with 30% (21–40%) of the papers reporting over 20 p-values and one study reporting a massive 156. Whilst we accept that the details of statistical methods to correct for multiple comparisons may not be common knowledge in the orthopaedic research community, and the circumstances when it is appropriate to adjust for multiple testing remain contentious amongst both statistical and clinical communities [ 16 ], we would expect most researchers to have some appreciation that carrying out large numbers of hypothesis tests and reporting significance at the standard 5% level is bad practice. We would advise that often the best way of dealing with this issue is to insist that a clear description and justification of all tests of significance that have been performed be included; this process of questioning will generally lead to a marked reduction in the number of tests reported. Graphs and tables: The presentation of results was a particular weak point in the papers included in this survey. All graphs and tables should be clearly labelled and sufficiently detailed to allow at least some inference to be made in isolation from the rest of the paper. Authors should include a clear title so that readers can quickly understand the information on display without reference to the main body of the text. With the increasing availability of software for producing sophisticated plots, it is tempting for authors to indulge in novel ways to present results. However, this is often one area where clear and simple tends to be best.

Although not formally one of the items in the questionnaire, we noted that 5 of the 27 RCTs (19%; 95% CI 7–39%) tested for differences between baseline characteristics (e.g. age, gender ratio, BMI etc.) after recruitment and randomization of patients to treatment arms of a trial. Since the randomization process produces treatment groups that are random samples from the same population , a null hypothesis of no difference between two populations must, by definition, be true. As such, any significant difference observed between groups must have arisen by chance; i.e. it is a type I error. Despite the widespread appreciation of this argument within the statistics community, this is still a widely reported error in many medical disciplines that, with adequate statistical input during the course of a study and at review, could be avoided.

The opinions expressed here are the result of independent assessments made by two statisticians using a sample of 100 representative orthopaedic papers and, as such, are limited by the experience and prejudices of the assessors and the size and nature of the survey. However, the carefully designed sampling strategy and the random selection methods ensured that the papers surveyed were indeed representative of the target literature [ 2 ]. Furthermore, the fact that many of the issues highlighted in this paper are familiar topics to those providing statistical reviews of medical research [ 17 ], suggests that the views expressed here are likely to be widely held within this community. For those who are unfamiliar with good practice in research, others have provided guidance in the use of statistics in the orthopaedic setting [ 1 , 18 ] and also specifically in the design of RCTs [ 19 , 20 ]. More generally, the series of short articles on the use of statistics for medical researchers published in the British Medical Journal [ 21 ] provide a rich resource of information on good statistical practice. Although our focus here has been on research published in general orthopaedic journals, the nature and extent of the issues raised here are clearly not exclusive to this discipline and as such we expect that the issues raised in the discussion and our recommendations for improvement to be applicable across all medical disciplines. To the non-statistically trained reader, many of the criticisms reported here may seem niggling and unimportant relative to the clinical details of a study. However, it is troubling to report that the statistical assessors in this survey thought that in 17% (10–26%) of the studies, the conclusions were not clearly justified by the results. For 39% (30–49%) of studies a different analysis should have been undertaken and for 17% (10–26%) of them, a different analysis could have made a difference to the conclusions. The results of this survey present challenges for us all, whether statistician, clinician, reviewer or journal editor, and it is only by greater dialogue between us all that these important issues can be addressed.

Petrie A: Statistics in orthopaedic papers. J Bone Joint Surg Br. 2006, 88: 1121-1136. 10.1302/0301-620X.88B9.17896.

Article   CAS   PubMed   Google Scholar  

Parsons N, Hiskens R, Price CL, Costa ML: A systematic survey of the quality of research reporting in general orthopaedic journals. J Bone Joint Surg Br. 2011, 93: 1154-1159. 10.1302/0301-620X.93B9.27193.

Moher D: CONSORT: an evolving tool to help improve the quality of reports of randomized controlled trials. Consolidated Standards of Reporting Trials. JAMA. 1998, 279: 1489-1491.

CAS   PubMed   Google Scholar  

von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC: The strengthening the reporting of observational studies in epidemiology (STROBE) statement: Guidelines for reporting observational studies. Lancet. 2007, 370: 1453-1457. 10.1016/S0140-6736(07)61602-X.

Article   PubMed   Google Scholar  

Siebelt M, Siebelt T, Pilot P, Bloem RM, Bhandari M, Poolman RW: Citation analysis of orthopaedic literature: 18 major orthopaedic journals compared for Impact Factor and SCImago. BMC Musculoskelet Disord. 2010, 11: 4-10.1186/1471-2474-11-4.

Article   PubMed   PubMed Central   Google Scholar  

Web of Knowledge. [ http://wok.mimas.ac.uk/ ]

Kilkenny C, Parsons N, Kadyszewski E, Festing MFW, Cuthill IC, Fry D, Hutton J, Altman D: Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One. 2009, 4: e7824-10.1371/journal.pone.0007824.

Altman DG, Bland JM: Units of analysis. BMJ. 1874, 1997: 314-

Google Scholar  

Chow S-C, Shao J, Wang H: Sample size calculations in clinical research. 2008, New York: Chapman and Hall

Schlesselman JJ: Sample size requirements in cohort and case–control studies of disease. American J Epidemiol. 1974, 99: 381-384.

CAS   Google Scholar  

Missing data analysis. [ http://missingdata.lshtm.ac.uk/ ]

Oliver D, Hall JC: Usage of statistics in the surgical literature and the 'orphan P' phenomenon. Aust N Z J Surg. 1989, 59: 449-451. 10.1111/j.1445-2197.1989.tb01609.x.

Altman DG, Bland JM: Parametric v non-parametric methods for data analysis. BMJ. 2009, 338: a3167-10.1136/bmj.a3167.

Bland M: An introduction to medical statistics. 2003, Oxford: OUP

Bland JM, Altman DG: Multiple significance tests: the Bonferroni method. BMJ. 1995, 310: 170-10.1136/bmj.310.6973.170.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Perneger TV: What's wrong with Bonferroni adjustments. BMJ. 1998, 316: 1236-10.1136/bmj.316.7139.1236.

Bland M: How to upset the Statistical Referee. [ http://www-users.york.ac.uk/~mb55/talks/upset.htm ]

Petrie A: Statistical power in testing a hypothesis. J Bone Joint Surg Br. 2010, 92: 1192-1194. 10.1302/0301-620X.92B9.25069.

Simunovic N, Devereaux PJ, Bhandari M: Design considerations for randomised trials in orthopaedic fracture surgery. Injury. 2008, 39: 696-704. 10.1016/j.injury.2008.02.012.

Soucacos PN, Johnson EO, Babis G: Randomised controlled trials in orthopaedic surgery and traumatology: overview of parameters and pitfalls. Injury. 2008, 39: 636-642. 10.1016/j.injury.2008.02.011.

BMJ Statistics Notes Series. [ http://openwetware.org/wiki/BMJ_Statistics_Notes_series ]

Pre-publication history

The pre-publication history for this paper can be accessed here: http://www.biomedcentral.com/1471-2288/12/60/prepub

Download references

Author information

Authors and affiliations.

Warwick Medical School, University of Warwick, Coventry, CV2 2DX, UK

Nick R Parsons, Juul Achten & Matthew L Costa

Public Health, Epidemiology and Biostatistics Group, University of Birmingham, Birmingham, B15 2TT, UK

Charlotte L Price

University Hospitals Coventry and Warwickshire NHS Trust, Coventry, UK

Richard Hiskens

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Nick R Parsons .

Additional information

Competing interests.

The authors declare that they have no competing interests.

Authors’ contributions

NP, JA, MC, RH and CP designed the study and extracted the data. RH identified the papers for inclusion in the study. NP and CP, reviewed the papers, extracted the data, conducted the analyses and created the first draft of the manuscript. All authors participated in editing the manuscript and approved final manuscript for publication.

Electronic supplementary material

Additional file 1: statistical questionnaire.(doc ), rights and permissions.

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and Permissions

About this article

Cite this article.

Parsons, N.R., Price, C.L., Hiskens, R. et al. An evaluation of the quality of statistical design and analysis of published medical research: results from a systematic survey of general orthopaedic journals. BMC Med Res Methodol 12 , 60 (2012). https://doi.org/10.1186/1471-2288-12-60

Download citation

Received : 01 November 2011

Accepted : 16 April 2012

Published : 25 April 2012

DOI : https://doi.org/10.1186/1471-2288-12-60

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

BMC Medical Research Methodology

ISSN: 1471-2288

research paper statistical analysis

Trending now

Top 10 career options after b.sc, ai in digital marketing: how ai is modifying the digital marketing landscape, artificial intelligence career guide: a comprehensive playbook to becoming an ai expert, top 5 jobs in ai and the right artificial intelligence skills you need to stand out, how ai and data science experts can rescue the global supply chain, what is virtual reality everything you need to know, advantages and disadvantages of artificial intelligence, top 10 machine learning algorithms you need to know in 2023, boost your ai and machine learning career with caltech ctme’s bootcamp, top 10 machine learning projects and ideas, what is statistical analysis types, methods and examples.

What Is Statistical Analysis?

Table of Contents

Statistical analysis is the process of collecting and analyzing data in order to discern patterns and trends. It is a method for removing bias from evaluating data by employing numerical analysis. This technique is useful for collecting the interpretations of research, developing statistical models, and planning surveys and studies.

Statistical analysis is a scientific tool in AI and ML that helps collect and analyze large amounts of data to identify common patterns and trends to convert them into meaningful information. In simple words, statistical analysis is a data analysis tool that helps draw meaningful conclusions from raw and unstructured data. 

The conclusions are drawn using statistical analysis facilitating decision-making and helping businesses make future predictions on the basis of past trends. It can be defined as a science of collecting and analyzing data to identify trends and patterns and presenting them. Statistical analysis involves working with numbers and is used by businesses and other institutions to make use of data to derive meaningful information. 

Master Tools You Need For Becoming an AI Engineer

Master Tools You Need For Becoming an AI Engineer

Types of Statistical Analysis

Given below are the 6 types of statistical analysis:

Descriptive Analysis

Descriptive statistical analysis involves collecting, interpreting, analyzing, and summarizing data to present them in the form of charts, graphs, and tables. Rather than drawing conclusions, it simply makes the complex data easy to read and understand.

Inferential Analysis

The inferential statistical analysis focuses on drawing meaningful conclusions on the basis of the data analyzed. It studies the relationship between different variables or makes predictions for the whole population.

Predictive Analysis

Predictive statistical analysis is a type of statistical analysis that analyzes data to derive past trends and predict future events on the basis of them. It uses machine learning algorithms, data mining , data modelling , and artificial intelligence to conduct the statistical analysis of data.

Prescriptive Analysis

The prescriptive analysis conducts the analysis of data and prescribes the best course of action based on the results. It is a type of statistical analysis that helps you make an informed decision. 

Exploratory Data Analysis

Exploratory analysis is similar to inferential analysis, but the difference is that it involves exploring the unknown data associations. It analyzes the potential relationships within the data. 

Causal Analysis

The causal statistical analysis focuses on determining the cause and effect relationship between different variables within the raw data. In simple words, it determines why something happens and its effect on other variables. This methodology can be used by businesses to determine the reason for failure. 

Importance of Statistical Analysis

Statistical analysis eliminates unnecessary information and catalogs important data in an uncomplicated manner, making the monumental work of organizing inputs appear so serene. Once the data has been collected, statistical analysis may be utilized for a variety of purposes. Some of them are listed below:

Boost Your AI and Machine Learning Career

Boost Your AI and Machine Learning Career

Benefits of Statistical Analysis

Statistical analysis can be called a boon to mankind and has many benefits for both individuals and organizations. Given below are some of the reasons why you should consider investing in statistical analysis:

Statistical Analysis Process

Given below are the 5 steps to conduct a statistical analysis that you should follow:

Statistical Analysis Methods

Although there are various methods used to perform data analysis, given below are the 5 most used and popular methods of statistical analysis:

Mean or average mean is one of the most popular methods of statistical analysis. Mean determines the overall trend of the data and is very simple to calculate. Mean is calculated by summing the numbers in the data set together and then dividing it by the number of data points. Despite the ease of calculation and its benefits, it is not advisable to resort to mean as the only statistical indicator as it can result in inaccurate decision making. 

Standard Deviation

Standard deviation is another very widely used statistical tool or method. It analyzes the deviation of different data points from the mean of the entire data set. It determines how data of the data set is spread around the mean. You can use it to decide whether the research outcomes can be generalized or not. 

Regression is a statistical tool that helps determine the cause and effect relationship between the variables. It determines the relationship between a dependent and an independent variable. It is generally used to predict future trends and events.

Hypothesis Testing

Hypothesis testing can be used to test the validity or trueness of a conclusion or argument against a data set. The hypothesis is an assumption made at the beginning of the research and can hold or be false based on the analysis results. 

Sample Size Determination

Sample size determination or data sampling is a technique used to derive a sample from the entire population, which is representative of the population. This method is used when the size of the population is very large. You can choose from among the various data sampling techniques such as snowball sampling, convenience sampling, and random sampling. 

Statistical Analysis Software

Everyone can't perform very complex statistical calculations with accuracy making statistical analysis a time-consuming and costly process. Statistical software has become a very important tool for companies to perform their data analysis. The software uses Artificial Intelligence and Machine Learning to perform complex calculations, identify trends and patterns, and create charts, graphs, and tables accurately within minutes. 

Statistical Analysis Examples

Look at the standard deviation sample calculation given below to understand more about statistical analysis.

The weights of 5 pizza bases in cms are as follows:

Calculation of Mean = (9+2+5+4+12)/5 = 32/5 = 6.4

Calculation of mean of squared mean deviation = (6.76+19.36+1.96+5.76+31.36)/5 = 13.04

Sample Variance = 13.04

Standard deviation = √13.04 = 3.611

Career in Statistical Analysis

A Statistical Analyst's career path is determined by the industry in which they work. Anyone interested in becoming a Data Analyst may usually enter the profession and qualify for entry-level Data Analyst positions right out of high school or a certificate program — potentially with a Bachelor's degree in statistics, computer science, or mathematics. Some people go into data analysis from a similar sector such as business, economics, or even the social sciences, usually by updating their skills mid-career with a statistical analytics course.

Statistical Analyst is also a great way to get started in the normally more complex area of data science. A Data Scientist is generally a more senior role than a Data Analyst since it is more strategic in nature and necessitates a more highly developed set of technical abilities, such as knowledge of multiple statistical tools, programming languages, and predictive analytics models.

Aspiring Data Scientists and Statistical Analysts generally begin their careers by learning a programming language such as R or SQL. Following that, they must learn how to create databases, do basic analysis, and make visuals using applications such as Tableau. However, not every Statistical Analyst will need to know how to do all of these things, but if you want to advance in your profession, you should be able to do them all.

Based on your industry and the sort of work you do, you may opt to study Python or R, become an expert at data cleaning, or focus on developing complicated statistical models.

You could also learn a little bit of everything, which might help you take on a leadership role and advance to the position of Senior Data Analyst. A Senior Statistical Analyst with vast and deep knowledge might take on a leadership role leading a team of other Statistical Analysts. Statistical Analysts with extra skill training may be able to advance to Data Scientists or other more senior data analytics positions.

Become Proficient in Statistics Today

Hope this article assisted you in understanding the importance of statistical analysis in every sphere of life. Artificial Intelligence (AI) can help you perform statistical analysis and data analysis very effectively and efficiently. 

If you are a science wizard and fascinated by the role of AI in statistical analysis, check out this amazing Caltech Post Graduate Program in AI & ML course in collaboration with Caltech. With a comprehensive syllabus and real-life projects, this course is one of the most popular courses and will help you with all that you need to know about Artificial Intelligence. 

Find our Caltech Post Graduate Program In AI And Machine Learning Online Bootcamp in top cities:

About the author.

Simplilearn

Simplilearn is one of the world’s leading providers of online training for Digital Marketing, Cloud Computing, Project Management, Data Science, IT, Software Development, and many other emerging technologies.

Recommended Programs

Caltech Post Graduate Program in AI and Machine Learning

Artificial Intelligence Engineer

*Lifetime access to high-quality, self-paced e-learning content.

What Is Statistical Modeling?

What Is Statistical Modeling?

Recommended resources.

Digital Marketing Salary Guide 2021

Digital Marketing Salary Guide 2021

Understanding Statistical Process Control (SPC) and Top Applications

Understanding Statistical Process Control (SPC) and Top Applications

A Complete Guide on the Types of Statistical Studies

A Complete Guide on the Types of Statistical Studies

Free eBook: Guide To The CCBA And CBAP Certifications

Free eBook: Guide To The CCBA And CBAP Certifications

What is Data Analysis? Methods, Process and Types Explained

What is Data Analysis? Methods, Process and Types Explained

A Complete Guide to Get a Grasp of Time Series Analysis

A Complete Guide to Get a Grasp of Time Series Analysis

Tutlance

Statistics Research Paper Writing Guide + Examples

statistics research paper

A statistics research paper discusses and analyzes the numerical data. A statistics research paper should cover all the aspects regarding the distribution of data, frequency tables and graphs..

A statistics research paper is similar to a survey research paper in many ways. Both papers focus on collecting information about some specific topic using surveys. They both use statistical methods to collect, analyze and present this information..

To explain how they are different, consider two types of statistics: means and relationships; it may be easier to understand this by first explaining how they’re related: The mean is (most often) calculated through addition while relationships are typically found through multiplication. This also explains why you calculate means at the beginning of a statistics project (before any relationship has been discovered), while responsibility for calculating relationships typically falls to the end of a statistics project. The next time you’re in class, try to count how many times your instructor mentions “mean,” as opposed to “relationship!”

A statistics paper is based on a relationship between two or more variables (often referred to as independent and dependent variables). You can think of these variables almost like social security numbers—each person has one SSN that distinguishes them from all other people; similarly, each data point (a unique combination of values for an independent variable) has its own set of values for one or more variables..

An example: if we are interested in knowing whether there is a relationship between hospital beds per 1,000 residents and the reading scores of senior citizens, defined as ages 65 and over, in a given city or country, then we must gather data on both variables. We will need to find out how many hospital beds there are per 1,000 residents for each of the cities or countries represented in our sample..

This type of research paper writing can also be based only on means (instead of relationships). Consider two more examples: If a psychologist wants to see if age is related to memory loss, but doesn’t care whether this relationship is positive or negative (i.e., she simply wants to know if older people tend to have better memories than younger people), all she cares about are the mean memory scores for groups defined by age.

On the other hand, if the psychologist wants to know if older people tend to have better memories than younger, but she also cares about how large this difference is (e.g.. their mean score is 10 points higher), then her research project will include calculations of relationships between age and memory scores.

How to write a statistics research paper

Getting started with your research paper is a difficult task. All you need to do is hop online and search for some information on how to write it but what are the necessary steps?

You can learn them by following this short guide:

Start by proper research:

Write an introduction, transition into your thesis statement, and finish off with a conclusion of sorts – this structure will not only help you produce a strong foundation, but also provide the reader with clues regarding where you’re going in particular when writing statistics essay.

The introduction should be broad enough to capture attention of the reader and yet narrow enough to indicate that it’s about something specific.

It can also serve as a springboard for later argumentation or present the central idea.

What makes a good research paper introduction , however, is a little bit of mystery or enigma that makes the reader go “I want to know more about this, why do they think so?”

Finding journal articles on statistics research paper

The next step is to locate journals and magazines. It’s best if you have some knowledge about what kind of scholarly work this will entail – statistics papers are not often found in tabloids but rather in peer reviewed sources.

If you’re struggling with compiling a comprehensive essay list of online sources, try asking for help at your school’s librarian or library.

There are also other options such as using one of those article databases which contain articles from all over the world and sorted by category: perhaps there’s something there – it doesn’t even have to be a journal article.

There are also books out there that have statistics papers in them – in case you want to go a little bit old school.

Writing body paragraphs

The next step is writing your body or main part of the statistics research paper, so it makes sense first to decide on what kind of statistical test you’ll use and then look up relevant information about it.

Use this information as the building blocks for paragraphs going into details: why did the statistician choose it?

What are some common criticisms/counterpoints?

Which should be supported by specific examples and further justification of why they’re relevant. The next section will deal with interpreting numerical results and drawing conclusions – this involves taking those numbers and making something meaningful out of them beyond just comparing them to each other. It could be a lot to take in, so it’s best if you break this down and look up specific information for different parts of it.

Write your paper

Now that you have collected all the necessary material, statistics research paper writing shouldn’t be hard – remember not to simply copy/paste the information from somewhere else without citing your sources and making sure that your work is better than what you’ve copied.

The final step will involve polishing and proofreading your work – make sure there are no mistakes when submitting or publishing online.

You should always use correct grammar, spelling and punctuation as well as consistent referencing/citation style (MLA, APA etc.).

And there you have it: statistics research paper.

Write perfect conclusion

The next step is writing your conclusions. You’ve already done a lot of the work for this during the body part so it shouldn’t be anything out of the ordinary.

Make sure you reiterate what you said in the introduction and sometimes add some more commentary on particular things that could/can serve as relevant examples for future research or topics to investigate further.

One thing which is especially important here is presenting results/data clearly and making them easy to understand even for people who aren’t statisticians themselves. This can help greatly with potential critics.

A statistics research paper, then, should include a summary of the methods used to gather and analyze your data (usually presented in sections 2 & 3), followed by your findings (usually presented in sections 4 – 7). All of this material must be contained within a single document.

Your paper is organized differently from other types of research reports. The most common order has been presented here so as to make it easier for you to adapt these sections back into your own planning:

Note: The examples provided above are not intended to represent all possible formats that could be used but rather they provide information about what most researchers tend to do.

Also note: Depending on your course or assignment, you might be required to use certain formatting styles or these requirements may vary from one class or professor to another. But it’s always important to know how and when to cite (reference) sources within your paper.

The last few sections are optional, depending on the format guidelines established by the instructor for your assignment..

Good luck with this paper!

Statistics research paper outline template

The following is the general format and structure of a statistics research paper: introduction/background, problem statement, objectives, materials & methods, results and discussion sections. Use this table of contents as an outline when you are beginning your research.

Statistics research paper outline template

As you begin each portion make sure to refer back to this outline.

Writing an idealized outline for a statistics research paper. Each of the research paper outline item above has been discussed below in depth.

Statistics research writing tips:

Make sure to explain your purpose for the study and also give some background information on the problem. This background information should be used as a way of showing why your statistics research paper is important and significant to your field.  Here are some ways to say it without coming off as too boring or unprofessional. They’re quite general but good enough:

State what your hypothesis was; how you came up with it and any problems you faced trying to prove it.

This is how simple it is to write a great statistics term paper or research paper. If you get stuck, you can ask for research paper writing help from expert tutors.

Statistics Research Topics

Wondering what to write a research paper on statistics and probability about?

Statistics is a branch of mathematics dealing with the collection, analysis and interpretation or data.

Statistics is used in many fields, including natural science , social sciences, business and engineering.

A statistician collects, computes and analyzes numerical data to summarize information.

If you are looking for a statistics research paper topic ideas you’ve come to the right place!

statistics research paper topics

Check out this list below of major research paper topics in statistics and probability for college students:

Statistics Research Topics – Probability

Description: Probability deals with events that have uncertain outcomes..  It involves mathematical calculations using formulas involving random variables such as probability density functions (pdf), probability distributions , expected values or moments E(x), variance V(X).. In other words a probability distribution summarizes all possible outcomes in terms of probabilities based on theoretical assumptions or data collection.

A specific probability distribution can be constructed from a collection of frequencies of events in the long run.

This is known as the Central Limit Theorem  which says that if we take averages of random variables over very large numbers, their distributions will approach normal (i.e. bell-shaped curve) regardless of the shape or other details about them.

It’s also a probability measure used to find out how likely it is that one event A happens given that another event B happened first.

Statistics Research Topics – Descriptive statistics

Description: descriptive statistics collects and interprets numerical data in terms of distributions, graphs, measures and relationships among variables.

For example mean, median and mode are measures of central tendency and dispersion.

On the other hand, standard deviation measures dispersion. This is a statistical technique used to summarize data in terms of its most important features.

Descriptive statistics is also necessary for analyzing real-life situations.

It provides information that’s useful for making business decisions.

Statistics Research Topics – Testing significance of relationships (correlation)

Description: correlation deals with measuring the strength , direction and stability of relationship between two or more variables.

A positive correlation indicates that as the value of one variable increases , so does the value of another variable.

For example, if employees in a call center perform better at work when they are seated close to their supervisors, this would be an example of positive correlation because it that shows that as one variable (seating arrangements) increases the other variable (performance) also increases.

On the other hand, if two variables are negatively correlated which means that as one variable increases, the other decreases.

For example, if some countries have a high GDP per capita and low population growth rate then this would be an example of negative correlation because it shows that as your income rises your population falls.

Statistics – Research Paper – Sampling

Description: sampling deals with determining appropriate sample size s for a study based on specific requirements.

Like when you decide to choose five people out of hundreds in order to conduct surveys or research studies.

The main idea behind sampling is to reduce information loss by minimizing unnecessary information.

Also, sampling is also used to make inferences about a population or to study it indirectly by studying a representative sample (group of people) which is expected to be close enough.

Statistics Research Topics – Hypothesis testing

Description: hypothesis testing deals with conducting statistical tests that determine whether or not the data provided supports certain claim or statistically significant conclusion.

For example if you want to test whether the data shows that girls earn higher grades than boys in math classes.

This would be done using a formal statistical procedure called the t-test. By calculating two values and comparing them we can see if there’s a difference between their means.

If it turns out that one group has an average significantly different from the other, then it’s considered significant.

Statistics – Research Paper – Analysis of variance (ANOVA)

Description: analysis of variance (ANOVA) deals with comparing means between at least three or more groups to determine if they’re similar within a certain margin.

It can also be used for determining whether an overall mean is different than another overall mean across several groups.

ANOVA helps determine if there are significant differences in values that would affect results and conclusions drawn from two or more related populations.

For example, you want to know if there’s any difference among four brands of your favorite soft drink so you conduct an experiment by taking 10 people who all like this particular kind and have them taste each of the brands and see which one they prefer.

Statistics Research Topics – Confidence intervals

Description: confidence intervals are used when conducting statistical studies dealing with hypothesis testing which involves making observations about a population based on data collected.

It is essentially ranges of values meant to indicate variability between certain estimated parameters within a group or group of people.

It helps shows how much uncertainty exists within estimates for a single population parameter.

This is done by adding and subtracting margins of error to the original estimation..

Statistics – Research Topics – Non-parametric tests

Description: non-parametric tests are used when conducting statistical studies to make comparisons between two or more samples using data that’s not completely numerical.

This is usually done by converting scores (e.g., number grades, percentages) into ranks so that you’re able to compare between them easier.

Some examples of these tests includes 1 rank sum test, sign test etc…

If you want find out how statistics can help your business especially in terms of improving operations efficiency and productivity then contact us today! We will be glad to guide you through this step by process!

Join us today and we will help you develop a strategic plan and help you write a statistics paper fast.

In academia, for example, research papers are useful to gain knowledge, learning from other people’s mistakes: A good statistics research paper can add value to your coursework if it has been written by a professional essay writer who understands both the subject of the paper and how it is relevant to all fields of study.

Professional Statistics Research Paper Writers

Tutlance is a hub for the best statistics research paper writers from an array of academic disciplines and backgrounds. Our specialists work with you to understand your need, then write a top-class statistic paper that fits the bill and exceed your expectation in every way. You can be sure that our statistics research paper writers will deliver high quality work 100% plagiarism free!

Why Choose Us?

Tutlance.com remains the best homework writing service to pay for a statistics research paper because of our outstanding writers and deliverables:

Other guides:

Statistics Research Paper Writing Help from Our Statistics Essay Experts

Get help writing any statistics paper from professional online statistics tutors . We make statistics writing and research easier. You can save time by ordering a paper written for you from our statistics tutors.

Statistics writing services are available for all college students who wish to engage in the study of data using numerical methods to analyze various aspects such as distribution, central tendency, dispersion, relationship between two or more sets of samples etc. Many people think that statistical analysis implies only numerical procedures but it is much more than just that. The practice has many applications in varied fields like Customer Relationship Management (CRM), Business Intelligence (BI) , Marketing and Sales Analysis, Data Mining, Quality Control (QC), Operations Research (OR), Finance and Economics etc.. Statistics have been used in every part of human life since ages because humans are curious about their environment around them.

How can Tutlance help me get my statistics essay done?

We understand how demanding writing a statistics essay can be given that it involves working with complex statistical data. Before we send your order to our writers, we will check for accuracy and precision. We provide professional statistics essay writing help to students in all academic levels: high school, college and university. Our experts can also write a research proposal or any other paper that involves interpreting data and drawing conclusions based on scientific research methods.

Can you help me come up with a good statistics research paper template?

Yes we can help you create a good statistics research paper template sample! Statistics research paper sample is a good guide to follow while writing your own term paper. It can be an essay, dissertation or thesis. Our writers provide free examples of statistics papers that you can use as a starting point when writing your own work.

We have plenty of experts with experience in various fields such as sociology, anthropology, economics and many more. Just place an order online or contact us by phone at any time. We will assign the most suitable expert to write your statistics research paper for money .

Can you help me write a business statistics research paper?

Yes we can help you write a business statistics research paper . We have professional writers who will prepare a creative and impactful paper based on your instructions. Our experts specialize in writing dissertations, term papers, essays and other academic papers for students across the globe. For college students busy with their final semester exams, this is the best time to get some assistance from our experts .

If you are struggling to do your statistics essay or research paper because of lack of topic ideas or need any kind of guidance regarding how to go about your work , take advantage of our services today by getting in touch with us at Tutlance.com . We’d be more than glad to assist you with any sort of statistical issue you might be experiencing in your academic life. Our company is renowned for providing high quality business statistics research paper writing assistance to all our college students at a very affordable price.

Read more about business statistics assignment help , psychology statistics homework help .

Can you provide an example of a probability and statistics research paper?

Yes, we can provide an example of a probability and statistics research paper. It might be required for students who face difficulty in writing their own term papers or essays. We will not only provide a sample essay but also supply you with relevant instructions that are necessary to carry out the process of academic writing.

Other kinds of examples of research papers in probability and statistics include:

We can provide sample statistics research papers on any topic. Contact our statistics paper writers for cheap probability theory homework help .

Can Tutlance actually do my statistics homework?

Absolutely! Tutlance employs only the most qualified and experienced statisticians who are highly proficient in statistical techniques such as probability sampling, regression analysis, chi-square test. They possess the ability to interpret complex data results including mean , median , mode etc. We have access to different online channels through which we can get your homework done fast and effectively.

Read more – how can I hire someone to do my statistics homework and can you help me with my math – resource pages.

Contact us today or fill out this no obligation form now – Ask a question online and get help with statistics project research paper.

do my stats homework

Statistics Help Pages

Related Pages

Online Services

Statistics Services

Other Resources

statistics answer generator

Related Research Paper Writing Guides

Probability and statistics research paper examples

probability and statistics research paper

Hire a Homework Doer in 3 Simple Steps!

Tutlance is the best website to solve statistics problems for you.

Enago Academy

Effective Use of Statistics in Research – Methods and Tools for Data Analysis

' src=

Remember that impending feeling you get when you are asked to analyze your data! Now that you have all the required raw data, you need to statistically prove your hypothesis. Representing your numerical data as part of statistics in research will also help in breaking the stereotype of being a biology student who can’t do math.

Statistical methods are essential for scientific research. In fact, statistical methods dominate the scientific research as they include planning, designing, collecting data, analyzing, drawing meaningful interpretation and reporting of research findings. Furthermore, the results acquired from research project are meaningless raw data unless analyzed with statistical tools. Therefore, determining statistics in research is of utmost necessity to justify research findings. In this article, we will discuss how using statistical methods for biology could help draw meaningful conclusion to analyze biological studies.

Table of Contents

Role of Statistics in Biological Research

Statistics is a branch of science that deals with collection, organization and analysis of data from the sample to the whole population. Moreover, it aids in designing a study more meticulously and also give a logical reasoning in concluding the hypothesis. Furthermore, biology study focuses on study of living organisms and their complex living pathways, which are very dynamic and cannot be explained with logical reasoning. However, statistics is more complex a field of study that defines and explains study patterns based on the sample sizes used. To be precise, statistics provides a trend in the conducted study.

Biological researchers often disregard the use of statistics in their research planning, and mainly use statistical tools at the end of their experiment. Therefore, giving rise to a complicated set of results which are not easily analyzed from statistical tools in research. Statistics in research can help a researcher approach the study in a stepwise manner, wherein the statistical analysis in research follows –

1. Establishing a Sample Size

Usually, a biological experiment starts with choosing samples and selecting the right number of repetitive experiments. Statistics in research deals with basics in statistics that provides statistical randomness and law of using large samples. Statistics teaches how choosing a sample size from a random large pool of sample helps extrapolate statistical findings and reduce experimental bias and errors.

2. Testing of Hypothesis

When conducting a statistical study with large sample pool, biological researchers must make sure that a conclusion is statistically significant. To achieve this, a researcher must create a hypothesis before examining the distribution of data. Furthermore, statistics in research helps interpret the data clustered near the mean of distributed data or spread across the distribution. These trends help analyze the sample and signify the hypothesis.

3. Data Interpretation Through Analysis

When dealing with large data, statistics in research assist in data analysis. This helps researchers to draw an effective conclusion from their experiment and observations. Concluding the study manually or from visual observation may give erroneous results; therefore, thorough statistical analysis will take into consideration all the other statistical measures and variance in the sample to provide a detailed interpretation of the data. Therefore, researchers produce a detailed and important data to support the conclusion.

Types of Statistical Research Methods That Aid in Data Analysis

statistics in research

Statistical analysis is the process of analyzing samples of data into patterns or trends that help researchers anticipate situations and make appropriate research conclusions. Based on the type of data, statistical analyses are of the following type:

1. Descriptive Analysis

The descriptive statistical analysis allows organizing and summarizing the large data into graphs and tables . Descriptive analysis involves various processes such as tabulation, measure of central tendency, measure of dispersion or variance, skewness measurements etc.

2. Inferential Analysis

The inferential statistical analysis allows to extrapolate the data acquired from a small sample size to the complete population. This analysis helps draw conclusions and make decisions about the whole population on the basis of sample data. It is a highly recommended statistical method for research projects that work with smaller sample size and meaning to extrapolate conclusion for large population.

3. Predictive Analysis

Predictive analysis is used to make a prediction of future events. This analysis is approached by marketing companies, insurance organizations, online service providers, data-driven marketing, and financial corporations.

4. Prescriptive Analysis

Prescriptive analysis examines data to find out what can be done next. It is widely used in business analysis for finding out the best possible outcome for a situation. It is nearly related to descriptive and predictive analysis. However, prescriptive analysis deals with giving appropriate suggestions among the available preferences.

5. Exploratory Data Analysis

EDA is generally the first step of the data analysis process that is conducted before performing any other statistical analysis technique. It completely focuses on analyzing patterns in the data to recognize potential relationships. EDA is used to discover unknown associations within data, inspect missing data from collected data and obtain maximum insights.

6. Causal Analysis

Causal analysis assists in understanding and determining the reasons behind “why” things happen in a certain way, as they appear. This analysis helps identify root cause of failures or simply find the basic reason why something could happen. For example, causal analysis is used to understand what will happen to the provided variable if another variable changes.

7. Mechanistic Analysis

This is a least common type of statistical analysis. The mechanistic analysis is used in the process of big data analytics and biological science. It uses the concept of understanding individual changes in variables that cause changes in other variables correspondingly while excluding external influences.

Important Statistical Tools In Research

Researchers in the biological field find statistical analysis in research as the scariest aspect of completing research. However, statistical tools in research can help researchers understand what to do with data and how to interpret the results, making this process as easy as possible.

1. Statistical Package for Social Science (SPSS)

It is a widely used software package for human behavior research. SPSS can compile descriptive statistics, as well as graphical depictions of result. Moreover, it includes the option to create scripts that automate analysis or carry out more advanced statistical processing.

2. R Foundation for Statistical Computing

This software package is used among human behavior research and other fields. R is a powerful tool and has a steep learning curve. However, it requires a certain level of coding. Furthermore, it comes with an active community that is engaged in building and enhancing the software and the associated plugins.

3. MATLAB (The Mathworks)

It is an analytical platform and a programming language. Researchers and engineers use this software and create their own code and help answer their research question. While MatLab can be a difficult tool to use for novices, it offers flexibility in terms of what the researcher needs.

4. Microsoft Excel

Not the best solution for statistical analysis in research, but MS Excel offers wide variety of tools for data visualization and simple statistics. It is easy to generate summary and customizable graphs and figures. MS Excel is the most accessible option for those wanting to start with statistics.

5. Statistical Analysis Software (SAS)

It is a statistical platform used in business, healthcare, and human behavior research alike. It can carry out advanced analyzes and produce publication-worthy figures, tables and charts .

6. GraphPad Prism

It is a premium software that is primarily used among biology researchers. But, it offers a range of variety to be used in various other fields. Similar to SPSS, GraphPad gives scripting option to automate analyses to carry out complex statistical calculations.

This software offers basic as well as advanced statistical tools for data analysis. However, similar to GraphPad and SPSS, minitab needs command over coding and can offer automated analyses.

Use of Statistical Tools In Research and Data Analysis

Statistical tools manage the large data. Many biological studies use large data to analyze the trends and patterns in studies. Therefore, using statistical tools becomes essential, as they manage the large data sets, making data processing more convenient.

Following these steps will help biological researchers to showcase the statistics in research in detail, and develop accurate hypothesis and use correct tools for it.

There are a range of statistical tools in research which can help researchers manage their research data and improve the outcome of their research by better interpretation of data. You could use statistics in research by understanding the research question, knowledge of statistics and your personal experience in coding.

Have you faced challenges while using statistics in research? How did you manage it? Did you use any of the statistical tools to help you with your research data? Do write to us or comment below!

' src=

nice article to read

Holistic but delineating. A very good read.

Rate this article Cancel Reply

Your email address will not be published.

research paper statistical analysis

Enago Academy's Most Popular

best plagiarism checker

Best Plagiarism Checker Tool for Researchers — Top 4 to choose from!

While common writing issues like language enhancement, punctuation errors, grammatical errors, etc. can be dealt…

Year

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats were achieved!

It’s beginning to look a lot like success! Some of the greatest opportunities to research…

research paper statistical analysis

Qualitative Vs. Quantitative Research — A step-wise guide to conduct research

A research study includes the collection and analysis of data. In quantitative research, the data…

explanatory variables

Explanatory & Response Variable in Statistics — A quick guide for early career researchers!

Often researchers have a difficult time choosing the parameters and variables (like explanatory and response…

research paper statistical analysis

6 Tools to Create Flawless Presentations and Assignments

No matter how you look at it, presentations are vital to students’ success. It is…

2022 in a Nutshell — Reminiscing the year when opportunities were seized and feats…

Explanatory & Response Variable in Statistics — A quick guide for early career…

Sign-up to read more

Subscribe for free to get unrestricted access to all our resources on research writing and academic publishing including:

We hate spam too. We promise to protect your privacy and never spam you.

I am looking for Editing/ Proofreading services for my manuscript Tentative date of next journal submission:

research paper statistical analysis

For what are you most likely to depend on AI-assistance?

Quvae

Press ESC to close

How to write data analysis in a research paper.

How to write data analysis in a research paper?

Data from quantitative research can be analyzed using statistical methods. From sample data, predictions about a population can be tested with probabilities and models. The statistical analysis of quantitative data involves examining trends, patterns, and relationships. Currently, statistical tools are widely used by researchers, scientists, academicians, government agencies, businesses, and several other organizations. Data collected through an experiment or a probability sampling method can be subjected to statistical tests. The sample size of a statistical test must be large enough to accurately reflect the true distribution of the population to be studied. When deciding which statistical test to use to write, one must be aware of whether the data is consistent with certain assumptions and which variables are involved.

The statistical analysis needs to be carefully planned at the very beginning of the research process to draw valid conclusions. A researcher paper must specify their hypotheses and determine the design of their study, the sample size, and the sampling procedure. Data collected from the sample can be organized and summarized using descriptive statistics. Using inferential statistics, it is possible to formalize the process of testing hypotheses and estimating the population size. Findings can then be interpreted and generalized. A descriptive statistic summarizes the characteristics of a set of data. Inferential statistics are used to test hypotheses and evaluate whether the data on which they are based can be generalized to a broader population.

how to write data analysis in a research paper?

Step 1: Plan your hypothesis and research design

Specifying your hypotheses and designing your research will help you collect valid data for statistical analysis.

Analyzing statistical hypotheses

Researchers often examine the relationship between variables within a population. A prediction is the starting point, and statistical analysis allows you to test that prediction. A statistical hypothesis is a method for formulating a prediction about a population in a prescribed manner. The null and alternative hypotheses for each research prediction can be verified using the sample data. Although the null hypothesis always predicts no relationship between variables, the alternative hypothesis describes the relationship you predict from your research.

How to plan your research design

The research design refers to your method for gathering and analyzing data. This helps you determine what statistical test you will need to test your hypothesis. Determine whether you are applying an experimental approach, a descriptive approach, or a correlational approach in your research design. Studies using analysis and correlation do not directly affect variables, but only measure them, but experimental studies influence variables directly. We can investigate the relationships between variables by using a correlational design. The descriptive design involves the use of statistical tests to draw inferences from sample data about the characteristics of a population or phenomenon.

Measuring variables

Plan your research design by operationalizing your variables and deciding how they will be measured. The level of measurement of your variables is an important consideration when conducting statistical analysis. Categorical data consist of groups, while quantitative data are concerned with amounts. Making the right statistical and hypothesis testing choices is based on identifying the measurement level.

Step 2: Obtain data from a representative sample

Once you have used an appropriate sampling procedure when conducting statistical analysis, you can extend your conclusions beyond your sample. Probability sampling involves selecting participants at random from the population to conduct a study. In non-probability sampling, some members of a population have a higher chance of being selected than other members due to factors such as convenience or self-selection.

Step 3: Analyze and summarize your data

Your data can be examined and summarized with descriptive statistics after you have collected all the data.

Step 4: Analyze hypotheses and make inferences using inferential statistics

Numbers that describe a sample are called statistics, while numbers that describe the entire population are called parameters. Using inference statistics, you can conclude the characteristics of a population based on a sample. Based on the null hypothesis, statistical tests determine where the sampled data would fall in an expected distribution. There are two main outcomes of these tests: You can determine your test statistic by comparing your results with the null hypothesis. The test statistic measures how much your data deviate from the null hypothesis. A p-value helps you determine whether you are likely to obtain your results if your null hypothesis is true among the population.

Step 5: Interpret your results

Interpreting your findings is the final step of our statistical analysis.

Statistical significance

Conclusions are drawn from hypothesis testing based on statistical significance. The p-value of your results is compared with a set significance level (0.05) to determine whether they are statistically significant. The likelihood that statistically significant results arise from chance is highly unlikely. Such a result is extremely unlikely to occur in the population if the null hypothesis is true.

Analysing data and interpreting the results requires practice and guidance

Statistics

Organization

The organisation is the key to writing a good report. An outline should include: 1) the problem overview, 2) the data analysis and model approach, 3) the results of the data analysis, and 4) the substantive conclusions.

Problem Overview

Data Analysis and Model Approach

A Data and Model section will sometimes include graphs or tables, and sometimes not. Include a plot if you believe it will assist the reader in understanding the problem or data set itself, rather than your conclusions. While these tables provide important information about the data and approach, they do not provide information about the results of the study.

The results of the Data Analysis

Provide figures and tables that must support your argument in your results section. Label the images add informative captions, and refer to them in the text by their numbered labels. The following items might be included here: pictures of the data, pictures of the fitted model, a table of coefficients, and summaries of the model.

The Substantive Conclusions

Factors to consider when analyzing your research data

To demonstrate a high standard of research practice, researchers should possess adequate skills for analysing data. To gain better insights into data, it is ideal for researchers to understand the rationale for selecting one statistical method over another. The methods used in research and data analysis differ in scientific fields; therefore, designing a survey questionnaire, choosing data collection methods, and choosing a sample play a crucial role at the outset of an analysis. Analysing data in research presents accurate and reliable information. The most important thing researchers should remember when analysing data is to remain open and unbiased toward unpredictable patterns, results, and expressions.

Leave a Reply Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Share Article:

Other stories

Scopus q1 q2 q3 and q4  journals, interview with dr.kukiat tudpor: geographic information system-based health surveillance for community-dwelling older persons.

Whatsapp Icon

Please note that Internet Explorer version 8.x is not supported as of January 1, 2016. Please refer to this page for more information.

Statistical Analysis

The statistical analysis (ANOVA) revealed significant differences (p < 0.01) between the corresponding classes of substances belonging respectively to olive oils and seed oils.

From: Olives and Olive Oil in Health and Disease Prevention , 2010

Related terms:

Statistical Methods

Thomas D. Gauthier , Mark E. Hawley , in Introduction to Environmental Forensics (Third Edition) , 2015

5.3.3.3 Estimation of Areal Averages

Environmental forensics projects frequently involve estimation of average values for areas that are defined by political or property boundaries. The simplest way to derive these estimates is to average all of the observations collected within the area of interest. This is consistent with the assumptions that underlie most classic statistical methods; that is, the observed values are independent observations that are all equally representative of the population of interest. In cases in which the data set is characterized by significant trends or spatial persistence, however, weighted averaging methods should be considered.

Block kriging is a statistical method of computing areal averages that can be used with data sets that exhibit both regional trends and spatial persistence. This method generally provides average values for rectangular areas and is appropriate for use with large data sets. When the data set is small, and especially when regional trends account for nearly all of the total variation, weighted averages calculated from isopleth maps are appropriate. The area is divided into subareas by the isopleths, each subarea is represented by the average of the isopleths that define it, and the weights are based on the proportion of the total area in each subarea. In cases where significant trends are not evident, the method of Thiessen polygons may be more appropriate. This method assumes that the value at each unsampled location is equal to the value at the nearest sampled location. The area of interest is divided into subareas by the perpendicular bisectors of the lines that connect the various sampling locations, and the proportion of the total area represented by each observation is used to assign weights in the averaging process. An example of a field of values divided into Thiessen polygons is shown in Figure 5.14 .

Assessing Available Information

Edward E. Whang , Stanley W. Ashley , in Surgical Research , 2001

e. Statistics

Statistical methods are discussed in greater detail in a separate chapter in this book. Three of the most prevalent statistical errors about which to be vigilant are ( 1 ) statistical analysis methods and sample size determinations being made after data collection ( posteriori ) rather than a priori , ( 2 ) lack of significance being interpreted to imply lack of difference [studies with negative (not statistically different) results must include power calculations so that the probability of type II errors (differences not detected when there are differences) can be assessed], and ( 3 ) multiple outcome measurements, multiple comparisons, and subgroup comparisons [in the absence of appropriate multivariable procedures and clear a priori hypotheses, suspect the presence of type I errors (differences concluded when there are no differences)].

Speech Recognition: Statistical Methods

L.R. Rabiner , B.-H. Juang , in Encyclopedia of Language & Linguistics (Second Edition) , 2006

Statistical methods for speech processing refer to a general methodology in which knowledge about both a speech signal and the language that it expresses, along with practical uses of that knowledge for specific tasks or services, is developed from actual realizations of speech data through a well-defined mathematical and statistical formalism. For more than 20 years, this basic methodology has produced many advances and new results, particularly for recognizing and understanding speech and natural language by machine. In this article, we focus on two important statistical methods, one based primarily on a hidden Markov model formulation that has gained widespread acceptance as the dominant technique in characterizing the variation in the acoustic signal representing speech, and one related to the use of statistics for characterizing word co-occurrences. This second model acts as a form of grammar or set of syntactical constraints on the language. In contrast to earlier systems that employed knowledge based on linguistic analyses, these data-driven statistical methods have proven to produce consistent and useful results and have become the underpinning technology of modern speech recognition and understanding systems. Such systems are used in a wide range of applications such as automatic telephone call routing and information retrieval.

Statistical Analysis for Experimental-Type Designs

Elizabeth DePoy PhD, MSW, OTR , Laura N. Gitlin PhD , in Introduction to Research (Fifth Edition) , 2016

What Is Statistical Analysis?

Statistical analysis is concerned with the organization and interpretation of data according to well-defined, systematic, and mathematical procedures and rules. The term “data” refers to information obtained through data collection to answer such research questions as, “How much?” “How many?” “How long?” “How fast?” and “How related?” In statistical analysis, data are represented by numbers. The value of numerical representation lies largely in the asserted clarity of numbers. This property cannot always be exhibited in words. 1

 For example, assume you visit your physician, and she indicates that you need a surgical procedure. If the physician says that most patients survive the operation, you will want to know what is meant by “most.” Does it mean 58 out of 100 patients survive the operation, or 80 out of 100? 

Numerical data provide a precise standardized language to describe phenomena. As tools, statistical analyses provide a method for systematically analyzing and drawing conclusions to tell a quantitative story. 2 Statistical analyses can be viewed as the stepping stones used by the experimental-type researcher to cross a stream from one bank (the question) to the other (the answer).

You now can see that there are no surprises in the tradition of experimental-type research. Statistical analysis in this tradition is guided by and dependent on all the previous steps of the research process, including the level of knowledge development, research problem, research question, study design, number of study variables, level of measurement, sampling procedures, and sample size. Each of these steps logically leads to the selection of appropriate statistical actions. We discuss each of these later in the chapter.

First, it is important to understand three categories of analysis in the field of statistics: descriptive, inferential, and associational. Each level of statistical analysis corresponds to the particular level of knowledge about the topic, the specific type of question asked by the researcher, and whether the data are derived from the population as a whole or are a subset or sample. Recall that we briefly discussed the implications of boundary setting for statistical choice. This last point, population or sample, will become clear in this chapter. Experimental-type researchers aim to predict the cause of phenomena. Thus, the three levels of statistical analysis are hierarchical and consistent with the level of research questioning discussed in Chapter 8 , with description being the most basic level.

Descriptive statistics form the first level of statistical analysis and are used to reduce large sets of observations into more compact and interpretable forms. 1,2 If study subjects consist of the entire research population, descriptive statistics can be primarily used; however, descriptive statistics are also used to summarize the data derived from a sample. Description is the first step of any analytical process and typically involves counting occurrences, proportions, or distributions of phenomena. The investigator descriptively examines the data before proceeding to the next levels of analysis.

The second level of statistics involves making inferences. Inferential statistics are used to draw conclusions about population parameters based on findings from a sample. 3 The statistics in this category are concerned with tests of significance to generalize findings to the population from which the sample is drawn. Inferential statistics are also used to examine group differences within a sample. If the study subjects are a sample, both descriptive and inferential statistics can be used in concert with one another. There is no need to use inferential statistics when analyzing results from an entire population because the purpose of inferential statistics is to estimate population characteristics and phenomena from the study of a smaller group, a sample.

By their nature, inferential statistics account for errors that may occur when drawing conclusions about a large group based on a smaller segment of that group. You can therefore see, when studying a population in which every element is represented in the study, why no sampling error will occur and thus why there is no need to draw inferences.

Associational statistics are the third level of statistical analysis. 3,4 These statistics refer to a set of procedures designed to identify relationships bet­ween and among multiple variables and to determine whether knowledge of one set of data allows the investigator to infer or predict the characteristics of another set. The primary purpose of these multivariate types of statistical analyses is to make causal statements and predictions.

Table 20-1 summarizes the primary statistical procedures associated with each level of analysis. A summary of the relationship among the level of knowledge, type of question, and level of statistical analysis is presented in Table 20-2 . Let us examine the purpose and logic of each level of statistical analysis in greater detail.

Writing a Protocol

Elizabeth A. Bartrum , Barbara I. Karp , in Principles and Practice of Clinical Research (Fourth Edition) , 2018

The statistical analysis section provides crucial information on how the collected data and samples will be analyzed to achieve the primary and secondary study aims. The statistical analysis section should have sufficient information for reviewing committees to be able to determine that the methodology is sound and valid for the planned analyses.

The statistical analysis section also needs to provide information, such as a power analysis (see Chapter 25 ) to support the accrual number request. The number of planned subjects to enroll should be adequate to provide sufficient data for valid results, while also being the minimum number needed as well to avoid unnecessary exposure of participants to research risks. The expertise of a statistician should be obtained when designing the study and again when writing the statistics section of the protocol.

Statistical Analysis for Security and Supervision

Whitney DeCamp , ... Robert A. Metscher , in Security Supervision and Management (Fourth Edition) , 2015

The Collection of Data

Why does a security supervisor need to learn about the collection of data or research methods? The reasons are quite simple: the ability of managers and supervisors to sense, spot, and address problems before they become serious creates a tremendous advantage for the companies that employ them. Knowing about research and problem-solving processes assists supervisors in identifying problems and finding out more about the situation. By collecting and analyzing data, then displaying it as useful information, the security professional can answer fundamental operational and strategic planning questions.

Statistical analysis ultimately boils down to numerical results. With computer programs available to do the mathematical computations, most people using statistics can focus more on simply how to get the computer to do what they want and how to interpret the results. You do not have to be a mathematical wizard to do this—the average person can calculate and interpret meaningful statistics. There are three steps involved in statistical analysis:

The collection of data

The organization of data

The analysis of data

A raw dataset—that is, a spreadsheet or table with lots of numbers—cannot usually be understood by simply looking at it. To grasp the meaning of a vast amount of numerical data, its bulk must be reduced; that is, it must be made manageable. The process of abstracting the significant facts contained in the data and making clear and concise statements about the derived results, constitutes a statistical analysis. Common sense and experience are key elements in the analysis phase of information gathering. The purpose is to give a summarized and comprehensible numerical description of large amounts of information.

Operational considerations include the following:

What programs, tasks, or actions consume our efforts?

Why are we engaged in these activities?

How can we show our activities are efficient and effective?

Strategic considerations include the following:

What goals represent the next horizon?

What must we do to support that direction?

If not, how must we change?

Imaging Physics

In Primer of Diagnostic Imaging (Fifth Edition) , 2011

Statistical testing

Statistical Methods to Test Hypotheses

Preclinical Evaluation of Carcinogenicity Using Standard-Bred and Genetically Engineered Rodent Models

D.L. McCormick , in A Comprehensive Guide to Toxicology in Nonclinical Drug Development (Second Edition) , 2017

The statistical analysis of tumor incidence data is a critical element of the interpretation of the results of carcinogenicity bioassays. Unfortunately, the complexity of the statistical analyses required, when considered with the number of different statistical approaches that may be used to evaluate carcinogenicity data, suggest that a comprehensive analysis and discussion of these analyses is beyond the scope of the current chapter.

Numerous approaches to the statistical analysis of carcinogenicity data have been proposed (eg, see Refs. [4,38,55] ), and the Center for Drug Evaluation and Research at the FDA has issued a draft guidance document in which statistical approaches to the analysis and interpretation of carcinogenicity data are discussed [58] . However, there is relatively little overall consensus on what constitutes the “optimal” approach to statistical analysis of tumor data from carcinogenesis bioassays, and no single approach to the analysis of these data has received broad support from the scientific community. As an example, a white paper in which members of the Society of Toxicologic Pathology [38] discussed approaches to statistical analysis of carcinogenicity data generated seven commentaries in which alternate approaches to this analysis were proposed [39] .

In consideration of the complexity of the statistical analysis of carcinogenicity data, it is strongly recommended that the practicing toxicologist enlist the support of a statistician who is experienced in the analyses of such data.

Statistical Clustering

J.A. Hartigan , in International Encyclopedia of the Social & Behavioral Sciences , 2001

Statistical methods assist in classification in four ways: in devising probability models for data and classes so that probable classifications for a given set of data can be identified; in developing tests of validity of particular classes produced by a classification scheme; in comparing different classification schemes for effectiveness; and in expanding the search for optimal classifications by probabilistic search strategies. Standard hierarchical and partitioning algorithms are evaluated from a statistical point of view, and some improvements of these algorithms using density estimation methods and mixture models are proposed. Methods are given for identifying multimodality in high dimensional data.

Mechanisms and Risks of Decompression

Richard D. Vann , in Bove and Davis' Diving Medicine (Fourth Edition) , 2004

CONCLUSIONS

Statistical methods used in probabilistic modeling are not wise in themselves and are simply data-fitting tools that compensate for ignorance regarding underlying mechanisms. Bubble formation, inert gas exchange, and pathophysiology cannot be assumed to be identical in the brain, spinal cord, and limbs. This is why decompression modes should represent premorbid physiology as closely as possible and why understanding this physiology has practical importance. Relating physiology to decompression safety is an epidemiologic problem associated with finding the probability of injury in the context of the individual, the environment, and the exposure. Much will be gained by formalizing operational and clinical methods and by applying analytical techniques used widely in science and medicine.

IMAGES

  1. Statistical Analysis Research Paper Example

    research paper statistical analysis

  2. 😍 Statistical research paper. How to Find Statistics for a Research Paper: 14 Steps. 2019-02-21

    research paper statistical analysis

  3. Analysis In A Research Paper

    research paper statistical analysis

  4. Sample Research Paper Statistical Analysis

    research paper statistical analysis

  5. 🎉 Analysis of newspaper research report results paper. Analysis of Newspaper Research Report

    research paper statistical analysis

  6. Statistical Analysis Types

    research paper statistical analysis

VIDEO

  1. How to Earn Money as a Statistician

  2. m. com 1st semester 2021 2022 mdu university exam question paper Statistical Anylasis for business

  3. Introduction to the New Statistics

  4. Statistical Mechanics: An Introduction (PHY)

  5. Security thread paper

  6. How to analyse qualitative data

COMMENTS

  1. The Beginner's Guide to Statistical Analysis

    Statistical analysis is an important part of quantitative research. You can use it to test hypotheses and make estimates about populations.

  2. Basic statistical tools in research and data analysis

    Statistical methods involved in carrying out a study include planning, designing, collecting data, analysing, drawing meaningful interpretation and

  3. Practical recommendations for statistical analysis and data ...

    Problematic part of the article, also often neglected by authors is the subheading Statistical analysis, within Materials and methods, where authors must

  4. Chapter 10. Experimental Design: Statistical Analysis of Data

    A news report by MSNBC describes a study in which children were observed carrying school backpacks. The article states: Thirteen children ages 8 and 9 walked

  5. An evaluation of the quality of statistical design and analysis of

    The application of statistics in reported research in trauma and ... In the 34 papers reporting missing data, 28 based the analysis on

  6. What is Statistical Analysis? Types, Methods and Examples

    Statistical analysis is the process of collecting and analyzing data in order to discern patterns and trends. It is a method for removing bias from

  7. Writing a Statistics Research Paper

    A statistics research paper, then, should include a summary of the methods used to gather and analyze your data (usually presented in sections 2 & 3)

  8. Role of Statistics in Research

    Role of Statistics in Biological Research · 1. Establishing a Sample Size · 2. Testing of Hypothesis · 3. Data Interpretation Through Analysis.

  9. How to write data analysis in a research paper?

    How to write data analysis in a research paper? · Step 1: Plan your hypothesis and research design · Analyzing statistical hypotheses · How to plan your research

  10. Statistical Analysis

    As tools, statistical analyses provide a method for systematically analyzing and drawing conclusions to tell a quantitative story. ... Statistical analyses can be