Methodology

Evaluation Framework for Assessing Clinical Studies on Probiotics in IBS

Evidence Inclusion

This database focused exclusively on randomized, placebo-controlled trials of probiotics in the IBS population to ensure the highest quality of evidence. Open-label trials, which lacked a placebo control, were excluded due to the significant placebo response often observed in IBS populations.

 

At this time, synbiotic formulations were not included in the database. We also generally excluded individual probiotics studied within a food or beverage matrix. However, we made exceptions for probiotics with multiple studies using both food/beverage and non-food delivery matrices. This approach helped us capture the totality of evidence for a given probiotic. In these cases, any potential confounding effects of the delivery matrix were noted in the study summaries.

 

Later stages of this project may include synbiotic formulations and a broader range of alternative delivery matrices.


The Overall Concept

Our evaluation system produced two independent groups of values: 1) Strength and Direction of Reported Effects on Specific Symptoms 2) Quality of Evidence

Strength and Direction of Reported Effects on Specific Symptoms

We produced a value for each symptom based on reported effects. Our list included ten common and recurrent symptoms or parameters frequently measured in placebo-controlled trials of probiotics in IBS.

Symptoms not fitting within this framework were classified as “other symptoms” in our study findings. For further details, we’ve documented how we’ve categorized different symptoms across studies under ten parameters.

Classification of Effect Sizes:

  • Strong
  • Moderate
  • Weak
  • No effects
  • Adverse effects
  • Not reported (missing or disqualified data)

Calculation Details:

Cohen’s d and Cohen’s h were calculated as effect sizes. Based on our dataset’s distribution, we interpreted effect sizes as follows: values up to 0.5 were classified as weak, representing the bottom 40% of beneficial effect sizes; values between 0.5 and 1.0 were considered moderate, covering the middle 35% of beneficial effect sizes; and values over 1.0 were deemed strong, comprising the top 25% of beneficial effect sizes in our database.

When the treatment and control groups had similar pretest values for a variable, effect sizes were calculated using the posttest values of the compared groups. If pretest values differed significantly between groups, the effect size was determined by the pretest-posttest differences between the group means. In both scenarios, the standard deviation at posttest was used in the calculations.

For proportions, Cohen’s h was calculated. Due to the similar interpretative guidelines for Cohen’s d and Cohen’s h, we treated them as comparable and used them jointly in our calculations.


Quality of Evidence

We calculated a quality of evidence index by evaluating several key indicators of study validity, largely inspired by the methodology of Higgins et al.(1)

  • Randomization Process: Were participants randomized into groups?
  • Description of Randomization: Is there a description of the randomization procedure? Does the description indicate that the procedure was valid?
  • Baseline Comparison: Does the comparison between groups at baseline indicate potential issues with the randomization procedure?
  • Sample Size Sufficiency: Is the sample very small (e.g., less than 30 people per group)? Is the sample size sufficient to detect the expected effects?
  • Generalizability: Does the sample have characteristics that might limit the generalizability of the findings (e.g., participants with a narrow range of characteristics or with very specific traits different from the general population)? If not reported, we assumed generalizable characteristics.
  • Concealment from Participants: Was the group assignment concealed from study participants? 
  • If yes, did most or all participants become aware of their assigned intervention during the study? If not reported, we assumed participants were aware.
  • Concealment from Researchers: Were researchers and other individuals working with the participants aware of the intervention the participants were undergoing? If not reported, we assumed they were aware.
  • Identical Administration Vehicles: Were the treatment and placebo/control administration vehicles identical? If not reported, we assumed they were not identical.
  • Protocol Deviations: Were there deviations from the treatment, functioning of the treatment, or assignment protocols? Could these deviations have affected the results? If there were deviations, were they present equally in both groups or only in one group? If not reported, we assumed there were none.
  • Unplanned Interventions: Did participants undergo any interventions or treatments during the study that were not part of the protocol? Were these the same across all groups? If yes, could they have affected the intervention outcome? If not reported, we assumed there were none.
  • Attrition Rate: Was the attrition rate low (e.g., up to 5%)?
  • Group Differences in Attrition: Did attrition rates differ substantially between groups?
  • Impact of Attrition: Are there indications that attrition could have affected the outcome? Was attrition specific to people with certain values of the outcome or specific study-relevant characteristics? Not reporting was considered negative if the answer to the first question was no.
  • Validity of Outcome Assessment: Did the study use a valid/recognized method of assessing the outcomes?
  • Consistency in Measurement: Was the outcome assessment method the same in all study groups (at all time points and with different subgroups)?
  • Blinding in Assessment: Were the assessors (or participants in self-reports) aware of the intervention group they were assessing at the time of the assessment? Could this knowledge affect the assessment? If both questions were answered yes, this was considered negative. If no to the first question, the second was not considered (e.g., in the case of objective measures).
  • Completeness of Reporting: Did the authors report results for all measures they used, particularly if they used multiple measures of the same outcome variable?
  • Selective Reporting: Did they selectively report results?
  • Results Spin: Are there indications of result spinning? Is there inappropriate use of causal language? Are there conclusions and claims that do not stem from or contradict the study results? Is there sloppy reporting (e.g., obvious reporting errors that appear to be omissions or sloppiness)?

Scoring of Evidence Quality

See table to left

 

Note on Disqualifiers:

Our quality of evidence evaluation included a field for disqualifiers. Disqualifiers are significant issues that, when identified, reduce the quality of evidence to zero. These include major mistakes, omissions, or indicators of data tampering, result spinning, or misrepresentation of results, which are so severe that they completely undermine the validity of any reported findings.

Combining the Evidence Quality Scores

Calculating evidence quality scores for specific studies involved summing the points for each indicator group and then summing the group scores to get an overall score. Currently, this is an unweighted composite score, meaning each group carries the same weight.
The disqualifier served as a gatekeeper item, meaning the entire evidence quality score was set to zero if the disqualifier had a value of “Yes.”

Calculation Methodology:

 

  • Summing Points for Each Group: Points from each indicator within a group were summed to get the group score.
  • Summing Group Scores: Group scores were then summed to get the overall evidence quality score for each study.
  • Disqualifier Impact: If any study had a disqualifier marked as “Yes,” the overall evidence quality score for that study was set to zero.

Combining the Evidence Quality Scores:

 

The combined evidence quality for a probiotic was calculated as a weighted mean of the evidence quality scores from all studies examining that probiotic included in the analysis.

Weights Used:

 

  • Number of Study Participants: The number of per protocol participants in each study for the probiotic arm was used as a weight.
  • Evidence Quality: The evidence quality score of each study was also used as a weight.

 

For cross-over studies, the number of participants reported was multiplied by the number of study conditions each participant underwent. Raw scores and raw numbers of study participants were used as weights. This was done by calculating the product of these two values and using it as the weight.

Presentation of the Evidence

Probiotic Listings Ordered Based On:

 

  • Specific Symptoms: Primarily ranked by weighted mean effect sizes for specific symptom(s).
  • Global IBS Symptoms: Secondarily sorted by weighted mean effect sizes for overall IBS symptoms.
  • Evidence Quality: Further sorted based on the weighted mean quality of evidence from the underlying studies.

Top Picks Selection Process

 

When selecting our top probiotic picks, we adhered to the following criteria:

 

  • Commercial Availability: Probiotics had to be readily available for purchase.
  • Quality of Evidence: The quality of evidence supporting their effectiveness had to average over 75% across studies that assessed the relevant symptom(s) or parameter(s).
  • Effect Size: Probiotics had to show a moderate to high effect size (>0.5) for the specific parameter(s) assessed.

 

For categories with more than five candidates, we provided a comprehensive list but limited our top picks to the five best options. Rankings were determined based on the following:

 

  1. Effect size for the specific symptom(s).
  2. Effect size for global IBS symptoms.
  3. Quality of evidence.

References

 

  1. Higgins, J. P., Jelena Savović, Matthew J. Page, & Jonathan AC Sterne. (2024). 

Revised Cochrane risk-of-bias tool for randomized trials (RoB 2).

Be the first to know.

Get notified about the latest clinical study results across 40+ probiotic strains studied in IBS populations

Form Submit