Abstract
Background
Network metaanalysis is becoming increasingly popular for establishing comparative effectiveness among multiple interventions for the same disease. Network metaanalysis inherits all methodological challenges of standard pairwise metaanalysis, but with increased complexity due to the multitude of intervention comparisons. One issue that is now widely recognized in pairwise metaanalysis is the issue of sample size and statistical power. This issue, however, has so far only received little attention in network metaanalysis. To date, no approaches have been proposed for evaluating the adequacy of the sample size, and thus power, in a treatment network.
Findings
In this article, we develop easytouse flexible methods for estimating the ‘effective sample size’ in indirect comparison metaanalysis and network metaanalysis. The effective sample size for a particular treatment comparison can be interpreted as the number of patients in a pairwise metaanalysis that would provide the same degree and strength of evidence as that which is provided in the indirect comparison or network metaanalysis. We further develop methods for retrospectively estimating the statistical power for each comparison in a network metaanalysis. We illustrate the performance of the proposed methods for estimating effective sample size and statistical power using data from a network metaanalysis on interventions for smoking cessation including over 100 trials.
Conclusion
The proposed methods are easy to use and will be of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates.
Keywords:
Network metaanalysis; Indirect comparison; Sample size; Power; Strength of evidenceBackground
Over the past 2 decades, metaanalysis has become increasingly accepted by clinicians, decisionmakers and the public as providing highquality assessments of evidence [1]. Network metaanalysis, a new expansion of metaanalysis that allows for simultaneous comparison of several treatments, is similarly becoming increasingly accepted in the clinical research community [211]. Having been available for more than 3 decades, metaanalysis has been studied extensively, and several hundred articles published in this period have identified and resolved a vast array of basic and advanced methodological issues [1,12]. Network metaanalysis inherits all the challenges present in a standard metaanalysis (e.g., issues of bias, heterogeneity and precision), but with increased complexity due to the multitude of comparisons involved [5,13]. Since network metaanalysis is still a relatively new technique, the number of publications addressing methodological challenges is still relatively sparse.
One important issue that has received much attention in individual trials and metaanalysis is the issue of sample size and statistical power [1426]. Several studies have demonstrated the importance of interpreting pooled metaanalysis estimates and confidence intervals according to the statistical level of evidence (i.e., precision) [14,15,1824,26,27], and sound recommendations have been provided [16,22,28]. So far, however, only a small number of studies have addressed the issue of power and precision in network metaanalysis [8,13,29], and no comprehensive guidance exists on the topic. Network metaanalyses typically include many more trials than standard metaanalyses because of the multitude of comparisons involved and for this reason may artificially appear to provide a stronger evidence base. Likewise, the accompanying graphical representation of a treatment network can provide a similar compelling but potentially false impression of a strong evidence base.
Network metaanalysis utilizes evidence from direct (headtohead) comparisons (i.e., trials directly comparing treatment A and B) and indirect comparisons (e.g., the combination of trials comparing A with C and trials comparing B with C) [4,6,30]. The major challenge in interpreting the power and precision of a network metaanalysis stems from the fact that there are (typically) varying levels of power and precision across all comparisons. In addition, the power and precision of indirect evidence are more complex to assess than for direct evidence, and thus, without proper guidance, it will be difficult for most authors to evaluate the precision gain from use of indirect evidence as well as the strength of evidence in a treatment network.
In this article, we provide guidance on quantifying the power and precision in network metaanalysis using simple sample size considerations. We first describe how to quantify the precision in indirect comparison metaanalysis and subsequently in network metaanalysis with combinations of direct and indirect evidence. We then outline the concept of sample size requirements and power calculations in pairwise metaanalysis. Finally, we show how to combine these measures in order to quantify the power and strength of evidence available for all treatment comparisons in a network metaanalysis. We illustrate the described methods using data from a recent network metaanalysis on interventions for smoking cessation [31].
Methods
Basic methodological framework
Indirect comparisons
Indirect effect estimates are obtained with the effect estimates from two comparisons sharing a common comparator [32]. For example, when two treatments A and B have both been compared to some common comparator C (e.g., placebo) in a number of randomized clinical trials, an indirect effect estimate of treatment A versus B can be obtained using the metaanalysis effect estimate of A versus C and the metaanalysis effect estimate of B versus C [32]. In particular, the indirect effect estimate of A versus B (d_{AB}) is calculated as the estimated effect of A versus C (d_{AC}) minus the estimated effect of B versus C (d_{BC}). Mathematically, this corresponds to the equation
(Note, when dealing with ratio effect measures, such as relative risks and odds ratios, all calculations are done on the log scale to preserve linearity and approximate normality). To produce confidence intervals for the indirect estimate, we first need to estimate its variance. The variance of in the indirect estimate of A versus B (V_{AB}) is simply equal to the sum of the variance of the effect estimate of A versus C (V_{AC}) and the variance of the effect estimate of A versus B (V_{AB}). Mathematically this corresponds to the equation V_{AB} = V_{AC} + V_{BC}. It is therefore clear that the variance of a (direct) metaanalysis effect estimate based on some number of trials, say k, will always be smaller than the variance of an indirect metaanalysis based on the same number of trials, k (all trial sample sizes being equal). In other words, direct estimates come with higher precision and power (trial count and trial sample sizes being equal). In many situations, however, using indirect estimates can add considerable power and precision.
Combining direct and indirect evidence
When both direct and indirect evidence is available, it may often be advantageous to combine the two statistically [2,47,30,33]. For example, if only two small trials have investigated two active interventions A and B head to head, but 20 trials have compared A or B with placebo, the indirect evidence will be able to add much power and precision to the comparative estimate of A and B. The combination of indirect and direct evidence requires advanced statistical regression techniques (i.e., network metaanalysis) that are beyond the scope of this article [4,6,30]. However, in the context of sample size and power considerations, it suffices to understand that indirect evidence, when combined with direct evidence, increases the power and precision of treatment effect estimates [4,6,7,9,30,33]. The extent to which is does so can be evaluated readily by using the methods we describe below.
Sample size in indirect comparisons
In this section we introduce three methods for gauging how much statistical precision an indirect estimate provides when no direct evidence is available. In particular, we describe how to approximate the amount of information required in a direct (headtohead) metaanalysis to produce the same precision as that in the available indirect evidence. Simply put, what direct metaanalysis sample size would provide a similar degree of information? We dub this the effective sample size of the indirect evidence or, interchangeably, the effective indirect sample size. We describe three different methods for approximating the effective indirect sample size. Each of these methods differs with respect to simplicity and validity (the simpler one being the least valid), so we outline the simplicityvalidity tradeoffs at the end of the section.
Method 1: the effective number of trials
A simple approach to gauging the degree of power and precision available in indirect evidence is to approximate how many trials are required in an indirect comparison to produce a matching degree of power and precision from a single headtohead trial. This type of approximation is possible under the simple assumptions that the variances (of the mean) are equal for each trial and that no heterogeneity is present. Glenny et al. showed that when the number of trials is the same in both of two comparisons informing the indirect evidence (e.g., two trials of A vs. C and two trials of B vs. C), it takes four trials in the indirect evidence to produce the same precision as one direct headtohead trial [8]. In indirect comparisons, however, it is common that one comparison will include more trials than the other. When this happens, the above 1:4 precision ratio no longer holds true. For example, if the number of trials is twice as high in one comparison (i.e., a 1:2 trial count ratio), the indirect comparison will need exactly 4.5 trials to produce the same precision as one headtohead trial (see mathematical derivation in Appendix 1.a). In reality, however, to maintain a ratio of 1:2 in the trial count, one would need six trials (2:4) in the indirect comparison to produce the same precision as one headtohead trial. To produce the same precision as two headtohead trials, one would need 2 × 4.5 = 9 trials, which allows maintaining the 1:2 ratio with three trials in one comparison and six in the other (i.e., 3:6 as the trial count ratio). Table 1 presents the approximate number of trials required in an indirect comparison under different scenarios where the number of trials in the two comparisons is unbalanced. The mathematical derivations for all exact precision ratios are presented in Appendix 1.a.
Table 1. The required number of indirect comparison trials required to produce the same precision as a given number of direct (headtohead) trials
The cells in underlined italics indicate where the indirect evidence produces the exact precision of the corresponding number of trials. The remaining cells indicate where the indirect evidence produces precision slightly above that of the corresponding number of headtohead trials.
In some cases the required number of trials in an indirect comparison for a specific trial count ratio produces a precision corresponding to more than that of a stated number of single headtohead trials. For example, with a trial count ratio of 1:3, one would require 2 × 5.33 = 10.66 indirect comparison trials to produce the precision of two headtohead trials. However, since we cannot have fractions of trials, we take the closest integer above 10.66 where the trial count ratio is maintained: 12 trials with a trial count ratio of 3:9.
Table 1 can readily be used for quickly and easily checking how many headtohead trials the indirect evidence ‘effectively’ corresponds to. That is, if the indirect evidence produces the precision of, say, three trials, we can think of the evidence as being as strong as a metaanalysis of three headtohead trials. For example, if one has an indirect comparison with 4 trials comparing A with C, and 12 trials comparing B with C, the precision of the indirect comparison corresponds to a metaanalysis of 3 trials directly comparing A with B (Table 1). It should be noted that Table 1 is only valid to the extent that trial sample sizes and trial population variances are similar across trials, as well as the extent to which heterogeneity is absent or ignorable.
Method 2: the effective sample size
Another relatively simple approach to gauging the degree of power and precision from an indirect comparison is to consider the collection of trials included in each comparison as one (large) clinical trial. From a sample size perspective, following similar mathematic derivations as the above trial count perspective, the relationship between the precision of an indirect comparison and the precision of a direct metaanalysis turns out the same (see Appendix 1.b for mathematical derivations). For example, to produce the same precision as a headtohead metaanalysis including 1,000 patients, one would need a total of 4,000 (4 × 1,000) patients in the indirect comparison, provided the number of patients is the same for the two comparisons (2,000:2,000). Taking the sample size perspective comes with the flexibility of a possible reversal of the calculation. For example, an indirect comparison with 500 patients in the A vs. C comparison and 500 patients in the B vs. C comparisons would produce the same precision as a direct comparison with 250 patients [(500 + 500)/4]. Likewise, in a scenario with 1,000 patients in comparison A vs. C, and 10,000 patients in comparison B vs. C, the exact precision ratio is 12.1 (see Table 1), and so the effective direct metaanalysis sample size would be (1,000 + 10,000)/12.1 = 909.
Often, the sample sizes in the two comparisons do not line up to produce the exact precision ratio presented in Table 1 and Table 2. Letting n_{AC} and n_{BC} denote the sample sizes for the comparisons of A vs. C and B vs. C, respectively, a more general formula for the effective indirect sample size is (see Appendix 1.b).
Table 2. Effective heterogeneitycorrected sample sizes of indirect comparison scenarios with varying degrees of patient count ratios and heterogeneity in each comparison (A vs. C and B vs. C), but with fixed total sample size of 10,000
For the above example, the effective sample size using this formula is therefore
The above simple approach to calculate the effective indirect sample size does not consider the possibility that statistical heterogeneity exists across trials. When heterogeneity is present, the effect estimates that go into the indirect estimate incur a higher degree of variation, and so the effective indirect sample size corresponding to a headtohead metaanalysis will be smaller than with the above simple approach. In line with alreadyestablished heterogeneity corrections for metaanalysis required sample sizes [23,24,28], we put forward that the actual number of patients in each of the comparisons informing the indirect estimate can be penalized by the additional variation explained by heterogeneity. In line with previous proposals, we penalize for the ‘lack of homogeneity’ [23,24] using the popular measure of heterogeneity, I^{2}[34], as a basis for the penalization.
Consider the example where a metaanalysis of A vs. C includes 6,000 patients and a metaanalysis of B vs. C includes 8,000 patients, and assume the estimated degree of heterogeneity for A vs. C is 50% (I_{AC}^{2} = 50%) and 25% for B vs. C (I_{BC}^{2} = 25%). Then the lack of homogeneity is 100%50% = 50% for A vs. C and 100%25% = 75% for B vs. C. We penalize the actual sample size by multiplying the actual sample size by the lack of homogeneity, so that the penalized sample size of A vs. C is 50% × 6,000 = 3,000, and the penalized sample size for B vs. C is 75% × 8,000 = 6,000. The total penalized number of patients in the indirect comparison is then 3,000 + 6,000 = 9,000, the patient count ratio is 1:2 (same as 3,000:6,000), the precision ratio is 4.5 (see Table 1), and so the effective heterogeneitycorrected sample size in this indirect comparison is 9,000/4.5 = 2,000.
Following the above example, the general formula for a heterogeneitycorrected effective sample size for indirect evidence is
where n_{AC} and n_{BC} are the actual sample sizes (before correction) in the metaanalyses of A vs. C and B vs. C, respectively, and where the precision ratio is based on the heterogeneitycorrected sample sizes (see Table 1 and Table 2). In the appendix we provide a general formula for the precision ratio.
As with the above example of nonpenalized sample sizes, the penalized sample sizes may not always line up to match the precision ratio given in Table 1 and Table 2. The more general formula for the heterogeneitycorrected effective sample size is (see Appendix 1.b).
One immediate limitation of the aboveproposed sample size heterogeneity correction is the fact that I^{2}s are typically unreliable and unstable in metaanalyses including a limited number of trials and will depend on the effect metric used [3436]. In most cases, it will therefore be preferable to simply assume some plausible degree (percentage) of heterogeneity derived from a mix of clinical considerations and the I^{2} estimate at hand. Typically, an assumption of 25% or 50% heterogeneity will be reasonable in the context of sample size considerations [22]. Table 2 illustrates the effective sample size for various indirect comparison scenarios including a total of 10,000 patients, under different combinations of heterogeneity corrections (one including no correction).
Another limitation of the sample size approach is the inherent assumption that sample size is a good proxy for precision. This may not be true if there are some important differences in event rates (for binary data) or counts (count data), or if there are population differences in trials that result in notably different standard deviations, but not necessarily different effect estimates. To curb this limitation for binary data, one may for example choose to focus on the effective number of events. A more universally applicable approach focuses on a measure called statistical information. We describe this below.
The effective statistical information
Statistical information, also sometimes referred to as Fisher information, is a more complex statistical measure for gauging the degree of precision present in a data set. For pairwise metaanalyses, the statistical information is equal to the inverse of the pooled variance (i.e., one divided by the variance), which is also the measure of precision [37]. For indirect comparisons, the statistical information (precision) is equal the inverse of the pooled indirect variance. That is, with variances V_{AC} and V_{BC} for comparisons A vs. C and B vs. C, the indirect variance is V_{AC} + V_{BC}, so indirect statistical information is 1/(V_{AC} + V_{BC}). Because the variance incorporates heterogeneity and is dependent on the number of trials and the sample size, no further calculations or adjustments are needed.
A major disadvantage of statistical information in metaanalysis is that it operates on a scale that no statistically nonsophisticated reader could easily grasp [38]. The statistical information may however be useful in the context of sample size requirements since it is possible to calculate the required statistical information (analogous to the required sample size) and compare the actual statistical information present in an indirect comparison with such a yardstick for sufficient power.
Strength and limitations of the approaches
Each of the above approaches comes with strengths and limitations. These are outlined in Table 3.
Table 3. Strengths and limitations of the three approaches for gauging the effective degree of power and precision in indirect comparisons
Weak links
So far we have only described situations where the indirect estimate is obtained through one common comparator. However, it is worth noting that indirect evidence may often come from scenarios where the link between two treatments of interest must go through two or more comparators. For example, if we wish to compare A with D and the following three comparisons are available—A vs. B, B vs. C and C vs. D—then the link between A and D goes through both B and C. The indirect variance is now a sum of three (rather than two) direct variances, . We would therefore expect comparably smaller precision. Calculating the effective number of trials in the two common comparator example, we find that with an equal number of trials, say 3:3:3, the precision of the indirect comparison corresponds to that of only one direct trial, that is, an exact precision ratio of 1:9. With a more unbalanced number of trials, say 8:1:8, we get an exact precision ratio of 1:21.
Such consistently large direct to indirect precision ratios indicate that indirect comparisons with two (or more) comparators in the link will typically add very little precision. For this reason we dub them ‘weak links.’ In the context of combining direct and indirect evidence as we address below, it seems that weak links will typically only add an ignorable small amount of precision to the final estimate and thus, for simplicity, may be ignored for sample size and power considerations in indirect comparisons and network metaanalysis.
Effective sample size in treatment networks
The threetreatment loop
The simplest example of a combination of direct and indirect evidence is the threetreatment loop where precision is added to the comparison of A and B by borrowing strength from an indirect comparison based on some common comparator C. Whether our measure of precision (information) is the number of trials, the sample size or the statistical information, the total amount of precision available for a particular comparison in a threetreatment loop is conceptually the sum of information in the direct evidence and in the indirect evidence.
To calculate the effective number of trials informing a particular comparison (e.g., A vs. B) in the combined evidence, we simply add the effective number of trials in the indirect evidence to the number of trials in the headtohead evidence.
To calculate the effective number of patients informing a particular comparison, we simply add the effective number of patients in the indirect evidence to the number of patients in the headtohead evidence. If heterogeneity adjustments have been applied for the indirect evidence, a similar adjustment should be applied to the direct evidence (we illustrate this in our worked example below).
To calculate the statistical information informing a particular comparison, we simply take the sum of the inverse indirect variance and the inverse direct variance. Alternatively, the statistical information may be extracted directly from the statistical software package used to run the network metaanalysis. (Note that if multiarm trials are included in the analysis, some adjustments for correlations are needed).
Multiple sources of evidence
In many treatment networks two or more sources of indirect evidence informing some (or all) of the comparisons may exist. For example, two active treatments A and B may both have been compared to standardofcare and placebo. In this case, indirect evidence exists from two sources. Extending the threetreatment example above, estimating the total effective amount of information is simply a task of summing all indirect and direct evidence.
In a similar situation, multiple sources of indirect evidence may exist where no direct evidence exists. In this case, the effective number of trials, sample size or statistical information is simply obtained by summing all indirect information.
We previously discussed ‘weak links.’ Since these add a relatively small amount of information, one may chose to ignore them without notable loss of information (but with ease in the calculations). This goes for both situations where direct evidence and indirect evidence are being combined, and where there are multiple sources of indirect evidence.
Power and sample size requirements in network metaanalysis
Before getting into sample size and power considerations for indirect comparison metaanalysis and network metaanalysis, we first outline the already wellestablished framework for pairwise metaanalysis. We then extend the concept to indirect comparison metaanalysis and lastly to network metaanalysis.
Sample size requirements for direct metaanalysis
Several methodological studies have explored sample size and power considerations for direct (headtohead) metaanalysis [14,15,1924,26]. By now, it has been well established that the required sample size (i.e., the required number of patients) for a metaanalysis, should be at least that of a large welldesigned clinical trial [16]. Sample size calculations are derived from an a priori estimate of a treatment effect, d, that investigators wish to demonstrate; the associated variance around that treatment effect, V^{2}; and a maximum risk of type I error, α, (i.e., maximum falsepositive risk) and type II error, β, (i.e., maximum falsenegative risk). As a basis, one can use the required sample size corresponding to a large multicenter clinical trial as the required sample size for the headtohead metaanalysis [16,19,20,23,24].
Here z_{1α/2} and z_{1β} are the (1α/2)th and (1β)th percentiles of a standard normal distribution, and C is a constant depending on the randomization ratio and number of treatment arms (C = 4 with a randomization ratio of 1:1 and two treatment arms).
If statistical heterogeneity exists across the included trials in a metaanalysis, one can adjust the calculated sample size to account for the additional variation (i.e., increased uncertainty) [2124,28]. This is achieved by multiplying the required sample size, N, by a heterogeneity correction factor 1/(1H), where H has a similar interpretation as the wellknown measure I^{2} (the percentage of variation in a metaanalysis explained by heterogeneity) and is the a priori or maximum acceptable degree of heterogeneity [2124,28]. Empirical evidence and simulations have demonstrated that such adjustments perform well in maintaining the desired statistical power [21,22,24].
An alternative approach to dealing with heterogeneity is to calculate the required statistical information (also known as Fisher information) [17,25]. In pairwise metaanalysis the statistical information is simply the inverse of the pooled variance, that is, the pooled precision. The required statistical information resembles the required sample size.
A simulation study has demonstrated adequate performance of the required statistical information when the heterogeneity is modeled with certain Bayesian priors [17].
Information fractions and power in direct metaanalysis
At any point in a direct metaanalysis before the cumulative amount of evidence has surpassed the required sample size (or required statistical information), we can calculate two useful measures to gauge the strength of evidence. The first measure, the information fraction (IF), is the accrued number of patients, n (or statistical information), divided by the required sample size (or required statistical information) [19,23].
This measure gives us an idea of how far we have come and how much farther of a distance there is to the yardstick—our required sample size.
The second measure is a retrospective power calculation. Rearranging the expression of the required sample size, we can retrospectively estimate the power (1β) of the current data
where Φ is the cumulative standard normal distribution function.
Information fractions and power in indirect comparisons
So far we have described method for estimating the effective sample size in indirect and combined evidence, as well as methods for estimating the required sample size and gauging the strength of evidence in pairwise metaanalyses. We now combine these measures to evaluate the strength of indirect comparisons and network metaanalyses. For the remainder of this article we concentrate on the number of patients, but our example and calculations can easily be carried out using the number of trials or statistical information.
We have introduced the effective sample size for indirect evidence. In line with its definition, we can derive the effective indirect information fraction and the effective power in an indirect comparison. The steps required to do this are as follows. First, calculate the effective indirect sample size. Second, calculate the required sample size for a direct metaanalysis. Third, to get the effective indirect information fraction, simply divide the effective number of patients in the indirect comparison by the required sample size for a direct comparison. Fourth, to calculate the power of the available indirect evidence, simply insert the effective indirect sample size as the n in the above formula.
Information fractions and power in network metaanalysis
The steps required to calculate the information fraction and power of treatment comparisons that are both informed by direct and indirect evidence or by multiple sources of indirect evidence are similar to the step required for indirect comparisons. First, calculate the effective sample size for the comparison of interest by summing up the evidence from the available sources (we described how to do this above). Second, as before, calculate the required sample size for a direct metaanalysis. Third, as before, calculate the effective information fraction by dividing the effective number of patients by the required sample size. Fourth, calculate the power of the available evidence by inserting the effective sample size as the n in the above formula.
Results and discussion
Worked example – interventions for smoking cessation
A recent MTC explored the efficacy of five different interventions for smoking cessation, lowdose NRT (<22 mg nicotine patches), which is an overthecounter intervention, and four newer and more expensive interventions, combination NRT (i.e., patch plus gum), highdose NRT (>22 mg patches), buproprion and varenicline [31].The published MTC provides a useful dataset to illustrate the issues we have raised in this article. Lowdose NRT is already well established and well known to yield about a 50% higher success rate than inert control interventions (i.e., relative risk of approximately 1.50). To consider any of the four newer treatments worthwhile, one could argue for the need to demonstrate at least an additional 20% efficacy. These considerations make up the basis for the required sample size considerations. Keep in mind that the published MTC has much higher rates of effective sample size than the assumptions we make in this example. [31]
The median smoking cessation success rate at 6 months for patients in the inert control group is 15.0% across the 79 included trials reporting at this time point. Assuming a 1.5 relative risk, we would expect a 22.5% success rate with lowdose NRT. An additional 20% relative efficacy would suggest a 26.0% success rate (with any of the newer treatments). Accepting a maximum type I error of α = 5% and a power of (1β) = 90%, the required sample size to demonstrate that any of the newer interventions are at least 20% better than lowdose NRT is 6,303. To assess the strength of the evidence contained in the treatment network, we calculate the effective sample size (number of patients) for each of the newer treatments compared with lowdose NRT. We do this with and without taking heterogeneity into account. We subsequently calculate the information fraction and power based on the effective sample size. Figure 1 presents the number of trials, number of patients and degree of heterogeneity in the comparisons informed by headtohead evidence. Figure 2 presents the sources of direct and indirect evidence for the four comparisons of interest (newer treatments vs. lowdose NRT).
Figure 1. The number of trials, number of patients and degree of heterogeneity (I2) for each comparison in the treatment network that is informed by headtohead evidence.
Figure 2. The sources and strength of direct and indirect evidence (by crude sample size) for the four comparisons of newer treatments versus lowdose NRT. The thickened solid lines indicate the direct evidence for the comparison of interest. The thickened dashed lines indicate the indirect evidence that adds to the total effective sample size. The thickened dotdashed lines indicate a (sparse) source of indirect evidence that can be ignored.
For lowdose NRT vs. combination NRT, the direct evidence includes 1,664 patients (and no heterogeneity). Indirect evidence exists with inert control as the common comparator. The comparison of lowdose NRT and inert control includes 19,929 patients, but with 63% heterogeneity, so the heterogeneity penalized sample size is 19,929 × (1–0.63) = 7,374. The comparison of combination NRT vs. inert control includes 1,848 patients (and no heterogeneity). The effective indirect sample size without heterogeneity penalization (n_{indirect}) and the effective indirect sample size with heterogeneity penalization (n_{indirectPen}) are therefore
and
Adding this to the direct evidence sample size, we get an effective total sample size of
and
These two total effective sample sizes correspond to an information fraction of 53% and 50% (half of the require sample size has been accumulated) and a statistical power of 66% and 63%. Table 4 presents the direct effective indirect and total sample sizes and the corresponding information fractions and statistical power for all four comparisons of the newer treatments vs. lowdose NRT. The calculations for the remaining three comparisons are presented in Appendix 2.
Table 4. The effective sample sizes and corresponding information fractions and power estimates from the four comparisons of newer treatments vs. lowdose NRT
In the network metaanalysis of interventions for smoking cessation, highdose NRT and varenicline were both significantly better than lowdose NRT and demonstrated effect estimates larger than the a priori considered minimally important difference. For these comparisons, highdose NRT is supported by 88% power, and varenicline is supported by 76% power. Considering that the true effects of high dose NRT and varenicline over low dose NRT are much higher than the 20% increase that was assumed for these calculations, the true power of these comparisons are also much higher. For example, if the MTC actual effects for varenicline (i.e., a 38% increase in smoking cessation compared with low dose NRT) are used, the statistical power to detect this difference then exceeds 99%. Combination NRT and buproprion both yielded effect estimates very similar to lowdose NRT. Considering that the two are supported by 66% and 95% power, respectively, and that none of the two effect estimates appear superior, it should be reasonable to infer that the two interventions do not offer any noteworthy benefit over lowdose NRT.
Conclusions
In this article we have outlined available methods for gauging the strength of the evidence in a network metaanalysis using sample size and power considerations. We recommend sample size considerations in the context of the number of patients, as the required calculations are relatively straightforward and will resonate well with most clinicians and decisionmakers. The methods we have outlined are of high value to regulatory agencies and decision makers who must assess the strength of the evidence supporting comparative effectiveness estimates.
Appendix
1.a Calculating the effective number of trials
Consider the situation where three treatments, A, B and C, have been compared head to head in randomized clinical trials. For any one trial, assume that the estimated treatment effect has variance v. For a metaanalysis of 2k trials, using the inverse variance approach would produce an estimated variance of the pooled treatment effect of σ^{2}/2k. By the expected variance of an indirect comparison, if we have two comparison including k trials, we would expect an indirect variance estimate of σ^{2}/k + σ^{2}/k = 2σ^{2}/k. Now letting R denote a ratio describing the relationship between the precision of indirect and direct evidence; we can derive R as follows
That is, in the scenario where the number of trials are equal in the two comparisons informing the indirect comparison (and the other above assumptions are met), it would require four trials in the indirect evidence to produce the same precision as that corresponding to a single headtohead trial. We can generalize this ratio to the situation where the number of trials is not equal in the two comparisons informing the indirect evidence. Let k_{AC} and k_{BC} be the number of trials informing the comparison of A vs. C and B vs. C, respectively. For a single metaanalysis, with k_{AC} + k_{BC} trials we would expect a variance of the pooled effect of σ^{2}/(k_{AC} + k_{BC}). Moreover, we would expect a variance from the indirect comparison of σ^{2}/k_{AC} + v/k_{BC}. Proceeding as above we then have
This formula creates the basis for the results presented in Table 1.
1.b Calculating the effective number of patients
Consider the situation where three treatments, A, B and C, have been compared head to head in randomized clinical trials. Assume that the population variance of comparative treatment effects is the same for A vs. B, A vs. C and B vs. C, and assume the population variance produced by a fixedeffect pairwise metaanalysis can be regarded as a large welldesigned clinical trial. Let n_{AB}, n_{AC} and n_{BC} denote the metaanalysis sample size (total number of patients) for the three comparisons A vs. B, A vs. C and B vs. C, respectively.
We are interested in finding the ratio between the variance of the direct metaanalysis pooled treatment effect estimate and the variance of the indirect metaanalysis pooled treatment estimate. Let R denote this ratio, and let σ_{AB}^{2}, σ_{AC}^{2} and σ_{BC}^{2} denote the population variances for the three comparisons (where we assume σ_{AB}^{2} = σ_{AC}^{2} = σ_{BC}^{2} = σ^{2}). Then we have
Thus, by multiplying this ratio with the total indirect sample size (n_{AC} + n_{BC}) we have that the formula for the effective indirect sample size is
When heterogeneity exists for one or both of the comparisons in the indirect evidence, one can penalize the sample size by multiplying by the ‘lack of homogeneity,’ much similar to what is done for a heterogeneity correction of a required metaanalysis sample size. With estimates of the percentage of variation in the metaanalysis due to betweentrial heterogeneity for A vs. C, I_{AC}^{2}, and for B vs. C, I_{BC}^{2}, we can derive penalized sample sizes within each comparison
and subsequently use these penalized sample sizes in the formula for the effective indirect sample size.
2. Information fraction and power calculations – worked example
For lowdose NRT vs. highdose NRT, the direct evidence includes 3,605 patients (and no heterogeneity). Indirect evidence exists with inert control as the common comparator. The comparison of lowdose NRT and inert control includes 19,929 patients, but with 63% heterogeneity, so the heterogeneity penalized sample size is 19,929×(1–0.63) = 7,373. The comparison of highdose NRT vs. inert control includes 2,487 patients, but with 60% heterogeneity, so the heterogeneity penalized sample size is 2,487×(1–0.60) = 1,492. The effective sample size from this indirect comparison is therefore
and
A second indirect comparison with varenicline as the common comparator only includes 32 patients in one of the two involved comparisons. The effective sample size of this indirect comparison (n_{indirect} = 31) is so comparably small that we choose to ignore it. Adding the above calculated indirect sample sizes to the direct evidence sample size, we get effective total sample sizes of
and
This total effective sample sizes correspond to information fractions of 92% and 60% and statistical power estimates of 88% and 72%.
For lowdose NRT vs. buproprion, no direct evidence exists. Indirect evidence exists through inert control as the common comparator. As above, the sample size for lowdose NRT vs. inert control is 19,929, or 7,373 if heterogeneity penalized. The sample size for buproprion vs. inert control is 12,567, or 12,567×(1–0.39) = 7,666 when heterogeneity is penalized. Therefore, the total effective sample sizes (which are equal to the effective indirect sample sizes) are
and
This total effective sample sizes correspond to information fractions of >100% and 60% and statistical power estimates of 95% and 71%.
For lowdose NRT vs. varenicline, the direct evidence includes 740 patients (and no heterogeneity). As above, the sample size for lowdose NRT vs. inert control is 19,929, or 7,373 if heterogeneity is penalized. The sample size for varenicline vs. inert control is 4,331, or 4,331 × (1–0.69) = 1,343 if heterogeneity is penalized. Therefore, the total indirect sample sizes are
and
and so the total effective sample are
and
All power, and information fraction calculations above are geared to detect an assumed relative improvement in smoking cessation of 20%. All calculations are highly sensitive to the assumed relative improvement. In particular, assuming larger improvements would result in substantially larger power and information fraction estimates.
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
KT conceived the idea, mathematically derived the proposed formulas, drafted the first manuscript and performed all statistical analyses. EM contributed to the methodological development, writing of the manuscript and interpretation of results. Both authors read and approved the final manuscript.
References

Sutton AJ, Higgins JP: Recent developments in metaanalysis.
Stat Med 2008, 27:625650. PubMed Abstract  Publisher Full Text

Ioannidis JP: Integration of evidence from multiple metaanalyses: a primer on umbrella reviews, treatment networks and multiple treatments metaanalyses.
Cmaj 2009, 181:488493. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Jansen JP, Crawford B, Bergman G, Stam W: Bayesian metaanalysis of multiple treatment comparisons: an introduction to mixed treatment comparisons.
Value Health 2008, 11:956964. PubMed Abstract  Publisher Full Text

Lu G, Ades AE: Combination of direct and indirect evidence in mixed treatment comparisons.
Stat Med 2004, 23:31053124. PubMed Abstract  Publisher Full Text

Mills EJ, Bansback N, Ghement I, Thorlund K, Kelly S, Puhan MA, Wright J: Multiple treatment comparison metaanalyses: a step forward into complexity.
Clin Epidemiol 2011, 3:193202. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Salanti G, Higgins JP, Ades AE, Ioannidis JP: Evaluation of networks of randomized trials.
Stat Methods Med Res 2008, 17:279301. PubMed Abstract  Publisher Full Text

Sutton A, Ades AE, Cooper N, Abrams K: Use of indirect and mixed treatment comparisons for technology assessment.
Pharmacoeconomics 2008, 26:753767. PubMed Abstract  Publisher Full Text

Glenny AM, Altman DG, Song F, Sakarovitch C, Deeks JJ, D'Amico R, Bradburn M, Eastwood AJ: Indirect comparisons of competing interventions.
Health Technol Assess 2005, 9:1134.
iiiiv
PubMed Abstract  Publisher Full Text 
Song F, Altman DG, Glenny AM, Deeks JJ: Validity of indirect comparison for estimating efficacy of competing interventions: empirical evidence from published metaanalyses.
BMJ 2003, 326:472. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Song F, Loke YK, Walsh T, Glenny AM, Eastwood AJ, Altman DG: Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: survey of published systematic reviews.
BMJ 2009, 338:b1147. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Song F, Xiong T, ParekhBhurke S, Loke YK, Sutton AJ, Eastwood AJ, Holland R, Chen YF, Glenny AM, Deeks JJ, Altman DG: Inconsistency between direct and indirect comparisons of competing interventions: metaepidemiological study.
BMJ 2011, 343:d4909. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Higgins JPT, Green S (editors):
Cochrane Handbook for Systematic Reviews of Interventions Version 5.0.2 [updated September 2009]. The Cochrane Collaboration. 2009.

Mills EJ, Ioannidis JPA, Thorlund K, Schünemann HJ, Puhan MA, Guyatt GH: How to use an article reporting a multiple treatment comparison metaanalysis.
JAMA 2012, 308:124653. PubMed Abstract  Publisher Full Text

Brok J, Thorlund K, Gluud C, Wetterslev J: Trial sequential analysis reveals insufficient information size and potentially false positive results in many metaanalyses.
J Clin Epidemiol 2008, 61:763769. PubMed Abstract  Publisher Full Text

Brok J, Thorlund K, Wetterslev J, Gluud C: Apparently conclusive metaanalyses may be inconclusive–Trial sequential analysis adjustment of random error risk due to repetitive testing of accumulating data in apparently conclusive neonatal metaanalyses.
Int J Epidemiol 2009, 38:287298. PubMed Abstract  Publisher Full Text

Guyatt GH, Oxman AD, Kunz R, Brozek J, AlonsoCoello P, Rind D, Devereaux PJ, Montori VM, Freyschuss B, Vist G, et al.: GRADE guidelines 6. Rating the quality of evidenceimprecision.
J Clin Epidemiol 2011, 64:12831293. PubMed Abstract  Publisher Full Text

Higgins JP, Whitehead A, Simmonds M: Sequential methods for randomeffects metaanalysis.
Stat Med 2011, 30:903921. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Ioannidis J, Lau J: Evolution of treatment effects over time: empirical insight from recursive cumulative metaanalyses.
Proc Natl Acad Sci USA 2001, 98:831836. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pogue J, Yusuf S: Overcoming the limitations of current metaanalysis of randomised controlled trials.
Lancet 1998, 351:4752. PubMed Abstract  Publisher Full Text

Pogue JM, Yusuf S: Cumulating evidence from randomized trials: utilizing sequential monitoring boundaries for cumulative metaanalysis.
Control Clin Trials 1997, 18:580593.
discussion 661–586
PubMed Abstract  Publisher Full Text 
Thorlund K, Devereaux PJ, Wetterslev J, Guyatt G, Ioannidis JP, Thabane L, Gluud LL, AlsNielsen B, Gluud C: Can trial sequential monitoring boundaries reduce spurious inferences from metaanalyses?
Int J Epidemiol 2009, 38:276286. PubMed Abstract  Publisher Full Text

Thorlund K, Imberger G, Walsh M, Chu R, Gluud C, Wetterslev J, Guyatt G, Devereaux PJ, Thabane L: The number of patients and events required to limit the risk of overestimation of intervention effects in metaanalysis–a simulation study.
PLoS One 2011, 6:e25491. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Wetterslev J, Thorlund K, Brok J, Gluud C: Trial sequential analysis may establish when firm evidence is reached in cumulative metaanalysis.
J Clin Epidemiol 2008, 61:6475. PubMed Abstract  Publisher Full Text

Wetterslev J, Thorlund K, Brok J, Gluud C: Estimating required information size by quantifying diversity in randomeffects model metaanalyses.
BMC Med Res Methodol 2009, 9:86. PubMed Abstract  BioMed Central Full Text  PubMed Central Full Text

van der Tweel I, Bollen C: Sequential metaanalysis: an efficient decisionmaking tool.
Clin Trials 2010, 7:136146. PubMed Abstract  Publisher Full Text

Thorlund K, Anema A, Mills E: Interpreting metaanalysis according to the adequacy of sample size. An example using isoniazid chemoprophylaxis for tuberculosis in purified protein derivative negative HIVinfected individuals.
Clin Epidemiol 2010, 2:5766. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Pereira TV, Ioannidis JP: Statistically significant metaanalyses of clinical trials have modest credibility and inflated effects.
J Clin Epidemiol 2011, 64:10601069. PubMed Abstract  Publisher Full Text

Thorlund K, Engstrom J, Wetterslev J, Brok J, Imberger G, Gluud C: User manual for trial sequential analysis (TSA). In Book User manual for trial sequential analysis (TSA). Copenhagen Trial Unit, City; 2011.

Mills EJ, Ghement I, O'Regan C, Thorlund K: Estimating the power of indirect comparisons: a simulation study.
PLoS One 2011, 6:e16237. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Lumley T: Network metaanalysis for indirect treatment comparisons.
Stat Med 2002, 21:23132324. PubMed Abstract  Publisher Full Text

Mills EJ, Wu P, Lockhart I, Thorlund K, Puhan MA, Ebbert JO: Comparison of highdose and combination nicotine replacement therapy, varenicline, and buproprion for smoking cessation: A systematic review and multiple treatment metaanalysis.
Ann Med 2012, 44:58897. PubMed Abstract  Publisher Full Text

Bucher HC, Guyatt GH, Griffith LE, Walter SD: The results of direct and indirect treatment comparisons in metaanalysis of randomized controlled trials.
J Clin Epidemiol 1997, 50:683691. PubMed Abstract  Publisher Full Text

Higgins JP, Whitehead A: Borrowing strength from external trials in a metaanalysis.
Stat Med 1996, 15:27332749. PubMed Abstract  Publisher Full Text

Higgins JP, Thompson SG: Quantifying heterogeneity in a metaanalysis.
Stat Med 2002, 21:15391558. PubMed Abstract  Publisher Full Text

Borenstein M, Hedges L, Higgins JP: Introduction to metaanalysis. 25th edition. John Wiley and Sons, Chichester; 2009.

Thorlund K, Imberger G, Johnston B, Walsh M, Awad T, Thabane L, Gluud C, Devereaux PJ, Wetterslev J: Evolution of heterogeneity (I2) estimates and their 95% confidence intervals in large metaanalyses.
PLoS One 2012, 7:e39471. PubMed Abstract  Publisher Full Text  PubMed Central Full Text

Whitehead A, Whitehead J: A general parametric approach to the metaanalysis of randomized clinical trials.
Stat Med 1991, 10:16651677. PubMed Abstract  Publisher Full Text

Thorlund K, Imberger G, Wetterslev J, Brok J, Gluud C: Comments on 'Sequential metaanalysis: an efficient decisionmaking tool' by I van der Tweel and C Bollen.
Clin Trials 2010, 7:752753.
author reply 754
PubMed Abstract  Publisher Full Text