spacepy.poppy.boots_ci¶
- spacepy.poppy.boots_ci(data, n, inter, func, seed=None, target=None, sample_size=None, usepy=False, nretvals=1)[source]¶
Construct bootstrap confidence interval
The bootstrap is a statistical tool that uses multiple samples derived from the original data (called surrogates) to estimate a parameter of the population from which the sample was drawn. This assumes that the sample is randomly drawn and hence is representative of the underlying distribution. The benefit of the bootstrap is that it is non-parametric and can be applied in situations where there is reasonable doubt about the characteristics of the underlying distribution. This routine uses the boot- strap for its most common application - the estimation of confidence intervals.
- Parameters
- dataarray like
data to bootstrap
- nint
number of surrogate series to select, i.e. number of bootstrap iterations.
- internumerical
desired percentage confidence interval
- funccallable
Function to apply to each surrogate series
- sample_sizeint
number of samples in the surrogate series, default length of L{data}. This will change the statistical properties of the bootstrap and should only be used for good reason!
- seedint
Optional seed for the random number generator. If not specified, numpy generator will not be reseeded; C generator will be seeded from the clock.
- targetsame as data
a ‘target’ value. If specified, will also calculate percentage confidence of being at or above this value.
- nretvalsint
number of return values from input function
- Returns
- outsequence of float
inter percent confidence interval on value derived from func applied to the population sampled by data. If target is specified, also the percentage confidence of being above that value.
Examples
>>> data, n = numpy.random.lognormal(mean=5.1, sigma=0.3, size=3000), 4000. >>> myfunc = lambda x: numpy.median(x) >>> ci_low, ci_high = poppy.boots_ci(data, n, 95, myfunc) >>> ci_low, numpy.median(data), ci_high (163.96354196633686, 165.2393331896551, 166.60491435416566) iter. 1 ... repeat (162.50379144492726, 164.15218265100233, 165.42840588032755) iter. 2
For comparison
>>> data = numpy.random.lognormal(mean=5.1, sigma=0.3, size=90000) >>> numpy.median(data) 163.83888237895815
Note that the true value of the desired quantity may lie outside the 95% confidence interval one time in 20 realizations. This occurred for the first iteration here.
For the lognormal distribution, the median is found exactly by taking the exponential of the “mean” parameter. Thus here, the theoretical median is 164.022 (6 s.f.) and this is well captured by the above bootstrap confidence interval.