spacepy.toolbox¶

Toolbox of various functions and generic utilities.

Authors: Steve Morley, Jon Niehof, Brian Larsen, Josef Koller, Dan Welling Institution: Los Alamos National Laboratory Contact: smorley@lanl.gov, jniehof@lanl.gov, balarsen@lanl.gov, jkoller@lanl.gov, dwelling@lanl.gov Los Alamos National Laboratory

Functions

`arraybin`(array, bins)	Split a sequence into subsequences based on value.
`assemble`(fln_pattern, outfln[, sortkey, verbose])	assembles all pickled files matching fln_pattern into single file and save as outfln.
`binHisto`(data[, verbose])	Calculates bin width and number of bins for histogram using Freedman-Diaconis rule, if rule fails, defaults to square-root method
`bin_center_to_edges`(centers)	Convert a list of bin centers to their edges
`bin_edges_to_center`(edges)	Convert a list of bin edges to their centers
`bootHisto`(data[, inter, n, seed, plot, ...])	Bootstrap confidence intervals for a histogram.
`dictree`(in_dict[, verbose, spaces, levels, ...])	pretty print a dictionary tree
`dist_to_list`(func, length[, min, max])	Convert a probability distribution function to a list of values
`do_with_timeout`(timeout, target, args, *kwargs)	Execute a function (or method) with a timeout.
`eventTimer`(Event, Time1)	Times an event then prints out the time and the name of the event, nice for debugging and seeing that the code is progressing
`geomspace`(start[, ratio, stop, num])	Returns geometrically spaced numbers.
`getNamedPath`(name)	Return the full path of a parent directory with name as the leaf
`get_url`(url[, outfile, reporthook, cached, ...])	Read data from a URL
`human_sort`(l)	Sort the given list in the way that humans expect.
`hypot`(*args)	compute the N-dimensional hypot of an iterable or many arguments
`indsFromXrange`(inxrange)	return the start and end indices implied by a range, useful when range is zero-length
`interpol`(newx, x, y[, wrap])	1-D linear interpolation with interpolation of hours/longitude
`interweave`(a, b)	given two array-like variables interweave them together.
`intsolve`(func, value[, start, stop, maxit])	Find the function input such that definite integral is desired value.
`isview`(array1[, array2])	Returns if an object is a view of another object.
`linspace`(min, max, num, **kwargs)	Returns linear-spaced bins.
`loadpickle`(fln)	load a pickle and return content as dictionary
`logspace`(min, max, num, **kwargs)	Returns log-spaced bins.
`medAbsDev`(series[, scale])	Calculate median absolute deviation of a given input series
`mlt2rad`(mlt[, midnight])	Convert mlt values to radians for polar plotting transform mlt angles to radians from -pi to pi referenced from noon by default
`normalize`(vec[, low, high])	Given an input vector normalize the vector to a given range
`pmm`(*args)	print min and max of input arrays
`poisson_fit`(data[, initial, method])	Fit a Poisson distribution to data using the method and initial guess provided.
`progressbar`(count, blocksize, totalsize[, text])	print a progress bar with urllib.urlretrieve reporthook functionality
`query_yes_no`(question[, default])	Ask a yes/no question via raw_input() and return their answer.
`rad2mlt`(rad[, midnight])	Convert radians values to mlt transform radians from -pi to pi to mlt referenced from noon by default
`savepickle`(fln, dict[, compress])	save dictionary variable dict to a pickle with filename fln
`tCommon`(ts1, ts2[, mask_only])	Finds the elements in a list of datetime objects present in another
`tOverlap`(ts1, ts2, args, *kwargs)	Finds the overlapping elements in two lists of datetime objects
`tOverlapHalf`(ts1, ts2[, presort])	Find overlapping elements in two lists of datetime objects
`thread_job`(job_size, thread_count, target, ...)	Split a job into subjobs and run a thread for each
`thread_map`(target, iterable[, thread_count])	Apply a function to every element of a list, in separate threads
`timeout_check_call`(timeout, args, *kwargs)	Call a subprocess with a timeout.
`unique_columns`(inval[, axis])	Given a multidimensional input return the unique rows or columns along the given axis.
`update`([all, QDomni, omni, omni2, leapsecs, ...])	Download and update local database for omni, leapsecs etc
`windowMean`(data[, time, winsize, overlap, ...])	Windowing mean function, window overlap is user defined

Classes

LinkExtracter(*[, convert_charrefs])

Finds all links in a HTML page, useful for crawling.

Exceptions

TimeoutError

Raised when a time-limited process times out

spacepy.toolbox.arraybin(array, bins)[source]¶

Split a sequence into subsequences based on value.

Given a sequence of values and a sequence of values representing the division between bins, return the indices grouped by bin.

Parameters:

arrayarray_like

the input sequence to slice, must be sorted in ascending order

binsarray_like

dividing lines between bins. Number of bins is len(bins)+1,: value that exactly equal a dividing value are assigned to the higher bin

Returns:

outlist: indices for each bin (list of lists)

Examples

>>> import spacepy.toolbox as tb
>>> tb.arraybin(range(10), [4.2])
[[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]

spacepy.toolbox.assemble(fln_pattern, outfln, sortkey='ticks', verbose=True)[source]¶

assembles all pickled files matching fln_pattern into single file and save as outfln. Pattern may contain simple shell-style wildcards *? a la fnmatch file will be assembled along time axis given by Ticktock (key: ‘ticks’) in dictionary If sortkey = None, then nothing will be sorted

Parameters:

fln_patternstring: pattern to match filenames
outflnstring: filename to save combined files to

Returns:

outdict: dictionary with combined values

Examples

>>> import spacepy.toolbox as tb
>>> a, b, c = {'ticks':[1,2,3]}, {'ticks':[4,5,6]}, {'ticks':[7,8,9]}
>>> tb.savepickle('input_files_2001.pkl', a)
>>> tb.savepickle('input_files_2002.pkl', b)
>>> tb.savepickle('input_files_2004.pkl', c)
>>> a = tb.assemble('input_files_*.pkl', 'combined_input.pkl')
('adding ', 'input_files_2001.pkl')
('adding ', 'input_files_2002.pkl')
('adding ', 'input_files_2004.pkl')
('\n writing: ', 'combined_input.pkl')
>>> print(a)
{'ticks': array([1, 2, 3, 4, 5, 6, 7, 8, 9])}

spacepy.toolbox.binHisto(data, verbose=False)[source]¶

Calculates bin width and number of bins for histogram using Freedman-Diaconis rule, if rule fails, defaults to square-root method

The Freedman-Diaconis method is detailed in:: Freedman, D., and P. Diaconis (1981), On the histogram as a density estimator: L2 theory, Z. Wahrscheinlichkeitstheor. Verw. Geb., 57, 453–476
and is also described by:: Wilks, D. S. (2006), Statistical Methods in the Atmospheric Sciences, 2nd ed.

Parameters:

dataarray_like: list/array of data values
verboseboolean (optional): print out some more information

Returns:

outtuple: calculated width of bins using F-D rule, number of bins (nearest integer) to use for histogram

See also

matplotlib.pyplot.hist

Examples

>>> import numpy, spacepy
>>> import matplotlib.pyplot as plt
>>> numpy.random.seed(8675301)
>>> data = numpy.random.randn(1000)
>>> binw, nbins = spacepy.toolbox.binHisto(data)
>>> print(nbins)
19
>>> p = plt.hist(data, bins=nbins, histtype='step', density=True)

spacepy.toolbox.bin_center_to_edges(centers)[source]¶

Convert a list of bin centers to their edges

Given a list of center values for a set of bins, finds the start and end value for each bin. (start of bin n+1 is assumed to be end of bin n). Useful for e.g. matplotlib.pyplot.pcolor.

Edge between bins n and n+1 is arithmetic mean of the center of n and n+1; edge below bin 0 and above last bin are established to make these bins symmetric about their center value.

Parameters:

centerslist: list of center values for bins

Returns:

outlist: list of edges for bins
note: returned list will be one element longer than centers

Examples

>>> import spacepy.toolbox as tb
>>> tb.bin_center_to_edges([1,2,3])
[0.5, 1.5, 2.5, 3.5]

spacepy.toolbox.bin_edges_to_center(edges)[source]¶

Convert a list of bin edges to their centers

Given a list of edge values for a set of bins, finds the center of each bin. (start of bin n+1 is assumed to be end of bin n).

Center of bin n is arithmetic mean of the edges of the adjacent bins.

Parameters:

edgeslist: list of edge values for bins

Returns:

outnumpy.ndarray: array of centers for bins
note: returned array will be one element shorter than edges

Examples

>>> import spacepy.toolbox as tb
>>> tb.bin_center_to_edges([1,2,3])
[0.5, 1.5, 2.5, 3.5]

spacepy.toolbox.bootHisto(data, inter=90.0, n=1000, seed=None, plot=False, target=None, figsize=None, loc=None, **kwargs)[source]¶

Bootstrap confidence intervals for a histogram.

All other keyword arguments are passed to numpy.histogram() or matplotlib.pyplot.bar().

Changed in version 0.2.3: This argument pass-through did not work in earlier versions of SpacePy.

Parameters:

dataarray_like: list/array of data values
interfloat (optional; default 90): percentage confidence interval to return. Default 90% (i.e. lower CI will be 5% and upper will be 95%)
nint (optional; default 1000): number of bootstrap iterations
seedint (optional): Optional seed for the random number generator. If not specified; numpy generator will not be reseeded.
plotbool (optional): Plot the result. Plots if True or target, figsize, or loc specified.
target(optional): Target on which to plot the figure (figure or axes). See spacepy.plot.utils.set_target() for details.
figsizetuple (optional): Passed to spacepy.plot.utils.set_target().
locint (optional): Passed to spacepy.plot.utils.set_target().

Returns:

outtuple: tuple of bin_edges, low, high, sample[, bars]. Where bin_edges is the edges of the bins used; low is the histogram with the value for each bin from the bottom of that bin’s confidence interval; high similarly for the top; sample is the histogram of the input sample without resampling. If plotting, also returned is bars, the container object returned from matplotlib.

See also

binHisto
spacepy.plot.utils.set_target
numpy.histogram
matplotlib.pyplot.hist

Notes

Added in version 0.2.1.

The confidence intervals are calculated for each bin individually and thus the resulting low/high histograms may not have actually occurred in the calculation from the surrogates. If using a probability density histogram, this can have “interesting” implications for interpretation.

Examples

>>> import numpy.random
>>> import spacepy.toolbox
>>> numpy.random.seed(0)
>>> data = numpy.random.randn(1000)
>>> bin_edges, low, high, sample, bars = spacepy.toolbox.bootHisto(
...     data, plot=True)

(Source code, png, hires.png, pdf)

spacepy.toolbox.dictree(in_dict, verbose=False, spaces=None, levels=True, attrs=False, print_out=True, **kwargs)[source]¶

pretty print a dictionary tree

Parameters:

in_dictdict: a complex dictionary (with substructures)
verbosebool, default False: print more info
spacesstr (optional): string will added for every line
levelsint (optional): number of levels to recurse through (True, the default, means all)
attrsbool, default False: display information for attributes
print_outbool, default True: Added in version 0.5.0.

Print output (original behavior); if False, return the output.

Raises:

TypeError: Input does not have keys or attrs, cannot build tree.

Examples

>>> import spacepy.toolbox as tb
>>> d = {'grade':{'level1':[4,5,6], 'level2':[2,3,4]}, 'name':['Mary', 'John', 'Chris']}
>>> tb.dictree(d)
+
|____grade
     |____level1
     |____level2
|____name

More complicated example using a datamodel:

>>> from spacepy import datamodel
>>> counts = datamodel.dmarray([2,4,6], attrs={'units': 'cts/s'})
>>> data = {'counts': counts, 'PI': 'Dr Zog'}
>>> tb.dictree(data)
+
|____PI
|____counts
>>> tb.dictree(data, attrs=True, verbose=True)
+
|____PI (str [6])
|____counts (spacepy.datamodel.dmarray (3,))
    :|____units (str [5])

Attributes of, e.g., a CDF or a datamodel type object (obj.attrs) are denoted by a colon.

spacepy.toolbox.dist_to_list(func, length, min=None, max=None)[source]¶

Convert a probability distribution function to a list of values

This is a deterministic way to produce a known-length list of values matching a certain probability distribution. It is likely to be a closer match to the distribution function than a random sampling from the distribution.

Parameters:

funccallable

function to call for each possible value, returning: probability density at that value (does not need to be normalized.)

lengthint

number of elements to return

minfloat

minimum value to possibly include

maxfloat

maximum value to possibly include

Examples

>>> import matplotlib
>>> import numpy
>>> import spacepy.toolbox as tb
>>> gauss = lambda x: math.exp(-(x ** 2) / (2 * 5 ** 2)) / (5 * math.sqrt(2 * math.pi))
>>> vals = tb.dist_to_list(gauss, 1000, -numpy.inf, numpy.inf)
>>> print vals[0]
-16.45263...
>>> p1 = matplotlib.pyplot.hist(vals, bins=[i - 10 for i in range(21)], facecolor='green')
>>> matplotlib.pyplot.hold(True)
>>> x = [i / 100.0 - 10.0 for i in range(2001)]
>>> p2 = matplotlib.pyplot.plot(x, [gauss(i) * 1000 for i in x], 'red')
>>> matplotlib.pyplot.draw()

spacepy.toolbox.do_with_timeout(timeout, target, *args, **kwargs)[source]¶

Execute a function (or method) with a timeout.

Call the function (or method) target, with arguments args and keyword arguments kwargs. Normally return the return value from target, but if target takes more than timeout seconds to execute, raises TimeoutError.

Note

This is, at best, a blunt instrument. Exceptions from target may not propagate properly (tracebacks will be hard to follow.) The function which failed to time out may continue to execute until the interpreter exits; trapping the TimeoutError and continuing normally is not recommended.

Parameters:

timeoutfloat

Timeout, in seconds.

targetcallable

Python callable (generally a function, may also be an: imported ctypes function) to run.

argssequence

Arguments to pass to target.

kwargsdict

keyword arguments to pass to target.

Returns:

out: return value of target

Raises:

TimeoutError: If target does not return in timeout seconds.

Examples

>>> import spacepy.toolbox as tb
>>> import time
>>> def time_me_out():
...     time.sleep(5)
>>> tb.do_with_timeout(0.5, time_me_out) #raises TimeoutError

spacepy.toolbox.eventTimer(Event, Time1)[source]¶

Times an event then prints out the time and the name of the event, nice for debugging and seeing that the code is progressing

Parameters:

Eventstr: Name of the event, string is printed out by function
Time1time.time: the time to difference in the function

Returns:

Time2time.time: the new time for the next call to EventTimer

Examples

>>> import spacepy.toolbox as tb
>>> import time
>>> t1 = time.time()
>>> t1 = tb.eventTimer('Test event finished', t1)
('4.40', 'Test event finished')

spacepy.toolbox.geomspace(start, ratio=None, stop=False, num=50)[source]¶

Returns geometrically spaced numbers.

Parameters:

startfloat: The starting value of the sequence.
ratiofloat (optional): The ratio between subsequent points
stopfloat (optional): End value, if this is selected num is overridden
numint (optional): Number of samples to generate. Default is 50.

Returns:

seqarray: geometrically spaced sequence

See also

linspace
logspace

Examples

To get a geometric progression between 0.01 and 3 in 10 steps

>>> import spacepy.toolbox as tb
>>> tb.geomspace(0.01, stop=3, num=10)
[0.01,
018846716378431192,
035519871824902655,
066943295008216955,
12616612944575134,
23778172582285118,
44814047465571644,
84459764235318191,
5917892219322083,
9999999999999996]

To get a geometric progression with a specified ratio, say 10

>>> import spacepy.toolbox as tb
>>> tb.geomspace(0.01, ratio=10, num=5)
[0.01, 0.10000000000000001, 1.0, 10.0, 100.0]

spacepy.toolbox.getNamedPath(name)[source]¶

Return the full path of a parent directory with name as the leaf

Parameters:

namestring: the name of the parent directory to locate

Examples

Run from a directory /mnt/projects/dream/bin/Ephem with ‘dream’ as the name, this function would return ‘/mnt/projects/dream’

spacepy.toolbox.get_url(url, outfile=None, reporthook=None, cached=False, keepalive=False, conn=None)[source]¶

Read data from a URL

Open an HTTP URL, honoring the user agent as specified in the SpacePy config file. Returns the data, optionally also writing out to a file.

This is similar to the deprecated urlretrieve.

Changed in version 0.5.0: In earlier versions of SpacePy invalid combinations of cached and outfile raised RuntimeError, changed to ValueError.

Parameters:

urlstr: The URL to open
outfilestr (optional): Full path to file to write data to
reporthookcallable (optional): Function for reporting progress; takes arguments of block count, block size, and total size.
cachedbool (optional): Compare modification time of the URL to the modification time of outfile; do not retrieve (and return None) unless the URL is newer than the file. If set outfile is required.
keepalivebool (optional): Attempt to keep the connection open to retrieve more URLs. The return becomes a tuple of (data, conn) to return the connection used so it can be used again. This mode does not support proxies. Required to be True if conn is provided. (Default False)
connhttp.client.HTTPConnection (optional): An established http connection (HTTPS is also okay) to use with keepalive. If not provided, will attempt to make a connection.

Returns:

bytes: The HTTP data from the server.

See also

progressbar

Notes

This function honors proxy settings as described in urllib.request.getproxies(). Cryptic error messages (such as Network is unreachable) may indicate that proxy settings should be defined as appropriate for your environment (e.g. with HTTP_PROXY or HTTPS_PROXY environment variables).

spacepy.toolbox.human_sort(l)[source]¶

Sort the given list in the way that humans expect. http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html

Parameters:

llist: list of objects to human sort

Returns:

outlist: sorted list

Examples

>>> import spacepy.toolbox as tb
>>> dat = ['r1.txt', 'r10.txt', 'r2.txt']
>>> dat.sort()
>>> print dat
['r1.txt', 'r10.txt', 'r2.txt']
>>> tb.human_sort(dat)
['r1.txt', 'r2.txt', 'r10.txt']

spacepy.toolbox.hypot(*args)[source]¶

compute the N-dimensional hypot of an iterable or many arguments

Parameters:

argsmany numbers or array-like: array like or many inputs to compute from

Returns:

outfloat: N-dimensional hypot of a number

Notes

This function has a complicated speed function.

if a numpy array of floats is input this is passed off to C
if iterables are passed in they are made into numpy arrays and comptaton is done local
if many scalar agruments are passed in calculation is done in a loop

For max speed:

<20 elements expand them into scalars

>>> tb.hypot(*vals)
>>> tb.hypot(vals[0], vals[1]...) #alternate

>20 elements premake them into a numpy array of doubles

Examples

>>> from spacepy import toolbox as tb
>>> print tb.hypot([3,4])
5.0
>>> print tb.hypot(3,4)
5.0
>>> # Benchmark ####
>>> from spacepy import toolbox as tb
>>> import numpy as np
>>> import timeit
>>> num_list = []
>>> num_np = []
>>> num_np_double = []
>>> num_scalar = []
>>> tot = 500
>>> for num in tb.logspace(1, tot, 10):
>>>     print num
>>>     num_list.append(timeit.timeit(stmt='tb.hypot(a)',
                        setup='from spacepy import toolbox as tb;
                        import numpy as np; a = [3]*{0}'.format(int(num)), number=10000))
>>>     num_np.append(timeit.timeit(stmt='tb.hypot(a)',
                      setup='from spacepy import toolbox as tb;
                      import numpy as np; a = np.asarray([3]*{0})'.format(int(num)), number=10000))
>>>     num_scalar.append(timeit.timeit(stmt='tb.hypot(*a)',
                          setup='from spacepy import toolbox as tb;
                          import numpy as np; a = [3]*{0}'.format(int(num)), number=10000))
>>> from pylab import *
>>> loglog(tb.logspace(1, tot, 10),  num_list, lw=2, label='list')
>>> loglog(tb.logspace(1, tot, 10),  num_np, lw=2, label='numpy->ctypes')
>>> loglog(tb.logspace(1, tot, 10),  num_scalar, lw=2, label='scalar')
>>> legend(shadow=True, fancybox=1, loc='upper left')
>>> title('Different hypot times for 10000 runs')
>>> ylabel('Time [s]')
>>> xlabel('Size')

../_images/hypot_no_extension_speeds_3cases.png

spacepy.toolbox.indsFromXrange(inxrange)[source]¶

return the start and end indices implied by a range, useful when range is zero-length

Parameters:

inxrangerange: input range object to parse

Returns:

list of int: List of start, stop indices in the range. The return value is not defined if a stride is specified or if stop is before start (but will work when stop equals start).

Examples

>>> import spacepy.toolbox as tb
>>> foo = range(23, 39)
>>> foo[0]
23
>>> tb.indsFromXrange(foo)
[23, 39]
>>> foo1 = range(23, 23)
>>> tb.indsFromXrange(foo) #indexing won't work in this case
[23, 23]

spacepy.toolbox.interpol(newx, x, y, wrap=None, **kwargs)[source]¶

1-D linear interpolation with interpolation of hours/longitude

Parameters:

newxarray_like: x values where we want the interpolated values
xarray_like: x values of the original data (must be monotonically increasing or wrapping)
yarray_like: y values of the original data
wrapstring, optional: for continuous x data that wraps in y at ‘hours’ (24), ‘longitude’ (360), or arbitrary value (int, float)
kwargsdict: additional keywords, currently accepts baddata that sets baddata for masked arrays

Returns:

outnumpy.masked_array: interpolated data values for new abscissa values

Examples

For a simple interpolation

>>> import spacepy.toolbox as tb
>>> import numpy
>>> x = numpy.arange(10)
>>> y = numpy.arange(10)
>>> tb.interpol(numpy.arange(5)+0.5, x, y)
array([ 0.5,  1.5,  2.5,  3.5,  4.5])

To use the wrap functionality, without the wrap keyword you get the wrong answer

>>> y = range(24)*2
>>> x = range(len(y))
>>> tb.interpol([1.5, 10.5, 23.5], x, y, wrap='hour').compressed() # compress removed the masked array
array([  1.5,  10.5,  23.5])
>>> tb.interpol([1.5, 10.5, 23.5], x, y)
array([  1.5,  10.5,  11.5])

spacepy.toolbox.interweave(a, b)[source]¶

given two array-like variables interweave them together. Discussed here: http://stackoverflow.com/questions/5347065/interweaving-two-numpy-arrays

Parameters:

aarray-like: first array
barray-like: second array

Returns:

outnumpy.ndarray: interweaved array

spacepy.toolbox.intsolve(func, value, start=None, stop=None, maxit=1000)[source]¶

Find the function input such that definite integral is desired value.

Given a function, integrate from an (optional) start point until the integral reached a desired value, and return the end point of the integration.

Parameters:

funccallable: function to integrate, must take single parameter
valuefloat: desired final value of the integral
startfloat (optional): value at which to start integration, default -Infinity
stopfloat (optional): value at which to stop integration, default +Infinity
maxitinteger: maximum number of iterations

Returns:

outfloat: x such that the integral of L{func} from L{start} to x is L{value}
Note: Assumes func is everywhere positive, otherwise solution may: be multi-valued.

spacepy.toolbox.isview(array1, array2=None)[source]¶

Returns if an object is a view of another object. More precisely if one array argument is specified True is returned is the arrays owns its data. If two arrays arguments are specified a tuple is returned of if the first array owns its data and the the second if they point at the same memory location

Parameters:

array1numpy.ndarray: array to query if it owns its data

Returns:

outbool or tuple: If one array is specified bool is returned, True is the array owns its data. If two arrays are specified a tuple where the second element is a bool of if the array point at the same memory location

Other Parameters:

array2object (optional): array to query if array1 is a view of this object at the specified memory location

Examples

import numpy import spacepy.toolbox as tb a = numpy.arange(100) b = a[0:10] tb.isview(a) # False tb.isview(b) # True tb.isview(b, a) # (True, True) tb.isview(b, b) # (True, True) # the conditions are met and numpy cannot tell this

spacepy.toolbox.linspace(min, max, num, **kwargs)[source]¶

Returns linear-spaced bins. Same as numpy.linspace except works with datetime and is faster

Parameters:

minfloat, datetime: minimum value
maxfloat, datetime: maximum value
numinteger: number of linear spaced bins

Returns:

outarray: linear-spaced bins from min to max in a numpy array

Other Parameters:

kwargsdict: additional keywords passed into matplotlib.dates.num2date

See also

geomspace
logspace

Notes

This function works on both numbers and datetime objects. Not leapsecond aware.

Examples

>>> import spacepy.toolbox as tb
>>> tb.linspace(1, 10, 4)
array([  1.,   4.,   7.,  10.])

spacepy.toolbox.loadpickle(fln)[source]¶

load a pickle and return content as dictionary

Parameters:

flnstring: filename

Returns:

outdict: dictionary with content from file

See also

savepickle

Examples

note: If fln is not found, but the same filename with ‘.gz’: is found, will attempt to open the .gz as a gzipped file.

>>> d = loadpickle('test.pbin')

spacepy.toolbox.logspace(min, max, num, **kwargs)[source]¶

Returns log-spaced bins. Same as numpy.logspace except the min and max are the min and max not log10(min) and log10(max)

Parameters:

minfloat: minimum value
maxfloat: maximum value
numinteger: number of log spaced bins

Returns:

outarray: log-spaced bins from min to max in a numpy array

Other Parameters:

kwargsdict: additional keywords passed into matplotlib.dates.num2date

See also

geomspace
linspace

Notes

This function works on both numbers and datetime objects. Not leapsecond aware.

Examples

>>> import spacepy.toolbox as tb
>>> tb.logspace(1, 100, 5)
array([   1.        ,    3.16227766,   10.        ,   31.6227766 ,  100.        ])

spacepy.toolbox.medAbsDev(series, scale=False)[source]¶

Calculate median absolute deviation of a given input series

Median absolute deviation (MAD) is a robust and resistant measure of the spread of a sample (same purpose as standard deviation). The MAD is preferred to the inter-quartile range as the inter-quartile range only shows 50% of the data whereas the MAD uses all data but remains robust and resistant. See e.g. Wilks, Statistical methods for the Atmospheric Sciences, 1995, Ch. 3. For additional details on the scaling, see Rousseeuw and Croux, J. Amer. Stat. Assoc., 88 (424), pp. 1273-1283, 1993.

Parameters:

seriesarray_like: the input data series

Returns:

outfloat: the median absolute deviation

Other Parameters:

scalebool: if True (default: False), scale to standard deviation of a normal distribution

Examples

Find the median absolute deviation of a data set. Here we use the log- normal distribution fitted to the population of sawtooth intervals, see Morley and Henderson, Comment, Geophysical Research Letters, 2009.

>>> import numpy
>>> import spacepy.toolbox as tb
>>> numpy.random.seed(8675301)
>>> data = numpy.random.lognormal(mean=5.1458, sigma=0.302313, size=30)
>>> print data
array([ 181.28078923,  131.18152745, ... , 141.15455416, 160.88972791])
>>> tb.medAbsDev(data)
28.346646721370192

note This implementation is robust to presence of NaNs

spacepy.toolbox.mlt2rad(mlt, midnight=False)[source]¶

Convert mlt values to radians for polar plotting transform mlt angles to radians from -pi to pi referenced from noon by default

Parameters:

mltnumpy array: array of mlt values
midnightboolean (optional): reference to midnight instead of noon

Returns:

outnumpy array: array of radians

See also

rad2mlt

Examples

>>> from numpy import array
>>> mlt2rad(array([3,6,9,14,22]))
array([-2.35619449, -1.57079633, -0.78539816,  0.52359878,  2.61799388])

spacepy.toolbox.normalize(vec, low=0.0, high=1.0)[source]¶

Given an input vector normalize the vector to a given range

Parameters:

vecarray_like: input vector to normalize
lowfloat: minimum value to scale to, default 0.0
highfloat: maximum value to scale to, default 1.0

Returns:

outarray_like: normalized vector

Examples

>>> import spacepy.toolbox as tb
>>> tb.normalize([1,2,3])
[0.0, 0.5, 1.0]

spacepy.toolbox.pmm(*args)[source]¶

print min and max of input arrays

Parameters:

aarray-like: arbitrary number of input arrays (or lists)

Returns:

outlist: list of min, max for each array

Examples

>>> import spacepy.toolbox as tb
>>> from numpy import arange
>>> tb.pmm(arange(10), arange(10)+3)
[[0, 9], [3, 12]]

spacepy.toolbox.poisson_fit(data, initial=None, method='Powell')[source]¶

Fit a Poisson distribution to data using the method and initial guess provided.

Parameters:

dataarray-like: Data to fit a Poisson distribution to.
initialint or None: initial guess for the fit, if None np.median(data) is used
methodstr: method passed to scipy.optimize.minimize, default=’Powell’

Returns:

resultscipy.optimize.optimize.OptimizeResult: Resulting fit results from scipy.optimize, answer is result.x, user should likely round.

Examples

>>> import spacepy.toolbox as tb
>>> from scipy.stats import poisson
>>> import matplotlib.pyplot as plt
>>> import numpy as np
>>> data = poisson.rvs(20, size=1000)
>>> res = tb.poisson_fit(data)
>>> print(res.x)
19.718000038769095
>>> xvals = np.arange(0, np.max(data)+5)
>>> plt.hist(data, bins=xvals, normed=True)
>>> plt.plot(xvals, poisson.pmf(xvals, np.round(res.x)))

spacepy.toolbox.progressbar(count, blocksize, totalsize, text='Download Progress')[source]¶

print a progress bar with urllib.urlretrieve reporthook functionality

Examples

>>> import spacepy.toolbox as tb
>>> import urllib
>>> urllib.urlretrieve(config['psddata_url'], PSDdata_fname, reporthook=tb.progressbar)

spacepy.toolbox.query_yes_no(question, default='yes')[source]¶

Ask a yes/no question via raw_input() and return their answer.

“question” is a string that is presented to the user. “default” is the presumed answer if the user just hits <Enter>. It must be “yes” (the default), “no” or None (meaning an answer is required of the user).

The “answer” return value is one of “yes” or “no”.

Parameters:

questionstr: the question to ask
defaultstr (optional)

Returns:

outstr: answer (‘yes’ or ‘no’)

Raises:

ValueError: The default answer is not in (None|”yes”|”no”)

Examples

>>> import spacepy.toolbox as tb
>>> tb.query_yes_no('Ready to go?')
Ready to go? [Y/n] y
'yes'

spacepy.toolbox.rad2mlt(rad, midnight=False)[source]¶

Convert radians values to mlt transform radians from -pi to pi to mlt referenced from noon by default

Parameters:

radnumpy array: array of radian values
midnightboolean (optional): reference to midnight instead of noon

Returns:

outnumpy array: array of mlt values

See also

mlt2rad

Examples

>>> rad2mlt(array([0,pi, pi/2.]))
array([ 12.,  24.,  18.])

spacepy.toolbox.savepickle(fln, dict, compress=None)[source]¶

save dictionary variable dict to a pickle with filename fln

Parameters:

flnstring

filename

dictdict

container with stuff

compressbool

write as a gzip-compressed file: (.gz will be added to fln). If not specified, defaults to uncompressed, unless the compressed file exists and the uncompressed does not.

See also

loadpickle

Examples

>>> d = {'grade':[1,2,3], 'name':['Mary', 'John', 'Chris']}
>>> savepickle('test.pbin', d)

spacepy.toolbox.tCommon(ts1, ts2, mask_only=True)[source]¶

Finds the elements in a list of datetime objects present in another

Parameters:

ts1list or array-like: first set of datetime objects
ts2list or array-like: second set of datetime objects

Returns:

outtuple: Two element tuple of truth tables (of 1 present in 2, & vice versa)

See also

tOverlapHalf
tOverlap

Examples

>>> import spacepy.toolbox as tb
>>> import numpy as np
>>> import datetime as dt
>>> ts1 = np.array([dt.datetime(2001,3,10)+dt.timedelta(hours=a) for a in range(20)])
>>> ts2 = np.array([dt.datetime(2001,3,10,2)+dt.timedelta(hours=a*0.5) for a in range(20)])
>>> common_inds = tb.tCommon(ts1, ts2)
>>> common_inds[0] #mask of values in ts1 common with ts2
array([False, False,  True,  True,  True,  True,  True,  True,  True,
        True,  True,  True, False, False, False, False, False, False,
       False, False], dtype=bool)
>>> ts2[common_inds[1]] #values of ts2 also in ts1

The latter can be found more simply by setting the mask_only keyword to False

>>> common_vals = tb.tCommon(ts1, ts2, mask_only=False)
>>> common_vals[1]
array([2001-03-10 02:00:00, 2001-03-10 03:00:00, 2001-03-10 04:00:00,
       2001-03-10 05:00:00, 2001-03-10 06:00:00, 2001-03-10 07:00:00,
       2001-03-10 08:00:00, 2001-03-10 09:00:00, 2001-03-10 10:00:00,
       2001-03-10 11:00:00], dtype=object)

spacepy.toolbox.tOverlap(ts1, ts2, *args, **kwargs)[source]¶

Finds the overlapping elements in two lists of datetime objects

Parameters:

ts1datetime: first set of datetime object
ts2datetime: datatime object
args: additional arguments passed to tOverlapHalf

Returns:

outlist: indices of ts1 within interval of ts2, & vice versa

See also

tOverlapHalf
tCommon

Examples

Given two series of datetime objects, event_dates and omni[‘Time’]:

>>> import spacepy.toolbox as tb
>>> from spacepy import omni
>>> import datetime
>>> event_dates = st.tickrange(datetime.datetime(2000, 1, 1), datetime.datetime(2000, 10, 1), deltadays=3)
>>> onni_dates = st.tickrange(datetime.datetime(2000, 1, 1), datetime.datetime(2000, 10, 1), deltadays=0.5)
>>> omni = omni.get_omni(onni_dates)
>>> [einds,oinds] = tb.tOverlap(event_dates, omni['ticks'])
>>> omni_time = omni['ticks'][oinds[0]:oinds[-1]+1]
>>> print omni_time
[datetime.datetime(2000, 1, 1, 0, 0), datetime.datetime(2000, 1, 1, 12, 0),
... , datetime.datetime(2000, 9, 30, 0, 0)]

spacepy.toolbox.tOverlapHalf(ts1, ts2, presort=False)[source]¶

Find overlapping elements in two lists of datetime objects

This is one-half of tOverlap, i.e. it finds only occurrences where ts2 exists within the bounds of ts1, or the second element returned by tOverlap.

Parameters:

ts1list

first set of datetime object

ts2list

datatime object

presortbool

Set to use a faster algorithm which assumes ts1 and: ts2 are both sorted in ascending order. This speeds up the overlap comparison by about 50x, so it is worth sorting the list if one sort can be done for many calls to tOverlap

Returns:

outlist

indices of ts2 within interval of ts1

note: Returns empty list if no overlap found

See also

tOverlap
tCommon

spacepy.toolbox.thread_job(job_size, thread_count, target, *args, **kwargs)[source]¶

Split a job into subjobs and run a thread for each

Each thread spawned will call L{target} to handle a slice of the job.

This is only useful if a job:

Can be split into completely independent subjobs
Relies heavily on code that does not use the Python GIL, e.g. numpy or ctypes code
Does not return a value. Either pass in a list/array to hold the result, or see L{thread_map}

Parameters:

job_sizeint

Total size of the job. Often this is an array size.

thread_countint

Number of threads to spawn. If =0 or None, will: spawn as many threads as there are cores available on the system. (Each hyperthreading core counts as 2.) Generally this is the Right Thing to do. If NEGATIVE, will spawn abs(thread_count) threads, but will run them sequentially rather than in parallel; useful for debugging.

targetcallable

Python callable (generally a function, may also be an: imported ctypes function) to run in each thread. The last two positional arguments passed in will be a “start” and a “subjob size,” respectively; frequently this will be the start index and the number of elements to process in an array.

argssequence

Arguments to pass to L{target}. If L{target} is an instance: method, self must be explicitly passed in. start and subjob_size will be appended.

kwargsdict

keyword arguments to pass to L{target}.

Examples

squaring 100 million numbers:

>>> import numpy
>>> import spacepy.toolbox as tb
>>> numpy.random.seed(8675301)
>>> a = numpy.random.randint(0, 100, [100000000])
>>> b = numpy.empty([100000000], dtype='int64')
>>> def targ(in_array, out_array, start, count):              out_array[start:start + count] = in_array[start:start + count] ** 2
>>> tb.thread_job(len(a), 0, targ, a, b)
>>> print(b[0:5])
[2704 7225  196 1521   36]

This example:

Defines a target function, which will be called for each thread. It is usually necessary to define a simple “wrapper” function like this to provide the correct call signature.
The target function receives inputs C{in_array} and C{out_array}, which are not touched directly by C{thread_job} but are passed through in the call. In this case, C{a} gets passed as C{in_array} and C{b} as C{out_array}
The target function also receives the start and number of elements it needs to process. For each thread where the target is called, these numbers are different.

spacepy.toolbox.thread_map(target, iterable, thread_count=None, *args, **kwargs)[source]¶

Apply a function to every element of a list, in separate threads

Interface is similar to multiprocessing.map, except it runs in threads

This is made largely obsolete in python3 by from concurrent import futures

Parameters:

targetcallable

Python callable to run on each element of iterable.: For each call, an element of iterable is appended to args and both args and kwargs are passed through. Note that this means the iterable element is always the last positional argument; this allows the specification of self as the first argument for method calls.

iterableiterable

elements to pass to each call of L{target}

argssequence

arguments to pass to target before each element of: iterable

thread_countinteger

Number of threads to spawn; see L{thread_job}.

kwargsdict

keyword arguments to pass to L{target}.

Returns:

outlist: return values of L{target} for each item from L{iterable}

Examples

find totals of several arrays

>>> import numpy
>>> from spacepy import toolbox
>>> inputs = range(100)
>>> totals = toolbox.thread_map(numpy.sum, inputs)
>>> print(totals[0], totals[50], totals[99])
(0, 50, 99)

>>> # in python3
>>> from concurrent import futures
>>> with futures.ThreadPoolExecutor(max_workers=4) as executor:
...:     for ans in executor.map(numpy.sum, [0,50,99]):
...:         print ans
#0
#50
#99

spacepy.toolbox.timeout_check_call(timeout, *args, **kwargs)[source]¶

Call a subprocess with a timeout.

Deprecated since version 0.7.0: Use timeout argument of subprocess.check_call(), added in Python 3.3.

Like subprocess.check_call(), but will terminate the process and raise TimeoutError if it runs for too long.

This will only terminate the single process started; any child processes will remain running (this has implications for, say, spawing shells.)

Parameters:

timeoutfloat: Timeout, in seconds. Fractions are acceptable but the resolution is of order 100ms.
argssequence: Arguments passed through to subprocess.Popen
kwargsdict: keyword arguments to pass to subprocess.Popen

Returns:

outint: 0 on successful completion

Raises:

TimeoutError: If subprocess does not return in timeout seconds.
CalledProcessError: If command has non-zero exit status

Examples

>>> import spacepy.toolbox as tb
>>> tb.timeout_check_call(1, 'sleep 30', shell=True) #raises TimeoutError

spacepy.toolbox.unique_columns(inval, axis=0)[source]¶

Given a multidimensional input return the unique rows or columns along the given axis. Based largely on http://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array axis=0 is unique rows, axis=1 is unique columns

Parameters:

invalarray-like: array to find unique columns or rows of

Returns:

outarray: N-dimensional array of the unique values along the axis

Other Parameters:

axisint: The axis to find unique over, default: 0

spacepy.toolbox.update(all=True, QDomni=False, omni=False, omni2=False, leapsecs=False, PSDdata=False, cached=True)[source]¶

Download and update local database for omni, leapsecs etc

Web access is via get_url(); notes there may be helpful in debugging errors. See also the keepalive configuration option.

Parameters:

allboolean (optional): if True, update OMNI2, Qin-Denton and leapsecs
omniboolean (optional): if True. update only omni (Qin-Denton)
omni2boolean (optional): if True, update only original OMNI2
QDomniboolean (optional): if True, update OMNI2 and Qin-Denton
leapsecsboolean (optional): if True, update only leapseconds
cachedboolean (optional): Only update files if timestamp on server is newer than timestamp on local file (default). Set False to always download files.

Returns:

outstring: data directory where things are saved

See also

get_url

Examples

>>> import spacepy.toolbox as tb
>>> tb.update(omni=True)

spacepy.toolbox.windowMean(data, time=[], winsize=0, overlap=0, st_time=None, op=<function mean>)[source]¶

Windowing mean function, window overlap is user defined

Parameters:

dataarray_like: 1D series of points
timelist (optional): series of timestamps, optional (format as numeric or datetime) For non-overlapping windows set overlap to zero. Must be same length as data.
winsizeinteger or datetime.timedelta (optional): window size
overlapinteger or datetime.timedelta (optional): amount of window overlap
st_timedatetime.datetime (optional): for time-based averaging, a start-time other than the first point can be specified
opcallable (optional): the operator to be called, default numpy.mean

Returns:

outtuple: the windowed mean of the data, and an associated reference time vector

Examples

For non-overlapping windows set overlap to zero. e.g. (time-based averaging) Given a data set of 100 points at hourly resolution (with the time tick in the middle of the sample), the daily average of this, with half-overlapping windows is calculated:

>>> import spacepy.toolbox as tb
>>> from datetime import datetime, timedelta
>>> wsize = datetime.timedelta(days=1)
>>> olap = datetime.timedelta(hours=12)
>>> data = [10, 20]*50
>>> time = [datetime.datetime(2001,1,1) + datetime.timedelta(hours=n, minutes = 30) for n in range(100)]
>>> outdata, outtime = tb.windowMean(data, time, winsize=wsize, overlap=olap, st_time=datetime.datetime(2001,1,1))
>>> outdata, outtime
([15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0],
 [datetime.datetime(2001, 1, 1, 12, 0),
  datetime.datetime(2001, 1, 2, 0, 0),
  datetime.datetime(2001, 1, 2, 12, 0),
  datetime.datetime(2001, 1, 3, 0, 0),
  datetime.datetime(2001, 1, 3, 12, 0),
  datetime.datetime(2001, 1, 4, 0, 0),
  datetime.datetime(2001, 1, 4, 12, 0)])

When using time-based averaging, ensure that the time tick corresponds to the middle of the time-bin to which the data apply. That is, if the data are hourly, say for 00:00-01:00, then the time applied should be 00:30. If this is not done, unexpected behaviour can result.

e.g. (pointwise averaging),

>>> outdata, outtime = tb.windowMean(data, winsize=24, overlap=12)
>>> outdata, outtime
([15.0, 15.0, 15.0, 15.0, 15.0, 15.0, 15.0], [12.0, 24.0, 36.0, 48.0, 60.0, 72.0, 84.0])

where winsize and overlap are numeric, in this example the window size is 24 points (as the data are hourly) and the overlap is 12 points (a half day). The output vectors start at winsize/2 and end at N-(winsize/2), the output time vector is basically a reference to the nth point in the original series.

note This is a quick and dirty function - it is NOT optimized, at all.

exception spacepy.toolbox.TimeoutError[source]¶: Raised when a time-limited process times out

spacepy.toolbox¶

Table of Contents

Previous topic

Next topic

This Page