spacepy.pycdf.Var

class spacepy.pycdf.Var(cdf_file, var_name, *args)[source]

A CDF variable.

This object does not directly store the data from the CDF; rather, it provides access to the data in a format that much like a Python list or numpy ndarray. General list information is available in the python docs: 1, 2, 3.

The CDF user’s guide, section 2.3, provides background on variables.

Note

Not intended to be created directly; use methods of CDF to gain access to a variable.

A record-varying variable’s data are viewed as a hypercube of dimensions n_dims+1 (the extra dimension is the record number). They are indexed in row-major fashion, i.e. the last index changes most frequently / is contiguous in memory. If the CDF is column-major, the data are transformed to row-major before return.

Non record-varying variables are similar, but do not have the extra dimension of record number.

Variables can be subscripted by a multidimensional index to return the data. Indices are in row-major order with the first dimension representing the record number. If the CDF is column major, the data are reordered to row major. Each dimension is specified by standard Python slice notation, with dimensions separated by commas. The ellipsis fills in any missing dimensions with full slices. The returned data are lists; Python represents multidimensional arrays as nested lists. The innermost set of lists represents contiguous data.

Note

numpy ‘fancy indexing’ is not supported.

Degenerate dimensions are ‘collapsed’, i.e. no list of only one element will be returned if a single subscript is specified instead of a range. (To avoid this, specify a slice like 1:2, which starts with 1 and ends before 2).

Two special cases:

  1. requesting a single-dimension slice for a record-varying variable will return all data for that record number (or those record numbers) for that variable.

  2. Requests for multi-dimensional variables may skip the record-number dimension and simply specify the slice on the array itself. In that case, the slice of the array will be returned for all records.

In the event of ambiguity (e.g., single-dimension slice on a one-dimensional variable), case 1 takes priority. Otherwise, mismatch between the number of dimensions specified in the slice and the number of dimensions in the variable will cause an IndexError to be thrown.

This all sounds very complicated but it is essentially attempting to do the ‘right thing’ for a range of slices.

An unusual case is scalar (zero-dimensional) non-record-varying variables. Clearly they cannot be subscripted normally. In this case, use the [...] syntax meaning ‘access all data.’:

>>> from spacepy import pycdf
>>> testcdf = pycdf.CDF('test.cdf', '')
>>> variable = testcdf.new('variable', recVary=False,
...     type=pycdf.const.CDF_INT4)
>>> variable[...] = 10
>>> variable
<Var:
CDF_INT4 [] NRV
>
>>> variable[...]
10

Reading any empty non-record-varying variable will return an empty with the same number of dimensions, but all dimensions will be of zero length. The scalar is, again, a special case: due to the inability to have a numpy array which is both zero-dimensional and empty, reading an NRV scalar variable with no data will return an empty one-dimensional array. This is really not recommended.

As a list type, variables are also iterable; iterating over a variable returns a single complete record at a time.

This is all clearer with examples. Consider a variable B_GSM, with three elements per record (x, y, z components) and fifty records in the CDF. Then:

  1. B_GSM[0, 1] is the y component of the first record.

  2. B_GSM[10, :] is a three-element list, containing x, y, and z components of the 11th record. As a shortcut, if only one dimension is specified, it is assumed to be the record number, so this could also be written B_GSM[10].

  3. B_GSM[...] reads all data for B_GSM and returns it as a fifty-element list, each element itself being a three-element list of x, y, z components.

Multidimensional example: consider fluxes stored as a function of pitch angle and energy. Such a variable may be called Flux and stored as a two-dimensional array, with the first dimension representing (say) ten energy steps and the second, eighteen pitch angle bins (ten degrees wide, centered from 5 to 175 degrees). Assume 100 records stored in the CDF (i.e. 100 different times).

  1. Flux[4] is a list of ten elements, one per energy step, each element being a list of 18 fluxes, one per pitch bin. All are taken from the fifth record in the CDF.

  2. Flux[4, :, 0:4] is the same record, all energies, but only the first four pitch bins (roughly, field-aligned).

  3. Flux[..., 0:4] is a 100-element list (one per record), each element being a ten-element list (one per energy step), each containing fluxes for the first four pitch bins.

This slicing notation is very flexible and allows reading specifically the desired data from the CDF.

Note

The C CDF library allows reading records which have not been written to a file, returning a pad value. pycdf checks the size of a variable and will raise IndexError for most attempts to read past the end. If these checks fail, a value is returned with a warning VIRTUAL_RECORD_DATA. Please open an issue if this occurs. See pg. 39 and following of the CDF User’s Guide for more on virtual records.

All data are, on read, converted to appropriate Python data types; EPOCH, EPOCH16, and TIME_TT2000 types are converted to datetime. Data are returned in numpy arrays.

Note

Although pycdf supports TIME_TT2000 variables, the Python datetime object does not support leap seconds. Thus, on read, any seconds past 59 are truncated to 59.999999 (59 seconds, 999 milliseconds, 999 microseconds).

Potentially useful list methods and related functions:

The topic of array majority can be very confusing; good background material is available at IDL Array Storage and Indexing. In brief, regardless of the majority stored in the CDF, pycdf will always present the data in the native Python majority, row-major order, also known as C order. This is the default order in NumPy. However, packages that render image data may expect it in column-major order. If the axes seem ‘swapped’ this is likely the reason.

The attrs Python attribute acts as a dictionary referencing zAttributes (do not confuse the two); all the dictionary methods above also work on the attribute dictionary. See zAttrList for more on the dictionary of attributes.

With writing, as with reading, every attempt has been made to match the behavior of Python lists. You can write one record, many records, or even certain elements of all records. There is one restriction: only the record dimension (i.e. dimension 0) can be resized by write, as all records in a variable must have the same dimensions. Similarly, only whole records can be deleted.

Note

Unusual error messages on writing data usually mean that pycdf is unable to interpret the data as a regular array of a single type matching the type and shape of the variable being written. A 5x4 array is supported; an irregular array where one row has five columns and a different row has six columns is not. Error messages of this type include:

  • Data must be well-formed, regular array of number, string, or datetime

  • setting an array element with a sequence.

  • shape mismatch: objects cannot be broadcast to a single shape

For these examples, assume Flux has 100 records and dimensions [2, 3].

Rewrite the first record without changing the rest:

>>> Flux[0] = [[1, 2, 3], [4, 5, 6]]

Writes a new first record and delete all the rest:

>>> Flux[...] = [[1, 2, 3], [4, 5, 6]]

Write a new record in the last position and add a new record after:

>>> Flux[99:] = [[[1, 2, 3], [4, 5, 6]],
...              [[11, 12, 13], [14, 15, 16]]]

Insert two new records between the current number 5 and 6:

>>> Flux[5:6] = [[[1, 2, 3], [4, 5, 6]],  [[11, 12, 13],
...               [14, 15, 16]]]

This operation can be quite slow, as it requires reading and rewriting the entire variable. (CDF does not directly support record insertion.)

Change the first element of the first two records but leave other elements alone:

>>> Flux[0:2, 0, 0] = [1, 2]

Remove the first record:

>>> del Flux[0]

Removes record 5 (the sixth):

>>> del Flux[5]

Due to the need to work around a bug in the CDF library, this operation can be quite slow.

Delete all data from Flux, but leave the variable definition intact:

>>> del Flux[...]

Note

Although this interface only directly supports zVariables, zMode is set on opening the CDF so rVars appear as zVars. See p.24 of the CDF user’s guide; pyCDF uses zMode 2.

attrs

zAttributes for this zVariable in a dict-like format.

compress([comptype, param])

Set or check the compression of this variable

copy()

Copies all data and attributes from this variable

dtype

Provide the numpy dtype equivalent to the CDF type of this variable.

dv([new_dv])

Gets or sets dimension variance of each dimension of variable.

insert(index, data)

Inserts a single record before an index

name()

Returns the name of this variable

nelems()

Number of elements for each value in this variable

rename(new_name)

Renames this variable

rv([new_rv])

Gets or sets whether this variable has record variance

shape

Provides the numpy array-like shape of this variable.

type([new_type])

Returns or sets the CDF type of this variable

attrs

zAttributes for this zVariable in a dict-like format. See zAttrList for details.

compress(comptype=None, param=None)[source]

Set or check the compression of this variable

Compression may not be changeable on variables with data already written; even deleting the data may not permit the change.

See section 2.6 of the CDF user’s guide for more information on compression.

Returns
outtuple

the (comptype, param) currently in effect

Other Parameters
comptypectypes.c_long

type of compression to change to, see CDF C reference manual section 4.10. Constants for this parameter are in const. If not specified, will not change compression.

paramctypes.c_long

Compression parameter, see CDF CRM 4.10 and const. If not specified, will choose reasonable default (5 for gzip; other types have only one possible parameter.)

copy()[source]

Copies all data and attributes from this variable

Returns
outVarCopy

list of all data in record order

dtype

Provide the numpy dtype equivalent to the CDF type of this variable.

Data from this variable will be returned in numpy arrays of this type.

See also

type
dv(new_dv=None)[source]

Gets or sets dimension variance of each dimension of variable.

If the variance is unknown, True is assumed (this replicates the apparent behavior of the CDF library on variable creation).

Parameters
new_dvlist of boolean

Each element True to change that dimension to dimension variance, False to change to not dimension variance. (Unspecified to simply check variance.)

Returns
outlist of boolean

True if that dimension has variance, else false.

insert(index, data)[source]

Inserts a single record before an index

Parameters
indexint

index before which to insert the new record

data :

the record to insert

name()[source]

Returns the name of this variable

Returns
outstr

variable’s name

nelems()[source]

Number of elements for each value in this variable

This is the length of strings for CHAR and UCHAR, should be 1 otherwise.

Returns
int

length of strings

rename(new_name)[source]

Renames this variable

Parameters
new_namestr

the new name for this variable

rv(new_rv=None)[source]

Gets or sets whether this variable has record variance

If the variance is unknown, True is assumed (this replicates the apparent behavior of the CDF library on variable creation).

Returns
outBoolean

True if record varying, False if NRV

Other Parameters
new_rvboolean

True to change to record variance, False to change to NRV, unspecified to simply check variance.

shape

Provides the numpy array-like shape of this variable.

Returns a tuple; first element is number of records (RV variable only) And the rest provide the dimensionality of the variable.

Note

Assigning to this attribute will not change the shape.

type(new_type=None)[source]

Returns or sets the CDF type of this variable

Parameters
new_typectypes.c_long

the new type from const

Returns
outint

CDF type