pycdf - Python interface to CDF files

This package provides a Python interface to the Common Data Format (CDF) library used for many NASA missions, available at http://cdf.gsfc.nasa.gov/. It is targeted at Python 2.6+ and should work without change on either Python 2 or Python 3.

The interface is intended to be ‘pythonic’ rather than reproducing the C interface. To open or close a CDF and access its variables, see the CDF class. Accessing data within the variables is via the Var class. The lib object provides access to some routines that affect the functionality of the library in general. The const module contains constants useful for accessing the underlying library.

The CDF C library must be properly installed in order to use this package. The CDF distribution provides scripts meant to be called in a user’s login scripts, definitions.B for bash and definitions.C for C-shell derivatives. (See the installation instructions which come with the CDF library.) These will set environment variables specifying the location of the library; pycdf will respect these variables if they are set. Otherwise it will search the standard system library path and the default installation locations for the CDF library.

If pycdf has trouble finding the library, try setting CDF_LIB before importing the module, e.g. if the library is in CDF/lib in the user’s home directory:

>>> import os
>>> os.environ["CDF_LIB"] = "~/CDF/lib"
>>> from spacepy import pycdf

If this works, make the environment setting permanent. Note that on OSX, using plists to set the environment may not carry over to Python terminal sessions; use .cshrc or .bashrc instead.

Authors: Jon Niehof

Institution: University of New Hampshire

Contact: Jonathan.Niehof@unh.edu

Copyright 2010-2015 Los Alamos National Security, LLC.

Contents

Create a CDF

This example presents the entire sequence of creating a CDF and populating it with some data; the parts are explained individually below.

>>> from spacepy import pycdf
>>> import datetime
>>> time = [datetime.datetime(2000, 10, 1, 1, val) for val in range(60)]
>>> import numpy as np
>>> data = np.random.random_sample(len(time))
>>> cdf = pycdf.CDF('MyCDF.cdf', '')
>>> cdf['Epoch'] = time
>>> cdf['data'] = data
>>> cdf.attrs['Author'] = 'John Doe'
>>> cdf.attrs['CreateDate'] = datetime.datetime.now()
>>> cdf['data'].attrs['units'] = 'MeV'
>>> cdf.close()

Import the pycdf module.

>>> from spacepy import pycdf

Make a data set of datetime. These will be converted into CDF_TIME_TT2000 types.

>>> import datetime
>>> # make a dataset every minute for a hour
>>> time = [datetime.datetime(2000, 10, 1, 1, val) for val in range(60)]

Warning

If you create a CDF in backwards compatibility mode (using set_backward()), then datetime objects are degraded to CDF_EPOCH (millisecond resolution), not CDF_EPOCH16 (microsecond resolution). Use new() to specify a data type.

Create some random data.

>>> import numpy as np
>>> data = np.random.random_sample(len(time))

Create a new empty CDF. The empty string, ‘’, is the name of the CDF to use as a master; given an empty string, an empty CDF will be created, rather than copying from a master CDF. If a master is used, data in the master will be copied to the new CDF.

>>> cdf = pycdf.CDF('MyCDF.cdf', '')

Note

You cannot create a new CDF with a name that already exists on disk. It will throw a NameError

To put data into a CDF, assign it directly to an element of the CDF. CDF objects behave like Python dictionaries.

>>> # put time into CDF variable Epoch
>>> cdf['Epoch'] = time
>>> # and the same with data (the smallest data type that fits the data is used by default)
>>> cdf['data'] = data

Adding attributes is done similarly. CDF attributes are also treated as dictionaries.

>>> # add some attributes to the CDF and the data
>>> cdf.attrs['Author'] = 'John Doe'
>>> cdf.attrs['CreateDate'] = datetime.datetime.now()
>>> cdf['data'].attrs['units'] = 'MeV'

Closing the CDF ensures the new data are written to disk:

>>> cdf.close()

CDF files, like standard Python files, act as context managers

>>> with cdf.CDF('filename.cdf', '') as cdf_file:
...     #do brilliant things with cdf_file
>>> #cdf_file is automatically closed here

Read a CDF

Reading a CDF is very similar: the CDF object behaves like a dictionary. The file is only accessed when data are requested. A full example using the above CDF:

>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF.cdf')
>>> print(cdf)
    Epoch: CDF_TIME_TT2000 [60]
    data: CDF_FLOAT [60]
>>> cdf['data'][4]
    0.8609974384307861
>>> data = cdf['data'][...] # don't forget the [...]
>>> cdf_dat = cdf.copy()
>>> cdf_dat.keys()
    ['Epoch', 'data']
>>> cdf.close()

Again import the pycdf module

>>> from spacepy import pycdf

Then open the CDF, this looks the same and creation, but without mention of a master CDF.

>>> cdf = pycdf.CDF('MyCDF.cdf')

The default __str__() and __repr__() behavior explains the contents, type, and size but not the data.

>>> print(cdf)
    Epoch: CDF_TIME_TT2000 [60]
    data: CDF_FLOAT [60]

To access the data one has to request specific elements of the variable, similar to a Python list.

>>> cdf['data'][4]
    0.8609974384307861
>>> data = cdf['data'][...] # don't forget the [...]

CDF.copy() will return the entire contents of a CDF, including attributes, as a SpaceData object:

>>> cdf_dat = cdf.copy()

Since CDF objects behave like dictionaries they have a keys() method and iterations are over the names in keys()

>>> cdf_dat.keys()
    ['Epoch', 'data']

Close the CDF when finished:

>>> cdf.close()

Modify a CDF

An example modifying the CDF created above:

>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF.cdf')
>>> cdf.readonly(False)
    False
>>> cdf['newVar'] = [1.0, 2.0]
>>> print(cdf)
    Epoch: CDF_TIME_TT2000 [60]
    data: CDF_FLOAT [60]
    newVar: CDF_FLOAT [2]
>>> cdf.close()

As before, each step in this example will now be individually explained. Existing CDF files are opened in read-only mode and must be set to read-write before modification:

>>> cdf.readonly(False)
    False

Then new variables can be added:

>>> cdf['newVar'] = [1.0, 2.0]

Or contents can be changed:

>>> cdf['data'][0] = 8675309

You can write all new data to an existing variable, leaving the variable type, dimensionality, and attributes unchanged:

>>> cdf['Epoch'][...] = [datetime.datetime(2010, 10, 1, 1, val)
...     for val in range(60)]

This is the common usage when using a CDF file containing all the variables and attributes but no data, sometimes called a “master CDF”. Although the [...] makes this explicit (writing new records not a new variable), the same syntax as for a new variable can also be used:

>>> # Either create a new variable or overwrite data in existing
>>> cdf['Epoch'] = [datetime.datetime(2010, 10, 1, 1, val)
...     for val in range(60)]

The new variables appear immediately:

>>> print(cdf)
    Epoch: CDF_TIME_TT2000 [60]
    data: CDF_FLOAT [60]
    newVar: CDF_FLOAT [2]

Closing the CDF ensures changes are written to disk:

>>> cdf.close()

Non record-varying

Non record-varying (NRV) variables are usually used for data that does not vary with time, such as the energy channels for an instrument.

NRV variables need to be created with CDF.new(), specifying the keyword ‘recVary’ as False.

>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF2.cdf', '')
>>> cdf.new('data2', [1], recVary=False)
    <Var:
    CDF_BYTE [1] NRV
    >
>>> cdf['data2'][...]
    [1]

Slicing and indexing

Subsets of data in a variable can be easily referenced with Python’s slicing and indexing notation.

This example uses bisect to read a subset of the data from the hourly data file created in earlier examples.

>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF.cdf')
>>> start = datetime.datetime(2000, 10, 1, 1, 9)
>>> stop = datetime.datetime(2000, 10, 1, 1, 35)
>>> import bisect
>>> start_ind = bisect.bisect_left(cdf['Epoch'], start)
>>> stop_ind = bisect.bisect_left(cdf['Epoch'], stop)
>>> # then grab the data we want
>>> time = cdf['Epoch'][start_ind:stop_ind]
>>> data = cdf['data'][start_ind:stop_ind]
>>> cdf.close()

The Var documentation has several additional examples.

String handling

Changed in version 0.3.0.

Prior to SpacePy 0.3.0, pycdf treated all strings as ASCII-encoded, and would raise errors when writing or reading strings that were not valid ASCII.

Per the NASA CDF library, variable and attribute names must be in ASCII. The contents of CDF_CHAR and CDF_UCHAR were redefined to be UTF-8 as of CDF 3.8.1. As of SpacePy 0.3.0, pycdf treats all CHAR variables with a default encoding of UTF-8. This is true regardless of the version of the underlying CDF library.

UTF-8 is a variable-length encoding, so the number of elements in the variable may not correspond to the number of characters if data are not restricted to the ASCII range.

A different encoding can be specified with the encoding argument to open and this encoding will be used on all reads and writes to that file. Opening a CDF read-write with encoding other than utf-8 or ascii will issue a warning.

Writing strings which cannot be represented in the desired encoding will raise an error. When reading from a CDF, characters which cannot be decoded will be replaced with the Unicode “replacement character” U+FFFD, which usually displays as a question mark.

It is always possible to write raw bytes data to a variable, if it is desired to use a different encoding for one time. For arrays of data, this will usually involve numpy.char.encode():

>>> cdf['Variable'] = data.encode('latin-1')
>>> cdf['Variable'] = numpy.char.encode(data, encoding='latin-1')

All encoding and decoding can also be skipped using the raw_var() method to access a variable; however, without encoding, only bytes can be written to string variables.

Troubleshooting

Cannot load CDF C library

pycdf requires the standard NASA CDF library; it can be installed after SpacePy. See specific instructions for Linux, Mac, and Windows.

The error Cannot load CDF C library indicates pycdf cannot find this library. pycdf searches in locations where the library is installed by default; if the library is not found, set the CDF_LIB environment variable to the directory containing the library file (.dll, .dylib, or .so) before importing pycdf.

ZLIB_ERROR when opening a CDF

The error message ZLIB_ERROR: Error during ZLIB decompression most commonly occurs when opening a CDF which has been compressed with whole-file compression. In this case, it must be unzipped into a temporary location (details are in the CDF User’s Guide).

The temporary location is specified by environment variables, most commonly CDF_TMP. It appears that, particularly on Windows, some installers of the library may set this to a location which is not writeable. In that case, the solution is to change the environment variable to a writeable location.

On Windows, environment variables are set in the System Properties control panel. Click the “Environment Variables” button on the Advanced tab. Usually a good value for CDF_TMP is C:\Users\USERNAME\AppData\Local\Temp. If CDF_TMP is not set, variables TMP and TEMP will be used, so those values are worth checking. Values starting with C:\WINDOWS\system32\config are unlikely to work.

On Unix, including MacOS, CDF_TMP is used if set; otherwise TMPDIR.

Access to CDF constants and the C library

Constants defined in cdf.h and occasionally useful in accessing CDFs are available in the const module.

The underlying C library is represented by the lib variable.

Classes

CDF(pathname[, masterpath, create, ...])

Python object representing a CDF file.

Var(cdf_file, var_name, *args)

A CDF variable.

gAttrList(cdf_file[, special_entry])

Object representing all the gAttributes in a CDF.

zAttrList(zvar)

Object representing all the zAttributes in a zVariable.

zAttr(cdf_file, attr_name[, create])

zAttribute for zVariables within a CDF.

gAttr(cdf_file, attr_name[, create])

Global Attribute for a CDF

AttrList(cdf_file[, special_entry])

Object representing a list of attributes.

Attr(cdf_file, attr_name[, create])

An attribute, g or z, for a CDF

Library([libpath, library])

Abstraction of the base CDF C library and its state.

CDFCopy(cdf)

A dictionary-like copy of all data and attributes in a CDF

VarCopy(zVar)

A list-like copy of the data and attributes in a Var

CDFError(status)

Raised for an error in the CDF library.

CDFException(status)

Base class for errors or warnings in the CDF library.

CDFWarning(status)

Used for a warning in the CDF library.

EpochError

Used for errors in epoch routines

Functions

concatCDF(cdfs[, varnames, raw])

Concatenate data from multiple CDFs

Submodules

const

Various constants defined in cdf.h and used in pycdf.

istp

Support for ISTP-compliant CDFs

Data

spacepy.pycdf.lib

Module global Library object.

Initalized at pycdf load time so all classes have ready access to the CDF library and a common state. E.g:

>>> from spacepy import pycdf
>>> pycdf.lib.version
    (3, 3, 0, ' ')