pycdf - Python interface to CDF files¶
This package provides a Python interface to the Common Data Format (CDF) library used for many NASA missions, available at http://cdf.gsfc.nasa.gov/. It is targeted at Python 2.6+ and should work without change on either Python 2 or Python 3.
The interface is intended to be ‘pythonic’ rather than reproducing the
C interface. To open or close a CDF and access its variables, see the
class. Accessing data within the variables is via the
lib object provides access to some routines
that affect the functionality of the library in general. The
const module contains constants useful for accessing
the underlying library.
The CDF C library must be properly installed in order to use this package.
The CDF distribution provides scripts meant to be called in a user’s
definitions.B for bash and
definitions.C for C-shell
derivatives. (See the installation instructions which come with the CDF library.)
These will set environment variables specifying the location
of the library; pycdf will respect these variables if they are set. Otherwise
it will search the standard system library path and the default installation
locations for the CDF library.
If pycdf has trouble finding the library, try setting
CDF_LIB before importing
the module, e.g. if the library is in
CDF/lib in the user’s home directory:
>>> import os >>> os.environ["CDF_LIB"] = "~/CDF/lib" >>> from spacepy import pycdf
If this works, make the environment setting permanent. Note that on OSX,
using plists to set the environment may not carry over to Python terminal
Authors: Jon Niehof
Institution: University of New Hampshire
Copyright 2010-2015 Los Alamos National Security, LLC.
This example presents the entire sequence of creating a CDF and populating it with some data; the parts are explained individually below.
>>> from spacepy import pycdf >>> import datetime >>> time = [datetime.datetime(2000, 10, 1, 1, val) for val in range(60)] >>> import numpy as np >>> data = np.random.random_sample(len(time)) >>> cdf = pycdf.CDF('MyCDF.cdf', '') >>> cdf['Epoch'] = time >>> cdf['data'] = data >>> cdf.attrs['Author'] = 'John Doe' >>> cdf.attrs['CreateDate'] = datetime.datetime.now() >>> cdf['data'].attrs['units'] = 'MeV' >>> cdf.close()
Import the pycdf module.
>>> from spacepy import pycdf
Make a data set of
datetime. These will be converted into
>>> import datetime >>> # make a dataset every minute for a hour >>> time = [datetime.datetime(2000, 10, 1, 1, val) for val in range(60)]
If you create a CDF in backwards compatibility mode (using
datetime objects are degraded to CDF_EPOCH
(millisecond resolution), not CDF_EPOCH16 (microsecond resolution).
new() to specify a data type.
Create some random data.
>>> import numpy as np >>> data = np.random.random_sample(len(time))
Create a new empty CDF. The empty string, ‘’, is the name of the CDF to use as a master; given an empty string, an empty CDF will be created, rather than copying from a master CDF. If a master is used, data in the master will be copied to the new CDF.
>>> cdf = pycdf.CDF('MyCDF.cdf', '')
You cannot create a new CDF with a name that already exists on disk.
It will throw a
To put data into a CDF, assign it directly to an element of the CDF. CDF objects behave like Python dictionaries.
>>> # put time into CDF variable Epoch >>> cdf['Epoch'] = time >>> # and the same with data (the smallest data type that fits the data is used by default) >>> cdf['data'] = data
Adding attributes is done similarly. CDF attributes are also treated as dictionaries.
>>> # add some attributes to the CDF and the data >>> cdf.attrs['Author'] = 'John Doe' >>> cdf.attrs['CreateDate'] = datetime.datetime.now() >>> cdf['data'].attrs['units'] = 'MeV'
Closing the CDF ensures the new data are written to disk:
CDF files, like standard Python files, act as context managers
>>> with cdf.CDF('filename.cdf', '') as cdf_file: ... #do brilliant things with cdf_file >>> #cdf_file is automatically closed here
Reading a CDF is very similar: the CDF object behaves like a dictionary. The file is only accessed when data are requested. A full example using the above CDF:
>>> from spacepy import pycdf >>> cdf = pycdf.CDF('MyCDF.cdf') >>> print(cdf) Epoch: CDF_TIME_TT2000  data: CDF_FLOAT  >>> cdf['data'] 0.8609974384307861 >>> data = cdf['data'][...] # don't forget the [...] >>> cdf_dat = cdf.copy() >>> cdf_dat.keys() ['Epoch', 'data'] >>> cdf.close()
Again import the pycdf module
>>> from spacepy import pycdf
Then open the CDF, this looks the same and creation, but without mention of a master CDF.
>>> cdf = pycdf.CDF('MyCDF.cdf')
__repr__() behavior explains the contents, type, and size but not the data.
>>> print(cdf) Epoch: CDF_TIME_TT2000  data: CDF_FLOAT 
To access the data one has to request specific elements of the variable, similar to a Python list.
>>> cdf['data'] 0.8609974384307861 >>> data = cdf['data'][...] # don't forget the [...]
>>> cdf_dat = cdf.copy()
Since CDF objects behave like dictionaries they have a
keys() method and iterations are over the names in
>>> cdf_dat.keys() ['Epoch', 'data']
Close the CDF when finished:
An example modifying the CDF created above:
>>> from spacepy import pycdf >>> cdf = pycdf.CDF('MyCDF.cdf') >>> cdf.readonly(False) False >>> cdf['newVar'] = [1.0, 2.0] >>> print(cdf) Epoch: CDF_TIME_TT2000  data: CDF_FLOAT  newVar: CDF_FLOAT  >>> cdf.close()
As before, each step in this example will now be individually explained. Existing CDF files are opened in read-only mode and must be set to read-write before modification:
>>> cdf.readonly(False) False
Then new variables can be added:
>>> cdf['newVar'] = [1.0, 2.0]
Or contents can be changed:
>>> cdf['data'] = 8675309
You can write all new data to an existing variable, leaving the variable type, dimensionality, and attributes unchanged:
>>> cdf['Epoch'][...] = [datetime.datetime(2010, 10, 1, 1, val) ... for val in range(60)]
This is the common usage when using a CDF file containing all the
variables and attributes but no data, sometimes called a “master
CDF”. Although the
[...] makes this explicit (writing new records
not a new variable), the same syntax as for a new variable can also be
>>> # Either create a new variable or overwrite data in existing >>> cdf['Epoch'] = [datetime.datetime(2010, 10, 1, 1, val) ... for val in range(60)]
The new variables appear immediately:
>>> print(cdf) Epoch: CDF_TIME_TT2000  data: CDF_FLOAT  newVar: CDF_FLOAT 
Closing the CDF ensures changes are written to disk:
Non record-varying (NRV) variables are usually used for data that does not vary with time, such as the energy channels for an instrument.
NRV variables need to be created with
CDF.new(), specifying the keyword ‘recVary’ as False.
>>> from spacepy import pycdf >>> cdf = pycdf.CDF('MyCDF2.cdf', '') >>> cdf.new('data2', , recVary=False) <Var: CDF_BYTE  NRV > >>> cdf['data2'][...] 
Subsets of data in a variable can be easily referenced with Python’s slicing and indexing notation.
This example uses
bisect to read a subset of the data from the
hourly data file created in earlier examples.
>>> from spacepy import pycdf >>> cdf = pycdf.CDF('MyCDF.cdf') >>> start = datetime.datetime(2000, 10, 1, 1, 9) >>> stop = datetime.datetime(2000, 10, 1, 1, 35) >>> import bisect >>> start_ind = bisect.bisect_left(cdf['Epoch'], start) >>> stop_ind = bisect.bisect_left(cdf['Epoch'], stop) >>> # then grab the data we want >>> time = cdf['Epoch'][start_ind:stop_ind] >>> data = cdf['data'][start_ind:stop_ind] >>> cdf.close()
Var documentation has several additional examples.
Changed in version 0.3.0.
Prior to SpacePy 0.3.0, pycdf treated all strings as ASCII-encoded, and would raise errors when writing or reading strings that were not valid ASCII.
Per the NASA CDF library, variable and attribute names must be in
ASCII. The contents of
CDF_UCHAR were redefined
to be UTF-8 as of CDF 3.8.1. As of SpacePy 0.3.0, pycdf treats all
CHAR variables with a default encoding of UTF-8. This is true
regardless of the version of the underlying CDF library.
UTF-8 is a variable-length encoding, so the number of elements in the variable may not correspond to the number of characters if data are not restricted to the ASCII range.
A different encoding can be specified with the
open and this encoding will be used on
all reads and writes to that file. Opening a CDF read-write with
encoding other than
ascii will issue a warning.
Writing strings which cannot be represented in the desired encoding will raise an error. When reading from a CDF, characters which cannot be decoded will be replaced with the Unicode “replacement character” U+FFFD, which usually displays as a question mark.
It is always possible to write raw bytes data to a variable, if it is
desired to use a different encoding for one time. For arrays of data,
this will usually involve
>>> cdf['Variable'] = data.encode('latin-1') >>> cdf['Variable'] = numpy.char.encode(data, encoding='latin-1')
Cannot load CDF C library indicates pycdf cannot find
this library. pycdf searches in locations where the library is
installed by default; if the library is not found, set the
environment variable to the directory containing the library file
(.dll, .dylib, or .so) before importing pycdf.
The error message
ZLIB_ERROR: Error during ZLIB decompression most
commonly occurs when opening a CDF which has been compressed with
whole-file compression. In this case, it must be unzipped into a
temporary location (details are in the CDF User’s Guide).
The temporary location is specified by environment variables, most
CDF_TMP. It appears that, particularly on Windows, some
installers of the library may set this to a location which is not
writeable. In that case, the solution is to change the environment
variable to a writeable location.
On Windows, environment variables are set in the System Properties
control panel. Click the “Environment Variables” button on the
Advanced tab. Usually a good value for
CDF_TMP is not
TEMP will be used, so those values are
worth checking. Values starting with
are unlikely to work.
On Unix, including MacOS,
CDF_TMP is used if set; otherwise
Constants defined in cdf.h and occasionally useful in accessing CDFs are
available in the
The underlying C library is represented by the
Python object representing a CDF file.
A CDF variable.
Object representing all the gAttributes in a CDF.
Object representing all the zAttributes in a zVariable.
zAttribute for zVariables within a CDF.
Global Attribute for a CDF
Object representing a list of attributes.
An attribute, g or z, for a CDF
Abstraction of the base CDF C library and its state.
A dictionary-like copy of all data and attributes in a
A list-like copy of the data and attributes in a
Raised for an error in the CDF library.
Base class for errors or warnings in the CDF library.
Used for a warning in the CDF library.
Used for errors in epoch routines
Concatenate data from multiple CDFs
Various constants defined in cdf.h and used in pycdf.
Support for ISTP-compliant CDFs