pycdf - Python interface to CDF files¶
Contents¶
See also the full API documentation
.
Create a CDF¶
This example presents the entire sequence of creating a CDF and populating it with some data; the parts are explained individually below.
>>> from spacepy import pycdf
>>> import datetime
>>> time = [datetime.datetime(2000, 10, 1, 1, val) for val in range(60)]
>>> import numpy as np
>>> data = np.random.random_sample(len(time))
>>> cdf = pycdf.CDF('MyCDF.cdf', '')
>>> cdf['Epoch'] = time
>>> cdf['data'] = data
>>> cdf.attrs['Author'] = 'John Doe'
>>> cdf.attrs['CreateDate'] = datetime.datetime.now()
>>> cdf['data'].attrs['units'] = 'MeV'
>>> cdf.close()
Import the pycdf module.
>>> from spacepy import pycdf
Make a data set of datetime
. These will be converted into
CDF_TIME_TT2000 types.
>>> import datetime
>>> # make a dataset every minute for a hour
>>> time = [datetime.datetime(2000, 10, 1, 1, val) for val in range(60)]
Warning
A CDF can be created in backward compatibility mode for reading with
CDF library before 3.0, using set_backward()
.
Then datetime
objects are degraded to CDF_EPOCH
(millisecond resolution), not CDF_EPOCH16 (microsecond resolution).
Use new()
to specify a data type.
Create some random data.
>>> import numpy as np
>>> data = np.random.random_sample(len(time))
Create a new empty CDF. The empty string, ‘’, is the name of the CDF to use as a master; given an empty string, an empty CDF will be created, rather than copying from a master CDF. If a master is used, data in the master will be copied to the new CDF.
>>> cdf = pycdf.CDF('MyCDF.cdf', '')
Note
You cannot create a new CDF with a name that already exists on disk.
It will throw a NameError
To put data into a CDF, assign it directly to an element of the CDF. CDF objects behave like Python dictionaries.
>>> # put time into CDF variable Epoch
>>> cdf['Epoch'] = time
>>> # and the same with data (the smallest data type that fits the data is used by default)
>>> cdf['data'] = data
Adding attributes is done similarly. CDF attributes are also treated as dictionaries.
>>> # add some attributes to the CDF and the data
>>> cdf.attrs['Author'] = 'John Doe'
>>> cdf.attrs['CreateDate'] = datetime.datetime.now()
>>> cdf['data'].attrs['units'] = 'MeV'
Closing the CDF ensures the new data are written to disk:
>>> cdf.close()
CDF files, like standard Python files, act as context managers
>>> with cdf.CDF('filename.cdf', '') as cdf_file:
... #do brilliant things with cdf_file
>>> #cdf_file is automatically closed here
Read a CDF¶
Reading a CDF is very similar: the CDF object behaves like a dictionary. The file is only accessed when data are requested. A full example using the above CDF:
>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF.cdf')
>>> print(cdf)
Epoch: CDF_TIME_TT2000 [60]
data: CDF_FLOAT [60]
>>> cdf['data'][4]
0.8609974384307861
>>> data = cdf['data'][...] # don't forget the [...]
>>> cdf_dat = cdf.copy()
>>> cdf_dat.keys()
['Epoch', 'data']
>>> cdf.close()
Again import the pycdf module
>>> from spacepy import pycdf
Then open the CDF, this looks the same and creation, but without mention of a master CDF.
>>> cdf = pycdf.CDF('MyCDF.cdf')
The default __str__()
and __repr__()
behavior explains the contents, type, and size but not the data.
>>> print(cdf)
Epoch: CDF_TIME_TT2000 [60]
data: CDF_FLOAT [60]
To access the data one has to request specific elements of the variable, similar to a Python list.
>>> cdf['data'][4]
0.8609974384307861
>>> data = cdf['data'][...] # don't forget the [...]
CDF.copy()
will return the entire contents of a CDF, including
attributes, as a SpaceData
object:
>>> cdf_dat = cdf.copy()
Since CDF objects behave like dictionaries they have a keys()
method and iterations are over the names in keys()
>>> cdf_dat.keys()
['Epoch', 'data']
Close the CDF when finished:
>>> cdf.close()
Modify a CDF¶
An example modifying the CDF created above:
>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF.cdf')
>>> cdf.readonly(False)
False
>>> cdf['newVar'] = [1.0, 2.0]
>>> print(cdf)
Epoch: CDF_TIME_TT2000 [60]
data: CDF_FLOAT [60]
newVar: CDF_FLOAT [2]
>>> cdf.close()
As before, each step in this example will now be individually explained. Existing CDF files are opened in read-only mode and must be set to read-write before modification:
>>> cdf.readonly(False)
False
Then new variables can be added:
>>> cdf['newVar'] = [1.0, 2.0]
Or contents can be changed:
>>> cdf['data'][0] = 8675309
You can write all new data to an existing variable, leaving the variable type, dimensionality, and attributes unchanged:
>>> cdf['Epoch'][...] = [datetime.datetime(2010, 10, 1, 1, val)
... for val in range(60)]
This is the common usage when using a CDF file containing all the
variables and attributes but no data, sometimes called a “master
CDF”. Although the [...]
makes this explicit (writing new records
not a new variable), the same syntax as for a new variable can also be
used:
>>> # Either create a new variable or overwrite data in existing
>>> cdf['Epoch'] = [datetime.datetime(2010, 10, 1, 1, val)
... for val in range(60)]
The new variables appear immediately:
>>> print(cdf)
Epoch: CDF_TIME_TT2000 [60]
data: CDF_FLOAT [60]
newVar: CDF_FLOAT [2]
Closing the CDF ensures changes are written to disk:
>>> cdf.close()
Non record-varying¶
Non record-varying (NRV) variables are usually used for data that does not vary with time, such as the energy channels for an instrument.
NRV variables need to be created with CDF.new()
, specifying the keyword ‘recVary’ as False.
>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF2.cdf', '')
>>> cdf.new('data2', [1], recVary=False)
<Var:
CDF_BYTE [1] NRV
>
>>> cdf['data2'][...]
[1]
Slicing and indexing¶
Subsets of data in a variable can be easily referenced with Python’s slicing and indexing notation.
This example uses bisect
to read a subset of the data from the
hourly data file created in earlier examples.
>>> from spacepy import pycdf
>>> cdf = pycdf.CDF('MyCDF.cdf')
>>> start = datetime.datetime(2000, 10, 1, 1, 9)
>>> stop = datetime.datetime(2000, 10, 1, 1, 35)
>>> import bisect
>>> start_ind = bisect.bisect_left(cdf['Epoch'], start)
>>> stop_ind = bisect.bisect_left(cdf['Epoch'], stop)
>>> # then grab the data we want
>>> time = cdf['Epoch'][start_ind:stop_ind]
>>> data = cdf['data'][start_ind:stop_ind]
>>> cdf.close()
The Var
documentation has several additional examples.
String handling¶
Changed in version 0.3.0.
Prior to SpacePy 0.3.0, pycdf treated all strings as ASCII-encoded, and would raise errors when writing or reading strings that were not valid ASCII.
Per the NASA CDF library, variable and attribute names must be in
ASCII. The contents of CDF_CHAR
and CDF_UCHAR
were redefined
to be UTF-8 as of CDF 3.8.1. As of SpacePy 0.3.0, pycdf treats all
CHAR
variables with a default encoding of UTF-8. This is true
regardless of the version of the underlying CDF library.
UTF-8 is a variable-length encoding, so the number of elements in the variable may not correspond to the number of characters if data are not restricted to the ASCII range.
A different encoding can be specified with the encoding
argument
to spacepy.pycdf.CDF.open()
and this encoding will be used on
all reads and writes to that file. Opening a CDF read-write with
encoding
other than utf-8
or ascii
will issue a warning.
Writing strings which cannot be represented in the desired encoding will raise an error. When reading from a CDF, characters which cannot be decoded will be replaced with the Unicode “replacement character” U+FFFD, which usually displays as a question mark.
It is always possible to write raw bytes data to a variable, if it is
desired to use a different encoding for one time. For arrays of data,
this will usually involve numpy.char.encode()
:
>>> cdf['Variable'] = data.encode('latin-1')
>>> cdf['Variable'] = numpy.char.encode(data, encoding='latin-1')
All encoding and decoding can also be skipped using the
raw_var()
method to access a variable;
however, without encoding, only bytes
can be written to
string variables.
Troubleshooting¶
Cannot load CDF C library¶
pycdf requires the standard NASA CDF library 3.5 or later; it can be installed after SpacePy. Generally this is only necessary if installing SpacePy from source. See specific instructions for Linux, Mac, and Windows.
The error Cannot load CDF C library
indicates pycdf cannot find
this library. pycdf searches in locations where the library is
installed by default; if the library is not found, set the CDF_LIB
environment variable to the directory containing the library file
(.dll, .dylib, or .so) before importing pycdf.
ZLIB_ERROR when opening a CDF¶
The error message ZLIB_ERROR: Error during ZLIB decompression
most
commonly occurs when opening a CDF which has been compressed with
whole-file compression. In this case, it must be unzipped into a
temporary location (details are in the CDF User’s Guide).
The temporary location is specified by environment variables, most
commonly CDF_TMP
. It appears that, particularly on Windows, some
installers of the library may set this to a location which is not
writeable. In that case, the solution is to change the environment
variable to a writeable location.
On Windows, environment variables are set in the System Properties
control panel. Click the “Environment Variables” button on the
Advanced tab. Usually a good value for CDF_TMP
is
C:\Users\USERNAME\AppData\Local\Temp
. If CDF_TMP
is not
set, variables TMP
and TEMP
will be used, so those values are
worth checking. Values starting with C:\WINDOWS\system32\config
are unlikely to work.
On Unix, including MacOS, CDF_TMP
is used if set; otherwise
TMPDIR
.
Access to CDF constants and the C library¶
Constants defined in cdf.h and occasionally useful in accessing CDFs are
available in the const
module.
The underlying C library is represented by the lib
variable.