************************************************* Paulikas and Blake revisited (Reeves et al. 2011) ************************************************* This case study reproduces the figures of Reeves et al. (2011), "On the relationship between relativistic electron flux and solar wind velocity: Paulikas and Blake revisited" (`doi:10.1029/2010JA015735 `_). Setup ===== Create a directory to hold files for this case study. Within this directory, create subdirectories ``code``, ``data``, and ``plots``. (Using version control on the code directory is recommended; the SpacePy team uses `git `_.) Obtaining energetic particle data ================================= We require the 1.8-3.5 MeV electron flux from the LANL-GEO ESP detector, available in the paper's auxiliary material (scroll down to "Supporting information" on the `paper's page `_. The ESP data are in Data Set S1. Save this file to the ``data`` directory; the filename is assumed to be ``jgra20797-sup-0003-ds01.txt``. The data file was corrupted on upload to AGU, and the code to fix it is non-trivial, so this is a good chance to learn how to run someone else's code. (:ref:`appendix` has step-by-step information on each portion of this process.) Copy all of the following and paste it into a file called ``fix_esp_data.py`` in the ``code`` directory. .. code-block:: python import os.path datadir = os.path.join('..', 'data') in_name = os.path.join(datadir, 'jgra20797-sup-0003-ds01.txt') out_name = os.path.join(datadir, 'jgra20797-sup-0003-ds01_FIXED.txt') infile = open(in_name, 'r') outfile = open(out_name, 'w') data = infile.read() infile.close() data = data.replace('\r', '\n') data = data.replace('\n\n', '\n') data = data.split('\n') for i in range(15): outfile.write(data.pop(0) + '\n') oldline = None for line in data: if line[0:2] in ['19', '20', '2']: if not oldline is None: outfile.write(oldline + '\n') oldline = line else: oldline += line outfile.write(oldline + '\n') outfile.close() Now this script can be run with ``python fix_esp_data.py``. It should create a file called ``jgra20797-sup-0003-ds01_FIXED.txt`` in the ``data`` directory. File fixed, we can load and begin examining the data. Change to the ``code`` directory and start your Python interpreter. (`IPython `_ is recommended, but not required.) In the following examples, do not type the leading ``>>>``; this is the Python interpreter prompt. IPython has a different prompt that looks like ``In [1]``. >>> import os.path >>> datadir = os.path.join('..', 'data') >>> print(datadir) ../data The first line imports the :py:mod:`os.path` module from the Python standard library. Python has a huge `standard library `_. To keep this code organized, it is divided into many modules, and a module must be imported before it can be used. (The `Python module of the week `_ is a great way to explore the standard library.) The second line makes a variable, ``datadir``, which will contain the path of the data directory. The :py:func:`os.path.join` function provides a portable way of "gluing" together directories in a path, and will use backslashes on Windows and forward slashes on Unix. The third line then prints out the value of this variable for confirmation; note this is a Unix system. Note that string constants in Python can use single or double quotes; we could just as well have written: >>> datadir = os.path.join("..", "data") or even: >>> datadir = os.path.join('..', "data") The full path can also be used (and this is a better case for using a variable.) For example, I am preparing this example in a directory ``reeves_morley_friedel_2011`` in my home directory, so I could use: >>> datadir = os.path.join('home', 'jniehof', 'reeves_morley_friedel_2011', ... 'data') This very long line can be typed across two lines in Python, and because the line break happens within parentheses, a line continuation character is not required. Returning to reading the ESP data file: >>> fname = os.path.join(datadir, 'jgra20797-sup-0003-ds01_FIXED.txt') creates a variable holding the full path to the fixed file. >>> import numpy The import statement imports any installed `module `_, just as if it were in the standard library. Here we import the very useful :mod:`numpy` module, which is a prerequisite for SpacePy and useful in its own right. >>> esp_fluxes = numpy.loadtxt(fname, skiprows=14, usecols=[1]) :func:`~numpy.loadtxt` makes it easy to load data from a file into a numpy :class:`~numpy.ndarray`, a very useful data container. ``skiprows`` skips the header information, and specifying only column 1 (first column is column 0) with ``usecols`` will only load the fluxes for 1.8-3.5MeV. We only load the fluxes at this point because they can be represented as floats, which numpy arrays store very efficiently. >>> import datetime The :mod:`datetime` module provides Python objects which can manipulate dates and times and have some understanding of the meanings of dates, making for easy comparisons between dates, date arithmetic, and other useful features. >>> convert = lambda x: datetime.datetime.strptime(x, '%Y-%m-%d') This line sets up a converter to be used later. :meth:`~datetime.datetime.strptime` creates a :class:`~datetime.datetime` from a string, given a format definition (here specified as year-month-day). So: >>> print(datetime.datetime.strptime('2010-01-02', '%Y-%m-%d')) 2010-01-02 00:00:00 `lambda `_ is a simple shortcut for a one-liner function; wherever ``convert(x)`` is used after the definition, it functions like ``datetime.datetime.strptime(x, '%Y-%m-%d')``. This makes it easier to parse a date string without specifying the format all the time: >>> print(convert('2010-01-02')) This converter can be used with :func:`~numpy.loadtxt`: >>> esp_times = numpy.loadtxt(fname, skiprows=14, usecols=[0], ... converters={0: convert}, dtype=numpy.object) The ``converters`` option takes a Python `dictionary `_. The default `dtype `_ is float, which cannot store datetimes; using ``numpy.object`` allows storage of any Python object. Since it would be useful to be able to load the data without typing so many lines, create a file called ``common.py`` in the ``code`` directory with the following contents: .. code-block:: python import datetime import os.path import numpy datadir = os.path.join('..', 'data') def load_esp(): fname = os.path.join(datadir, 'jgra20797-sup-0003-ds01_FIXED.txt') esp_fluxes = numpy.loadtxt(fname, skiprows=14, usecols=[1]) convert = lambda x: datetime.datetime.strptime(x, '%Y-%m-%d') esp_times = numpy.loadtxt(fname, skiprows=14, usecols=[0], converters={0: convert}, dtype=numpy.object) return (esp_times, esp_fluxes) All needed imports are at the top of the file, with one blank line between standard library imports and other imports and two blank lines after them. ``datadir`` is defined as a global variable, outside of the function (but notice that it is available to the ``load_esp`` function.) The rest of the file defines a `function `_ which returns the dates and fluxes in a `tuple `_. The next section shows how to use this function. Solar Wind data and averaging ============================= The top panel of figure 1 shows the ESP fluxes overplotted with the solar wind velocity. Fortunately, the :mod:`~spacepy.omni` module of SpacePy provides an interface to the hourly solar wind dataset, OMNI. :func:`~spacepy.omni.get_omni` returns data for a particular set of times. In this case, we want hourly data, covering 1989 through 2010 (we'll cut it down to size later). :func:`~spacepy.time.tickrange` allows us to specify a start time, stop time, and time step. >>> import spacepy.omni >>> import spacepy.time >>> times = spacepy.time.tickrange('1989-01-01', '2011-01-01', ... datetime.timedelta(hours=1)) >>> d = spacepy.omni.get_omni(times) >>> vsw = d['velo'] >>> vsw_times = d['UTC'] We'll also load the esp data: >>> import common >>> esp_times, esp_flux = common.load_esp() Even though we have not installed ``common.py``, the ``import`` statement finds it because it is in the current directory. ``load_esp`` returns a `tuple `_, which can be *unpacked* into separate variables. Now we need to produce 27-day running averages of both the flux and the solar wind speed. Fortunately there are no gaps in the time series: >>> import numpy >>> d = numpy.diff(vsw_times) >>> print(d.min()) 1:00:00 >>> print(d.max()) 1:00:00 >>> d = numpy.diff(esp_times) >>> print(d.min()) 1 day, 0:00:00 >>> print(d.max()) 1 day, 0:00:00 :func:`numpy.diff` returns the difference between every element of an array and the previous element. :meth:`~numpy.ndarray.min` and :meth:`~numpy.ndarray.max` do exactly what they sound like. So this code confirms that every time in the vsw data is on a continuous one hour cadence, and the ESP data is on a continuous one day cadence. >>> import scipy.stats >>> esp_flux_av = numpy.empty(shape=esp_flux.shape, dtype=esp_flux.dtype) >>> for i in range(len(esp_flux_av)): ... esp_flux_av[i] = scipy.stats.nanmean(esp_flux[max(i - 13, 0):i + 14]) :func:`numpy.empty` creates an empty array, taking the ``shape`` and ``dtype`` from the ``esp_flux`` array. ``empty`` does not initialize the data in the array, so it is essentially random junk; use :func:`~numpy.zeros` to create an array filled with zeros. :func:`len` returns the length of an array, and :func:`range` then iterates over each number from 0 to length minus 1, i.e. the entire array. Each element is then set to a 27-day average: from 13 days before a day's measurement through 13 days after. (Python slices do not include the last element listed; they are half-open). Note that these slices can happily run off the end of the ``esp_flux`` array, but we use :func:`max` to ensure the first index does not go negative. (Negative indices have special meaning in Python.) :func:`~scipy.stats.nanmean` takes the mean of a numpy array, but skips any elements with a value of "not a number" (nan), which is often used for fill. (This is our first exposure to the :mod:`scipy` module.) For the solar wind averaging, the times need to cover the 24 * 13.5 = 324 hours previous, and 324 hours following (non-inclusive). There is also a more efficient way than using an explicit loop: >>> vsw_av = numpy.fromiter((scipy.stats.nanmean(vsw[max(0, i - 324):i + 324]) ... for i in range(len(vsw))), ... count=len(vsw), dtype=vsw.dtype) :func:`~numpy.fromiter` makes a numpy array from an `iterator `_, which is like a list except that it holds information on generating each element in a sequence rather than creating the entire sequence. ``count`` provides numpy with the number of elements in the output (so it can make the entire array at once); ``dtype`` here is just copied from the input. The type of iterator used here is a `generator expression `_, closely related to a `list comprehension `_. These are among the most powerful and most difficult to understand concepts in Python. An illustrative, although not useful, example: >>> for i in (x + 1 for x in range(10)): ... print(i) Here ``(x + 1 for x in range(10))`` is a generator expression that creates an iterator, which will return the numbers 1 through 10. At no point is the complete list of all numbers constructed, saving memory. In our calculation of ``esp_flux_av``, we created an explicit loop in Python. The generator expression used to compute ``vsw_av`` has no explicit loop, and the actual looping is handled in (much faster) compiled C code. Making Figure 1 =============== To actually plot, we need access to the :mod:`~matplotlib.pyplot` module: >>> import matplotlib.pyplot as plt >>> plt.ion() This alternate form of the import statement shouldn't be overused (it can make code harder to read by masking the origin of functions), but is conventional for matplotlib. :func:`~matplotlib.pyplot.ion` turns on interactive mode so plots appear and are updated as they're created. >>> plt.semilogy(esp_times, 10 ** esp_flux_av, 'b') >>> plt.draw() >>> plt.draw() :func:`~matplotlib.pyplot.semilogy` creates a semilog plot, log on the Y axis. The first two arguments are a list of X and Y values; after that there are many options to specify formatting (such as the color, used here.) The ESP fluxes are stored as the log of the flux; ``**`` is the exponentiation operator so the (geometric!) average is plotted properly. :func:`~matplotlib.pyplot.draw` draws the updated plot; sometimes it needs to be called repeatedly. Use it whenever you want the plot updated; it will not be included from here on. >>> plt.xlabel('Year', weight='bold') >>> plt.ylabel('Electron Flux\n1.8-3.5 MeV', color='blue', weight='bold') >>> plt.ylim(1e-2, 10) (0.01, 10) :func:`~matplotlib.pyplot.xlabel` and :func:`~matplotlib.pyplot.ylabel` set the labels for the axes. Note the newline (``\n``) in the string for the Y label. :func:`~matplotlib.pyplot.ylim` sets the lower and upper limits for the Y axis; there is, of course, :func:`~matplotlib.pyplot.xlim` as well. These are the simplest, although not most flexible, ways to work with plots. To produce the full Figure 1, we'll move out of interactive mode: >>> plt.ioff() >>> plt.show() :func:`~matplotlib.pyplot.ioff` turns off interactive mode. Once interactive mode is off, :func:`~matplotlib.pyplot.show` displays the full plot, including controls for panning, zooming, etc. Until the plot is closed, nothing further can happen in the Python window. >>> fig = plt.figure(figsize=[11, 8.5]) :func:`~matplotlib.pyplot.figure` creates a new :class:`~matplotlib.figure.Figure`; the size specified here is US-letter paper, landscape orientation. >>> ax = fig.add_subplot(111) :meth:`~matplotlib.figure.Figure.add_subplot` creates an :class:`~matplotlib.axes.Axes` object, which can contain an actual plot. ``111`` here means that the figure will have 1 subplot and the new subplot should be in position (1, 1); more on this later. >>> fluxline = ax.plot(esp_times, 10 ** esp_flux_av, 'b') :meth:`~matplotlib.axes.Axes.plot` puts the relevant data into the plot; again specifying a blue line. It returns a list of :class:`~matplotlib.lines.Line2D` objects, which we save for later use. >>> ax.set_yscale('log') :meth:`~matplotlib.axes.Axes.set_yscale` switches the Y axis between log and linear (:meth:`~matplotlib.axes.Axes.set_xscale` for the X axis). >>> ax.set_ylim(1e-2, 10) >>> ax.set_xlabel('Year', weight='bold') >>> ax.set_ylabel('Electron Flux\n1.8-3.5 MeV', color='b', weight='bold') :meth:`~matplotlib.axes.Axes.set_ylim` (and :meth:`~matplotlib.axes.Axes.set_xlim`), :meth:`~matplotlib.axes.Axes.set_xlabel`, and :meth:`~matplotlib.axes.Axes.set_ylabel` function much as above, but operate on a particular :class:`~matplotlib.axes.Axes` object. >>> ax2 = ax.twinx() :meth:`~matplotlib.axes.Axes.twinx` establishes a second Y axis (two values twinned on one X axis) on the same plot. >>> vswline = ax2.plot(vsw_times, vsw_av, 'r') >>> ax2.set_ylim(300, 650) >>> ax2.set_ylabel('Solar Wind Speed', color='r', rotation=270, weight='bold') The resulting :class:`~matplotlib.axes.Axes` object has all the methods that we've used before. Note ``rotation`` on :meth:`~matplotlib.axes.Axes.set_ylabel` to make the text run top-to-bottom rather than bottom-to-top. >>> ax.set_xlim(esp_times[0], esp_times[-1]) Since the solar wind data extends beyond the ESP data, this sets the X axis to match the ESP data. Note ``-1`` to refer to the last element of the array. >>> leg = ax.legend([fluxline[0], vswline[0]], ['Flux', 'Vsw'], ... loc='upper left', frameon=False) :meth:`~matplotlib.axes.Axes.legend`, as may be expected, creates a :class:`~matplotlib.legend.Legend` on the axes. The first parameter is a list of the matplotlib objects to make a legend for; since the plotting commands return these, we can pass them back in. Each plotting command returns a *list*. In this case we just take the 0th element of each list since we know there's only one line from each plotting command. The second parameter is the text used to annotate each line. >>> fluxtext, vswtext = leg.get_texts() >>> fluxtext.set_color(fluxline[0].get_color()) >>> vswtext.set_color(vswline[0].get_color()) The default text color is black, so we use :meth:`~matplotlib.legend.Legend.get_texts` to get the :class:`~matplotlib.text.Text` objects for the annotations. Again, we know there are two (we just created the legend). Then :meth:`~matplotlib.text.Text.set_color` sets the color based on the the existing color for each line (:meth:`~matplotlib.lines.Line2D.get_color`). To see the results: >>> plt.show() Close the window when done. Now we want to save the output: >>> fig_fname = os.path.join('..', 'plots', 'fig1a.eps') >>> fig.savefig(fig_fname) :meth:`~matplotlib.figure.Figure.savefig` saves the figure, in this case as an encapsulated PostScript file (to the ``plots`` directory). Let's tweak a few things. For one, there's a lot of padding around the figure, which can make it difficult to properly scale for publication. The way around this is to specify a :class:`~matplotlib.transforms.Bbox` (bounding box), basically the lower left and upper right corners (in inches) to include in the saved figure. Getting this right tends to be a matter of trial and error. (:meth:`~matplotlib.figure.Figure.get_tightbbox` is supposed to help with this, but it doesn't quite work yet.) >>> import matplotlib.transforms >>> bob = matplotlib.transforms.Bbox([[0.52, 0.35], [10.5, 7.95]]) >>> fig.savefig(fig_fname, bbox_inches=bob, pad_inches=0.0) Better, but all the text is awfully small. Once the figure is fit in the paper it'll be really small. And the font isn't that great. >>> import matplotlib >>> matplotlib.rcParams['axes.unicode_minus'] = False >>> matplotlib.rcParams['text.usetex']= True >>> matplotlib.rcParams['font.family'] = 'serif' >>> matplotlib.rcParams['font.size'] = 14 >>> bob = matplotlib.transforms.Bbox([[0.4, 0.35], [10.7, 7.95]]) >>> fig.savefig(fig_fname, bbox_inches=bob, pad_inches=0.0) Now the font is bigger and it's rendered using TeX, which should match the body of the paper better (assuming the paper is in LaTeX). The larger font means tweaking the bounding box. ``unicode_minus`` fixes a problem where negative numbers on the axis don't render properly in TeX. Matplotlib has many more options for `customization `_. The end result is a nice figure that can be printed full-size, put in a PDF, or included directly in a paper. Now we need the bottom half of Figure 1. From `SIDC `_, download the "Monthly mean total sunspot number" (``monthssn.dat``). Put it in the ``data`` directory. >>> import bisect >>> import datetime >>> monthfile = os.path.join(common.datadir, 'monthssn.dat') >>> convert = lambda x: datetime.datetime.strptime(x, '%Y%m') >>> ssn_data = numpy.genfromtxt(monthfile, skip_header=2400, usecols=[0, 2, 3], ... converters={0: convert}, dtype=numpy.object, ... skip_footer=24) >>> idx = bisect.bisect_left(ssn_data[:, 0], datetime.datetime(1989, 1, 1)) >>> ssn_data = ssn_data[idx:] >>> ssn_times = ssn_data[:, 0] >>> ssn = numpy.asarray(ssn_data[:, 1], dtype=numpy.float64) >>> smooth_ssn = numpy.asarray(ssn_data[:, 2], dtype=numpy.float64) >>> ssn_times += datetime.timedelta(days=15) Much of this should be familiar. :func:`~numpy.genfromtxt` is a little more flexible than :func:`~numpy.loadtxt`; here it allows the skipping of lines at the end as well as the beginning (skipping 200 years at the start, 2 at the end, where data are provisional.) Here we load both times and the sunspot numbers in the same command so that if any lines don't load, they will not wind up in any of the arrays. :mod:`bisect` provides fast functions for searching in sorted data; :func:`~bisect.bisect_left` is roughly a find-the-position-of function. Having found the position of the start of 1989, we then keep times from then on (specifying a start index without a stop index in Python means "from start to end of the list.") Note that, although ``bisect`` is meant to work on lists, it also works fine on numpy arrays; this is a common feature of Python known as `duck typing `_. We then use :func:`~numpy.asarray` to convert the ``ssn`` and ``smooth_ssn`` columns to float arrays. Note the slice notation: ``[:, 0]`` means take all indices of the first dimension (line number) and only the 0th index of the second dimension (column in the line). Finally, we use :class:`~datetime.timedelta` to shift the date associated with a month from the beginning to roughly the middle of the month. Adding a scalar to an array does an element-wise addition. >>> import matplotlib.figure >>> fig = plt.figure(figsize=[11, 8.5], ... subplotpars=matplotlib.figure.SubplotParams(hspace=0.1)) >>> ax = fig.add_subplot(211) When creating the figure this time, we use :class:`~matplotlib.figure.SubplotParams` to choose a slightly smaller vertical spacing between adjacent subplots. Tweaking ``SubplotParams`` also provides an alternative to tweaking bounding boxes. Then we create a subplot with the information that there will be 2 rows, 1 column, and this is the first subplot. Now everything acting on ax, above, can be repeated, although we skip setting the xlabel since only the bottom axis will be labeled. >>> fluxline = ax.plot(esp_times, 10 ** esp_flux_av, 'b') >>> ax.set_yscale('log') >>> ax.set_ylim(1e-2, 10) >>> ax.set_ylabel('Electron Flux\n1.8-3.5 MeV', color='b', weight='bold') >>> ax2 = ax.twinx() >>> vswline = ax2.plot(vsw_times, vsw_av, 'r') >>> ax2.set_ylim(300, 650) >>> ax2.set_ylabel('Solar Wind Speed', color='r', rotation=270, weight='bold') >>> ax.set_xlim(esp_times[0], esp_times[-1]) >>> leg = ax.legend([fluxline[0], vswline[0]], ['Flux', 'Vsw'], ... loc='upper left', frameon=False) >>> fluxtext, vswtext = leg.get_texts() >>> fluxtext.set_color(fluxline[0].get_color()) >>> vswtext.set_color(vswline[0].get_color()) Then we move on to adding the solar wind: >>> ax3 = fig.add_subplot(212, sharex=ax) This adds another subplot, the second in the 2x1 array. Its x axis is shared with the existing ``ax``. (This is poorly documented; see this `example `_) >>> plt.setp(ax.get_xticklabels(), visible=False) >>> plt.setp(ax2.get_xticklabels(), visible=False) :func:`~matplotlib.pyplot.setp` sets a property. :meth:`~matplotlib.axes.Axes.get_xticklabels` returns all the tick labels (:class:`~matplotlib.text.Text`) for the x axis; ``setp`` then sets ``visible`` to ``False`` for all of them. This hides the labeling on the axis for the upper subfigure. >>> ax3.set_xlabel('Year', weight='bold') >>> ax3.set_ylabel('Sunspot Number', weight='bold') >>> smoothline = ax3.plot(ssn_times, smooth_ssn, lw=2.0, color='k') >>> ssnline = ax3.plot(ssn_times, ssn, color='k', linestyle='dotted') There is nothing new here except for the specifications of ``linewidth`` and ``linestyle``; see :meth:`~matplotlib.axes.Axes.plot` for details. Note ``k`` as the abbreviation for black (to avoid confusion with blue.) >>> leg2 = ax3.legend([ssnline[0], smoothline[0]], ... ['Sunspot Number', 'Smoothed SSN'], ... loc='upper right', frameon=False) >>> ax3.set_ylim(0, 200) >>> ax3.set_xlim(esp_times[0], esp_times[-1]) >>> fig_fname = os.path.join('..', 'plots', 'fig1.eps') >>> fig.savefig(fig_fname, bbox_inches=bob, pad_inches=0.0) All of this has been seen for the top half of figure 1. Following is the complete code to reproduce Figure 1. .. code-block:: python import bisect import datetime import os.path import common import matplotlib import matplotlib.figure import matplotlib.pyplot as plt import matplotlib.transforms import numpy import scipy import scipy.stats import spacepy.omni import spacepy.time matplotlib.rcParams['axes.unicode_minus'] = False matplotlib.rcParams['text.usetex']= True matplotlib.rcParams['font.family'] = 'serif' matplotlib.rcParams['font.size'] = 14 bob = matplotlib.transforms.Bbox([[0.4, 0.35], [10.7, 7.95]]) times = spacepy.time.tickrange('1989-01-01', '2011-01-01', datetime.timedelta(hours=1)) d = spacepy.omni.get_omni(times) vsw = d['velo'] vsw_times = d['UTC'] esp_times, esp_flux = common.load_esp() esp_flux_av = numpy.empty(shape=esp_flux.shape, dtype=esp_flux.dtype) for i in range(len(esp_flux_av)): esp_flux_av[i] = scipy.stats.nanmean(esp_flux[max(i - 13, 0):i + 14]) vsw_av = numpy.fromiter((scipy.stats.nanmean(vsw[max(0, i - 324):i + 324]) for i in range(len(vsw))), count=len(vsw), dtype=vsw.dtype) monthfile = os.path.join(common.datadir, 'monthssn.dat') convert = lambda x: datetime.datetime.strptime(x, '%Y%m') ssn_data = numpy.genfromtxt(monthfile, skip_header=2400, usecols=[0, 2, 3], converters={0: convert}, dtype=numpy.object, skip_footer=24) idx = bisect.bisect_left(ssn_data[:, 0], datetime.datetime(1989, 1, 1)) ssn_data = ssn_data[idx:] ssn_times = ssn_data[:, 0] ssn = numpy.asarray(ssn_data[:, 1], dtype=numpy.float64) smooth_ssn = numpy.asarray(ssn_data[:, 2], dtype=numpy.float64) ssn_times += datetime.timedelta(days=15) fig = plt.figure(figsize=[11, 8.5], subplotpars=matplotlib.figure.SubplotParams(hspace=0.1)) ax = fig.add_subplot(211) fluxline = ax.plot(esp_times, 10 ** esp_flux_av, 'b') ax.set_yscale('log') ax.set_ylim(1e-2, 10) ax.set_ylabel('Electron Flux\n1.8-3.5 MeV', color='b', weight='bold') ax2 = ax.twinx() vswline = ax2.plot(vsw_times, vsw_av, 'r') ax2.set_ylim(300, 650) ax2.set_ylabel('Solar Wind Speed', color='r', rotation=270, weight='bold') ax.set_xlim(esp_times[0], esp_times[-1]) leg = ax.legend([fluxline[0], vswline[0]], ['Flux', 'Vsw'], loc='upper left', frameon=False) fluxtext, vswtext = leg.get_texts() fluxtext.set_color(fluxline[0].get_color()) vswtext.set_color(vswline[0].get_color()) ax3 = fig.add_subplot(212, sharex=ax) plt.setp(ax.get_xticklabels(), visible=False) plt.setp(ax2.get_xticklabels(), visible=False) ax3.set_xlabel('Year', weight='bold') ax3.set_ylabel('Sunspot Number', weight='bold') smoothline = ax3.plot(ssn_times, smooth_ssn, lw=2.0, color='k') ssnline = ax3.plot(ssn_times, ssn, color='k', linestyle='dotted') leg2 = ax3.legend([ssnline[0], smoothline[0]], ['Sunspot Number', 'Smoothed SSN'], loc='upper right', frameon=False) ax3.set_ylim(0, 200) ax3.set_xlim(esp_times[0], esp_times[-1]) fig_fname = os.path.join('..', 'plots', 'fig1.eps') fig.savefig(fig_fname, bbox_inches=bob, pad_inches=0.0) .. _appendix: Appendix: Fixing the ESP data file ================================== This appendix provides a detailed explanation of the script that fixes the ESP data file. First set up a variable to hold the location of the data, as above: >>> import os.path >>> datadir = os.path.join('..', 'data') Examining the data file, it is clear that something is odd: lines appear to have been broken inappropriately; for example, the data for 1989-10-12 are split across two lines. So the first task is to fix this file, first opening the original (broken) file and an output (fixed) file: >>> in_name = os.path.join(datadir, 'jgra20797-sup-0003-ds01.txt') >>> out_name = os.path.join(datadir, 'jgra20797-sup-0003-ds01_FIXED.txt') >>> infile = open(in_name, 'r') >>> outfile = open(out_name, 'w') These lines :func:`open` the original file for reading (``r``), and a new file for writing (``w``). Note that opening a file for writing will destroy any existing contents. The file happens to contain a mixture of carriage returns and proper newlines, so to begin all the carriage returns need to be rewritten as newlines: >>> data = infile.read() >>> infile.close() >>> data = data.replace('\r', '\n') >>> data = data.replace('\n\n', '\n') :meth:`~file.read` reads *all* data from the file at once, so this is not recommended for large files. In this case it makes things easier. Once the data are read, :meth:`~file.close` the file. Calling the :meth:`~str.replace` method on ``data`` replaces all instances of the first parameter (``'\r'``) with the second (``'\n'``). ``\r`` is the special code indicating a carriage return; ``\n``, a newline. For a literal backslash, use ``\\``. Once the carriage returns have been replaced with newlines, a second round of replacement eliminates duplicates. Now that the line endings have been cleaned up, it's time to rejoin the erroneously split lines. First copy over the 15 lines of header verbatim: >>> data = data.split('\n') >>> for i in range(15): ... outfile.write(data.pop(0) + '\n') :meth:`~str.split` splits a string into a `list `_, with the split between elements happening wherever the provided parameter occurs. A simple example: >>> foo = 'a.b.c'.split('.') >>> print(foo) ['a', 'b', 'c'] The splitting character is not present in the output. The advantage of a list is that it makes it easy to access individual elements: >>> print(foo[1]) b The first element of a Python list is numbered zero. :func:`range` returns a list of numbers, starting from 0, with the parameter specifying how many elements are in the list: >>> print(range(5)) [0, 1, 2, 3, 4] The last number is 4 (not 5 as might be expected), but there are 5 elements in the list. The `for `_ executes the following indented statement once for every element in the ``in`` list: >>> for i in ['a', 'b', 'c']: ... print i a b c Indentation is significant in Python! Normally indents are four spaces and the tab key will do the job. (In the above example, you may need to hit enter twice after the print statement, the second to terminate the indentation.) `pop `_ returns one element from a list, and deletes it from the list. Using ``0`` pops off the first element, and :meth:`~file.write` writes a string to a file. ``+`` can be used to concatenate two strings together. Since :meth:`~str.split` removed the newlines, they need to be readded. So this little block of code splits the data into a list on newlines and, repeating fifteen times, takes the first element of that list and writes it, with a newline, to the output. Now ``data`` contains only the actual lines of data. >>> oldline = None >>> for line in data: ... if line[0:2] in ['19', '20', '2']: ... if not oldline is None: ... outfile.write(oldline + '\n') ... oldline = line ... else: ... oldline += line >>> outfile.write(oldline + '\n') >>> outfile.close() ``None`` is a special Python value specifically indicating nothing; it's used here to mark the first time around the loop. ``line[0:2]`` gets the first two characters in the string `line`, and the ``in`` operator compares the resulting string to see if it is present in the following list. This will return ``True`` if the line begins with ``19`` or ``20``. The `if `_ statement executes the following indented block if the condition is True. So, if this is True, the previous line probably ended properly and it can be written out. First there is an additional check that this isn't the first time around the loop, and then the *previous* line (which we know ended cleanly) is written out. The currently-read line then becomes the new "previous" line. The ``2`` is a special case: if the line is less than two characters long, ``line[0:2]`` will return the entire line, and it so happens that these cases always correspond to the previous line being whole. If this test fails, everything under ``else`` is executed. Here the assumption is that the previous line didn't end cleanly and the current line is actually a continuation of it, so the current line is appended to the previous. ``a += b`` is a shortcut for ``a = a + b``. Once the loop terminates, the last line is written out, and the file closed.