Tables¶
The SQL database tables used in dbprocessing, and their columns, are
summarized here. getEntry()
will,
in most cases, return a record from any table.
Executable data-producing codes |
|
A single data file |
|
Code used to create an output file |
|
Input files used to create an output file |
|
Codes that link files to products |
|
Instrument (for grouping related products) |
|
Connect instruments to products |
|
Log of ProcessQueue.py runs. |
|
Unused |
|
Directories for mission codes, files, etc. |
|
Process that converts inputs to output |
|
Files to be processed |
|
Generalization of file types |
|
Relates processes to their input products |
|
Record of files in a release |
|
Satellite (for grouping related products) |
|
Unix start/stop time for files |
- code¶
Table describing a single executable script, used for creating output files from input files (see Codes). Note each version of a code has its own entry (with no explicit connection between them), and a given script may be referred to/used by more than one code entry.
- filename¶
Filename (only) of executable; this is passed as part of the command line. May include substitution strings. (
String
,NOT NULL
)
- relative_path¶
Directory containing
filename
, relative tomission.codedir
. (String
,NOT NULL
)
- code_start_date¶
Code is valid for files dated on or after this date; matching is by
utc_file_date
. (Date
,NOT NULL
)
- code_stop_date¶
Code is valid for files dated on or after before date; matching is by
utc_file_date
. (Date
,NOT NULL
)
- code_description¶
Human-readable description of a code. May be considered a name but usually longer than a name and usually not used for lookup. (
Text
,NOT NULL
)
- process_id¶
The process implemented by this code. (
Integer
,NOT NULL
,FK
process.process_id
)
- interface_version¶
Version of the code. Full version is
interface.quality.revision
. (SmallInteger
,NOT NULL
)
- quality_version¶
Version of the code. (
SmallInteger
,NOT NULL
)
- revision_version¶
Version of the code. (
SmallInteger
,NOT NULL
)
- output_interface_version¶
Interface (i.e. major) version of the output product of this code. (
SmallInteger
,NOT NULL
)
- active_code¶
Whether the code is active; inactive codes are not used for processing files. (
Boolean
,NOT NULL
)
- newest_version¶
Whether this is the newest version of a particular code. This may somewhat conflict with having multiple versions of a code that are selected based on
code_start_date
andcode_stop_date
; in practice, to date only one version of a code has been markednewest_version
and it is also usually the only one markedactive_code
. (Boolean
,NOT NULL
)
- arguments¶
Command line arguments for building the code. The full command line is built from
mission.codedir
,relative_path
,filename
,process.extra_params
,arguments
, the input files, and then output files (in that order). (Text
)
- ram¶
A relative measure of how much memory this code consumes. Purely relative and used in calculating how many codes are currently running for purposes of
ProcessQueue.py -n
. Nominally 1, so e.g. making 2 indicates a process that takes up twice as much RAM as “typical”, and 0.5 indicates half as much as typical. (Float
)
- cpu¶
Analagous to
ram
, a relative measure of how much processor power it takes to run this code. More concretely, this should usually be set to the number of threads a code uses (thus being integral); a long-running single-threaded process should still be set to1
. (SmallInteger
)
- file¶
A single data file; conceptually maps to a single file on disk. Related, but not identical, to
Diskfile
andDBfile
. Much of this information is populated by theinspector
. See alsoaddFile()
.
- filename¶
Name of the file, without path. The pathing is determined from
product.relative_path
. (String
,NOT NULL
)
- utc_file_date¶
Single data “characterizing” the data within the file. For a file of a product on a
DAILY
timebase, this usually maps to the date of every timestamp within the file. However, a daily file may, due to conversions, include a small amount of data from the previous and following date, thus this is distinct fromutc_start_time
andutc_stop_time
. Semantics on other timebases are not yet defined. (Date
)
- utc_start_time¶
Timestamp of the first record in this file. The interpretation of this timestamp is not defined by dbprocessing. (
DateTime
)
- utc_stop_time¶
Timestamp of the last record in this file. The interpretation of this timestamp is not defined by dbprocessing. (
DateTime
)
- data_level¶
Numerical level of this file; somewhat redundant with
product.level
. (Float
,NOT NULL
)
- interface_version¶
Version of the file. Full version is
interface.quality.revision
. (SmallInteger
,NOT NULL
)
- quality_version¶
Version of the file. (
SmallInteger
,NOT NULL
)
- revision_version¶
Version of the file. (
SmallInteger
,NOT NULL
)
- verbose_provenance¶
Full command line which was used to build this file; in theory if the same codes and input files are in place, executing this command line will recreate the file. (
Text
)
- check_date¶
Date the file was quality checked; unused. (Was meant to support the QA loop). (
DateTime
)
- met_start_time¶
Mission Elapsed Time (MET) of first record in file, meant to correspond to
utc_start_time
. Not used by dbprocessing logic and interpretation is not defined by dbprocessing. (Float
)
- met_stop_time¶
MET of last record in file, corresponding to
utc_stop_time
. (Float
)
- exists_on_disk¶
Whether the file is believed to exist on disk, or is a historical record of a deleted file. (
Boolean
,NOT NULL
)
- product_id¶
This file is considered an instance of this product. (
Integer
,NOT NULL
,FK
product.product_id
)
- process_keywords¶
Warning
This explanation may not be completely correct; this is not commonly used.
When a product has keyword substitutions in the filename
format
that are not directly calculatable by dbprocessing (not, e.g. date or version), the values of those keywords for this file are stored, allowing calculation of the filename. (Text
)
- filecodelink¶
Connects a single data file to the (single) code used to create it. A many-to-one relationship: many files are made from a single code.
- resulting_file¶
ID of the file created. (
Integer
,NOT NULL
,FK
file.file_id
)
- source_code¶
ID of the code used to create
resulting_file
. (Integer
,NOT NULL
,FK
code.code_id
)
- filefilelink¶
Connects a single data file to the (potentially many) input files used to create it. A many-to-many relationship: each file may serve as input to multiple output files, and each output file may be created from multiple inputs. This table is expressed as pairs: each row links one output file to one of its input files.
- source_file¶
ID of the source (input) file for a particular pairing. (
Integer
,NOT NULL
,FK
file.file_id
)
- resulting_file¶
ID of the resulting (output) file for a particular pairing. (
Integer
,NOT NULL
,FK
file.file_id
)
- inspector¶
An inspector is a small piece of code which examines (inspects) a file to determine its product and various metadata for dbprocessing; this table describes the codes.
See also
- inspector_id¶
Auto-incremented ID for this inspector, mostly for cross-referencing. (
Integer
,PK
,NOT NULL
)
- relative_path¶
Directory containing
filename
, relative tomission.inspectordir
. (String
,NOT NULL
)
- description¶
Human-readable description of an inspector. May be considered a name but usually longer than a name and usually not used for lookup. (
Text
,NOT NULL
)
- interface_version¶
Version of the inspector. Full version is
interface.quality.revision
. (SmallInteger
,NOT NULL
)
- quality_version¶
Version of the inspector. (
SmallInteger
,NOT NULL
)
- revision_version¶
Version of the inspector. (
SmallInteger
,NOT NULL
)
- output_interface_version¶
Warning
The purpose of this column is unclear.
Usually 1. May exist simply from copying the
code
definition. (SmallInteger
,NOT NULL
)
- active_code¶
Whether this inspector is active, i.e. actually executed to determine potential matches between files and products. (
Boolean
,NOT NULL
)
- newest_version¶
Warning
This does not appear to be used, so its purpose is unclear.
Whether this is the newest version of a particular inspector. (
Boolean
,NOT NULL
)
- arguments¶
Space-separated list of
key=value
pairs, passed as keyword arguments toinspect
. This allows the same file to be used as an inspector for multiple products, by using different arguments. (Text
)
- product¶
ID of the product which this inspector identifies. Every inspector can identify on, and only one, product. (
Integer
,NOT NULL
,FK
product.product_id
)
- instrument¶
Describes an instrument. An instrument is primarily a means of connecting related products for convenience (e.g. in queries and reprocessing); generally speaking it corresponds to a physical instrument. The hierarchy of association is
instrument
,satellite
,mission
, where each relation is many-to-one.
- instrument_id¶
Auto-incremented ID for this instrument, mostly for cross-referencing. (
Integer
,PK
,NOT NULL
)
- instrument_name¶
Name of the instrument, normally short to make it easy to use in command line queries. (
String
,NOT NULL
)
- satellite_id¶
ID of the satellite of which this instrument is part. (
Integer
,NOT NULL
,FK
satellite.satellite_id
)
- instrumentproductlink¶
Connects each instrument to the products it is associated with. This is a many-to-many link: an instrument may have its data in several products, and a product may draw from several instruments. Most commonly a product is associated with only one instrument; having multiple instruments per product is not heavily used or tested.
This table is expressed as pairs: each row links one product with one instrument.
Warning
The existence of
product.instrument_id
suggests a different approach than this.
- instrument_id¶
ID of the instrument in a pairing. (
Integer
,NOT NULL
,FK
instrument.instrument_id
)
- product_id¶
ID of the product associated with the instrument in the same record. (
Integer
,NOT NULL
,FK
product.product_id
)
- logging¶
Log of the state of ProcessQueue.py invocations. Every run creates a single record in this table, recording the state of processing and how it terminated.
- logging_id¶
Auto-incremented ID for each log entry, to maintain unique rows. (
Integer
,PK
,NOT NULL
)
- currently_processing¶
Is this instance of ProcessQueue.py still running. There should only be one instance running at a time, so this is used as a lock (
currentlyProcessing()
). (Boolean
,NOT NULL
)
- pid¶
Process ID of ProcessQueue.py. (
Integer
)
- processing_start_time¶
When this instance of ProcessQueue.py started. (
DateTime
,NOT NULL
)
- processing_end_time¶
When this instance of ProcessQueue.py completed. (
DateTime
)
- comment¶
How ProcessQueue.py exited. In the event the processing flag was cleared manually with clearProcessingFlag.py, this includes the
message
. (Text
)
- mission_id¶
ID of the mission on which this is executing. (
Integer
,NOT NULL
,FK
mission.mission_id
)
- user¶
Username running the ProcessQueue.py process. (
String
,NOT NULL
)
- hostname¶
Name of the host on which ProcessQueue.py is running. (
String
,NOT NULL
)
- logging_file¶
Warning
This table appears to be unused.
Likely intended to provide some sort of dbprocessing-level support for logging from data processing codes, but not used.
- logging_id¶
(
Integer
,NOT NULL
,FK
logging.logging_id
)
- file_id¶
(
Integer
,NOT NULL
,FK
file.file_id
)
- code_id¶
(
Integer
,NOT NULL
,FK
code.code_id
)
- mission¶
The mission for the data held in this database. A mission may pertain to multiple satellites, e.g. the Van Allen Probes mission had RBSP-A and RBSP-B. The hierarchy of association is
instrument
,satellite
,mission
, where each relation is many-to-one.This is the top-level table determining where dbprocessing looks for files and codes.
Warning
In theory a single database can contain multiple missions; in practice, this has always been a one-to-one, and many parts of the codebase assume just one mission. In particular, most command line arguments to specify “mission” really specify the database.
A mission has many relevant directories; in older versions of the database, these were not all explicitly specified, and in newer versions they may often be null. In these cases a default is used; see
getDirectory()
.Note
Where specified in this table, directories are assumed to be absolute. If relative, they are relative to current directory, not any particular mission directory.
- mission_id¶
Auto-incremented ID for this code, mostly for cross-referencing. (
Integer
,PK
,NOT NULL
)
- mission_name¶
Human-readable name of this mission, should be short for easy use in command line queries. (
String
,NOT NULL
)
- rootdir¶
All data paths are specified relative to this directory. Code and related paths are not. (
String
,NOT NULL
)See also
- incoming_dir¶
Directory from which new files are ingested for this mission. (
String
,NOT NULL
)See also
- errordir¶
Outputs of failed data processing codes, both file outputs and stdout, are placed in this directory, by default
'errors'
incodedir
. (String
)See also
- process_id¶
Auto-incremented ID for this process, mostly for cross-referencing (
Integer
,PK
,NOT NULL
)
- process_name¶
Human-readable name of this process, normally short to make it easy to use in command line queries. (
String
,NOT NULL
)
- output_product¶
ID of the single output product. (
Integer
,FK
product.product_id
)
- extra_params¶
Arguments to add to the processing command line. These are added before the code’s
arguments
. (Text
)
- processqueue¶
Queue of files which are to be evaluated as potential inputs to processes see Process Queue.
ProcessQueue.py -p
will evaluate all products which can be built using these as inputs, and create any which are out of date.
- file_id¶
ID of a file in the queue. (
Integer
,PK
,NOT NULL
,FK
file.file_id
)
- version_bump¶
Requested approach to processing. By default, will only run processes if the outputs are out of date. If this is set, processes are forced to run, and the specified version component of the output is incremented (0 for interface version, 1 for quality, 2 for revision.) (
SmallInteger
)See also
- product_id¶
Auto-incremented ID for this product, mostly for cross-referencing. (
Integer
,PK
,NOT NULL
)
- product_name¶
Human-readable name of the product, normally short to make it easy to use in command line queries. (
String
,NOT NULL
)
- instrument_id¶
The instrument providing data for this product. (
Integer
,NOT NULL
,FK
instrument.instrument_id
)Warning
The existence of
instrumentproductlink
suggests a different approach than this.
- relative_path¶
Location where data files of this product are stored, relative to
rootdir
. May contain fields to be filled (e.g.{Y}
to have a by-year directory). (String
,NOT NULL
)
- level¶
Numerical level of this product; somewhat redundant with
file.data_level
. (Float
,NOT NULL
)
- format¶
Filename template for files of this product. Normally contains fields to be filled (e.g.
{Y}
to include the year). (Text
,NOT NULL
)
- product_description¶
Human-readable description of the product. Usually not used for queries. (
Text
)
- productprocesslink¶
Relates processes to the products that they need as inputs. Each record pairs a process with one of its input products and describes that relationship.
- process_id¶
ID of the process whose input product is described by this record. (
Integer
,NOT NULL
,FK
process.process_id
)
- input_product_id¶
ID of one input product for the process of this record. (
Integer
,NOT NULL
,FK
product.product_id
)
- optional¶
Whether
input_product_id
is an optional product, in which case the process can execute without it, or not. A product will only execute if all its required inputs are available. If all inputs are optional, it will only execute if at least one optional input is available. (Boolean
,NOT NULL
)
- yesterday¶
Number of days in the past of this product to include as inputs to the process. For instance, if this is 2, then in processing day
n
, daysn-1
andn-2
are also provided as inputs. Behavior is undefined for timebases other thanDAILY
. (Integer
,NOT NULL
)
- release¶
Tracks the files that are present in a public release. For every release, each file in that release has a record in this table. This is a many-to-many relationship: each file may be in multiple releases, which may contain multiple files.
- file_id¶
ID of the file which is included in the release. (
Integer
,NOT NULL
,FK
file.file_id
)
- satellite¶
Describes an satellite. A satellite is primarily a means of connecting related products for convenience (e.g. in queries and reprocessing); generally speaking it corresponds to a physical instrument. The hierarchy of association is
instrument
,satellite
,mission
, where each relation is many-to-one.
- satellite_id¶
Auto-incremented ID for this satellite, mostly for cross-referencing. (
Integer
,PK
,NOT NULL
)
- satellite_name¶
Name of the satellite, normally short to make it easy to use in command line queries. (
String
,NOT NULL
)
- mission_id¶
ID of the mission of which this satellite is part. (
Integer
,NOT NULL
,FK
mission.mission_id
)
- unixtime¶
Stores the start and stop time for each file as a count of seconds since the Unix epoch. This makes certain lookups faster.
- file_id¶
ID of the file for which this record stores the start/stop times. (
Integer
,PK
,NOT NULL
,FK
file.file_id
)
- unix_start¶
utc_start_time
for this file expressed as seconds since Unix epoch. (Integer
)
- unix_stop¶
utc_stop_time
for this file expressed as seconds since Unix epoch. (Integer
)
Release: 0.1.0 Doc generation date: Feb 10, 2022