API¶

Code documentation of Pydoas API.

Data import¶

class pydoas.dataimport.ResultImportSetup(base_dir=None, start=datetime.datetime(1900, 1, 1, 0, 0), stop=datetime.datetime(3000, 1, 1, 0, 0), meta_import_info='doasis', result_import_dict={}, default_dict={}, doas_fit_err_factors={}, dev_id='', lt_to_utc_offset=datetime.timedelta(0))[source]¶

Setup class for spectral result imports from text like files

__init__(base_dir=None, start=datetime.datetime(1900, 1, 1, 0, 0), stop=datetime.datetime(3000, 1, 1, 0, 0), meta_import_info='doasis', result_import_dict={}, default_dict={}, doas_fit_err_factors={}, dev_id='', lt_to_utc_offset=datetime.timedelta(0))[source]¶

Parameters:

base_dir (str) – folder containing resultfiles
start (datetime) – time stamp of first spectrum
stop (datetime) – time stamp of last spectrum
meta_import_info – Specify the result file format and columns for meta information (see als file import_info.txt or example script 2). Input can be str or dict. In case a string is provided, it is assumed, that the specs are defined in import_info.txt, i.e. can be imported (as dictionary) from this file (using get_import_info(), e.g. with arg = doasis). If a dictionary is provided, the information is directly set from the provided dictionary.
result_import_dict (dict) –
specify file and header information for import. Keys define the used abbreveations after import, the values to each key consist of a list with 2 elements: the first specifies the UNIQUE string which is used to identify this species in the header of a given Fit result file, the second entry is a list with arbitrary length containing the fit scenario IDs defining from which fit scenario result files this specific species is to be extracted.

Example:
```
result_import_dict = {"so2" : ['SO2_Hermans', ['f01','f02']],
                      "o3"  : ['o3_burrows'], ['f01']]}
```
Here so2 and “o3” are imported, the data column in the result files is found by the header string 'SO2_Hermans' / 'o3_burrows' and this species is imported from all fit scenario result files with fit Ids ["f01", "f02"] (UNIQUE substrings in FitScenario file names.

Exemplary file name:

D130909_S0628_i6_f19_r20_f01so2.dat

This (exemplary) filename convention is used for the example result files shipped with this package (see folder pydoas/data/doasis_resultfiles) which include fit result files from the software DOASIS.

The delimiter for retrieving info from these file names is “_”, the first substring provides info about the date (day), the second about the start time of this time series (HH:MM), 3rd, 4th and 5th information about first and last fitted spectrum number and the corresponding number of the reference spectrum used for this time series and the last index about the fit scenario (fitID).

Each resultfile must therefore include a unique ID in the file name by which it can be identified.

default_dict (dict) –

specify default species, e.g.:

dict_like = {"so2"     :   "f02",
             "o3"      :   "f01"}

doas_fit_err_factors (dict) –
fit correction factors (i.e. factors by which the DOAS fit error is increased):
```
dict_like = {"so2"     :   "f02",
             "o3"      :   "f01"}
```
dev_id (str) – string ID for DOAS device (of minor importance)
lt_to_utc_offset (timedelta) – specify time zone offset (will be added on data import if applicable).

start¶: Start time-stamp of data

stop¶: Stop time-stamp of data

base_path¶: Old name of base_dir for versions <= 1.0.1

set_start_time(dt)[source]¶

Set the current start time

Parameters:	dt (datetime) – start time of dataset

set_stop_time(dt)[source]¶

Set the current start time

Parameters:	dt (datetime) – start time of dataset

check_time_stamps()[source]¶: Check if time stamps are valid and if not, set

complete()[source]¶: Checks if basic information is available

set_defaults(dict_like)[source]¶

Update default fit IDs for fitted species

Scheme:

dict_like = {"so2"     :   "f02",
             "o3"      :   "f01"}

set_fitcorr_factors(dict_like)[source]¶

Set correction factors for uncertainty estimate from DOAS fit errors

Parameters:	dict_like (dict) – dictionary specifying correction factors for DOAS fit errors (which are usually underestimated, see e.g. Gliss et al. 2015) for individual fit scenarios, e.g.: dict_like = {"f01" : 4.0, "f02" : 2.0}

Default value is 3.0.

xs¶: Returns list with xs names

get_xs_names()[source]¶: Set and return the string IDs of all fitted species

get_fit_ids_species(species_id)[source]¶

Find all fit scenarios which contain results of species

Parameters:	species_id (str) – string ID of fitted species (e.g. SO2)

fit_ids¶: Returns list with all fit ids

access_type¶: Return the current setting for data access type

HEADER_ACCESS_OPT¶: Checks if current settings allow column identification from file header line

FIRST_DATA_ROW_INDEX¶

get_fit_ids()[source]¶

Get all fit id abbreveations

Gets all fit ids (i.e. keys of fit import dict self.import_info)

class pydoas.dataimport.DataImport(setup=None)[source]¶

A class providing reading routines of DOAS result files

Here, it is assumed, that the results are stored in FitResultFiles, tab delimited whereas the columns correspond to the different variables (i.e. fit results, metainfo, …) and the rows to the individual spectra.

__init__(setup=None)[source]¶: x.__init__(…) initializes x; see help(type(x)) for signature

get_data()[source]¶: Load all data

load_result_type_info()[source]¶

Load import information for result type specified in setup

The detailed import information is stored in the package data file import_info.txt, this file can also be used to create new filetypes

base_dir¶: Returns current basepath of resultfiles

start¶: Returns start date and time of dataset

stop¶: Returns stop date and time of dataset

time_str_format¶

Returns datetime formatting info for string to datetime conversion

This information should be available in the resultfile type specification file (package data: data/import_info.txt)

fit_err_add_col¶: Return current value for relative column of fit errors

init_result_dict()[source]¶: Initiate the result dictionary

find_valid_indices_header(fileheader, dict)[source]¶

Find positions of species in header of result file

Parameters:	fileheader (list) – header row of resultfile dict (dict) – dictionary containing species IDs (keys) and the corresponding (sub) strings (vals) to find them in the header

find_all_indices(fileheader, fit_id)[source]¶

Find all relevant indices for a given result file (fit scenario)

Parameters:	fileheader (list) – list containing all header strings from result file (not required if data access mode is from columns see also `HEADER_ACCESS_OPT()` in `ResultImportSetup`) fit_id (str) – ID of fit scenario (required in order to find all fitted species supposed to be extracted, specified in `self.setup.import_info`)

load_results()[source]¶

Load all results

The results are loaded as specified in self.import_setup for all valid files which were detected in get_all_files() which writes self.file_paths

find_col_index(substr, header)[source]¶

Find the index of the column in data

Parameters:	substr (str) – substr identifying the column in header header (list) – the header of the data in which index of substr is searched

check_time_match(data)[source]¶

Check if data is within time interval set by self.start and self.stop

Parameters:	data (list) – data as read by `read_text_file()`
Returns:	bool, Match or no match

first_file¶

Get filepath of first file match in self.base_dir

This can for instance be read with read_text_file()

init_filepaths()[source]¶: Initate the file paths

get_all_files()[source]¶

Get all valid files based on current settings

Checks self.base_dir for files matching the specified file type, and which include one of the required fit IDs in their name. Files matching these 2 criteria are opened and the spectrum times are read and checked. If they match the time interval specified by self.start and self.stop the files are added to the dictionary self.file_paths where the keys specify the individual fit scenario IDs.

Note

This function does not load data but only assigns the individual result files to the fit IDs, the data will then be loaded calling load_results()

read_text_file(p)[source]¶

Read text file using csv.reader and return data as list

Parameters:	p (str) – file path
Returns list:	data

Fit result analysis and plotting¶

class pydoas.analysis.DatasetDoasResults(setup=None, init=1, **kwargs)[source]¶

A Dataset for DOAS fit results import and processing

__init__(setup=None, init=1, **kwargs)[source]¶

Initialisation of object

Parameters:	setup (ResultImportSetup) – setup specifying all necessary import settings (please see documentation of `ResultImportSetup` for setup details) kwargs – alternative way to setup `self.setup` (`ResultImportSetup` object), which is only used in case no input parameter setup** is invalid. Valid keyword arguments are input parameters of `ResultImportSetup` object.

load_input(setup=None, **kwargs)[source]¶

Process input information

Writes self.setup based on setup

Parameters:	setup – is set if valid (i.e. if input is `ResultImportSetup`) **kwargs – keyword arguments for new `ResultImportSetup` (are used in case first parameter is invalid)

base_path¶: Returns current basepath of resultfiles (from self.setup)

start¶: Returns start date and time of dataset (from self.setup)

stop¶: Returns stop date and time of dataset (from self.setup)

dev_id¶: Returns device ID of dataset (from self.setup)

import_info¶: Returns information about result import details

change_time_ival(start, stop)[source]¶

Change the time interval for the considered dataset

Parameters:	start (datetime) – new start time stop (datatime) – new stop time

Note

Previously loaded results will be deleted

load_raw_results()[source]¶: Try to load all results as specified in self.setup

has_data(fit_id, species_id, start=None, stop=None)[source]¶: Checks if specific data is available

get_spec_times(fit)[source]¶: Returns start time and stop time arrays for spectra to a given fit

set_start_stop_time()[source]¶: Get start/stop range of dataset

get_start_stop_mask(fit, start=None, stop=None)[source]¶: Creates boolean mask for data access only in a certain time interval

get_meta_info(fit, meta_id, start=None, stop=None)[source]¶

Get meta info array

Parameters:	meta_id (str) – string ID of meta information boolMask (array) – boolean mask for data retrieval

Note

Bool mask must have same length as the meta data array

get_results(species_id, fit_id=None, start=None, stop=None)[source]¶

Get spectral results object

Parameters:	species_id (str) – string ID of species fit_id (str) – string ID of fit scenario (if None, tries to load default fit_id) start – if valid (i.e. datetime object) only data after that time stamp is considered stop – if valid (i.e. datetime object) only data before that time stamp is considered

get_default_fit_id(species_id)[source]¶

Get default fit scenario id for species

Parameters:	species_id (str) – ID of species (e.g. “so2”)

set_default_fitscenarios(default_dict)[source]¶

Update default fit scenarios for species

Parameters:	default_dict (dict) – dictionary specifying new default fit scenarios, it could e.g. look like: default_dict = {"so2" : "f01", "o3" : "f01", "bro" : "f03"}

plot(species_id, fit_id=None, start=None, stop=None, **kwargs)[source]¶: Plot DOAS results

scatter_plot(species_id_xaxis, fit_id_xaxis, species_id_yaxis, fit_id_yaxis, lin_fit_opt=1, species_id_zaxis=None, fit_id_zaxis=None, start=None, stop=None, ax=None, **kwargs)[source]¶

Make a scatter plot of two species

Parameters:

species_id_xaxis (str) – string ID of x axis species (e.g. “so2”)
fit_id_xaxis (str) – fit scenario ID of x axis species (e.g. “f01”)
species_id_yaxis (str) – string ID of y axis species (e.g. “so2”)
fit_id_yaxis (str) – fit scenario ID of y axis species (e.g. “f02”)
species_id_zaxis (str) – string ID of z axis species (e.g. “o3”)
fit_id_zaxis (str) – fit scenario ID of z axis species (e.g. “f01”)

:param bool linF

linear_regression(x_data, y_data, mask=None, ax=None)[source]¶

Perform linear regression and return parameters

Parameters:	x_data (ndarray) – x data array y_data (ndarray) – y data array mask (ndarray) – mask specifying indices of input data supposed to be considered for regression (None) ax – matplotlib axes object (None), if provided, then the result is plotted into the axes

get_fit_import_setup()[source]¶: Get the current fit import setup

class pydoas.analysis.DoasResults(data, index=None, start_acq=[], stop_acq=[], fit_errs=None, species_id='', fit_id='', fit_errs_corr_fac=1.0)[source]¶

Data time series object inheriting from pandas.Series for handling and analysing DOAS fit results

Parameters:	data (arraylike) – DOAS fit results (column densities) index (arraylike) – Time stamps of data points fit_errs (arraylike) – DOAS fit errors species_id (string) – String specifying the fitted species fit_id (string) – Unique string specifying the fit scenario used fit_errs_corr_fac (int) – DOAS fit error correction factor

Todo

Finish magic methods, i.e. apply error propagation, think about time merging etc…

__init__(data, index=None, start_acq=[], stop_acq=[], fit_errs=None, species_id='', fit_id='', fit_errs_corr_fac=1.0)[source]¶: x.__init__(…) initializes x; see help(type(x)) for signature

fit_errs = None¶

fit_id = None¶

fit_errs_corr_fac = None¶

start_acq = []¶

stop_acq = []¶

start¶: Start time of data

stop¶: Stop time of data

species¶: Return name of current species

has_start_stop_acqtamps()[source]¶: Checks if start_time and stop_time arrays have valid data

merge_other(other, itp_method='linear', dropna=True)[source]¶

Merge with other time series sampled on different grid

Note

This object will not be changed, instead, two new Series objects will be created and returned

Parameters:

other (Series) – Other time series
itp_method (str) – String specifying interpolation method (e.g. linear, quadratic)
dropna (bool) – Drop indices containing NA after merging and interpolation

Returns:

2-element tuple containing

this Series (merged)
other Series (merged)

Return type:

tuple

get_data_above_detlim()[source]¶

Get fit results exceeding the detection limit

The detection limit is determined as follows:

self.fit_errs_corr_fac*self.data_err

plot(date_fmt=None, **kwargs)[source]¶

Plot time series

Uses plotting utility of Series object (pandas)

Parameters:	**kwargs – keyword arguments for pandas plot method

shift(timedelta=datetime.timedelta(0))[source]¶

Shift time stamps of object

Parameters:	timedelta (timedelta) – temporal shift
Returns:	shifted `DoasResults` object

Supplemental / IO / Helpers¶

This module contains I/O routines for DOAS result files

pydoas.inout.get_data_dirs()[source]¶

Get directories containing example package data

Returns:	list of package subfolders containing data files

pydoas.inout.get_data_files(which=u'doasis')[source]¶: Get all example result files from package data

pydoas.inout.get_result_type_ids()[source]¶: Read file import_info.txt and find all valid import types

pydoas.inout.import_type_exists(type_id)[source]¶

Checks if data import type exists in import_info.txt

Parameters:	type_id (str) – string ID to be searched in import_info.txt

pydoas.inout.get_import_info(resulttype=u'doasis')[source]¶

Try to load DOAS result import specification for default type

Import specifications for a specified data type (see package data file “import_info.txt” for available types, use the instructions in this file to create your own import setup if necessary)

Parameters:	resulttype (str) – name of result type (field “type” in “import_info.txt” file)

pydoas.inout.import_info_file()[source]¶: Return path to supplementary file import_info.txt

pydoas.inout.write_import_info_to_default_file(import_dict)[source]¶