API

Code documentation of Pydoas API.

Data import

class pydoas.dataimport.ResultImportSetup(base_dir=None, start=datetime.datetime(1900, 1, 1, 0, 0), stop=datetime.datetime(3000, 1, 1, 0, 0), meta_import_info='doasis', result_import_dict={}, default_dict={}, doas_fit_err_factors={}, dev_id='', lt_to_utc_offset=datetime.timedelta(0))[source]

Setup class for spectral result imports from text like files

__init__(base_dir=None, start=datetime.datetime(1900, 1, 1, 0, 0), stop=datetime.datetime(3000, 1, 1, 0, 0), meta_import_info='doasis', result_import_dict={}, default_dict={}, doas_fit_err_factors={}, dev_id='', lt_to_utc_offset=datetime.timedelta(0))[source]
Parameters:
  • base_dir (str) – folder containing resultfiles
  • start (datetime) – time stamp of first spectrum
  • stop (datetime) – time stamp of last spectrum
  • meta_import_info – Specify the result file format and columns for meta information (see als file import_info.txt or example script 2). Input can be str or dict. In case a string is provided, it is assumed, that the specs are defined in import_info.txt, i.e. can be imported (as dictionary) from this file (using get_import_info(), e.g. with arg = doasis). If a dictionary is provided, the information is directly set from the provided dictionary.
  • result_import_dict (dict) –

    specify file and header information for import. Keys define the used abbreveations after import, the values to each key consist of a list with 2 elements: the first specifies the UNIQUE string which is used to identify this species in the header of a given Fit result file, the second entry is a list with arbitrary length containing the fit scenario IDs defining from which fit scenario result files this specific species is to be extracted.

    Example:

    result_import_dict = {"so2" : ['SO2_Hermans', ['f01','f02']],
                          "o3"  : ['o3_burrows'], ['f01']]}
    

    Here so2 and “o3” are imported, the data column in the result files is found by the header string 'SO2_Hermans' / 'o3_burrows' and this species is imported from all fit scenario result files with fit Ids ["f01", "f02"] (UNIQUE substrings in FitScenario file names.

    Exemplary file name:

    D130909_S0628_i6_f19_r20_f01so2.dat

    This (exemplary) filename convention is used for the example result files shipped with this package (see folder pydoas/data/doasis_resultfiles) which include fit result files from the software DOASIS.

    The delimiter for retrieving info from these file names is “_”, the first substring provides info about the date (day), the second about the start time of this time series (HH:MM), 3rd, 4th and 5th information about first and last fitted spectrum number and the corresponding number of the reference spectrum used for this time series and the last index about the fit scenario (fitID).

    Each resultfile must therefore include a unique ID in the file name by which it can be identified.

  • default_dict (dict) –

    specify default species, e.g.:

    dict_like = {"so2"     :   "f02",
                 "o3"      :   "f01"}
    
  • doas_fit_err_factors (dict) –

    fit correction factors (i.e. factors by which the DOAS fit error is increased):

    dict_like = {"so2"     :   "f02",
                 "o3"      :   "f01"}
    
  • dev_id (str) – string ID for DOAS device (of minor importance)
  • lt_to_utc_offset (timedelta) – specify time zone offset (will be added on data import if applicable).
start

Start time-stamp of data

stop

Stop time-stamp of data

base_path

Old name of base_dir for versions <= 1.0.1

set_start_time(dt)[source]

Set the current start time

Parameters:dt (datetime) – start time of dataset
set_stop_time(dt)[source]

Set the current start time

Parameters:dt (datetime) – start time of dataset
check_time_stamps()[source]

Check if time stamps are valid and if not, set

complete()[source]

Checks if basic information is available

set_defaults(dict_like)[source]

Update default fit IDs for fitted species

Scheme:

dict_like = {"so2"     :   "f02",
             "o3"      :   "f01"}
set_fitcorr_factors(dict_like)[source]

Set correction factors for uncertainty estimate from DOAS fit errors

Parameters:dict_like (dict) –

dictionary specifying correction factors for DOAS fit errors (which are usually underestimated, see e.g. Gliss et al. 2015) for individual fit scenarios, e.g.:

dict_like = {"f01"   :   4.0,
             "f02"   :   2.0}

Default value is 3.0.

xs

Returns list with xs names

get_xs_names()[source]

Set and return the string IDs of all fitted species

get_fit_ids_species(species_id)[source]

Find all fit scenarios which contain results of species

Parameters:species_id (str) – string ID of fitted species (e.g. SO2)
fit_ids

Returns list with all fit ids

access_type

Return the current setting for data access type

HEADER_ACCESS_OPT

Checks if current settings allow column identification from file header line

FIRST_DATA_ROW_INDEX
get_fit_ids()[source]

Get all fit id abbreveations

Gets all fit ids (i.e. keys of fit import dict self.import_info)

class pydoas.dataimport.DataImport(setup=None)[source]

A class providing reading routines of DOAS result files

Here, it is assumed, that the results are stored in FitResultFiles, tab delimited whereas the columns correspond to the different variables (i.e. fit results, metainfo, …) and the rows to the individual spectra.

__init__(setup=None)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

get_data()[source]

Load all data

load_result_type_info()[source]

Load import information for result type specified in setup

The detailed import information is stored in the package data file import_info.txt, this file can also be used to create new filetypes

base_dir

Returns current basepath of resultfiles

start

Returns start date and time of dataset

stop

Returns stop date and time of dataset

time_str_format

Returns datetime formatting info for string to datetime conversion

This information should be available in the resultfile type specification file (package data: data/import_info.txt)

fit_err_add_col

Return current value for relative column of fit errors

init_result_dict()[source]

Initiate the result dictionary

find_valid_indices_header(fileheader, dict)[source]

Find positions of species in header of result file

Parameters:
  • fileheader (list) – header row of resultfile
  • dict (dict) – dictionary containing species IDs (keys) and the corresponding (sub) strings (vals) to find them in the header
find_all_indices(fileheader, fit_id)[source]

Find all relevant indices for a given result file (fit scenario)

Parameters:
  • fileheader (list) – list containing all header strings from result file (not required if data access mode is from columns see also HEADER_ACCESS_OPT() in ResultImportSetup)
  • fit_id (str) – ID of fit scenario (required in order to find all fitted species supposed to be extracted, specified in self.setup.import_info)
load_results()[source]

Load all results

The results are loaded as specified in self.import_setup for all valid files which were detected in get_all_files() which writes self.file_paths

find_col_index(substr, header)[source]

Find the index of the column in data

Parameters:
  • substr (str) – substr identifying the column in header
  • header (list) – the header of the data in which index of substr is searched
check_time_match(data)[source]

Check if data is within time interval set by self.start and self.stop

Parameters:data (list) – data as read by read_text_file()
Returns:
  • bool, Match or no match
first_file

Get filepath of first file match in self.base_dir

This can for instance be read with read_text_file()

init_filepaths()[source]

Initate the file paths

get_all_files()[source]

Get all valid files based on current settings

Checks self.base_dir for files matching the specified file type, and which include one of the required fit IDs in their name. Files matching these 2 criteria are opened and the spectrum times are read and checked. If they match the time interval specified by self.start and self.stop the files are added to the dictionary self.file_paths where the keys specify the individual fit scenario IDs.

Note

This function does not load data but only assigns the individual result files to the fit IDs, the data will then be loaded calling load_results()

read_text_file(p)[source]

Read text file using csv.reader and return data as list

Parameters:p (str) – file path
Returns list:data

Fit result analysis and plotting

class pydoas.analysis.DatasetDoasResults(setup=None, init=1, **kwargs)[source]

A Dataset for DOAS fit results import and processing

__init__(setup=None, init=1, **kwargs)[source]

Initialisation of object

Parameters:
  • setup (ResultImportSetup) – setup specifying all necessary import settings (please see documentation of ResultImportSetup for setup details)
  • **kwargs

    alternative way to setup self.setup (ResultImportSetup object), which is only used in case no input parameter setup is invalid. Valid keyword arguments are input parameters of ResultImportSetup object.

load_input(setup=None, **kwargs)[source]

Process input information

Writes self.setup based on setup

Parameters:
  • setup – is set if valid (i.e. if input is ResultImportSetup)
  • **kwargs
    • keyword arguments for new ResultImportSetup

    (are used in case first parameter is invalid)

base_path

Returns current basepath of resultfiles (from self.setup)

start

Returns start date and time of dataset (from self.setup)

stop

Returns stop date and time of dataset (from self.setup)

dev_id

Returns device ID of dataset (from self.setup)

import_info

Returns information about result import details

change_time_ival(start, stop)[source]

Change the time interval for the considered dataset

Parameters:
  • start (datetime) – new start time
  • stop (datatime) – new stop time

Note

Previously loaded results will be deleted

load_raw_results()[source]

Try to load all results as specified in self.setup

has_data(fit_id, species_id, start=None, stop=None)[source]

Checks if specific data is available

get_spec_times(fit)[source]

Returns start time and stop time arrays for spectra to a given fit

set_start_stop_time()[source]

Get start/stop range of dataset

get_start_stop_mask(fit, start=None, stop=None)[source]

Creates boolean mask for data access only in a certain time interval

get_meta_info(fit, meta_id, start=None, stop=None)[source]

Get meta info array

Parameters:
  • meta_id (str) – string ID of meta information
  • boolMask (array) – boolean mask for data retrieval

Note

Bool mask must have same length as the meta data array

get_results(species_id, fit_id=None, start=None, stop=None)[source]

Get spectral results object

Parameters:
  • species_id (str) – string ID of species
  • fit_id (str) – string ID of fit scenario (if None, tries to load default fit_id)
  • start – if valid (i.e. datetime object) only data after that time stamp is considered
  • stop – if valid (i.e. datetime object) only data before that time stamp is considered
get_default_fit_id(species_id)[source]

Get default fit scenario id for species

Parameters:species_id (str) – ID of species (e.g. “so2”)
set_default_fitscenarios(default_dict)[source]

Update default fit scenarios for species

Parameters:default_dict (dict) –

dictionary specifying new default fit scenarios, it could e.g. look like:

default_dict = {"so2"   :   "f01",
                "o3"    :   "f01",
                "bro"   :   "f03"}
plot(species_id, fit_id=None, start=None, stop=None, **kwargs)[source]

Plot DOAS results

scatter_plot(species_id_xaxis, fit_id_xaxis, species_id_yaxis, fit_id_yaxis, lin_fit_opt=1, species_id_zaxis=None, fit_id_zaxis=None, start=None, stop=None, ax=None, **kwargs)[source]

Make a scatter plot of two species

Parameters:
  • species_id_xaxis (str) – string ID of x axis species (e.g. “so2”)
  • fit_id_xaxis (str) – fit scenario ID of x axis species (e.g. “f01”)
  • species_id_yaxis (str) – string ID of y axis species (e.g. “so2”)
  • fit_id_yaxis (str) – fit scenario ID of y axis species (e.g. “f02”)
  • species_id_zaxis (str) – string ID of z axis species (e.g. “o3”)
  • fit_id_zaxis (str) – fit scenario ID of z axis species (e.g. “f01”)

:param bool linF

linear_regression(x_data, y_data, mask=None, ax=None)[source]

Perform linear regression and return parameters

Parameters:
  • x_data (ndarray) – x data array
  • y_data (ndarray) – y data array
  • mask (ndarray) – mask specifying indices of input data supposed to be considered for regression (None)
  • ax – matplotlib axes object (None), if provided, then the result is plotted into the axes
get_fit_import_setup()[source]

Get the current fit import setup

class pydoas.analysis.DoasResults(data, index=None, start_acq=[], stop_acq=[], fit_errs=None, species_id='', fit_id='', fit_errs_corr_fac=1.0)[source]

Data time series object inheriting from pandas.Series for handling and analysing DOAS fit results

Parameters:
  • data (arraylike) – DOAS fit results (column densities)
  • index (arraylike) – Time stamps of data points
  • fit_errs (arraylike) – DOAS fit errors
  • species_id (string) – String specifying the fitted species
  • fit_id (string) – Unique string specifying the fit scenario used
  • fit_errs_corr_fac (int) – DOAS fit error correction factor

Todo

Finish magic methods, i.e. apply error propagation, think about time merging etc…

__init__(data, index=None, start_acq=[], stop_acq=[], fit_errs=None, species_id='', fit_id='', fit_errs_corr_fac=1.0)[source]

x.__init__(…) initializes x; see help(type(x)) for signature

fit_errs = None
fit_id = None
fit_errs_corr_fac = None
start_acq = []
stop_acq = []
start

Start time of data

stop

Stop time of data

species

Return name of current species

has_start_stop_acqtamps()[source]

Checks if start_time and stop_time arrays have valid data

merge_other(other, itp_method='linear', dropna=True)[source]

Merge with other time series sampled on different grid

Note

This object will not be changed, instead, two new Series objects will be created and returned

Parameters:
  • other (Series) – Other time series
  • itp_method (str) – String specifying interpolation method (e.g. linear, quadratic)
  • dropna (bool) – Drop indices containing NA after merging and interpolation
Returns:

2-element tuple containing

  • this Series (merged)
  • other Series (merged)

Return type:

tuple

get_data_above_detlim()[source]

Get fit results exceeding the detection limit

The detection limit is determined as follows:

self.fit_errs_corr_fac*self.data_err
plot(date_fmt=None, **kwargs)[source]

Plot time series

Uses plotting utility of Series object (pandas)

Parameters:**kwargs
  • keyword arguments for pandas plot method
shift(timedelta=datetime.timedelta(0))[source]

Shift time stamps of object

Parameters:timedelta (timedelta) – temporal shift
Returns:shifted DoasResults object

Supplemental / IO / Helpers

This module contains I/O routines for DOAS result files

pydoas.inout.get_data_dirs()[source]

Get directories containing example package data

Returns:list of package subfolders containing data files
pydoas.inout.get_data_files(which=u'doasis')[source]

Get all example result files from package data

pydoas.inout.get_result_type_ids()[source]

Read file import_info.txt and find all valid import types

pydoas.inout.import_type_exists(type_id)[source]

Checks if data import type exists in import_info.txt

Parameters:type_id (str) – string ID to be searched in import_info.txt
pydoas.inout.get_import_info(resulttype=u'doasis')[source]

Try to load DOAS result import specification for default type

Import specifications for a specified data type (see package data file “import_info.txt” for available types, use the instructions in this file to create your own import setup if necessary)

Parameters:resulttype (str) – name of result type (field “type” in “import_info.txt” file)
pydoas.inout.import_info_file()[source]

Return path to supplementary file import_info.txt

pydoas.inout.write_import_info_to_default_file(import_dict)[source]