API

Code documentation of Pydoas API.

Data import

class pydoas.dataimport.DataImport(setup=None)[source]

Bases: object

A class providing reading routines of DOAS result files

Here, it is assumed, that the results are stored in FitResultFiles, tab delimited whereas the columns correspond to the different variables (i.e. fit results, metainfo, …) and the rows to the individual spectra.

__init__(setup=None)[source]
property base_dir

Returns current basepath of resultfiles

check_time_match(data)[source]

Check if data is within time interval set by self.start and self.stop

Parameters:

data (list) – data as read by read_text_file()

Returns:

  • bool, Match or no match

find_all_indices(fileheader, fit_id)[source]

Find all relevant indices for a given result file (fit scenario)

Parameters:
  • fileheader (list) – list containing all header strings from result file (not required if data access mode is from columns see also HEADER_ACCESS_OPT() in ResultImportSetup)

  • fit_id (str) – ID of fit scenario (required in order to find all fitted species supposed to be extracted, specified in self.setup.import_info)

find_col_index(substr, header)[source]

Find the index of the column in data

Parameters:
  • substr (str) – substr identifying the column in header

  • header (list) – the header of the data in which index of substr is searched

find_valid_indices_header(fileheader, dict)[source]

Find positions of species in header of result file

Parameters:
  • fileheader (list) – header row of resultfile

  • dict (dict) – dictionary containing species IDs (keys) and the corresponding (sub) strings (vals) to find them in the header

property first_file

Get filepath of first file match in self.base_dir

This can for instance be read with read_text_file()

property fit_err_add_col

Return current value for relative column of fit errors

get_all_files()[source]

Get all valid files based on current settings

Checks self.base_dir for files matching the specified file type, and which include one of the required fit IDs in their name. Files matching these 2 criteria are opened and the spectrum times are read and checked. If they match the time interval specified by self.start and self.stop the files are added to the dictionary self.file_paths where the keys specify the individual fit scenario IDs.

Note

This function does not load data but only assigns the individual result files to the fit IDs, the data will then be loaded calling load_results()

get_data()[source]

Load all data

init_filepaths()[source]

Initate the file paths

init_result_dict()[source]

Initiate the result dictionary

load_result_type_info()[source]

Load import information for result type specified in setup

The detailed import information is stored in the package data file import_info.txt, this file can also be used to create new filetypes

load_results()[source]

Load all results

The results are loaded as specified in self.import_setup for all valid files which were detected in get_all_files() which writes self.file_paths

read_text_file(p)[source]

Read text file using csv.reader and return data as list

Parameters:

p (str) – file path

Returns list:

data

property start

Returns start date and time of dataset

property stop

Returns stop date and time of dataset

property time_str_format

Returns datetime formatting info for string to datetime conversion

This information should be available in the resultfile type specification file (package data: data/import_info.txt)

class pydoas.dataimport.ResultImportSetup(base_dir=None, start=datetime.datetime(1900, 1, 1, 0, 0), stop=datetime.datetime(3000, 1, 1, 0, 0), meta_import_info='doasis', result_import_dict={}, default_dict={}, doas_fit_err_factors={}, dev_id='')[source]

Bases: object

Setup class for spectral result imports from text like files

Parameters:
  • base_dir – folder containing resultfiles

  • start – time stamp of first spectrum

  • stop – time stamp of last spectrum

  • meta_import_info – Specify the result file format and columns for meta information (see als file import_info.txt or example script 2). Input can be str or dict. In case a string is provided, it is assumed, that the specs are defined in import_info.txt, i.e. can be imported (as dictionary) from this file (using get_import_info(), e.g. with arg = doasis). If a dictionary is provided, the information is directly set from the provided dictionary.

  • result_import_dict

    specify file and header information for import. Keys define the used abbreveations after import, the values to each key consist of a list with 2 elements: the first specifies the UNIQUE string which is used to identify this species in the header of a given Fit result file, the second entry is a list with arbitrary length containing the fit scenario IDs defining from which fit scenario result files this specific species is to be extracted.

    Example:

    result_import_dict = {"so2" : ['SO2_Hermans', ['f01','f02']],
                          "o3"  : ['o3_burrows'], ['f01']]}
    

    Here so2 and “o3” are imported, the data column in the result files is found by the header string 'SO2_Hermans' / 'o3_burrows' and this species is imported from all fit scenario result files with fit Ids ["f01", "f02"] (UNIQUE substrings in FitScenario file names.

    Example file name:

    D130909_S0628_i6_f19_r20_f01so2.dat

    This (exemplary) filename convention is used for the example result files shipped with this package (see folder pydoas/data/doasis_resultfiles) which include fit result files from the software `DOASIS.

    The delimiter for retrieving info from these file names is “_”, the first substring provides info about the date (day), the second about the start time of this time series (HH:MM), 3rd, 4th and 5th information about first and last fitted spectrum number and the corresponding number of the reference spectrum used for this time series and the last index about the fit scenario (fitID).

    Each resultfile must therefore include a unique ID in the file name by which it can be identified.

  • default_dict

    specify default species, e.g.:

    dict_like = {"so2"     :   "f02",
                 "o3"      :   "f01"}
    

  • doas_fit_err_factors

    fit correction factors (i.e. factors by which the DOAS fit error is increased):

    dict_like = {"so2"     :   "f02",
                 "o3"      :   "f01"}
    

  • dev_id – string ID for DOAS device (of minor importance)

property FIRST_DATA_ROW_INDEX
property HEADER_ACCESS_OPT

Checks if current settings allow column identification from file header line

__init__(base_dir=None, start=datetime.datetime(1900, 1, 1, 0, 0), stop=datetime.datetime(3000, 1, 1, 0, 0), meta_import_info='doasis', result_import_dict={}, default_dict={}, doas_fit_err_factors={}, dev_id='')[source]
property access_type

Return the current setting for data access type

property base_path

Old name of base_dir for versions <= 1.0.1

check_time_stamps()[source]

Check if time stamps are valid and if not, set

complete()[source]

Checks if basic information is available

property fit_ids

Returns list with all fit ids

get_fit_ids()[source]

Get all fit id abbreveations

Gets all fit ids (i.e. keys of fit import dict self.import_info)

get_fit_ids_species(species_id)[source]

Find all fit scenarios which contain results of species

Parameters:

species_id (str) – string ID of fitted species (e.g. SO2)

get_xs_names()[source]

Set and return the string IDs of all fitted species

set_defaults(dict_like)[source]

Update default fit IDs for fitted species

Scheme:

dict_like = {"so2"     :   "f02",
             "o3"      :   "f01"}
set_fitcorr_factors(dict_like)[source]

Set correction factors for uncertainty estimate from DOAS fit errors

Parameters:

dict_like (dict) –

dictionary specifying correction factors for DOAS fit errors (which are usually underestimated, see e.g. Gliss et al. 2015) for individual fit scenarios, e.g.:

dict_like = {"f01"   :   4.0,
             "f02"   :   2.0}

Default value is 3.0.

set_start_time(dt)[source]

Set the current start time

Parameters:

dt (datetime) – start time of dataset

set_stop_time(dt)[source]

Set the current start time

Parameters:

dt (datetime) – start time of dataset

property start

Start time-stamp of data

property stop

Stop time-stamp of data

Fit result analysis and plotting

class pydoas.analysis.DatasetDoasResults(setup=None, init=1, **kwargs)[source]

Bases: object

A Dataset for DOAS fit results import and processing

setup

setup specifying all necessary import settings (please see documentation of ResultImportSetup for setup details)

Type:

ResultImportSetup

raw_results

dictionary containing the imported results

Type:

dict

Parameters:
  • setup (ResultImportSetup) – setup specifying all necessary import settings (please see documentation of ResultImportSetup for setup details)

  • init (int) – if 1, the raw results will be loaded immediately

  • **kwargs – alternative way to setup setup (ResultImportSetup object), which is only used in case input parameter setup is None.

__init__(setup=None, init=1, **kwargs)[source]
property base_path: str

Basepath of resultfiles

change_time_ival(start, stop)[source]

Change the time interval for the considered dataset

Note

Previously loaded results will be deleted

Parameters:
  • start (datetime) – new start time

  • stop (datetime) – new stop time

Return type:

None

property dev_id: str

Device ID of dataset

get_default_fit_id(species_id)[source]

Get default fit scenario id for species

Parameters:

species_id (str) – ID of species (e.g. “so2”)

get_fit_import_setup()[source]

Get the current fit import setup

get_meta_info(fit, meta_id, start=None, stop=None)[source]

Get meta info array

Parameters:
  • meta_id (str) – string ID of meta information

  • boolMask (array) – boolean mask for data retrieval

Note

Bool mask must have same length as the meta data array

get_results(species_id, fit_id=None, start=None, stop=None)[source]

Get spectral results object

Parameters:
  • species_id (str) – string ID of species

  • fit_id (str) – string ID of fit scenario (if None, tries to load default fit_id)

  • start – if valid (i.e. datetime object) only data after that time stamp is considered

  • stop – if valid (i.e. datetime object) only data before that time stamp is considered

get_spec_times(fit)[source]

Returns start time and stop time arrays for spectra to a given fit

Parameters:

fit (str) – ID of the fit scenario

Returns:

start and stop time arrays for the spectra

Return type:

tuple

get_start_stop_mask(fit, start=None, stop=None)[source]

Creates boolean mask for data access only in a certain time interval

has_data(fit_id, species_id, start=None, stop=None)[source]

Checks if specific data is available

Parameters:
  • fit_id (str) – ID of the fit scenario

  • species_id (str) – ID of the species

  • start (datetime, optional) – Start datetime for the data check

  • stop (datetime, optional) – Stop datetime for the data check

Returns:

True if data is available, False otherwise

Return type:

bool

property import_info

Returns information about result import details

linear_regression(x_data, y_data, mask=None, ax=None)[source]

Perform linear regression and return parameters

Parameters:
  • x_data (ndarray) – x data array

  • y_data (ndarray) – y data array

  • mask (ndarray) – mask specifying indices of input data supposed to be considered for regression (None)

  • ax – matplotlib axes object (None), if provided, then the result is plotted into the axes

load_input(setup=None, **kwargs)[source]

Process input information

Parameters:
  • setup (ResultImportSetup) – setup specifying all necessary import settings (please see documentation of ResultImportSetup for setup details)

  • **kwargs – alternative way to setup setup (ResultImportSetup object), which is only used in case input parameter setup is None.

load_raw_results()[source]

Try to load all results as specified in the setup

This method will try to load all results as specified in the setup. If the import setup is not complete, an exception will be raised.

Returns:

True if data is loaded, False otherwise

Return type:

bool

Raises:

AttributeError – If the import setup is not complete

plot(species_id, fit_id=None, start=None, stop=None, **kwargs)[source]

Plot DOAS results

scatter_plot(species_id_xaxis, fit_id_xaxis, species_id_yaxis, fit_id_yaxis, lin_fit_opt=1, species_id_zaxis=None, fit_id_zaxis=None, start=None, stop=None, ax=None, **kwargs)[source]

Make a scatter plot of two species

Parameters:
  • species_id_xaxis (str) – string ID of x axis species (e.g. “so2”)

  • fit_id_xaxis (str) – fit scenario ID of x axis species (e.g. “f01”)

  • species_id_yaxis (str) – string ID of y axis species (e.g. “so2”)

  • fit_id_yaxis (str) – fit scenario ID of y axis species (e.g. “f02”)

  • species_id_zaxis (str) – string ID of z axis species (e.g. “o3”)

  • fit_id_zaxis (str) – fit scenario ID of z axis species (e.g. “f01”)

  • start (datetime) – start time stamp for data retrieval

  • stop (datetime) – stop time stamp for data retrieval

  • ax – matplotlib axes object (None), if provided, then the result is plotted into the axes

  • kwargs – keyword arguments for matplotlib scatter plot (e.g. color, marker, edgecolor, etc.)

set_default_fitscenarios(default_dict)[source]

Update default fit scenarios for species

Parameters:

default_dict (dict) –

dictionary specifying new default fit scenarios, it could e.g. look like:

default_dict = {"so2"   :   "f01",
                "o3"    :   "f01",
                "bro"   :   "f03"}

set_start_stop_time()[source]

Get start/stop range of dataset

property start: datetime

Start date and time of dataset

property stop: datetime

Stop date and time of dataset

class pydoas.analysis.DoasResults(data, index=None, start_acq=None, stop_acq=None, fit_errs=None, species_id=None, fit_id=None, fit_errs_corr_fac=1.0)[source]

Bases: Series

Data time series for handling and analysing DOAS fit results

Parameters:
  • data (arraylike) – DOAS fit results (column densities)

  • index (arraylike) – Time stamps of data points

  • fit_errs (arraylike) – DOAS fit errors

  • species_id (string) – String specifying the fitted species

  • fit_id (string) – Unique string specifying the fit scenario used

  • fit_errs_corr_fac (int) – DOAS fit error correction factor

__init__(data, index=None, start_acq=None, stop_acq=None, fit_errs=None, species_id=None, fit_id=None, fit_errs_corr_fac=1.0)[source]
get_data_above_detlim()[source]

Get fit results exceeding the detection limit

The detection limit is determined as follows:

self.fit_errs_corr_fac*self.data_err
has_start_stop_acqtamps()[source]

Checks if start_time and stop_time arrays have valid data

merge_other(other, itp_method='linear', dropna=True)[source]

Merge with other time series sampled on different grid

Note

This object will not be changed, instead, two new Series objects will be created and returned

Parameters:
  • other (Series) – Other time series

  • itp_method (str) – String specifying interpolation method (e.g. linear, quadratic)

  • dropna (bool) – Drop indices containing NA after merging and interpolation

Returns:

2-element tuple containing

  • this Series (merged)

  • other Series (merged)

Return type:

tuple

plot(date_fmt=None, **kwargs)[source]

Plot time series

Uses plotting utility of Series object (pandas)

Parameters:

**kwargs

  • keyword arguments for pandas plot method

shift(timedelta=datetime.timedelta(0))[source]

Shift time stamps of object

Parameters:

timedelta (timedelta) – temporal shift

Returns:

shifted DoasResults object

property species

Return name of current species

property start

Start time of data

property stop

Stop time of data

Supplemental / IO / Helpers

This module contains I/O routines for DOAS result files

pydoas.inout.get_data_files(which='doasis')[source]

Get all example result files from package data

pydoas.inout.get_import_info(resulttype='doasis')[source]

Try to load DOAS result import specification for default type

Import specifications for a specified data type (see package data file “import_info.txt” for available types, use the instructions in this file to create your own import setup if necessary)

Parameters:

resulttype (str) – name of result type (field “type” in “import_info.txt” file)

pydoas.inout.get_result_type_ids()[source]

Read file import_info.txt and find all valid import types

pydoas.inout.import_info_file()[source]

Return path to supplementary file import_info.txt

pydoas.inout.import_type_exists(type_id)[source]

Checks if data import type exists in import_info.txt

Parameters:

type_id (str) – string ID to be searched in import_info.txt

pydoas.inout.write_import_info_to_default_file(import_dict, file=None)[source]