rminstr.data_structures¶

Exceptions¶

ExptParametersReadError

Exception raised for reading ExptParameters.

Classes¶

`DataRecord`	The DataRecord class records experimental data from long measurements.
`TimeSeries`	Class for oraganizing 1D timeseries data, output by DataRecords.
`ExistingRecord`	ExistingRecord is derived from DataRecord.
`ActiveRecord`	ActiveRecord is derived from DataRecord.
`LegacyRecord`	LegacyRecord is derived from DataRecord.
`ExptParameters`	The ExptParameters class keeps track of experimental settings.

Functions¶

`kendall_p`(→ float)	Determine whether the thermopile headings are stable.
`runs_statistic`(→ float)	Determine whether the thermopile headings are stable.
`get_config_as_dictionary`(config_files)
`save_dictionary_as_config`(dictionary, path)

Package Contents¶

class rminstr.data_structures.DataRecord(columns: list[str], maxlen: int, output_dir: str, minlen: int, auto_timestamps: bool = True, sep: str = ',', repo_path: str = None)¶

The DataRecord class records experimental data from long measurements.

There are two derived classes: ActiveRecord and ExistingRecord. ActiveRecord is intended to store data from experiments in progress. ExistingRecord is intended to read data files

Comments:

A DataRecord object will keep track of an arbitrary number of scalar variables (“columns”).
Data are stored in fixed-length deques, so that the ammount of memory required does not increase over time during long measurements.
Data and metadata are automatically written out at regular intervals so that if the measurement is interrupted, minimal data is lost. Data files are automatically named to avoid overwriting data.
Each measurement is associated with an auto-generated timestamp. The auto_generated timestamp can be overriden by calling self.update with the timestamp argument. Timestamps can also be disabled by initializing with auto_timestamps = False.
The [] operator is overloaded, and allows access to the most recent measurement

Initialize a DataRecord object.

Parameters:

columnslist[str]: Names of variables tracked by the DataRecord object. list of strings.
maxlenint: Maximum length of the data record in samples.
output_dirstr: Directory where data are stored.
minlenint: Minimum length of data record. Used to ensure recent history is available to measurement program.
auto_timestampsbool, optional: If True, automatically assign timestamps to data points when update() is called. The default is True.
sepstr, optional: Delimiter for output CSV files. The default is “,”
repo_path: str, optional: path to the git repository. If none, use default path.

Returns:

None.

columns¶: list[str]: List of colum names.

output_dir¶: str: Directory where data are stored.

sep = ','¶: str: delimiter for output CSV files.

repo_path = None¶: str: path to git repository.

time_zero = None¶: float: Timestamp corresponding to the start of the experiment

auto_timestamps = True¶: bool: If True, automatically assign timestamps to data points when update() is called.

maxlen¶: int: Maximum length of the data record in samples.

minlen¶: int: Minimum length of data record. Used to ensure recent history is available to measurement program.

current_index = 0¶: int: Index of current measurement.

timestamps¶: dict[str, deque[float]]: Keys correspond to “columns” attribute. Each entry is a deque of timestamps associated with measurements.

counters¶: dict[str, deque[int]]: Keys correspond to “columns” attribute. Each entry is a deque of indices associated with measurements.

record¶: dict[str, deque[float]]: Keys correspond to “columns” attribute. Each entry is a deque of values associated with measurements.

indices¶: deque[int]: contains indices of all measuerments

file_counter = 0¶: int: Counts the number of files that have been written.

update_queue¶

update(column: str, value: float, timestamp: float = None)¶

Record measurement data.

Appends data to specified column, writes data to file when the record reaches a length of self.maxlen.

Parameters:

columnstr: Name of column to append data to.
valuefloat: data to append to column.
timestampfloat, optional: If None, use result of self.get_time(). Caution: when using this argument, the user is responsible for ensuring that these timestamps are ordered. The default is None.

Returns:

None.

array_update(column: str, values: iter, timestamps: iter)¶

Update a data column with an array.

Parameters:

columnstr: Column to add to.
valuesiter: Data values to add to.
timestampsiter: Timestamps of values.

Returns:

None.

output(write_recent_data: bool = False)¶

Manage length of the internal data structures.

This version of output does not save data to a file. It only maintains the length of the record while also maintaining the invariants.

Please overload this method if you want to actually save data.

Parameters:

write_recent_databool, optional

If False, the most recent self.minlen samples are retained in the record and not written to file.

If True, all data in the record are written to file, so that the record is empty after this method is called.

The default is False.

Returns:

None.

stage_update(column: str, values: numpy.ndarray, timestamps: numpy.ndarray)¶

Add a timeseries to the batch update queue.

Parameters:

columnstr: Name of column to append data to.
valuesnp.ndarray: list of values to append
timestampsfloat, optional: Timestamps associated with values.

Returns:

None.

batch_update()¶

Clear batch update queue.

Returns:

None.

get_time_series(column: str, t_min: float = None, t_max: float = None, delta_t: float = None) → tuple¶

Make timeseries data for a given variable.

Parameters:

columnstr: the column that you want to convert to timeseries data
t_minfloat, optional: Only return data that was measured later than this time. If None, use the earliest available time. The default is None.
t_maxfloat, optional: Only return data that was measured earlier than this time. If None, use the latest available time. The default is None.
delta_tfloat, optional: If not None, resample data with time interval delta_t. The function does not interpolate, but rather uses the most recent past value. The default is None.

Returns:

tuple[DataArray, DataArray]

(times, values)

timesnumpy array of floats: timestamps of samples
valuesnumpy array of floats: values from column

Raises:

record_empty: Raised if an index error occurs when self.timestamps[column][0] is accessed.

get_conditional_data(columns: list, condition: collections.abc.Callable[[Index, DataArray, DataArray], bool]) → tuple[IndexArray, DataArray, DataArray]¶

Return the rows that satisfy a given condition.

Each row is evaluated independently
Missing values are populated with NaN

Parameters:

columnslist of str: Specifies which colums to export.
conditionCallable[[Index, DataArray, DataArray], bool]: If True, append data to output.

Returns:

tuple[IndexArray, DataArray, DataArray]

(indicies, times, values)

indicesnumpy array of ints: indices of the samples.
timesdict of numpy arrays of floats: timestamps of samples, indexed by column
valuesdict of numpy array of floats: values from column, indexed by column

class rminstr.data_structures.TimeSeries¶

Bases: NamedTuple

Class for oraganizing 1D timeseries data, output by DataRecords.

NamedTuple like object. Can be indexed or iterated like a tuple where index 0 is the t attribute and index 1 is the values. Has additional functionality usefil for time series data.

Attributes:

tnp.ndarray: Absolute time stamps in seconds of timeseries, from when data was taken.
values: str: Values of timeseries data.

t: numpy.ndarray¶

values: numpy.ndarray¶

trel() → numpy.ndarray¶: Get the relative time of.

class rminstr.data_structures.ExistingRecord(file_path: str, output_dir: str = None, maxlen: int = None, minlen: int = None)¶

Bases: DataRecord

ExistingRecord is derived from DataRecord.

An ExistingRecord object reads files created by ActiveRecord.

Initialize ExistingRecord object.

Parameters:

file_pathstr: Full path to metadata file.
output_dirstr, optional: Directory where data is stored. If None, determine from metadata file. The default is None.
maxlenint, optional: Maximum length of record. If None, infer from metadata file. The default is None.
minlenint, optional: Minimum length of record. If None, set to maxlen. The default is None.

Returns:

None.

time_zero = None¶: float: Timestamp corresponding to the start of the experiment

auto_timestamps = None¶: bool: If True, automatically assign timestamps to data points when update() is called.

maxlen = None¶: int: Maximum length of the data record in samples.

current_file = None¶

metadata¶

metadata_filepath¶

file_index = 0¶

file¶

read_header()¶

Read the header of the current file and enumerate columns.

Returns:

None.

read_next_line() → bool¶

Read one line of data from the input file, and adds it to record.

Returns:

bool: True if the line was successfully read.

batch_read(columns_list: list[str] = None) → dict[str, tuple]¶

Read the entirety of DataRecord at once.

The output is a in a dictionary of tupled timeseries. This method uses pandas as backend for reading and parsing files. So, it has no effect on the state of the object.

Parameters:

columns_listlist[str], optional: either None (returns every column in record) or list of strings of columns to return. The default is None.

Returns:

dict[str, tuple]: {column: (timestamp,value)}

class rminstr.data_structures.ActiveRecord(columns: list[str], maxlen: int, output_dir: str, minlen: str = 0, auto_timestamps: bool = True, sep: str = ',', meas_name: str = None, instruments: dict = None, stage_everything: bool = False)¶

Bases: DataRecord

ActiveRecord is derived from DataRecord.

An ActiveRecord stores and continually saves experimental data from measurements in progress.

Additional Comments about of ActiveRecord:

Data files are automatically written. When the data is written to a file, it is automatically converted to a CSV format, where each column is a variable, each row is a measurement, and the rows are strictly time-ordered. It is not necessary to have the same number of entries for each column. Missing data is automatically populated with NaN.
A specified number of recent samples (self.minlen) are kept in memory for flow control. These samples can be written to a file and deleted from memory by calling self.output with write_recent_data = True.

Initialize an ActiveRecord object.

Parameters:

columnslist[str]

Names of variables that will be recorded.

maxlenint

Maximum number of samples to retain in memory.

output_dirstr

Directory where output data is written. Ends in “\”.

minlenstr, optional

Number of samples to be retained in memory after writing data to a file. The default is 0.

auto_timestampsbool, optional

If True, maintain a list of timestamps for each column.The default is True.

sepstr, optional

Delimeter in output file. The default is “,”.

meas_namestr, optional

If specified, output files will be named meas_name_x_y.csv, where x is a number (1,2,3,…) that is appended if neccessary to avoid overwriting existing data, and y is the index of the data file.

If not specied (None): the meas_name will default to the date in YYYYMMDD format.

The default is None.

instruments: dict, optional

dictionary w/ columns as keys and instruments as values. Each instrument is expected to have an info_dict attribute. If not None, will connect those column keys to instrument vaues in the metadata.

The default is None.

stage_everythingbool, optional

If True, the setitem method adds updates to the update queue rather than directly updating the record. This behavior can be used to ensure updates are time-ordered in the output file. If True, batch_update must be called to update the record.

The default is False.

Returns:

None.

local_backups = None¶

session_str = None¶

metadata¶

time_zero = None¶: float: Timestamp corresponding to the start of the experiment

auto_timestamps = True¶: bool: If True, automatically assign timestamps to data points when update() is called.

minlen = 0¶: int: Minimum length of data record. Used to ensure recent history is available to measurement program.

file_counter = 0¶: int: Counts the number of files that have been written.

stage_everything = False¶

metadata_file_name¶

output(write_recent_data: bool = False)¶

Write data to a file. Also writes metadata.

Parameters:

write_recent_databool, optional

If False, the most recent self.minlen samples are retained in the record and not written to file.

If True, all data in the record are written to file, so that the record is empty after this method is called.

The default is False.

Returns:

None.

get_time()¶

Get relative time.

Returns:

float: current relative time

init_timer(time_zero: float = None)¶

Set time = 0 for relative timestamps.

Parameters:

time_zerofloat, optional: Timestamp of time = 0. If None, use result of time.time(). The default is None.

Returns:

None.

init_metadata(meas_name: str = None)¶

Populate self.metadata with the default metadata.

Parameters:

meas_namestr, optional

If specified, output files will be named meas_name_x_y.csv, where x is a number (1,2,3,…) that is appended if neccessary to avoid overwriting existing data, and y is the index of the data file.

If not specied (None): the meas_name will default to the date in YYYYMMDD format.

The default is None

Returns:

None.

add_instruments(instr: dict)¶

Add instruments to metadata.

Parameters:

instruments :dict: dictionairy containing instruments with info_dict attribute expected to be a dictionairy with columns as keys, and instruments as values. Each instrument is expected to have ‘info_dict’ as an attribute. If a ‘range’ is not in info dict, it will not be saved.

add_file()¶

Increments file counter and maintains metadata.

Determines name of next output file and adds the file to self.metadata with key “file_x”, where x is the current value of file_counter.

Returns:

file_namestr: name of next output file.

output_metadata(use_backup: bool = False)¶

Write metadata to a file.

Parameters:

use_backupbool, optional: If True, saves to local backup (~/rminstr/self.session_str). The default is False.

Returns:

None.

check_missing_files() → list[pathlib.Path]¶

Check if any files are missing from the target directory.

Returns:

list[Path]: List of paths expected to be in target directory.

clean_backups()¶

Clean up backups at the end of an experiment.

Combines the backup directory with the target directory. If it cant, adds a noe to the metadata that this step failed, which will notify users trying to read data with this flag down the road.

Returns:

None.

class rminstr.data_structures.LegacyRecord(data_file_path: str, output_dir: str, maxlen: int, minlen: int, missing_data: str = '', sep: str = ',')¶

Bases: DataRecord

LegacyRecord is derived from DataRecord.

Similar to ExistingRecord, it reads data from csv files. Unlike ExistingRecord, it can read any csv, not just ones written by an ActiveRecord object.

Initialize ExistingRecord object.

Parameters:

data_file_pathstr: Path to first data file relative to output_dir. The data columns are inferred from this file’s header
output_dirstr: Working directory.
maxlenint: Maximum length of record.
minlenint: Minimum length of record.
missing_datastr, optional: string that represents missing data in input file The default is ‘’.
sepstr, optional: delimeter in input files The default is ‘,’.

Returns:

None.

missing_data = ''¶

sep = ','¶: str: delimiter for output CSV files.

output_dir¶: str: Directory where data are stored.

time_zero = None¶: float: Timestamp corresponding to the start of the experiment

current_file = None¶

missing = ''¶

metadata¶

file_counter = 0¶: int: Counts the number of files that have been written.

files = []¶

file_index = 0¶

file¶

read_header(data_file_path: str) → list[str]¶

Read header.

Parameters:

data_file_pathstr: full path to data file

Returns:

list[str]: List of column names.

add_file(file_name: str)¶

Append in_file.

Parameters:

file_namestr: Full path to data

Returns:

None.

read_next_line() → bool¶

Read one line of data from the input file, and adds it to record.

Returns:

bool: True if the line was successfully read.

rminstr.data_structures.kendall_p(v: DataArray) → float¶

Determine whether the thermopile headings are stable.

See NIST Tech Note 1374 for more details.

Parameters:

vDataArray: vector of voltage measurements

Returns:

float: kendall_p: A number between 0 and 1. Small numbers (less than 0.25) show a decreasing trend, while large numbers (greater than 0.75) show an increasing trend

rminstr.data_structures.runs_statistic(v: DataArray) → float¶

Determine whether the thermopile headings are stable.

See NIST Tech Note 1374 for more details.

Parameters:

vDataArray: vector of voltage measurements

Returns:

float: Zscore based on number of “runs” (consecutive points with same trend) 0 is the expected number of runs in random data, and small numbers mean long runs (not random)

class rminstr.data_structures.ExptParameters(config_files: str | list[str], run_settings_file: str = None, initial_dict: dict = None, config_file_priority: list[int] = None, header: int = 0, columns: list[str] = None, column_types: list[str] = None)¶

The ExptParameters class keeps track of experimental settings.

Users specify two files used to describe how a program will run.

config_file: contains information that never changes over the duration of an experiment. ex. instrument GPIB addresses, VNA IFBW. See below for formatting information.
run_settings_file: a spreadsheet of values specifying measurements that occur sequentially.ex. frequencies, power levels Each row represents a measurement. Each column represents a variable.

The ExptParameters object can be read like a dictionary. For example:

params = ExptParameters(“C:examples\”, config.csv”, “run.csv”) print(params[“VNA_IFBW”])

These variables are read-only, so

params[“VNA_IFBW”] = 10

does not work.

For run settings, the ExptParameters object keeps track of the index of the current data point. So, params[“Frequency”] will return a number, rather than a list. To get the next Frequency point, call params.advance().

Initialize an ExptParameters object.

Parameters:

config_filesUnion[str, list[str]]

If the value is a string, it is interpreted as the path to the config file.

If as list of strings is given, each entry is interpreted as the path to a config file. Each config file is read in the order given in the list. If the same setting exists in multiple config files, use config_file_priority to determine which setting to use.

run_settings_filestr

Full path to run settings file

initial_dictdict

A dictionary to use to initialize the config before adding additional csv files

config_file_prioritylist[int], optional

Indicates the priority of the corresponding (by order) config file. Higher priority corresponds to lower numbers.If None, all files have equal priority. The default is None.

headerint, optional

Number of lines to discard when reading run_settings. The default is 0.

columnslist[str], optional

Names of columns in run settings file. The default is None. If None, first check config file. If that fails, check run file for column names

column_typeslist[str], optional

Data types for columns in run settings. The default is None. If None, first check config file. If that fails, assume everything is a float.

Returns:

None.

columns = None¶: list[str]: the keys of run settings

column_types = None¶: list[str]: the types of data stored in columns of run settings.

index = None¶: int: Tracks progress through the steps given in run settings.

num_steps = None¶: int: number of rows in run settings file

run_settings = None¶: dict[str, list]: stores run settings

add_config_files(config_files)¶

Make a dictionary from config files

Parameters:

config_filesstr or list(str): Path or a list of paths to the config files to load.

Returns:

dict: Dictionary of config files

load_run_settings(in_file, header: int = 0)¶

Load run settings.

Parameters:

in_filestr: Full path to run file
headerint, optional: Number of lines to ignore at top of file. The default is 0.

Returns:

None.

Raises:

ValueError: If Run settings column is duplicated in config file.

save_run_settings(file_name: str)¶

Save run settings to file as csv.

Parameters:

file_namestr: Full path to file.

Returns:

None.

save_config(file_name: str, output_format='CSV')¶

Save config to file as csv.

Parameters:

file_namestr: Full path to file.
output_formatstr, optional: The format of the file. Options are CSV. The default is CSV.

Returns:

None.

get_column(column)¶

Get a column from run_settings.

Parameters:

columnstr: column to return

Returns:

None.

advance()¶

Advance to the next row in run file.

Returns:

None.

complete()¶

Test if run is complete.

Returns true if advance() has been called enough times to completely iterate through the run settings file, so that no more rows are available.

Returns:

bool: True if complete

keys()¶: Get keys.

exception rminstr.data_structures.ExptParametersReadError(message)¶

Bases: Exception

Exception raised for reading ExptParameters.

Attributes:: message – explanation of the error

Initialize self. See help(type(self)) for accurate signature.

message¶

rminstr.data_structures.get_config_as_dictionary(config_files)¶

rminstr.data_structures.save_dictionary_as_config(dictionary, path)¶

rminstr.data_structures¶

Exceptions¶

Classes¶

Functions¶

Package Contents¶

Table of Contents

This Page

Versions