rminstr.data_structures

Exceptions

ExptParametersReadError

Exception raised for reading ExptParameters.

Classes

DataRecord

The DataRecord class records experimental data from long measurements.

TimeSeries

Class for oraganizing 1D timeseries data, output by DataRecords.

ExistingRecord

ExistingRecord is derived from DataRecord.

ActiveRecord

ActiveRecord is derived from DataRecord.

LegacyRecord

LegacyRecord is derived from DataRecord.

ExptParameters

The ExptParameters class keeps track of experimental settings.

Functions

kendall_p(→ float)

Determine whether the thermopile headings are stable.

runs_statistic(→ float)

Determine whether the thermopile headings are stable.

get_config_as_dictionary(config_files)

save_dictionary_as_config(dictionary, path)

Package Contents

class rminstr.data_structures.DataRecord(columns: list[str], maxlen: int, output_dir: str, minlen: int, auto_timestamps: bool = True, sep: str = ',', repo_path: str = None)

The DataRecord class records experimental data from long measurements.

There are two derived classes: ActiveRecord and ExistingRecord. ActiveRecord is intended to store data from experiments in progress. ExistingRecord is intended to read data files

Comments:

  • A DataRecord object will keep track of an arbitrary number of scalar variables (“columns”).

  • Data are stored in fixed-length deques, so that the ammount of memory required does not increase over time during long measurements.

  • Data and metadata are automatically written out at regular intervals so that if the measurement is interrupted, minimal data is lost. Data files are automatically named to avoid overwriting data.

  • Each measurement is associated with an auto-generated timestamp. The auto_generated timestamp can be overriden by calling self.update with the timestamp argument. Timestamps can also be disabled by initializing with auto_timestamps = False.

  • The [] operator is overloaded, and allows access to the most recent measurement

Initialize a DataRecord object.

Parameters:
columnslist[str]

Names of variables tracked by the DataRecord object. list of strings.

maxlenint

Maximum length of the data record in samples.

output_dirstr

Directory where data are stored.

minlenint

Minimum length of data record. Used to ensure recent history is available to measurement program.

auto_timestampsbool, optional

If True, automatically assign timestamps to data points when update() is called. The default is True.

sepstr, optional

Delimiter for output CSV files. The default is “,”

repo_path: str, optional

path to the git repository. If none, use default path.

Returns:
None.
columns

list[str]: List of colum names.

output_dir

str: Directory where data are stored.

sep = ','

str: delimiter for output CSV files.

repo_path = None

str: path to git repository.

time_zero = None

float: Timestamp corresponding to the start of the experiment

auto_timestamps = True

bool: If True, automatically assign timestamps to data points when update() is called.

maxlen

int: Maximum length of the data record in samples.

minlen

int: Minimum length of data record. Used to ensure recent history is available to measurement program.

current_index = 0

int: Index of current measurement.

timestamps

dict[str, deque[float]]: Keys correspond to “columns” attribute. Each entry is a deque of timestamps associated with measurements.

counters

dict[str, deque[int]]: Keys correspond to “columns” attribute. Each entry is a deque of indices associated with measurements.

record

dict[str, deque[float]]: Keys correspond to “columns” attribute. Each entry is a deque of values associated with measurements.

indices

deque[int]: contains indices of all measuerments

file_counter = 0

int: Counts the number of files that have been written.

update_queue
update(column: str, value: float, timestamp: float = None)

Record measurement data.

Appends data to specified column, writes data to file when the record reaches a length of self.maxlen.

Parameters:
columnstr

Name of column to append data to.

valuefloat

data to append to column.

timestampfloat, optional

If None, use result of self.get_time(). Caution: when using this argument, the user is responsible for ensuring that these timestamps are ordered. The default is None.

Returns:
None.
array_update(column: str, values: iter, timestamps: iter)

Update a data column with an array.

Parameters:
columnstr

Column to add to.

valuesiter

Data values to add to.

timestampsiter

Timestamps of values.

Returns:
None.
output(write_recent_data: bool = False)

Manage length of the internal data structures.

This version of output does not save data to a file. It only maintains the length of the record while also maintaining the invariants.

Please overload this method if you want to actually save data.

Parameters:
write_recent_databool, optional

If False, the most recent self.minlen samples are retained in the record and not written to file.

If True, all data in the record are written to file, so that the record is empty after this method is called.

The default is False.

Returns:
None.
stage_update(column: str, values: numpy.ndarray, timestamps: numpy.ndarray)

Add a timeseries to the batch update queue.

Parameters:
columnstr

Name of column to append data to.

valuesnp.ndarray

list of values to append

timestampsfloat, optional

Timestamps associated with values.

Returns:
None.
batch_update()

Clear batch update queue.

Returns:
None.
get_time_series(column: str, t_min: float = None, t_max: float = None, delta_t: float = None) tuple

Make timeseries data for a given variable.

Parameters:
columnstr

the column that you want to convert to timeseries data

t_minfloat, optional

Only return data that was measured later than this time. If None, use the earliest available time. The default is None.

t_maxfloat, optional

Only return data that was measured earlier than this time. If None, use the latest available time. The default is None.

delta_tfloat, optional

If not None, resample data with time interval delta_t. The function does not interpolate, but rather uses the most recent past value. The default is None.

Returns:
tuple[DataArray, DataArray]

(times, values)

timesnumpy array of floats

timestamps of samples

valuesnumpy array of floats

values from column

Raises:
record_empty

Raised if an index error occurs when self.timestamps[column][0] is accessed.

get_conditional_data(columns: list, condition: collections.abc.Callable[[Index, DataArray, DataArray], bool]) tuple[IndexArray, DataArray, DataArray]

Return the rows that satisfy a given condition.

  • Each row is evaluated independently

  • Missing values are populated with NaN

Parameters:
columnslist of str

Specifies which colums to export.

conditionCallable[[Index, DataArray, DataArray], bool]

If True, append data to output.

Returns:
tuple[IndexArray, DataArray, DataArray]

(indicies, times, values)

indicesnumpy array of ints

indices of the samples.

timesdict of numpy arrays of floats

timestamps of samples, indexed by column

valuesdict of numpy array of floats

values from column, indexed by column

class rminstr.data_structures.TimeSeries

Bases: NamedTuple

Class for oraganizing 1D timeseries data, output by DataRecords.

NamedTuple like object. Can be indexed or iterated like a tuple where index 0 is the t attribute and index 1 is the values. Has additional functionality usefil for time series data.

Attributes:
tnp.ndarray

Absolute time stamps in seconds of timeseries, from when data was taken.

values: str

Values of timeseries data.

t: numpy.ndarray
values: numpy.ndarray
trel() numpy.ndarray

Get the relative time of.

class rminstr.data_structures.ExistingRecord(file_path: str, output_dir: str = None, maxlen: int = None, minlen: int = None)

Bases: DataRecord

ExistingRecord is derived from DataRecord.

An ExistingRecord object reads files created by ActiveRecord.

Initialize ExistingRecord object.

Parameters:
file_pathstr

Full path to metadata file.

output_dirstr, optional

Directory where data is stored. If None, determine from metadata file. The default is None.

maxlenint, optional

Maximum length of record. If None, infer from metadata file. The default is None.

minlenint, optional

Minimum length of record. If None, set to maxlen. The default is None.

Returns:
None.
time_zero = None

float: Timestamp corresponding to the start of the experiment

auto_timestamps = None

bool: If True, automatically assign timestamps to data points when update() is called.

maxlen = None

int: Maximum length of the data record in samples.

current_file = None
metadata
metadata_filepath
file_index = 0
file
read_header()

Read the header of the current file and enumerate columns.

Returns:
None.
read_next_line() bool

Read one line of data from the input file, and adds it to record.

Returns:
bool

True if the line was successfully read.

batch_read(columns_list: list[str] = None) dict[str, tuple]

Read the entirety of DataRecord at once.

The output is a in a dictionary of tupled timeseries. This method uses pandas as backend for reading and parsing files. So, it has no effect on the state of the object.

Parameters:
columns_listlist[str], optional

either None (returns every column in record) or list of strings of columns to return. The default is None.

Returns:
dict[str, tuple]

{column: (timestamp,value)}

class rminstr.data_structures.ActiveRecord(columns: list[str], maxlen: int, output_dir: str, minlen: str = 0, auto_timestamps: bool = True, sep: str = ',', meas_name: str = None, instruments: dict = None, stage_everything: bool = False)

Bases: DataRecord

ActiveRecord is derived from DataRecord.

An ActiveRecord stores and continually saves experimental data from measurements in progress.

Additional Comments about of ActiveRecord:

  • Data files are automatically written. When the data is written to a file, it is automatically converted to a CSV format, where each column is a variable, each row is a measurement, and the rows are strictly time-ordered. It is not necessary to have the same number of entries for each column. Missing data is automatically populated with NaN.

  • A specified number of recent samples (self.minlen) are kept in memory for flow control. These samples can be written to a file and deleted from memory by calling self.output with write_recent_data = True.

Initialize an ActiveRecord object.

Parameters:
columnslist[str]

Names of variables that will be recorded.

maxlenint

Maximum number of samples to retain in memory.

output_dirstr

Directory where output data is written. Ends in “\”.

minlenstr, optional

Number of samples to be retained in memory after writing data to a file. The default is 0.

auto_timestampsbool, optional

If True, maintain a list of timestamps for each column.The default is True.

sepstr, optional

Delimeter in output file. The default is “,”.

meas_namestr, optional

If specified, output files will be named meas_name_x_y.csv, where x is a number (1,2,3,…) that is appended if neccessary to avoid overwriting existing data, and y is the index of the data file.

If not specied (None): the meas_name will default to the date in YYYYMMDD format.

The default is None.

instruments: dict, optional

dictionary w/ columns as keys and instruments as values. Each instrument is expected to have an info_dict attribute. If not None, will connect those column keys to instrument vaues in the metadata.

The default is None.

stage_everythingbool, optional

If True, the setitem method adds updates to the update queue rather than directly updating the record. This behavior can be used to ensure updates are time-ordered in the output file. If True, batch_update must be called to update the record.

The default is False.

Returns:
None.
local_backups = None
session_str = None
metadata
time_zero = None

float: Timestamp corresponding to the start of the experiment

auto_timestamps = True

bool: If True, automatically assign timestamps to data points when update() is called.

minlen = 0

int: Minimum length of data record. Used to ensure recent history is available to measurement program.

file_counter = 0

int: Counts the number of files that have been written.

stage_everything = False
metadata_file_name
output(write_recent_data: bool = False)

Write data to a file. Also writes metadata.

Parameters:
write_recent_databool, optional

If False, the most recent self.minlen samples are retained in the record and not written to file.

If True, all data in the record are written to file, so that the record is empty after this method is called.

The default is False.

Returns:
None.
get_time()

Get relative time.

Returns:
float

current relative time

init_timer(time_zero: float = None)

Set time = 0 for relative timestamps.

Parameters:
time_zerofloat, optional

Timestamp of time = 0. If None, use result of time.time(). The default is None.

Returns:
None.
init_metadata(meas_name: str = None)

Populate self.metadata with the default metadata.

Parameters:
meas_namestr, optional

If specified, output files will be named meas_name_x_y.csv, where x is a number (1,2,3,…) that is appended if neccessary to avoid overwriting existing data, and y is the index of the data file.

If not specied (None): the meas_name will default to the date in YYYYMMDD format.

The default is None

Returns:
None.
add_instruments(instr: dict)

Add instruments to metadata.

Parameters:
instruments :dict

dictionairy containing instruments with info_dict attribute expected to be a dictionairy with columns as keys, and instruments as values. Each instrument is expected to have ‘info_dict’ as an attribute. If a ‘range’ is not in info dict, it will not be saved.

add_file()

Increments file counter and maintains metadata.

Determines name of next output file and adds the file to self.metadata with key “file_x”, where x is the current value of file_counter.

Returns:
file_namestr

name of next output file.

output_metadata(use_backup: bool = False)

Write metadata to a file.

Parameters:
use_backupbool, optional

If True, saves to local backup (~/rminstr/self.session_str). The default is False.

Returns:
None.
check_missing_files() list[pathlib.Path]

Check if any files are missing from the target directory.

Returns:
list[Path]

List of paths expected to be in target directory.

clean_backups()

Clean up backups at the end of an experiment.

Combines the backup directory with the target directory. If it cant, adds a noe to the metadata that this step failed, which will notify users trying to read data with this flag down the road.

Returns:
None.
class rminstr.data_structures.LegacyRecord(data_file_path: str, output_dir: str, maxlen: int, minlen: int, missing_data: str = '', sep: str = ',')

Bases: DataRecord

LegacyRecord is derived from DataRecord.

Similar to ExistingRecord, it reads data from csv files. Unlike ExistingRecord, it can read any csv, not just ones written by an ActiveRecord object.

Initialize ExistingRecord object.

Parameters:
data_file_pathstr

Path to first data file relative to output_dir. The data columns are inferred from this file’s header

output_dirstr

Working directory.

maxlenint

Maximum length of record.

minlenint

Minimum length of record.

missing_datastr, optional

string that represents missing data in input file The default is ‘’.

sepstr, optional

delimeter in input files The default is ‘,’.

Returns:
None.
missing_data = ''
sep = ','

str: delimiter for output CSV files.

output_dir

str: Directory where data are stored.

time_zero = None

float: Timestamp corresponding to the start of the experiment

current_file = None
missing = ''
metadata
file_counter = 0

int: Counts the number of files that have been written.

files = []
file_index = 0
file
read_header(data_file_path: str) list[str]

Read header.

Parameters:
data_file_pathstr

full path to data file

Returns:
list[str]

List of column names.

add_file(file_name: str)

Append in_file.

Parameters:
file_namestr

Full path to data

Returns:
None.
read_next_line() bool

Read one line of data from the input file, and adds it to record.

Returns:
bool

True if the line was successfully read.

rminstr.data_structures.kendall_p(v: DataArray) float

Determine whether the thermopile headings are stable.

See NIST Tech Note 1374 for more details.

Parameters:
vDataArray

vector of voltage measurements

Returns:
float

kendall_p: A number between 0 and 1. Small numbers (less than 0.25) show a decreasing trend, while large numbers (greater than 0.75) show an increasing trend

rminstr.data_structures.runs_statistic(v: DataArray) float

Determine whether the thermopile headings are stable.

See NIST Tech Note 1374 for more details.

Parameters:
vDataArray

vector of voltage measurements

Returns:
float

Zscore based on number of “runs” (consecutive points with same trend) 0 is the expected number of runs in random data, and small numbers mean long runs (not random)

class rminstr.data_structures.ExptParameters(config_files: str | list[str], run_settings_file: str = None, initial_dict: dict = None, config_file_priority: list[int] = None, header: int = 0, columns: list[str] = None, column_types: list[str] = None)

The ExptParameters class keeps track of experimental settings.

Users specify two files used to describe how a program will run.

  • config_file: contains information that never changes over the duration of an experiment. ex. instrument GPIB addresses, VNA IFBW. See below for formatting information.

  • run_settings_file: a spreadsheet of values specifying measurements that occur sequentially.ex. frequencies, power levels Each row represents a measurement. Each column represents a variable.

The ExptParameters object can be read like a dictionary. For example:

params = ExptParameters(“C:examples\”, config.csv”, “run.csv”) print(params[“VNA_IFBW”])

These variables are read-only, so

params[“VNA_IFBW”] = 10

does not work.

For run settings, the ExptParameters object keeps track of the index of the current data point. So, params[“Frequency”] will return a number, rather than a list. To get the next Frequency point, call params.advance().

Initialize an ExptParameters object.

Parameters:
config_filesUnion[str, list[str]]

If the value is a string, it is interpreted as the path to the config file.

If as list of strings is given, each entry is interpreted as the path to a config file. Each config file is read in the order given in the list. If the same setting exists in multiple config files, use config_file_priority to determine which setting to use.

run_settings_filestr

Full path to run settings file

initial_dictdict

A dictionary to use to initialize the config before adding additional csv files

config_file_prioritylist[int], optional

Indicates the priority of the corresponding (by order) config file. Higher priority corresponds to lower numbers.If None, all files have equal priority. The default is None.

headerint, optional

Number of lines to discard when reading run_settings. The default is 0.

columnslist[str], optional

Names of columns in run settings file. The default is None. If None, first check config file. If that fails, check run file for column names

column_typeslist[str], optional

Data types for columns in run settings. The default is None. If None, first check config file. If that fails, assume everything is a float.

Returns:
None.
columns = None

list[str]: the keys of run settings

column_types = None

list[str]: the types of data stored in columns of run settings.

index = None

int: Tracks progress through the steps given in run settings.

num_steps = None

int: number of rows in run settings file

run_settings = None

dict[str, list]: stores run settings

add_config_files(config_files)

Make a dictionary from config files

Parameters:
config_filesstr or list(str)

Path or a list of paths to the config files to load.

Returns:
dict

Dictionary of config files

load_run_settings(in_file, header: int = 0)

Load run settings.

Parameters:
in_filestr

Full path to run file

headerint, optional

Number of lines to ignore at top of file. The default is 0.

Returns:
None.
Raises:
ValueError

If Run settings column is duplicated in config file.

save_run_settings(file_name: str)

Save run settings to file as csv.

Parameters:
file_namestr

Full path to file.

Returns:
None.
save_config(file_name: str, output_format='CSV')

Save config to file as csv.

Parameters:
file_namestr

Full path to file.

output_formatstr, optional

The format of the file. Options are CSV. The default is CSV.

Returns:
None.
get_column(column)

Get a column from run_settings.

Parameters:
columnstr

column to return

Returns:
None.
advance()

Advance to the next row in run file.

Returns:
None.
complete()

Test if run is complete.

Returns true if advance() has been called enough times to completely iterate through the run settings file, so that no more rows are available.

Returns:
bool

True if complete

keys()

Get keys.

exception rminstr.data_structures.ExptParametersReadError(message)

Bases: Exception

Exception raised for reading ExptParameters.

Attributes:

message – explanation of the error

Initialize self. See help(type(self)) for accurate signature.

message
rminstr.data_structures.get_config_as_dictionary(config_files)
rminstr.data_structures.save_dictionary_as_config(dictionary, path)