rminstr.data_structures¶
Exceptions¶
Exception raised for reading ExptParameters. |
Classes¶
The DataRecord class records experimental data from long measurements. |
|
Class for oraganizing 1D timeseries data, output by DataRecords. |
|
ExistingRecord is derived from DataRecord. |
|
ActiveRecord is derived from DataRecord. |
|
LegacyRecord is derived from DataRecord. |
|
The ExptParameters class keeps track of experimental settings. |
Functions¶
|
Determine whether the thermopile headings are stable. |
|
Determine whether the thermopile headings are stable. |
|
|
|
Package Contents¶
- class rminstr.data_structures.DataRecord(columns: list[str], maxlen: int, output_dir: str, minlen: int, auto_timestamps: bool = True, sep: str = ',', repo_path: str = None)¶
The DataRecord class records experimental data from long measurements.
There are two derived classes: ActiveRecord and ExistingRecord. ActiveRecord is intended to store data from experiments in progress. ExistingRecord is intended to read data files
Comments:
A DataRecord object will keep track of an arbitrary number of scalar variables (“columns”).
Data are stored in fixed-length deques, so that the ammount of memory required does not increase over time during long measurements.
Data and metadata are automatically written out at regular intervals so that if the measurement is interrupted, minimal data is lost. Data files are automatically named to avoid overwriting data.
Each measurement is associated with an auto-generated timestamp. The auto_generated timestamp can be overriden by calling self.update with the timestamp argument. Timestamps can also be disabled by initializing with auto_timestamps = False.
The [] operator is overloaded, and allows access to the most recent measurement
Initialize a DataRecord object.
- Parameters:
- columnslist[str]
Names of variables tracked by the DataRecord object. list of strings.
- maxlenint
Maximum length of the data record in samples.
- output_dirstr
Directory where data are stored.
- minlenint
Minimum length of data record. Used to ensure recent history is available to measurement program.
- auto_timestampsbool, optional
If True, automatically assign timestamps to data points when update() is called. The default is True.
- sepstr, optional
Delimiter for output CSV files. The default is “,”
- repo_path: str, optional
path to the git repository. If none, use default path.
- Returns:
- None.
- columns¶
list[str]: List of colum names.
- output_dir¶
str: Directory where data are stored.
- sep = ','¶
str: delimiter for output CSV files.
- repo_path = None¶
str: path to git repository.
- time_zero = None¶
float: Timestamp corresponding to the start of the experiment
- auto_timestamps = True¶
bool: If True, automatically assign timestamps to data points when update() is called.
- maxlen¶
int: Maximum length of the data record in samples.
- minlen¶
int: Minimum length of data record. Used to ensure recent history is available to measurement program.
- current_index = 0¶
int: Index of current measurement.
- timestamps¶
dict[str, deque[float]]: Keys correspond to “columns” attribute. Each entry is a deque of timestamps associated with measurements.
- counters¶
dict[str, deque[int]]: Keys correspond to “columns” attribute. Each entry is a deque of indices associated with measurements.
- record¶
dict[str, deque[float]]: Keys correspond to “columns” attribute. Each entry is a deque of values associated with measurements.
- indices¶
deque[int]: contains indices of all measuerments
- file_counter = 0¶
int: Counts the number of files that have been written.
- update_queue¶
- update(column: str, value: float, timestamp: float = None)¶
Record measurement data.
Appends data to specified column, writes data to file when the record reaches a length of self.maxlen.
- Parameters:
- columnstr
Name of column to append data to.
- valuefloat
data to append to column.
- timestampfloat, optional
If None, use result of self.get_time(). Caution: when using this argument, the user is responsible for ensuring that these timestamps are ordered. The default is None.
- Returns:
- None.
- array_update(column: str, values: iter, timestamps: iter)¶
Update a data column with an array.
- Parameters:
- columnstr
Column to add to.
- valuesiter
Data values to add to.
- timestampsiter
Timestamps of values.
- Returns:
- None.
- output(write_recent_data: bool = False)¶
Manage length of the internal data structures.
This version of output does not save data to a file. It only maintains the length of the record while also maintaining the invariants.
Please overload this method if you want to actually save data.
- Parameters:
- write_recent_databool, optional
If False, the most recent self.minlen samples are retained in the record and not written to file.
If True, all data in the record are written to file, so that the record is empty after this method is called.
The default is False.
- Returns:
- None.
- stage_update(column: str, values: numpy.ndarray, timestamps: numpy.ndarray)¶
Add a timeseries to the batch update queue.
- Parameters:
- columnstr
Name of column to append data to.
- valuesnp.ndarray
list of values to append
- timestampsfloat, optional
Timestamps associated with values.
- Returns:
- None.
- batch_update()¶
Clear batch update queue.
- Returns:
- None.
- get_time_series(column: str, t_min: float = None, t_max: float = None, delta_t: float = None) tuple¶
Make timeseries data for a given variable.
- Parameters:
- columnstr
the column that you want to convert to timeseries data
- t_minfloat, optional
Only return data that was measured later than this time. If None, use the earliest available time. The default is None.
- t_maxfloat, optional
Only return data that was measured earlier than this time. If None, use the latest available time. The default is None.
- delta_tfloat, optional
If not None, resample data with time interval delta_t. The function does not interpolate, but rather uses the most recent past value. The default is None.
- Returns:
- tuple[DataArray, DataArray]
(times, values)
- timesnumpy array of floats
timestamps of samples
- valuesnumpy array of floats
values from column
- Raises:
- record_empty
Raised if an index error occurs when self.timestamps[column][0] is accessed.
- get_conditional_data(columns: list, condition: collections.abc.Callable[[Index, DataArray, DataArray], bool]) tuple[IndexArray, DataArray, DataArray]¶
Return the rows that satisfy a given condition.
Each row is evaluated independently
Missing values are populated with NaN
- Parameters:
- columnslist of str
Specifies which colums to export.
- conditionCallable[[Index, DataArray, DataArray], bool]
If True, append data to output.
- Returns:
- tuple[IndexArray, DataArray, DataArray]
(indicies, times, values)
- indicesnumpy array of ints
indices of the samples.
- timesdict of numpy arrays of floats
timestamps of samples, indexed by column
- valuesdict of numpy array of floats
values from column, indexed by column
- class rminstr.data_structures.TimeSeries¶
Bases:
NamedTupleClass for oraganizing 1D timeseries data, output by DataRecords.
NamedTuple like object. Can be indexed or iterated like a tuple where index 0 is the t attribute and index 1 is the values. Has additional functionality usefil for time series data.
- Attributes:
- tnp.ndarray
Absolute time stamps in seconds of timeseries, from when data was taken.
- values: str
Values of timeseries data.
- t: numpy.ndarray¶
- values: numpy.ndarray¶
- trel() numpy.ndarray¶
Get the relative time of.
- class rminstr.data_structures.ExistingRecord(file_path: str, output_dir: str = None, maxlen: int = None, minlen: int = None)¶
Bases:
DataRecordExistingRecord is derived from DataRecord.
An ExistingRecord object reads files created by ActiveRecord.
Initialize ExistingRecord object.
- Parameters:
- file_pathstr
Full path to metadata file.
- output_dirstr, optional
Directory where data is stored. If None, determine from metadata file. The default is None.
- maxlenint, optional
Maximum length of record. If None, infer from metadata file. The default is None.
- minlenint, optional
Minimum length of record. If None, set to maxlen. The default is None.
- Returns:
- None.
- time_zero = None¶
float: Timestamp corresponding to the start of the experiment
- auto_timestamps = None¶
bool: If True, automatically assign timestamps to data points when update() is called.
- maxlen = None¶
int: Maximum length of the data record in samples.
- current_file = None¶
- metadata¶
- metadata_filepath¶
- file_index = 0¶
- file¶
- read_header()¶
Read the header of the current file and enumerate columns.
- Returns:
- None.
- read_next_line() bool¶
Read one line of data from the input file, and adds it to record.
- Returns:
- bool
True if the line was successfully read.
- batch_read(columns_list: list[str] = None) dict[str, tuple]¶
Read the entirety of DataRecord at once.
The output is a in a dictionary of tupled timeseries. This method uses pandas as backend for reading and parsing files. So, it has no effect on the state of the object.
- Parameters:
- columns_listlist[str], optional
either None (returns every column in record) or list of strings of columns to return. The default is None.
- Returns:
- dict[str, tuple]
{column: (timestamp,value)}
- class rminstr.data_structures.ActiveRecord(columns: list[str], maxlen: int, output_dir: str, minlen: str = 0, auto_timestamps: bool = True, sep: str = ',', meas_name: str = None, instruments: dict = None, stage_everything: bool = False)¶
Bases:
DataRecordActiveRecord is derived from DataRecord.
An ActiveRecord stores and continually saves experimental data from measurements in progress.
Additional Comments about of ActiveRecord:
Data files are automatically written. When the data is written to a file, it is automatically converted to a CSV format, where each column is a variable, each row is a measurement, and the rows are strictly time-ordered. It is not necessary to have the same number of entries for each column. Missing data is automatically populated with NaN.
A specified number of recent samples (self.minlen) are kept in memory for flow control. These samples can be written to a file and deleted from memory by calling self.output with write_recent_data = True.
Initialize an ActiveRecord object.
- Parameters:
- columnslist[str]
Names of variables that will be recorded.
- maxlenint
Maximum number of samples to retain in memory.
- output_dirstr
Directory where output data is written. Ends in “\”.
- minlenstr, optional
Number of samples to be retained in memory after writing data to a file. The default is 0.
- auto_timestampsbool, optional
If True, maintain a list of timestamps for each column.The default is True.
- sepstr, optional
Delimeter in output file. The default is “,”.
- meas_namestr, optional
If specified, output files will be named meas_name_x_y.csv, where x is a number (1,2,3,…) that is appended if neccessary to avoid overwriting existing data, and y is the index of the data file.
If not specied (None): the meas_name will default to the date in YYYYMMDD format.
The default is None.
- instruments: dict, optional
dictionary w/ columns as keys and instruments as values. Each instrument is expected to have an info_dict attribute. If not None, will connect those column keys to instrument vaues in the metadata.
The default is None.
- stage_everythingbool, optional
If True, the setitem method adds updates to the update queue rather than directly updating the record. This behavior can be used to ensure updates are time-ordered in the output file. If True, batch_update must be called to update the record.
The default is False.
- Returns:
- None.
- local_backups = None¶
- session_str = None¶
- metadata¶
- time_zero = None¶
float: Timestamp corresponding to the start of the experiment
- auto_timestamps = True¶
bool: If True, automatically assign timestamps to data points when update() is called.
- minlen = 0¶
int: Minimum length of data record. Used to ensure recent history is available to measurement program.
- file_counter = 0¶
int: Counts the number of files that have been written.
- stage_everything = False¶
- metadata_file_name¶
- output(write_recent_data: bool = False)¶
Write data to a file. Also writes metadata.
- Parameters:
- write_recent_databool, optional
If False, the most recent self.minlen samples are retained in the record and not written to file.
If True, all data in the record are written to file, so that the record is empty after this method is called.
The default is False.
- Returns:
- None.
- get_time()¶
Get relative time.
- Returns:
- float
current relative time
- init_timer(time_zero: float = None)¶
Set time = 0 for relative timestamps.
- Parameters:
- time_zerofloat, optional
Timestamp of time = 0. If None, use result of time.time(). The default is None.
- Returns:
- None.
- init_metadata(meas_name: str = None)¶
Populate self.metadata with the default metadata.
- Parameters:
- meas_namestr, optional
If specified, output files will be named meas_name_x_y.csv, where x is a number (1,2,3,…) that is appended if neccessary to avoid overwriting existing data, and y is the index of the data file.
If not specied (None): the meas_name will default to the date in YYYYMMDD format.
The default is None
- Returns:
- None.
- add_instruments(instr: dict)¶
Add instruments to metadata.
- Parameters:
- instruments :dict
dictionairy containing instruments with info_dict attribute expected to be a dictionairy with columns as keys, and instruments as values. Each instrument is expected to have ‘info_dict’ as an attribute. If a ‘range’ is not in info dict, it will not be saved.
- add_file()¶
Increments file counter and maintains metadata.
Determines name of next output file and adds the file to self.metadata with key “file_x”, where x is the current value of file_counter.
- Returns:
- file_namestr
name of next output file.
- output_metadata(use_backup: bool = False)¶
Write metadata to a file.
- Parameters:
- use_backupbool, optional
If True, saves to local backup (~/rminstr/self.session_str). The default is False.
- Returns:
- None.
- check_missing_files() list[pathlib.Path]¶
Check if any files are missing from the target directory.
- Returns:
- list[Path]
List of paths expected to be in target directory.
- clean_backups()¶
Clean up backups at the end of an experiment.
Combines the backup directory with the target directory. If it cant, adds a noe to the metadata that this step failed, which will notify users trying to read data with this flag down the road.
- Returns:
- None.
- class rminstr.data_structures.LegacyRecord(data_file_path: str, output_dir: str, maxlen: int, minlen: int, missing_data: str = '', sep: str = ',')¶
Bases:
DataRecordLegacyRecord is derived from DataRecord.
Similar to ExistingRecord, it reads data from csv files. Unlike ExistingRecord, it can read any csv, not just ones written by an ActiveRecord object.
Initialize ExistingRecord object.
- Parameters:
- data_file_pathstr
Path to first data file relative to output_dir. The data columns are inferred from this file’s header
- output_dirstr
Working directory.
- maxlenint
Maximum length of record.
- minlenint
Minimum length of record.
- missing_datastr, optional
string that represents missing data in input file The default is ‘’.
- sepstr, optional
delimeter in input files The default is ‘,’.
- Returns:
- None.
- missing_data = ''¶
- sep = ','¶
str: delimiter for output CSV files.
- output_dir¶
str: Directory where data are stored.
- time_zero = None¶
float: Timestamp corresponding to the start of the experiment
- current_file = None¶
- missing = ''¶
- metadata¶
- file_counter = 0¶
int: Counts the number of files that have been written.
- files = []¶
- file_index = 0¶
- file¶
- read_header(data_file_path: str) list[str]¶
Read header.
- Parameters:
- data_file_pathstr
full path to data file
- Returns:
- list[str]
List of column names.
- add_file(file_name: str)¶
Append in_file.
- Parameters:
- file_namestr
Full path to data
- Returns:
- None.
- read_next_line() bool¶
Read one line of data from the input file, and adds it to record.
- Returns:
- bool
True if the line was successfully read.
- rminstr.data_structures.kendall_p(v: DataArray) float¶
Determine whether the thermopile headings are stable.
See NIST Tech Note 1374 for more details.
- Parameters:
- vDataArray
vector of voltage measurements
- Returns:
- float
kendall_p: A number between 0 and 1. Small numbers (less than 0.25) show a decreasing trend, while large numbers (greater than 0.75) show an increasing trend
- rminstr.data_structures.runs_statistic(v: DataArray) float¶
Determine whether the thermopile headings are stable.
See NIST Tech Note 1374 for more details.
- Parameters:
- vDataArray
vector of voltage measurements
- Returns:
- float
Zscore based on number of “runs” (consecutive points with same trend) 0 is the expected number of runs in random data, and small numbers mean long runs (not random)
- class rminstr.data_structures.ExptParameters(config_files: str | list[str], run_settings_file: str = None, initial_dict: dict = None, config_file_priority: list[int] = None, header: int = 0, columns: list[str] = None, column_types: list[str] = None)¶
The ExptParameters class keeps track of experimental settings.
Users specify two files used to describe how a program will run.
config_file: contains information that never changes over the duration of an experiment. ex. instrument GPIB addresses, VNA IFBW. See below for formatting information.
run_settings_file: a spreadsheet of values specifying measurements that occur sequentially.ex. frequencies, power levels Each row represents a measurement. Each column represents a variable.
The ExptParameters object can be read like a dictionary. For example:
params = ExptParameters(“C:examples\”, config.csv”, “run.csv”) print(params[“VNA_IFBW”])
These variables are read-only, so
params[“VNA_IFBW”] = 10
does not work.
For run settings, the ExptParameters object keeps track of the index of the current data point. So, params[“Frequency”] will return a number, rather than a list. To get the next Frequency point, call params.advance().
Initialize an ExptParameters object.
- Parameters:
- config_filesUnion[str, list[str]]
If the value is a string, it is interpreted as the path to the config file.
If as list of strings is given, each entry is interpreted as the path to a config file. Each config file is read in the order given in the list. If the same setting exists in multiple config files, use config_file_priority to determine which setting to use.
- run_settings_filestr
Full path to run settings file
- initial_dictdict
A dictionary to use to initialize the config before adding additional csv files
- config_file_prioritylist[int], optional
Indicates the priority of the corresponding (by order) config file. Higher priority corresponds to lower numbers.If None, all files have equal priority. The default is None.
- headerint, optional
Number of lines to discard when reading run_settings. The default is 0.
- columnslist[str], optional
Names of columns in run settings file. The default is None. If None, first check config file. If that fails, check run file for column names
- column_typeslist[str], optional
Data types for columns in run settings. The default is None. If None, first check config file. If that fails, assume everything is a float.
- Returns:
- None.
- columns = None¶
list[str]: the keys of run settings
- column_types = None¶
list[str]: the types of data stored in columns of run settings.
- index = None¶
int: Tracks progress through the steps given in run settings.
- num_steps = None¶
int: number of rows in run settings file
- run_settings = None¶
dict[str, list]: stores run settings
- add_config_files(config_files)¶
Make a dictionary from config files
- Parameters:
- config_filesstr or list(str)
Path or a list of paths to the config files to load.
- Returns:
- dict
Dictionary of config files
- load_run_settings(in_file, header: int = 0)¶
Load run settings.
- Parameters:
- in_filestr
Full path to run file
- headerint, optional
Number of lines to ignore at top of file. The default is 0.
- Returns:
- None.
- Raises:
- ValueError
If Run settings column is duplicated in config file.
- save_run_settings(file_name: str)¶
Save run settings to file as csv.
- Parameters:
- file_namestr
Full path to file.
- Returns:
- None.
- save_config(file_name: str, output_format='CSV')¶
Save config to file as csv.
- Parameters:
- file_namestr
Full path to file.
- output_formatstr, optional
The format of the file. Options are CSV. The default is CSV.
- Returns:
- None.
- get_column(column)¶
Get a column from run_settings.
- Parameters:
- columnstr
column to return
- Returns:
- None.
- advance()¶
Advance to the next row in run file.
- Returns:
- None.
- complete()¶
Test if run is complete.
Returns true if advance() has been called enough times to completely iterate through the run settings file, so that no more rows are available.
- Returns:
- bool
True if complete
- keys()¶
Get keys.
- exception rminstr.data_structures.ExptParametersReadError(message)¶
Bases:
ExceptionException raised for reading ExptParameters.
- Attributes:
message – explanation of the error
Initialize self. See help(type(self)) for accurate signature.
- message¶
- rminstr.data_structures.get_config_as_dictionary(config_files)¶
- rminstr.data_structures.save_dictionary_as_config(dictionary, path)¶