Function Reference

This appendix contains links to all documented functions included as part of the DIMSpec toolkit. As is common in R packages, not all functions are documented, but most are. Functions referenced in the rest of this user guide are linked directly to their entry on this page. Click any function in this table of contents to open its documentation.

DIMSpec Help Index

R Documentation

activate_py_env	Activate a python environment
active_connection	Is a connection object still available?
add_help	Attach a superscript icon with a bsTooltip to an HTML element
add_normalization_value	Add value(s) to a normalization table
add_or_get_id	Utility function to add a record
add_rdkit_aliases	Add fragment or compound aliases generated by RDKit functions
adduct_formula	Add Adduct to Formula
api_endpoint	Build an API endpoint programmatically
api_open_doc	Open Swagger API documentation
api_reload	Reloads the plumber API
api_start	Start the plumber API
api_stop	Stop the plumber API
append_icon_to	Create the JS to append an icon to an HTML element by its ID
bootstrap_compare_ms	Calculate dot product match score using bootstrap data
build_db	Build or rebuild the database from scratch
build_db_action	Build an escaped SQL query
build_triggers	Build pairs of INSERT/UPDATE triggers to resolve foreign key relationships
build_views	Build SQL to create views on normalized tables in SQLite
calculate.monoisotope	Calculate the monoisotopic mass of an elemental formula list
check_for_value	Check for a value in a database table
check_fragments	Determine number of matching fragments between unknown mass spectrum and specific peaks
check_isotopedist	Compare Isotopic Pattern to simulated pattern
check_mzML_convert	Check mzML file for specific MSConvert parameters
clause_where	Build a WHERE clause for SQL statements
close_up_shop	Conveniently close all database connections
compare_ms	Calculate dot product match score
complete_form_entry	Ensure complete form entry
create_fallback_build	Create an SQL file for use without the SQLite CLI
create_peak_list	Spectral Uncertainty Functions ———————————————————-
create_peak_table_ms1	Create peak table for MS1 data
create_peak_table_ms2	Create peak table for MS2 data
create_py_env	Create a python environment for RDKit
create_search_df	Create data.frame containing parameters for extraction and searching
create_search_ms	Generate uncertainty mass spectrum for MS1 and MS2 data
data_dictionary	Create a data dictionary
dataframe_match	Match multiple values in a database table
dotprod	Calculate dot product
dt_color_by	Apply colors to DT objects by value in a column
dt_formatted	Easily format multiple DT objects in a shiny project in the same manner
er_map	Create a simple entity relationship map
export_msp	Export to MSP
extend_suspect_list	Extend the compounds and aliases tables
extract.elements	Elemental Formula Functions
flush_dir	Flush a directory with archive
fn_guide	View an index of help documentation in your browser
fn_help	Get function documentation for this project
format_id	Format a file name as an HTML element ID
format_list_of_names	Grammatically collapse a list of values
formulalize	Generate standard chemical formula notation
full_import	Import one or more files from the NIST Method Reporting Tool for NTA
gather_qc	Quality Control Check of Import Data
get_annotated_fragments	Get all annotated fragments have matching masses
get_component	Resolve components from a list or named vector
get_compound_fragments	Get all fragments associated with compounds
get_compoundid	Get compound ID and name for specific peaks
get_fkpk_relationships	Extract foreign key relationships from a schema
get_massadj	Calculate the mass adjustment for a specific adduct
get_msconvert_data	Extract msconvert metadata
get_msdata	Get all mass spectral data within the database
get_msdata_compound	Get all mass spectral data for a specific compound
get_msdata_peakid	Get all mass spectral data for a specific peak id
get_msdata_precursors	Get all mass spectral data with a specific precursor ion
get_opt_params	Get optimized uncertainty mass spectra parameters for a peak
get_peak_fragments	Get annotated fragments for a specific peak
get_peak_precursor	Get precursor ion m/z for a specific peak
get_sample_class	Get sample class information for specific peaks
get_search_object	Generate msdata object from input peak data
get_suspectlist	Get the current NIST PFAS suspect list.
get_ums	Generate consensus mass spectrum
get_uniques	Get unique components of a nested list
getcharge	Get polarity of a ms scan within mzML object
getmslevel	Get MS Level of a ms scan within mzML object
getmzML	Brings raw data file into environment
getprecursor	Get precursor ion of a ms scan within mzML object
gettime	Get time of a ms scan within mzML object
has_missing_elements	Simple check for if an object is empty
is_elemental_match	Checks if two elemental formulas match
is_elemental_subset	Check if elemental formula is a subset of another formula
isotopic_distribution	Isotopic distribution functions
lockmass_remove	Remove lockmass scan from mzml object
log_as_dataframe	Pull a log file into an R object
log_fn	Simple logging convenience
log_it	Conveniently log a message to the console
make_acronym	Simple acronym generator
make_install_code	Convenience function to set a new installation code
make_requirements	Make import requirements file
manage_connection	Check for, and optionally remove, a database connection object
map_import	Map an import file to the database schema
mode_checks	Get list of available functions
molecule_picture	Picture a molecule from structural notation
monoisotope.list	Calculate the monoisotopic mass of a elemental formulas in
ms_plot_peak	Plot a peak from database mass spectral data
ms_plot_peak_overview	Create a patchwork plot of peak spectral properties
ms_plot_spectra	Plot a fragment map from database mass spectral data
ms_plot_spectral_intensity	Create a spectral intensity plot
ms_plot_titles	Consistent for ms_plot_x functions
ms_spectra_separated	Parse “Separated” MS Data
ms_spectra_zipped	Parse “Zipped” MS Data
mzMLconvert	Converts a raw file into an mzML
mzMLtoR	Opens file of type mzML into R environment
nist_shinyalert	Call [shinyalert::shinyalert] with specific styling
obj_name_check	Sanity check for environment object names
open_env	Convenience shortcut to open and edit session environment variables
open_proj_file	Open and edit project files
optimal_ums	Get the optimal uncertainty mass spectrum parameters for data
overlap	Calculate overlap ranges
pair_ums	Pairwise data.frame of two uncertainty mass spectra
peak_gather_json	Extract peak data and metadata
plot_compare_ms	Plot MS Comparison
plot_ms	Generate consensus mass spectrum
pool.sd	Pool standard deviations
pool.ums	Pool uncertainty mass spectra
pragma_table_def	Get table definition from SQLite
pragma_table_info	Explore properties of an SQLite table
py_modules_available	Are all conda modules available in the active environment
rdkit_active	Sanity check on RDKit binding
rdkit_mol_aliases	Create aliases for a molecule from RDKit
read_log	Read a log from a log file
rebuild_helps	Rebuild the help files as HTML with an index
rectify_null_from_env	Rectify NULL values provided to functions
ref_table_from_map	Get the name of a linked normalization table
remove_db	Remove an existing database
remove_icon_from	Remove the last icon attached to an HTML element
remove_sample	Delete a sample
repair_xl_casrn_forced_to_date	Repair CAS RNs forced to a date numeric by MSXL
repl_nan	Replace NaN
report_qc	Export QC result JSONfile into PDF
reset_logger_settings	Update logger settings
resolve_compound_aliases	Resolve compound aliases provided as part of the import routine
resolve_compound_fragments	Link together peaks, fragments, and compounds
resolve_compounds	Resolve the compounds node during bulk import
resolve_description_NTAMRT	Resolve the method description tables during import
resolve_fragments_NTAMRT	Resolve the fragments node during database import
resolve_method	Add an ms_method record via import
resolve_mobile_phase_NTAMRT	Resolve the mobile phase node
resolve_ms_data	Resolve and store mass spectral data during import
resolve_ms_spectra	Unpack mass spectral data in compressed format
resolve_multiple_values	Utility function to resolve multiple choices interactively
resolve_normalization_value	Resolve a normalization value against the database
resolve_peak_ums_params	Resolve and import optimal uncertain mass spectrum parameters
resolve_peaks	Resolve the peaks node during import
resolve_qc_data_NTAMRT	Resolve and import quality control data for import
resolve_qc_methods_NTAMRT	Resolve and import quality control method information
resolve_sample	Add a sample via import
resolve_sample_aliases	Resolve and import sample aliases
resolve_software_settings_NTAMRT	Import software settings
resolve_table_name	Check presence of a database table
save_data_dictionary	Save the current data dictionary to disk
search_all	Search all mass spectra within database against unknown mass spectrum
search_precursor	Search the database for all compounds with matching precursor ion m/z values
setup_rdkit	Conveniently set up an RDKit python environment for use with R
sigtest	Significance testing function
smilestoformula	Convert SMILES string to Formula and other information
sql_to_msp	Export SQL Database to a MSP NIST MS Format
sqlite_auto_trigger	Create a basic SQL trigger for handling foreign key relationships
sqlite_auto_view	Create a basic SQL view of a normalized table
sqlite_parse_build	Parse SQL build statements
sqlite_parse_import	Parse SQL import statements
start_api	Start the plumber interface from a clean environment
start_app	WIP Launch a shiny application
start_rdkit	Start the RDKit integration
summarize_check_fragments	Summarize results of check_fragments function
support_info	R session information for support needs
suspectlist_at_NIST	Open the NIST PDR entry for the current NIST PFAS suspect list
table_msdata	Tabulate MS Data
tack_on	Append additional named elements to a list
tidy_comments	Tidy up table and field comments
tidy_ms_spectra	Tidy Spectra
tidy_spectra	Decompress Spectra
unzip	Unzip binary data into vector
update_all	Convenience function to rebuild all database related files
update_data_sources	Dump current database contents
update_env_from_file	Update a conda environment from a requirements file
update_logger_settings	Update logger settings
user_guide	Launch the User Guide for DIMSpec
valid_file_format	Ensure files uploaded to a shiny app are of the required file type
validate_casrns	Validate a CAS RN
validate_column_names	Ensure database column presence
validate_tables	Ensure database table presence
verify_args	Verify arguments for a function
verify_import_columns	Verify column names for import
verify_import_requirements	Verify an import file’s properties
with_help	Convenience application of codeadd_help using pipes directly in codeUI.R

activate_py_env

R Documentation

Activate a python environment

Description

Programmatically setting up python bindings is a bit more convoluted than in a standard script. Given the name of a Python environment, it either (1) checks the provided ‘env_name’ against currently installed environments and binds the current session to it if found OR (2) installs a new environment with [create_py_env] and activates it by calling itself.

Usage

activate_py_env(
  env_name = NULL,
  required_libraries = NULL,
  required_modules = NULL,
  log_ns = NULL,
  conda_path = NULL
)

Arguments

`env_name`	CHR scalar of a python environment name to bind. The default, NULL, will look for an environment variable named ‘PYENV_NAME’
`required_libraries`	CHR vector of python libraries to include in the environment, if building a new environment. Ignored if ‘env_name’ is an existing environment. The default, NULL, will look for an environment variable named ‘PYENV_LIBRARIES’.
`required_modules`	CHR vector of modules to be checked for availability once the environment is activated. The default, NULL, will look for an environment variable named ‘PYENV_MODULES’.
`log_ns`	CHR scalar of the logging namespace to use, if any.

Details

It is recommended that project variables in ‘../config/env_py.R’ and ‘../config/env_glob.txt’ be used to control most of the behavior of this function. This works with both virtual and conda environments, though creation of new environments is done in conda.

Value

LGL scalar of whether or not activate was successful

Note

Where parameters are NULL, [rectify_null_from_env] will be used to get a value associated with it if they exist.

active_connection

R Documentation

Is a connection object still available?

Description

This is a thin wrapper for [DBI::dbIsValid] with some error logging.

Usage

active_connection(db_conn = con)

Arguments

db_conn

connection object (default “con”)

Value

LGL scalar indicating whether the database is available

add_help

R Documentation

Attach a superscript icon with a bsTooltip to an HTML element

Description

Attach a superscript icon with a bsTooltip to an HTML element

Usage

add_help(
  id,
  tooltip,
  icon_name = "question",
  size = "xs",
  icon_class = "info-tooltip primary",
  ...
)

Arguments

`id`	CHR scalar of the HTML ID to which to append the icon
`tooltip`	CHR scalar of the tooltip text
`icon_name`	CHR scalar of the icon name, which must be understandable by the ‘shiny::icon’ function; e.g. a font-awesome icon name (default: “question”).
`size`	CHR scalar of the general icon size as understandable by the font-awesome library (default: “xs”)
`icon_class`	CHR vector of classes to apply to the ‘<sup>’ container, as defined in the current CSS (default: “info-tooltip primary”)
`…`	Other named arguments to be passed to ‘shinyBS:bsTooltip’

Value

LIST of HTML tags for the desired help icon and its tooltip

Note

The following CSS is typically defined to go with this. .info-tooltip opacity: 30 transition: opacity .25s;

.info-tooltip:hover opacity: 100

.primary color: #3c8dbc;

Examples

add_help("example", "a tooltip")

add_normalization_value

R Documentation

Add value(s) to a normalization table

Description

One of the most common database operations is to look up or add a value in a normalization table. This utility function adds a single value and returns its associated id by using [build_db_action]. This is only suitable for a single value. If you need to bulk add multiple new values, use this with something like [lapply].

Usage

add_normalization_value("norm_table", name = "new value", acronym = "NV")

Arguments

`db_table`	CHR scalar of the normalization table’s name
`db_conn`	connection object (default “con”)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)
`id_column`	CHR scalar of the column to use as the primary key identifier for ‘db_table’ (default: “id”)
`database_map`	LIST of the database entity relationship map, typically from calling [er_map]. If NULL (default) the object “db_map” will be searched for and used by default, otherwise it will be created with [er_map]
`…`	CHR vector of additional named arguments to be added; names not appearing in the referenced table will be ignored

Value

NULL if unable to add the values, INT scalar of the new ID otherwise

add_or_get_id

R Documentation

Utility function to add a record

Description

Checks a table in the attached SQL connection for a primary key ID matching the provided ‘values’ and returns the ID. If none exists, adds a record and returns the resulting ID if successful. Values should be provided as a named vector of the values to add. No data coercion is performed, relying almost entirely on the database schema or preprocessing to ensure data integrity.

Usage

add_or_get_id(
  db_table,
  values,
  db_conn = con,
  ensure_unique = TRUE,
  require_all = TRUE,
  ignore = FALSE,
  log_ns = "db"
)

Arguments

`db_table`	CHR scalar name of the database table being modified
`values`	named vector of the values being added, passed to [build_db_action]
`db_conn`	connection object (default: con)
`ensure_unique`	LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)
`require_all`	LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)
`ignore`	LGL scalar on whether to treat the insert try as an “INSERT OR IGNORE” SQL statement (default: FALSE)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Details

Provided values are checked agaisnt required columns in the table using [verify_import_columns].

Operations to add the record and get the resulting ID are both performed with [build_db_action] and are performed virtually back to back with the latest-added ID being given preference in cases where added values may match multiple extant records.

Value

INT scalar of the record identifier

Note

If this is used in high volume/traffic applications, ID conflicts may occur if the timing is such that another record containing identical values is added before the call getting the ID completes.

add_rdkit_aliases

R Documentation

Add fragment or compound aliases generated by RDKit functions

Description

Aliases are stored for both compounds and fragments within the database to facilitate search and unambiguous identification. Given one molecular structure notation (SMILES is preferred), other machine-readable expressions can be generated quickly. Requested aliases as provided to ‘rdkit_aliases’ will be prefixed by ‘mol_to_prefix’ and checked against the namespace of available functions in RDKit and the correct functions automatically assigned.

Usage

add_rdkit_aliases(
  identifiers,
  alias_category = c("compounds", "fragments"),
  compound_aliases_table = "compound_aliases",
  fragment_aliases_table = "fragment_aliases",
  inchi_prefix = "InChI=1S/",
  rdkit_name = ifelse(exists("PYENV_NAME"), PYENV_NAME, "rdkit"),
  rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"),
  rdkit_ns = "rdk",
  rdkit_make_if_not = TRUE,
  rdkit_aliases = c("inchi", "inchikey"),
  mol_to_prefix = "MolTo",
  mol_from_prefix = "MolFrom",
  type = "smiles",
  as_object = TRUE,
  db_conn = con,
  log_ns = "rdk"
)

Arguments

`identifiers`	CHR vector of machine readable notations in ‘type’ format
`alias_category`	CHR scalar, one of “compounds” or “fragments” to determine where in the database to store the resulting aliases (default: “compounds”)
`compound_aliases_table`	CHR scalar name of the database table holding compound aliases (default: “compound_aliases”)
`fragment_aliases_table`	CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”)
`inchi_prefix`	CHR scalar prefix for the InChI code to use, if InChI is requested as part of ‘rdkit_aliases’
`rdkit_name`	CHR scalar name of the python environment at which RDKit is installed (default: is the session variable PYENV_NAME or “rdkit”)
`rdkit_ref`	CHR scalar name of the R pointer object to RDKit (default: is the session variable PYENV_REF or “rdk”)
`rdkit_ns`	CHR scalar name of the logging namespace to use (default: “rdk”); will be ignored if logging is off
`rdkit_make_if_not`	LGL scalar of whether to create an RDKit environment if it does not exist (default: TRUE)
`rdkit_aliases`	CHR vector of machine-readable aliases to generate, which must be recognizeable as names in the RDKit namespace when prefixed by ‘mol_to_prefix’ (default: c(“inchi”, “inchikey”)); these are not case sensitive
`mol_to_prefix`	CHR scalar of the prefix identifying alias creation functions, which must be recognizeable as names in the RDKit namespace when suffixed by ‘rdkit_aliases’ (default: “MolTo”); this is not case sensitive
`mol_from_prefix`	CHR scalar of the prefix identifying molecule expression creation functions, which must be recognizeable as names in the RDKit namespace when suffixed by ‘type’ (default: “MolFrom”); this is not case sensitive
`type`	CHR scalar indicating the type of ‘identifiers’ to be converted to molecule notation (default: “smiles”); this is not case sensitive
`as_object`	LGL scalar indicating whether to return the alias list to the session as an object (default: TRUE) or write aliases to the database (FALSE)
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

If ‘as_object’ == TRUE, a data.frame of unpacked spectra, otherwise no return and a database insertion will be performed

Note

It is not recommended to change the defaults here unless you are familiar with the naming conventions of RDKit.

Requires both INFORMATICS and USE_RDKIT set to TRUE in the session and a valid installation of the RDKIT python environment to function.

See the RDKit Documentation for more details.

adduct_formula

R Documentation

Add Adduct to Formula

Description

Add Adduct to Formula

Usage

adduct_formula(elementalformula, adduct = "+H")

Arguments

`elementalformula`	character string elemental formula
`adduct`	character string adduct state to add to the elemental formula, must contain an element, options are ‘+H’, ‘-H’, ‘+Na’, ‘+K’

Value

character string containing elemental formula with adduct

Examples

adduct_formula("C2H5O", adduct = "+H")

api_endpoint

R Documentation

Build an API endpoint programmatically

Description

This is a convenience function intended to support plumber endpoints. It only assists in the construction (and execution if ‘execute’ == TRUE) of endpoints. Endpoints must still be understood. Validity checking, execution, and opening in a web browser are supported. Invalid endpoints will not be executed or opened for viewing.

Usage

api_endpoint(
  path,
  ...,
  server_addr = PLUMBER_URL,
  check_valid = TRUE,
  execute = TRUE,
  open_in_browser = FALSE,
  raw_result = FALSE,
  max_pings = 20L,
  return_type = c("text", "raw", "parsed"),
  return_format = c("vector", "data.frame", "list")
)

Arguments

`path`	CHR scalar of the endpoint path.
`…`	Additional named parameters added to the endpoint, most typically the query portion. If only one is provided, it can remain unnamed and a query is assumed. If more than one is provided, all must be named. Named elements must be components of the return from [httr::parse_url] (see https://tools.ietf.org/html/rfc3986) for details of the parsing algorithm; unrecognized elements will be ignored.
`server_addr`	CHR scalar uniform resource locator (URL) address of an API server (e.g. “https://myapi.com:8080”) (defaults to the current environment variable “PLUMBER_URL”)
`check_valid`	LGL scalar on whether or not to first check that an endpoint returns a valid status code (200-299) (default: TRUE).
`execute`	LGL scalar of whether or not to execute the constructed endpoint and return the result; will be defaulted to FALSE if ‘check_valid’ == TRUE and the endpoint returns anything other than a valid status code. (default: TRUE)
`open_in_browser`	LGL scalar of whether or not to open the resulting endpoint in the system’s default browser; will be defaulted to FALSE if ‘check_valid’ == TRUE and the endpoint returns anything other than a valid status code. (default: FALSE)
`max_pings`	INT scalar maximum number of pings to try before timeout if using endpoint “_ping”; this is only used for endpoint “_ping” (default: 20)
`return_type`	CHR scalar on which return type to use, which must be one of “text”, “raw”, or “parsed” which will be used to read the content of the response item (default: “text”)
`return_format`	CHR scalar on which form to return data, which must be one of “vector”, “data.frame”, or “list” (default: “vector” to support primarily single value responses)

Value

CHR scalar of the constructed endpoint, with messages regarding status checks, return from the endpoint (typically JSON) if valid and ‘execute’ == TRUE, or NONE if ‘open_in_browser’ == TRUE

Note

Special support is provided for the way in which the NIST Public Data Repository treats URL fragments

This only support [httr::GET] requests.

Examples

api_endpoint("https://www.google.com/search", list(q = "something"), open_in_browser = TRUE)
api_endpoint("https://www.google.com/search", query = list(q = "NIST Public Data Repository"), open_in_browser = TRUE)

api_open_doc

R Documentation

Open Swagger API documentation

Description

This will launch the Swagger UI in a browser tab. The URL suffix “docs” will be automatically added if not part of the host URL accepted as ‘url’.

Usage

api_open_doc(url = PLUMBER_URL)

Arguments

url

CHR URL/URI of the plumber documentation host (default: environment variable “PLUMBER_URL”)

Value

None, opens a browser to the requested URL

api_reload

R Documentation

Reloads the plumber API

Description

Depending on system architecture, the plumber service may take some time to spin up and spin down. If ‘background’ is TRUE, this may mean the calling R thread runs ahead of the background process resulting in unexpected behavior (e.g. newly defined endpoints not being available), effectively binding it to the prior iteration. If the API does not appear to be reloading properly, it may be necessary to manually kill the process controlling it through your OS and to call this function again.

Usage

api_reload(
  pr = NULL,
  background = TRUE,
  plumber_file = NULL,
  on_host = NULL,
  on_port = NULL,
  log_ns = "api"
)

Arguments

`pr`	CHR scalar name of the plumber service object, typically only created as a background observer from [callr::r_bg] as a result of calling [api_reload] (default: NULL gets the environment setting for PLUMBER_OBJ_NAME)
`background`	LGL scalar of whether to load the plumber server as a background service (default: TRUE); set to FALSE for testing
`plumber_file`	CHR scalar of the path to a plumber API to launch (default: NULL)
`on_host`	CHR scalar of the host IP address (default: NULL)
`on_port`	CHR or INT scalar of the host port to use (default: NULL)
`log_ns`	CHR scalar namespace to use for logging (default: “api”)

Value

Launches the plumber API service on your local machine and returns the URL on which it can be accessed as a CHR scalar

api_start

R Documentation

Start the plumber API

Description

This is a wrapper to [plumber::pr_run] pointing to a project’s opinionated plumber settings with some error trapping. The host, port, and plumber file are set in the “config/env_R.R” location as PLUMBER_HOST, PLUMBER_PORT, and PLUMBER_FILE respectively.

Usage

api_start(plumber_file = NULL, on_host = NULL, on_port = NULL)

Arguments

`plumber_file`	CHR scalar of the path to a plumber API to launch (default: NULL)
`on_host`	CHR scalar of the host IP address (default: NULL)
`on_port`	CHR or INT scalar of the host port to use (default: NULL)

Value

LGL scalar with success status

Note

If either of ‘on_host’ or ‘on_port’ are NULL they will default first to any existing environment values of PLUMBER_HOST and PLUMBER_PORT, then to getOption(“plumber.host”, “127.0.0.1”) and getOption(“plumber.port”, 8080)

This will fail if the requested port is in use.

api_stop

R Documentation

Stop the plumber API

Description

Stop the plumber API

Usage

api_stop(pr = NULL, flush = TRUE, db_conn = "con", remove_service_obj = TRUE)

Arguments

`pr`	CHR scalar name of the plumber service object, typically only created as a background observer from [callr::r_bg] as a result of calling [api_reload] (default: NULL gets the environment setting for PLUMBER_OBJ_NAME)
`flush`	LGL scalar of whether to disconnect and reconnect to a database connection named as ‘db_conn’ (default: TRUE)
`db_conn`	CHR scalar of the connection object name (default: “con”)
`remove_service_obj`	LGL scalar of whether to remove the reference to ‘pr’ from the current global environment (default: TRUE)

Value

None, stops the plumber server

Note

This will also kill and restart the connection object if ‘flush’ is TRUE to release connections with certain configurations such as SQLite in write ahead log mode.

This function assumes the object referenced by name ‘pr’ exists in the global environment, and ‘remove_service_object’ will only remove it from .GlobalEnv.

append_icon_to

R Documentation

Create the JS to append an icon to an HTML element by its ID

Description

Create the JS to append an icon to an HTML element by its ID

Usage

append_icon_to(id, icon_name, icon_class = NULL)

Arguments

`id`	CHR scalar of the HTML ID to which to append an icon
`icon_name`	CHR scalar of the icon name, which must be understandable by the ‘shiny::icon’ function; e.g. a font-awesome icon name.
`icon_class`	CHR vector of classes to apply

Value

CHR scalar suitable to execute with ‘shinyjs::runJS’

Examples

append_icon_to("example", "r-project", "fa-3x")

bootstrap_compare_ms

R Documentation

Calculate dot product match score using bootstrap data

Description

Calculates a the match score (based on dot product) of the two uncertainty mass spectra. To generate a distribution of match scores using the uncertainty of the two mass spectra, bootstrapped data (using ‘rnorm’ for now)

Usage

bootstrap_compare_ms(
  ms1,
  ms2,
  error = c(5, 5),
  minerror = c(0.002, 0.002),
  m = 1,
  n = 0.5,
  runs = 10000
)

Arguments

`ms1, ms2`	the uncertainty mass spectra from function ‘get_ums’
`error`	a vector of the respective mass error (in ppm) for each mass spectrum or a single vector representing the mass error for all m/z values
`minerror`	a two component vector of the respective minimum mass error (in Da) for each mass spectrum or a single value representing the minimum mass error of all m/z values
`m, n`	weighting values for mass (m) and intensity (n)
`runs`

build_db

R Documentation

Build or rebuild the database from scratch

Description

This function will build or rebuild the NIST HRAMS database structure from scratch, removing the existing instance. By default, most parameters are set in the environment (at “./config/env_glob.txt”) but any values can be passed directly. This can be used to quickly spin up multiple copies with a clean slate using different build files, data files, or return to the last stable release.

Usage

build_db(db = "test_db.sqlite", db_conn_name = "test_conn")

Arguments

`db`	CHR scalar of the database name (default: session value DB_NAME)
`build_from`	CHR scalar of a SQL build script to use (default: environment value DB_BUILD_FILE)
`populate`	LGL scalar of whether to populate with data from the file in ‘populate_with’ (default: TRUE)
`populate_with`	CHR scalar for the populate script (e.g. “populate_demo.sql”) to during after the build is complete; (default: session value DB_DATA); ignored if ‘populate = FALSE’
`archive`	LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE), passed to [‘remove_db()’]
`sqlite_cli`	CHR scalar to use to look for installed sqlite3 CLI tools in the current system environment (default: session value SQLITE_CLI)
`connect`	LGL scalar of whether or not to connect to the rebuilt database in the global environment as object ’con“ (default: FALSE)

Details

If sqlite3 and its command line interface are available on your platform, that will be used (preferred method) but, if not, this function will read in all the necessary files to directly create it using shell commands. The shell method may not be universally applicable to certain compute environments or may require elevated permissions.

Value

None, check console for details

build_db_action

R Documentation

Build an escaped SQL query

Description

In most cases, issuing basic SQL queries is made easy by tidyverse compliant functions such as [dplyr::tbl]. Full interaction with an SQLite database is a bit more complicated and typically requires [DBI::dbExecute] and writing SQL directly; several helpers exist for that (e.g. [glue::glue_sql]) but aren’t as friendly or straight forward when writing more complicated actions, and still require directly writing SQL equivalents, routing through [DBI::dbQuoteIdentifier] and [DBI::dbQuoteLiteral] to prevent SQL injection attacks.

Usage

build_db_action("insert", "table", values = list(col1 = "a", col2 = 2,
  col3 = "describe"), execute = FALSE) build_db_action("insert", "table",
  values = list(col1 = "a", col2 = 2, col3 = "describe"))
  
  build_db_action("get_id", "table", match_criteria = list(id = 2))
  
  build_db_action("delete", "table", match_criteria = list(id = 2))
  
  build_db_action("select", "table", columns = c("col1", "col2", "col3"),
  match_criteria = list(id = 2)) build_db_action("select", "table",
  match_criteria = list(sample_name = "sample 123"))
  
  build_db_action("select", "table", match_criteria = list(sample_name =
  list(value = "sample 123", exclude = TRUE)) build_db_action("select",
  "table", match_criteria = list(sample_name = "sample 123",
  sample_contributor = "Smith"), and_or = "AND", limit = 5)

Arguments

`action`	CHR scalar, of one “INSERT”, “UPDATE”, “SELECT”, “GET_ID”, or “DELETE”
`table_name`	CHR scalar of the table name to which this query applies
`column_names`	CHR vector of column names to include (default NULL)
`values`	LIST of CHR vectors with values to INSERT or UPDATE (default NULL)
`match_criteria`	LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`and_or`	LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”)
`limit`	INT scalar of the maximum number of rows to return (default NULL)
`group_by`	CHR vector of columns by which to group (default NULL)
`order_by`	named CHR vector of columns by which to order, with names matching columns and values indicating whether to sort ascending (default NULL)
`distinct`	LGL scalar of whether or not to apply the DISTINCT clause to all match criteria (default FALSE)
`get_all_columns`	LGL scalar of whether to return all columns; will be set to TRUE automatically if no column names are provided (default FALSE)
`execute`	LGL scalar of whether or not to immediately execute the build query statement (default TRUE)
`single_column_as_vector`	LGL scalar of whether to return results as a vector if they consist of only a single column (default TRUE)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Details

This function is intended to ease that by taking care of most of the associated logic and enabling routing through other functions, or picking up arguments from within other function calls.

Value

CHR scalar of the constructed query

build_triggers

R Documentation

Build pairs of INSERT/UPDATE triggers to resolve foreign key relationships

Description

When building schema by script, it is often handy to enforce certain behaviors on database transactions involving foreign keys, especially in SQLite. Given a properly structured list object describing the mappings between tables in a schema (e.g. one deriving from [er_map]), this function will parse those for foreign key relationships.

Usage

build_triggers(er_map(db_conn = con))

Arguments

`db_map`	LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’
`references_in`	CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map])
`create_insert_trigger`	LGL scalar indicating whether to build an insert trigger for each table (default: TRUE).
`create_update_trigger`	LGL scalar indicating whether to build an update trigger for each table (default: FALSE).
`save_to_file`	CHR scalar of a file in which to write the output, if any (default: NULL will return the resulting object to the R session)

Details

Primarily, this requires a list object referring to tables that contains in each element a child element with the name provided in ‘references_in’. The pre-pass parsing function [get_fkpk_relationships] is used to pull references from the full map is used.

Value

LIST object containing one element for each table in ‘db_map’ containing foreign key references, with one child

Note

Tables in ‘db_map’ that do not contain foreign key relationships will be dropped from the output list.

This is largely a convenience function to programmatically apply [make_sql_triggers] to an entire schema. To skip tables with defined foreign key relationships for which triggers are undesirable, remove those tables from ‘db_map’ prior to calling this function.

build_views

R Documentation

Build SQL to create views on normalized tables in SQLite

Description

Build SQL to create views on normalized tables in SQLite

Usage

build_views(db_map = er_map(con), dictionary = data_dictionary(con))

Arguments

`db_map`	LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’
`references_in`	CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map])
`dictionary`	LIST object containing the schema dictionary produced by [data_dictionary] fully describing table entities
`drop_if_exists`	LGL scalar indicating whether to include a “DROP VIEW” prefix for the generated view statement; as this has an impact on schema, no default is set
`save_to_file`	CHR scalar name of a file path to save generated SQL (default: NULL will return a list object to the R session)
`append`	LGL scalar on whether to appead to ‘save_to_file’ (default: FALSE)

Value

LIST if ‘save_to_file = FALSE’ or none

calculate.monoisotope

R Documentation

Calculate the monoisotopic mass of an elemental formula list

Description

Calculate the monoisotopic mass of an elemental formula list

Usage

calculate.monoisotope(
  elementlist,
  exactmasses = NULL,
  adduct = "neutral",
  db_conn = "con"
)

Arguments

`elementlist`	list of elemental formula from ‘extract.elements’ function
`exactmasses`	list of exact masses of elements
`adduct`	character string adduct/charge state to add to the elemental formula, options are ‘neutral’, ‘+H’, ‘-H’, ‘+Na’, ‘+K’, ‘+’, ‘-’, ‘-radical’, ‘+radical’
`db_conn`	database connection object, either a CHR scalar name (default: “con”) or the connection object itself (preferred)

Value

numeric monoisotopic exact mass

Examples

elementlist <- extract.elements("C2H5O")
calculate.monoisotope(elementalist, adduct = "neutral")

check_for_value

R Documentation

Check for a value in a database table

Description

This convenience function simply checks whether a value exists in the distinct values of a given column. Only one column may be searched at a time; serialize it in other code to check multiple columns. It leverages the flexibility of [build_db_action] to do the searching. The ‘values’ parameter will be fed directly and can accept the nested list structure defined in [clause_where] for exclusions and like clauses.

Usage

con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
alphabet <- dplyr::tibble(lower = letters, upper = LETTERS)
dplyr::copy_to(con2, alphabet)
check_for_value("A", "alphabet", "upper", db_conn = con2)
check_for_value("A", "alphabet", "lower", db_conn = con2)
check_for_value(letters[1:10], "alphabet", "lower", db_conn = con2)

Arguments

`values`	CHR vector of the values to search
`db_table`	CHR scalar of the database table to search
`db_column`	CHR scalar of the column to search
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`db_conn`	connection object (default: con)
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

Value

LIST of length 1-2 containing “exists” as a LGL scalar for whether the values were found, and “values” containing the result of the database call, a data.frame object containing matching rows or NULL if exists == FALSE.

check_fragments

R Documentation

Determine number of matching fragments between unknown mass spectrum and specific peaks

Description

Determine number of matching fragments between unknown mass spectrum and specific peaks

Usage

check_fragments(con, ums, peakid, masserror = 5, minerror = 0.001)

Arguments

`con`	SQLite database connection
`ums`	uncertainty mass spectrum of unknown compound
`peakid`	integer vector of primary keys for peaks table
`masserror`	numeric relative mass error (ppm)
`minerror`	numeric minimum mass error (Da)

Value

table of fragments and TRUE/FALSE for if the fragment is within the unknown mass spectrum

check_isotopedist

R Documentation

Compare Isotopic Pattern to simulated pattern

Description

calculates the isotopic distribution of the stated elemental formula and compares against the empirical ms

Usage

check_isotopedist(
  ms,
  elementalformula,
  exactmasschart,
  error,
  minerror = 0.002,
  remove.elements = c(),
  max.dist = 3,
  min.int = 0.001,
  charge = "neutral",
  m = 1,
  n = 0.5
)

Arguments

`ms`	data.frame mass spectrum containing pair-wise m/z and intensity values of empirical isotopic pattern
`elementalformula`	character string of elemental formula to simulate isotopic pattern
`exactmasschart`	exact mass chart
`error`	numeric relative mass error (in ppm) of mass spectrometer
`minerror`	numeric minimum mass error (in Da) of mass spectrometer
`remove.elements`	character vector of elements to remove from elemental formula
`max.dist`	numeric maximum mass distance (in Da) from exact mass to include in simulated isotopic pattern
`min.int`	numeric minimum relative intensity (maximum = 1, minimum = 0) to include in simulated isotopic pattern
`charge`	character string for the charge state of the simulated isotopic pattern, options are ‘neutral’, ‘positive’, and ‘negative’
`m`	numeric dot product mass weighting
`n`	numeric dot product intensity weighting

Value

numeric vector of match scores between the empirical and calculated isotopic distribution.

check_mzML_convert

R Documentation

Check mzML file for specific MSConvert parameters

Description

Check mzML file for specific MSConvert parameters

Usage

check_mzML_convert(mzml)

Arguments

mzml

list of msdata from ‘mzMLtoR’ function

Value

data.frame object of conversion veracity checks

clause_where

R Documentation

Build a WHERE clause for SQL statements

Description

Properly escaping SQL to prevent injection attacks can be difficult with more complicated queries. This clause constructor is intended to be specific to the WHERE clause of SELECT to UPDATE statements. The majority of construction is achieved with the ‘match_criteria’ parameter, which should always be a list with names for the columns to appear in the WHERE clause. A variety of convenience is built in, from simple comparisons to more complicated ones including negation and similarity (see the description for argument ‘match_criteria’).

Usage

clause_where(ANSI(), "example", list(foo = "bar", cat = "dog"))
clause_where(ANSI(), "example", list(foo = list(values = "bar", like = TRUE)))
clause_where(ANSI(), "example", list(foo = list(values = "bar", exclude = TRUE)))

Arguments

`table_names`	CHR vector of tables to search
`match_criteria`	LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`and_or`	LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”)

Value

CHR scalar of the constructed where clause for an SQL statement

close_up_shop

R Documentation

Conveniently close all database connections

Description

This closes both the plumber service and all database connections from the current running environment. If outstanding promises exist to database tables or views were created as class ‘tbl_’ (e.g. with ‘tbl(con, “table”)’), set ‘back_up_connected_tbls’ to TRUE to collect data from those and preserve in-place in the current global environment.

Usage

manage_connection()
close_up_shop(TRUE)

Arguments

back_up_connected_tbls

LGL scalar of whether to clone currently promised tibble connections to database objects as data frames (default: FALSE).

Value

None, modifies the current global environment in place

compare_ms

R Documentation

Calculate dot product match score

Description

Calculates a the match score (based on dot product) of the two uncertainty mass spectra. Note: this is a static match score and does not include associated uncertainties.

Usage

compare_ms(
  ms1,
  ms2,
  error = c(5, 5),
  minerror = c(0.002, 0.002),
  m = 1,
  n = 0.5
)

Arguments

`ms1, ms2`	the uncertainty mass spectra from function ‘get_ums’
`error`	a vector of the respective mass error (in ppm) for each mass spectrum or a single vector representing the mass error for all m/z values
`minerror`	a two component vector of the respective minimum mass error (in Da) for each mass spectrum or a single value representing the minimum mass error of all m/z values
`m, n`	weighting values for mass (m) and intensity (n)

complete_form_entry

R Documentation

Ensure complete form entry

Description

This input validation check ensures the current session’s input object includes non-NA, non-NULL, and non-blank values similarly to [shiny::req] and [shiny::validate] but can be called with a predefined list of input names to check. Typically this is used for validate form entry completion. Call this function prior to reading form entries to ensure that all values requested by name in in ‘values’ are present. If they are not, a [nist_shinyalert] modal is displayed prompting the user to complete the form.

Usage

req(complete_form_entry(input, c("need1", "need2")))

Arguments

`input`	The session input object
`values`	CHR vector of input object names to require
`show_alert`	LGL scalar indicating whether or not to show an alert, set FALSE to return the status of the check

Value

Whether or not all required values are present.

create_fallback_build

R Documentation

Create an SQL file for use without the SQLite CLI

Description

For cases where the SQLite Command Line Interface is not available, dot commands used to simplify the database build pipeline are not usable. Call this function to create a self-contained SQL build file that can be used in [build_db] to build the database. The self-contained file will include all “CREATE” and “INSERT” statements necessary by parsing lines including “.read” and “.import” commands and directly reading referenced files.

Usage

create_fallback_build(build_file = file.path("config", "build.sql"))

Arguments

`build_file`	CHR scalar name SQL build file to use. The default, NULL, will use the environment variable “DB_BUILD_FILE” if it is available.
`populate`	LGL scalar of whether to populate data (default: TRUE)
`populate_with`	CHR scalar name SQL population file to use. The default, NULL, will use the environment variable “DB_DATA” if it is available.
`driver`	CHR scalar of the database driver class to use to correctly interpolate SQL commands (default: “SQLite”)
`comments`	CHR scalar regex identifying SQLite comments
`out_file`	CHR scalar of the output file name and destination. The default, NULL, will write to a file named similarly to ‘build_file’ suffixed with “_full”.

Value

None: a file will be written at ‘out_file’ with the output.

create_peak_list

R Documentation

Spectral Uncertainty Functions ———————————————————- Create peak list from SQL ms_data table

Description

The function extracts the relevant information and sorts it into nested lists for use in the uncertainty functions

Usage

create_peak_list(ms_data)

Arguments

ms_data

extraction of the ms_data from the SQL table for a specified peak

Value

nested list of all data

create_peak_table_ms1

R Documentation

Create peak table for MS1 data

Description

Takes a nested peak list and creates a peak table for easier determination of uncertainty of the measurement for MS1 data.

Usage

create_peak_table_ms1(peak, mass, masserror = 5, minerror = 0.002, int0 = NA)

Arguments

`mass`	the exact mass of the compound of interest
`masserror`	the mass accuracy (in ppm) of the instrument data
`minerror`	the minimum mass error (in Da) of the instrument data
`int0`	the default setting for intensity values for missing m/z values
`peaklist`	result of the ‘create_peak_list’ function

Value

nested list of dataframes containing all MS2 data for the peak

create_peak_table_ms2

R Documentation

Create peak table for MS2 data

Description

Takes a nested peak list and creates a peak table for easier determination of uncertainty of the measurement for MS2 data.

Usage

create_peak_table_ms2(peak, mass, masserror = 5, minerror = 0.002, int0 = NA)

Arguments

`mass`	the exact mass of the compound of interest
`masserror`	the mass accuracy (in ppm) of the instrument data
`minerror`	the minimum mass error (in Da) of the instrument data
`int0`	the default setting for intensity values for missing m/z values
`peaklist`	result of the ‘create_peak_list’ function

Value

nested list of dataframes containing all MS2 data for the peak

create_py_env

R Documentation

Create a python environment for RDKit

Description

This project offers a full integration of RDKit via [reticulate]. This function does the heavy lifting for setting up that environment, either from an environment specifications file or from the conda forge channel.

Usage

create_py_env("nist_hrms_db", c("reticulate", "rdkit"))

Arguments

env_name

CHR scalar of a python environment

Details

Preferred set up is to set variables in the ‘env_py.R’ file, which will be used over the internal defaults chosen here. The exception is if ‘INSTALL_FROM == “local”’ and no value is provided for ‘INSTALL_FROM_FILE’ which has no internal default.

Germane variables are ‘PYENV_NAME’ (default “reticulated_rdkit”), ‘CONDA_PATH’ (default “auto”), ‘CONDA_MODULES’ (default “rdkit”, “r-reticulate” will be added), ‘INSTALL_FROM’ (default “conda”), ‘INSTALL_FROM_FILE’ (default “rdkit/environment.yml”), ‘MIN_PY_VER’ (default 3.9).

Value

None

create_search_df

R Documentation

Create data.frame containing parameters for extraction and searching

Description

Use this to create an intermediate data frame object used as part of the search routine.

Usage

create_search_df(
  filename,
  precursormz,
  rt,
  rt_start,
  rt_end,
  masserror,
  minerror,
  ms2exp,
  isowidth
)

Arguments

`filename`	CHR scalar path to the mzml file
`precursormz`	NUM scalar for the mass-to-charge ratio to examine
`rt`	NUM scalar for the retention time centroid to examine
`rt_start`	NUM scalar for the retention time start point of the feature
`rt_end`	NUM scalar for the retention time end point of the feature
`masserror`	NUM scalar of the instrument mass error value in parts per million
`minerror`	NUM scalar of the minimum mass error value to use in absolute terms
`ms2exp`	NUM scalar type of the fragmentation experiment (e.g. MS1 or MS2)
`isowidth`	NUM scalar mass isolation width to use

Value

data.frame object collating provided values

create_search_ms

R Documentation

Generate uncertainty mass spectrum for MS1 and MS2 data

Description

Generate uncertainty mass spectrum for MS1 and MS2 data

Usage

create_search_ms(
  searchobj,
  correl = NULL,
  ph = NULL,
  freq = NULL,
  normfn = "sum",
  cormethod = "pearson"
)

Arguments

`searchobj`	list object generated from ‘get_search-object’
`correl`	correlation limit for ions to MS1
`ph`	peak height to select scans for generating mass spectrum
`freq`	observational frequency minimum for ions to use for generating mass spectrum
`normfn`	normalization function, options are “sum” or “mean”
`cormethod`	correlation function, default is “pearson”

Value

list object containing the ms1 uncertainty mass spectrum ‘ums1’, ms2 uncertainty mass spectrum ‘ums2’ and respective uncertainty mass spectrum parameters ‘ms1params’ and ‘ms2params’

data_dictionary

R Documentation

Create a data dictionary

Description

Get a list of tables and their defined columns with properties, including comments, suitable as a data dictionary from a connection object amenable to [odbc::dbListTables]. This function relies on [pragma_table_info].

Usage

data_dictionary(db_conn = con)

Arguments

db_conn

connection object (default:con)

Value

LIST of length equal to the number of tables in ‘con’ with attributes identifying which tables, if any, failed to render into the dictionary.

dataframe_match

R Documentation

Match multiple values in a database table

Description

Complex queries are sometimes necessary to match against multiple varied conditions across multiple items in a list or data frame. Call this function to apply vectorization to all items in ‘match_criteria’ and create a fully qualified SQL expression using [clause_where] and execute that query against the database connection in ‘db_conn’. Speed is not optimized during the call to clause where as each clause is built independently and joined together with “OR” statements.

Usage

dataframe_match(
  match_criteria,
  table_names,
  and_or = "AND",
  db_conn = con,
  log_ns = "db"
)

Arguments

`match_criteria`	LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’
`table_names`	CHR vector of tables to search
`and_or`	LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”)
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Details

This is intended for use with a data frame object

Value

data.frame of the matching database rows

dotprod

R Documentation

Calculate dot product

Description

Internal function: calculates the dot product between paired m/z and intensity values

Usage

dotprod(m1, i1, m2, i2, m = 1, n = 0.5)

Arguments

`m1, m2`	paired vectors containing measured m/z values
`i1, i2`	paired vectors containing measured intensity values
`m, n`	weighting values for mass (m) and intensity (n)

dt_color_by

R Documentation

Apply colors to DT objects by value in a column

Description

Adds a class to each node meeting the criteria defined elsewhere as project object ‘table_bg_classes’ as a list of colors with names matches values.

Usage

dt_color_by(names(DT_table_data), "color_by")

Arguments

`table_names`	CHR vector of the names going into a table
`look_for`	CHR vector of the column name to color by

Value

JS function to apply to a DT object by row

dt_formatted

R Documentation

Easily format multiple DT objects in a shiny project in the same manner

Description

This serves solely to reduce the amount of options fed into ‘DT::datatable’ by providing common defaults and transparent options. Parameters largely do exactly what they say and will create a list ‘column_defs’ suitable for use as ‘datatable(… options = list(columnDefs = column_defs)’. Leave NULL to ignore any aspect.

Usage

dt_formatted(
  dataframe,
  show_rownames = FALSE,
  hide_cols = NULL,
  center_cols = NULL,
  narrow_cols = NULL,
  narrow_col_width = "5%",
  medium_cols = NULL,
  medium_col_width = "10%",
  large_cols = NULL,
  large_col_width = "15%",
  truncate_cols = NULL,
  truncate_width = 20,
  date_cols = NULL,
  date_col_width = "10%",
  selection_mode = "single",
  callback = NULL,
  color_by_column = NULL,
  names_to = "title",
  filter_at = "top",
  chr_to_factor = TRUE,
  page_length = 10,
  page_length_menu = c(10, 25, 50),
  ...
)

Arguments

`dataframe`	data.frame to be converted to a DT::datatable object
`hide_cols`	CHR vector of column names to hide
`center_cols`	CHR vector of column names to center
`narrow_cols`	CHR vector of column names to make ‘narrow_col_width’ wide
`narrow_col_width`	CHR scalar defining column width (default: “5%”)
`medium_cols`	CHR vector of column names to make ‘medium_col_width’ wide
`medium_col_width`	CHR scalar defining column width (default: “10%”)
`large_cols`	CHR vector of column names to make ‘large_col_width’ wide
`large_col_width`	CHR scalar defining column width (default: “15%”)
`truncate_cols`	CHR vector of column names to truncate
`truncate_width`	INT scalar of the position at which to truncate
`date_cols`	CHR vector of column names identifying dates
`date_col_width`	CHR scalar defining column width (default: “10%”)
`selection_mode`	CHR scalar of the DT selection mode (default: “single”)
`callback`	JS custom callback to apply to the datatable widget
`color_by_column`	CHR scalar of the column name by which to color rows
`names_to`	CHR scalar of the name formatting modification to apply, as one of the options available in the ‘stringr’ package (default: “title” to apply ‘stringr::str_to_title’)
`filter_at`	CHR scalar of the position for the column filter as understood by ‘DT::datatable(…, filter = filter_at)’. (default: “top”)
`chr_to_factor`	BOOL scalar for whether or not to automatically convert character columns to factor columns (default: TRUE)
`…`	other named arguments to be passed to ‘DT::datatable’

Value

DT::datatable object formatted as requested

Note

Truncation applies a JS function to retain the underlying information as a hover tooltip and truncates using ellipses.

Column name formatting relies on being able to parse ‘names_to’ as a valid function of the form ’sprintf(“str_to_ recognized options include”lower”, “upper”, “title”, and “sentence”.

To apply a custom format, define these parameters as a list (e.g. “dt_format_options”) and pass it, along with your dataframe, as do.call(“dt_formatted”, c(dataframe = df, dt_format_options))

er_map

R Documentation

Create a simple entity relationship map

Description

This will poll the database connection and create an entity relationship map as a list directly from defined SQL statements used to build the table or view. For each table object it returns a list of length three containing the entity names that the table (1) ‘references’ (i.e. has a foreign key to), (2) is ‘referenced_by’ (i.e. is a foreign key for), and (3) views where it is ‘used_in_view’. These are names. This is intended for use as a mapping shortcut when ER Diagrams are unavailable, or for quick reference within a project, similarly to a dictionary relationship reference.

Usage

er_map(db_conn = con)

Arguments

db_conn

connection object, specifically of class “SQLiteConnection” but not strictly enforced

Details

SQL is generated from [pragma_table_def()] with argument ‘get_sql’ = TRUE and ignores entities whose names start with “sqlite”.

Value

nested LIST object describing the database entity connections

export_msp

R Documentation

Export to MSP

Description

The function exports an uncertainty mass spectrum into a NIST MS Search .msp file

Usage

export_msp(
  ms,
  file,
  precursor = "",
  name = "Exported Mass Spectrum",
  headerdata = c(),
  append = FALSE
)

Arguments

`ms`	uncertainty mass spectrum from ‘get_ums’ function
`file`	export .msp file to save the msp files
`precursor`	If available, the numeric precursor m/z for the designated mass spectrum
`name`	Text name to assign to the mass spectrum (not used in spectral searching)
`headerdata`	character string containing named values for additional data to put in the header
`append`	boolean (TRUE/FALSE) to append to .msp file (TRUE) or overwrite (FALSE)

extend_suspect_list

R Documentation

Extend the compounds and aliases tables

Description

Suspect lists are occasionally updated. To keep the current database up to date, run this function by pointing it to the updated or current suspect list. That suspect list should be one of (1) a file in either comma-separated-value (CSV) or a Microsoft Excel format (XLS or XLSX), (2) a data frame containing the new compounds in the standard format of the suspect list, or (3) a URL pointing to the suspect list.

Usage

extend_suspect_list(suspect_list, db_conn = con, retain_current = TRUE)

Arguments

`suspect_list`	CHR scalar pointing either to a file (CSV, XLS, or XLSX) or URL pointing to an XLSX file.
`db_conn`	connection object (default: con)
`retain_current`	LGL scalar of whether to retain the current list by attempting to match new entries to older ones, or to append all entries (default: TRUE)

Details

If ‘suspect_list’ does not contain one of the expected file extensions, it will be assumed to be a URL pointing to a Microsoft Excel file with the suspect list in the first spreadsheet. The file for that URL will be downloaded temporarily, read in as a data frame, and then removed.

Required columns for the compounds table are first pulled and all other columns are treated as aliases. If ‘retain_current’ is TRUE, entries in the “name” column will be matched against current aliases and the compound id will be persisted for that compound.

Value

None

extract.elements

R Documentation

Elemental Formula Functions Extract elements from formula

Description

Converts elemental formula into list of ‘elements’ and ‘counts’ corresponding to the composition

Usage

extract.elements(composition.str, remove.elements = c())

Arguments

`composition.str`	character string elemental formula
`remove.elements`	character vector containing elements to remove from

Value

list with ‘elements’ and ‘counts’

Examples

extract.elements("C2H5O")

extract.elements("C2H5ONa", remove.elements = c("Na", "Cl"))

flush_dir

R Documentation

Flush a directory with archive

Description

Clear a directory and archive those files if desired in any directory matching any pattern.

Usage

flush_dir("logs", ".txt")

flush_dir(directory = "logs")

Arguments

`archive`	LGL scalar on whether to archive current logs
`directory`	CHR scalar path to the directory to flush

Value

None, executes directory actions

None, removes files from a directory

fn_guide

R Documentation

View an index of help documentation in your browser

Description

View an index of help documentation in your browser

Usage

fn_guide()

Value

None

fn_help

R Documentation

Get function documentation for this project

Description

This function is analogous to “?”, “??”, and “help”. For now, this effort is distributed as a project instead of a package. This imposes certain limitations, particularly regarding function documentation. Use this function to see the documentation for functions in this project just as you would any installed package. The other limitation is that these help files will not populate directly as a pop up when using RStudio tab completion.

Usage

fn_help(fn_name)

Arguments

fn_name

Object or CHR string name of a function in this project.

Value

None, opens help file.

Note

This function will be deprecated if the project is moved to a package.

Examples

fn_help(fn_help)

format_html_id

R Documentation

Format a file name as an HTML element ID

Description

This is often useful to provide feedback to the user about the files they’ve provided to a shiny application in a more informative manner, as IDs produced here are suitable to build dynamic UI around. This can serve as the base ID for tooltips, additional information, icons, etc. and produce everything necessary in one place for any number of files.

Usage

format_html_id(filename)

Arguments

filename

CHR vector of file names

Value

CHR vector of the same size as filename

Examples

format_html_id(list.files())

format_list_of_names

R Documentation

Grammatically collapse a list of values

Description

Given a vector of arbitrary length that coerces properly to a human-readable character string, return it formatted as one of: “one”, “one and two”, or “one, two, …, and three” using glue::glue. This is functionally the same as a static version of [glue::glue_collapse] with parameters sep = “,”, width = Inf, and last = “, and”.

Usage

format_list_of_names(namelist, add_quotes = FALSE)

Arguments

`namelist`	vector of values to format
`add_quotes`	LGL scalar of whether to enclose individual values in quotation marks

Value

CHR vector of length one

Examples

format_list_of_names("test")
format_list_of_names(c("apples", "bananas"))
format_list_of_names(c(1:3))
format_list_of_names(seq.Date(Sys.Date(), Sys.Date() + 3, by = 1))

formulalize

R Documentation

Generate standard chemical formula notation

Description

Generate standard chemical formula notation

Usage

formulalize(formula)

Arguments

formula

CHR string of an elemental formula

Value

string with a standard ordered formula

Examples


formula <- "C10H15S1O3"
formulalize(formula)

full_import

R Documentation

Import one or more files from the NIST Method Reporting Tool for NTA

Description

This function serves as a single entry point for data imports. It is predicated upon the NIST import routine defined here and relies on several assumptions. It is intended ONLY as an interactive manner of importing n data files from the NIST Method Reporting Tool for NTA (MRT NTA).

Usage

full_import(
  import_object = NULL,
  file_name = NULL,
  db_conn = con,
  exclude_missing_required = FALSE,
  stop_if_missing_required = TRUE,
  include_if_missing_recommended = FALSE,
  stop_if_missing_recommended = TRUE,
  ignore_extra = TRUE,
  ignore_insert_conflicts = TRUE,
  requirements_obj = "import_requirements",
  method_in = "massspectrometry",
  ms_methods_table = "ms_methods",
  instrument_properties_table = "instrument_properties",
  sample_info_in = "sample",
  sample_table = "samples",
  contributor_in = "data_generator",
  contributors_table = "contributors",
  sample_aliases = NULL,
  generation_type = NULL,
  generation_type_norm_table = ref_table_from_map(sample_table, "generation_type"),
  mass_spec_in = "massspectrometry",
  chrom_spec_in = "chromatography",
  mobile_phases_in = "chromatography",
  qc_method_in = "qcmethod",
  qc_method_table = "qc_methods",
  qc_method_norm_table = ref_table_from_map(qc_method_table, "name"),
  qc_references_in = "source",
  qc_data_in = "qc",
  qc_data_table = "qc_data",
  carrier_mix_names = NULL,
  id_mix_by = "^mp*[0-9]+",
  mix_collection_table = "carrier_mix_collections",
  mobile_phase_props = list(in_item = "chromatography", db_table = "mobile_phases", props
    = c(flow = "flow", flow_units = "flowunits", duration = "duration", duration_units =
    "durationunits")),
  carrier_props = list(db_table = "carrier_mixes", norm_by =
    ref_table_from_map("carrier_mixes", "component"), alias_in = "carrier_aliases", props
    = c(id_by = "solvent", fraction_by = "fraction")),
  additive_props = list(db_table = "carrier_additives", norm_by =
    ref_table_from_map("carrier_additives", "component"), alias_in = "additive_aliases",
    props = c(id_by = "add$", amount_by = "_amount", units_by = "_units")),
  exclude_values = c("none", "", NA),
  peaks_in = "peak",
  peaks_table = "peaks",
  software_timestamp = NULL,
  software_settings_in = "msconvertsettings",
  ms_data_in = "msdata",
  ms_data_table = "ms_data",
  unpack_spectra = FALSE,
  unpack_format = c("separated", "zipped"),
  ms_spectra_table = "ms_spectra",
  linkage_table = "conversion_software_peaks_linkage",
  settings_table = "conversion_software_settings",
  as_date_format = "%Y-%m-%d %H:%M:%S",
  format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"),
  min_datetime = "2000-01-01 00:00:00",
  fragments_in = "annotation",
  fragments_table = "annotated_fragments",
  fragments_sources_table = "fragment_sources",
  fragments_norm_table = "norm_fragments",
  citation_info_in = "fragment_citation",
  inspection_info_in = "fragment_inspections",
  inspection_table = "fragment_inspections",
  generate_missing_aliases = TRUE,
  fragment_aliases_in = "fragment_aliases",
  fragment_aliases_table = "fragment_aliases",
  fragment_alias_type_norm_table = ref_table_from_map(fragment_aliases_table,
    "alias_type"),
  inchi_prefix = "InChI=1S/",
  rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"),
  rdkit_ns = "rdk",
  rdkit_make_if_not = TRUE,
  rdkit_aliases = c("inchi", "inchikey"),
  mol_to_prefix = "MolTo",
  mol_from_prefix = "MolFrom",
  type = "smiles",
  compounds_in = "compounddata",
  compounds_table = "compounds",
  compound_category = NULL,
  compound_category_table = "compound_categories",
  compound_aliases_in = "compound_aliases",
  compound_aliases_table = "compound_aliases",
  compound_alias_type_norm_table = ref_table_from_map(compound_aliases_table,
    "alias_type"),
  fuzzy = FALSE,
  case_sensitive = TRUE,
  ensure_unique = TRUE,
  require_all = FALSE,
  import_map = IMPORT_MAP,
  log_ns = "db"
)

Arguments

`import_object`	nested LIST object of JSON data to import; this import routine was built around output from the NTA MRT (default: NULL) - note you may supply either import object or file_name
`file_name`	external file in JSON format of data to import; this import routine was built around output from the NTA MRT (default: NULL) - note you may supply either import object or file_name
`db_conn`	connection object (default: con)
`exclude_missing_required`	LGL scalar of whether or not to skip imports missing required information (default: FALSE); if set to TRUE, this will override the setting for ‘stop_if_missing_required’ and the import will continue with logging messages for which files were incomplete
`stop_if_missing_required`	LGL scalar of whether or not to to stop the import routine if a file is missing required information (default: TRUE)
`include_if_missing_recommended`	LGL scalar of whether or not to include imports missing recommended information (default: FALSE)
`stop_if_missing_recommended`	LGL scalar of whether or not to to stop the import routine if a file is missing recommended information (default: TRUE)
`ignore_extra`	LGL scalar of whether to ignore extraneous import elements or stop the import process (default: TRUE)
`ignore_insert_conflicts`	LGL scalar of whether to ignore insert conflicts during the qc methods and qc data import steps (default: TRUE)
`requirements_obj`	CHR scalar of the name of an R object holding import requirements; this is a convenience shorthand to prevent multiple imports from parameter ‘file_name’ (default: “import_requirements”)
`method_in`	CHR scalar name of the ‘obj’ list containing method information
`ms_methods_table`	CHR scalar name of the database table containing method information
`instrument_properties_table`	CHR scalar name of the database table holding instrument property information for a given method (default: “instrument_properties”)
`sample_info_in`	CHR scalar name of the element within ‘import_object’ containing samples information
`sample_table`	CHR scalar name of the database table holding sample information (default: “samples”)
`contributor_in`	CHR scalar name of the element within ‘import_object[[sample_info_in]]’ containing contributor information (default: “data_generator”)
`contributors_table`	CHR scalar name of the database table holding contributor information (default: “contributors”)
`sample_aliases`	named CHR vector of aliases with names matching the alias, and values of the alias reference e.g. c(“ACU1234” = “NIST Biorepository GUAID”) which can be virutally any reference text; it is recommended that the reference be to a resolver service if connecting with external data sources (default: NULL)
`generation_type`	CHR scalar of the type of data generated for this sample (e.g. “empirical” or “in silico”). The default (NULL) will assign based on ‘generation_type_default’; any other value will override the default value and be checked against values in ‘geneation_type_norm_table’
`generation_type_norm_table`	CHR scalar name of the database table normalizing sample generation type (default: “empirical”)
`mass_spec_in`	CHR scalar name of the element in ‘obj’ holding mass spectrometry information (default: “massspectrometry”)
`chrom_spec_in`	CHR scalar name of the element in ‘obj’ holding chromatographic information (default: “chromatography”)
`mobile_phases_in`	CHR scalar name of the database table holding mobile phase and chromatographic information (default: “chromatography”)
`qc_method_in`	CHR scalar name of the import object element containing QC method information (default: “qcmethod”)
`qc_method_table`	CHR scalar of the database table name holding QC method check information (default: “qc_methods”)
`qc_method_norm_table`	CHR scalar name of the database table normalizing QC methods type (default: “norm_qc_methods_name”)
`qc_references_in`	CHR scalar of the name in ‘obj[[qc_method_in]]’ that contains the reference or citation for the QC protocol (default: “source”)
`carrier_mix_names`	CHR vector (optional) of carrier mix collection names to assign, the length of which should equal 1 or the length of discrete carrier mixtures; the default, NULL, will automatically assign names as a function of the method and sample id.
`id_mix_by`	regex CHR to identify mobile phase mixtures (default: “^mp*[0-9]+” matches the generated mixture names)
`mix_collection_table`	CHR scalar name of the mix collections table (default: “carrier_mix_collections”)
`mobile_phase_props`	LIST object describing how to import the mobile phase table containing: in_item: CHR scalar name of the ‘obj’ name containing chromatographic information (default: “chromatography”); db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); props: named CHR vector of name mappings with names equal to database columns in ‘mobile_phase_props$db_table' and values matching regex to match names in 'obj[[mobile_phase_props$in_item]]’
`carrier_props`	LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_carriers”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “carrier_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘carrier_props$db_table' and values matching regex to match names in 'obj[[mobile_phase_props$in_item]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate a carrier (e.g. “solvent”)
`additive_props`	LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_additives”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “additive_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘additive_props$db_table' and values matching regex to match names in 'obj[[mobile_phase_props$in_item]]’ ‘obj[[mobile_phase_props$in_item]][[mobile_phase_props$db_table]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate an additive (e.g. “add$”)
`exclude_values`	CHR vector indicating which values to ignore in ‘obj’ (default: c(“none”, ““, NA))
`peaks_in`	CHR scalar name of the element within ‘import_object’ containing peak information
`peaks_table`	CHR scalar name of the database table holding sample information (default: “samples”)
`ms_data_in`	CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”)
`ms_data_table`	CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)
`unpack_spectra`	LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE)
`unpack_format`	CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”
`ms_spectra_table`	CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)
`fragments_in`	CHR scalar name of the ‘obj’ component holding annotated fragment information (default: “annotation”)
`fragments_table`	CHR scalar name of the database table holding annotated fragment information (default: “annotated_fragments”)
`fragments_sources_table`	CHR scalar name of the database table holding fragment source (e.g. generation) information (default: “fragment_sources”)
`fragments_norm_table`	CHR scalar name of the database table holding normalized fragment identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragments_table’)
`citation_info_in`	CHR scalar name of the ‘obj’ component holding fragment citation information (default: “fragment_citation”)
`inspection_info_in`	CHR scalar name of the ‘obj’ component holding fragment inspection information (default: “fragment_inspections”)
`inspection_table`	CHR scalar name of the database table holding fragment inspection information (default: “fragment_inspections”)
`generate_missing_aliases`	LGL scalar determining whether or not to generate machine readable expressions (e.g. InChI) for fragment aliases from RDKit (requires RDKit activation; default: FALSE); see formals list for [add_rdkit_aliases]
`fragment_aliases_in`	CHR scalar name of the ‘obj’ component holding fragment aliases (default: “fragment_aliases”)
`fragment_aliases_table`	CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”)
`fragment_alias_type_norm_table`	CHR scalar name of the alias reference normalization table, by default the return of `ref_table_from_map(fragment_aliases_table, “alias_type”)`
`rdkit_ref`	CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)
`mol_to_prefix`	CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”)
`mol_from_prefix`	CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”)
`type`	The type of chemical structure notation (default: SMILES)
`compounds_in`	CHR scalar name in ‘obj’ holding compound data (default: “compounddata”)
`compounds_table`	CHR scalar name the database table holding compound data (default: “compounds”)
`compound_category`	CHR or INT scalar of the compound category (either a direct ID or a matching category label in ‘compound_category_table’) (default: NULL)
`compound_category_table`	CHR scalar name the database table holding normalized compound categories (default: “compound_categories”)
`compound_aliases_in`	CHR scalar name of where compound aliases are located within the import (default: “compound_aliases”), passed to [resolve_compounds] as “norm_alias_table”
`compound_aliases_table`	CHR scalar name of the alias reference table to use when assigning compound aliases (default: “compound_aliases”) passed to [resolve_compounds] as “compounds_table”
`compound_alias_type_norm_table`	CHR scalar name of the alias reference normalization table, by default the return of `ref_table_from_map(compound_aliases_table, “alias_type”)`
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`ensure_unique`	LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)
`require_all`	LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Details

Import files should be in JSON format as created by the MRT NTA. Examples are provided in the “example” directory of the project.

Defaults for this release are set throughout as of the latest database schema, but left here as arguments in case those should change, or slight changes are made to column and table names.

Value

Console logging if enabled and interactive prompts when user intervention is required. There is no formal return as it executes database actions.

Note

Many calls within this function are executed as do.call with a filtered argument list based on the names of formals for the called function. Several arguments to those functions are also left as the defaults set there; names must match exactly to be passed in this manner. See the list of inherited parameters.

gather_qc

R Documentation

Quality Control Check of Import Data

Description

Performs the quality control check on the imported data from the peak gather function.

Usage

gather_qc(
  gather_peak,
  exactmasses,
  exactmasschart,
  ms1range = c(0.5, 3),
  ms1isomatchlimit = 0.5,
  minerror = 0.002,
  max_correl = 0.8,
  correl_bin = 0.1,
  max_ph = 10,
  ph_bin = 1,
  max_freq = 10,
  freq_bin = 1,
  min_n_peaks = 3,
  cormethod = "pearson"
)

Arguments

`gather_peak`	peak object generated from ‘peak_gather_json’ function
`exactmasses`	exactmasses list
`ms1range`	2-component vector containing stating the range to evaluate the isotopic pattern of the precursor ion, from mass - ms1range[1] to mass + ms1range[2]
`ms1isomatchlimit`	the reverse dot product minimum score for the isotopic pattern match
`minerror`	the minimum mass error (in Da) allowable for the instrument
`max_correl`	[TODO PLACEHOLDER]
`correl_bin`	[TODO PLACEHOLDER]
`max_ph`	[TODO PLACEHOLDER]
`ph_bin`	[TODO PLACEHOLDER]
`max_freq`	[TODO PLACEHOLDER]
`freq_bin`	[TODO PLACEHOLDER]
`min_n_peaks`	[TODO PLACEHOLDER]
`cormethod`	[TODO PLACEHOLDER]

Value

nested list of quality control check results

get_annotated_fragments

R Documentation

Get all annotated fragments have matching masses

Description

Get all annotated fragments have matching masses

Usage

get_annotated_fragments(con, fragmentions, masserror, minerror)

Arguments

`con`	SQLite database connection
`fragmentions`	numeric vector containing m/z values for fragments to search
`masserror`	numeric relative mass error (ppm)
`minerror`	numeric minimum mass error (Da)

Value

data.frame of mass spectral data

get_component

R Documentation

Resolve components from a list or named vector

Description

Call this to pull a component named obj_component from a list or named vector provided as obj and optionally use [tack_on] to append to it. This is intended to ease the process of pulling specific components from a list for further treatment in the import process by isolating that component.

Usage

get_component(obj, obj_component, silence = TRUE, log_ns = "global", ...)

Arguments

`obj`	LIST or NAMED vector in which to find `obj_component`
`obj_component`	CHR vector of named elements to find in `obj`
`silence`	LGL scalar indicating whether to silence recursive messages, which may be the same for each element of `obj` (default: TRUE)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)
`…`	Additional arguments passed to/from the ellipsis parameter of calling functions. If named, names are preserved.

Details

This is similar in scope to [purrr::pluck] in many regards, but always returns items with names, and will search an entire list structure, including data frames, to return all values associated with that name in individual elements.

Value

LIST object containing the elements of obj

Note

This is a recursive function.

If ellipsis arguments are provided, they will be appended to each identified component via [tack_on]. Use with caution, but this can be useful for appending common data to an entire list (e.g. a datetime stamp for logging processing time or a processor name, human or software).

Examples

get_component(list(a = letters, b = 1:10), "a")
get_component(list(ex = list(a = letters, b = 1:10), ex2 = list(c = 1:5, a = LETTERS)), "a")
get_component(list(a = letters, b = 1:10), "a", c = 1:5)

get_compound_fragments

R Documentation

Get all fragments associated with compounds

Description

Get all fragments associated with compounds

Usage

get_compound_fragments(con, fragmentions, masserror, minerror)

Arguments

`con`	SQLite database connection
`fragmentions`	numeric vector containing m/z values for fragments to search
`masserror`	numeric relative mass error (ppm)
`minerror`	numeric minimum mass error (Da)

Value

data.frame object describing known fragments in the database with known compound and peak references attached

get_compoundid

R Documentation

Get compound ID and name for specific peaks

Description

Get compound ID and name for specific peaks

Usage

get_compoundid(con, peakid)

Arguments

`con`	SQLite database connection
`peakid`	integer vector of primary keys for peaks table

Value

table of compound IDs and names

get_fkpk_relationships

R Documentation

Extract foreign key relationships from a schema

Description

This convenience function is part of the automatic generation of SQL commands building views and triggers from a defined schema. Its sole purpose is as a pre-pass extraction of foreign key relationships between tables from an object created by [db_map], which in turn relies on specific formatting in the schema SQL definitions.

Usage

get_fkpk_relationships(er_map(db_conn = con))

Arguments

`db_map`	LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’
`references_in`	CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map])
`dictionary`	LIST object containing the schema dictionary produced by [data_dictionary] fully describing table entities

Value

LIST of data frames with one element for each table with a foreign key defined

Note

This only functions for list objects formatted correctly. That is, each entry in [db_map] must contain an element with a name matching that provided to ‘references_in’ which contains a character vector formatted as “table1 REFERENCES table2(pk_column)”.

get_massadj

R Documentation

Calculate the mass adjustment for a specific adduct

Description

Calculate the mass adjustment for a specific adduct

Usage

get_massadj(adduct = "+H", exactmasses = NULL, db_conn = "con")

Arguments

`adduct`	character string containing the + or - and the elemental formula of the adduct, note “2H” should be represented as “H2”
`exactmasses`	list of exact masses of elements, NULL pulls from the database
`db_conn`	database connection object, either a CHR scalar name (default: “con”) or the connection object itself (preferred)

Value

NUM scalar of the mass adjustment value

get_msconvert_data

R Documentation

Extract msconvert metadata

Description

Extracts relevant Proteowizard MSConvert metadata from mzml file. Used for ‘peak_gather_json’ function

Usage

get_msconvert_data(mzml)

Arguments

mzml

list of msdata from ‘mzMLtoR’ function

Value

list of msconvert parameters

get_msdata

R Documentation

Get all mass spectral data within the database

Description

Get all mass spectral data within the database

Usage

get_msdata(con)

Arguments

con

SQLite database connection

Value

data.frame of mass spectral data

get_msdata_compound

R Documentation

Get all mass spectral data for a specific compound

Description

Get all mass spectral data for a specific compound

Usage

get_msdata_compound(con, 15)

Arguments

`con`	SQLite database connection
`compoundid`	integer compound ID value

Value

data.frame of mass spectral data

get_msdata_peakid

R Documentation

Get all mass spectral data for a specific peak id

Description

Get all mass spectral data for a specific peak id

Usage

get_msdata_peakid(con, 15)

Arguments

`con`	SQLite database connection
`peakid`	integer vector of peak ids

Value

data.frame of mass spectral data

get_msdata_precursors

R Documentation

Get all mass spectral data with a specific precursor ion

Description

Get all mass spectral data with a specific precursor ion

Usage

get_msdata_precursors(con, precursorion, masserror, minerror)

Arguments

`con`	SQLite database connection
`precursorion`	numeric precursor ion m/z value
`masserror`	numeric relative mass error (ppm)
`minerror`	numeric minimum mass error (Da)

Value

data.frame of mass spectral data

get_opt_params

R Documentation

Get optimized uncertainty mass spectra parameters for a peak

Description

Get optimized uncertainty mass spectra parameters for a peak

Usage

get_opt_params(con, peak_ids)

Arguments

`con`	SQLite database connection
`peak_ids`	integer vector of primary keys for peaks table

Value

data.frame object of available optimized search parameters

get_peak_fragments

R Documentation

Get annotated fragments for a specific peak

Description

Get annotated fragments for a specific peak

Usage

get_peak_fragments(con, peakid)

Arguments

`con`	SQLite database connection
`peakid`	integer vector of primary keys for peaks table

Value

data.frame of annotated fragments

get_peak_precursor

R Documentation

Get precursor ion m/z for a specific peak

Description

Get precursor ion m/z for a specific peak

Usage

get_peak_precursor(con, peakid)

Arguments

`con`	SQLite database connection
`peakid`	integer primary key for peaks table

Value

numeric value of precursor ion m/z value

get_sample_class

R Documentation

Get sample class information for specific peaks

Description

Get sample class information for specific peaks

Usage

get_sample_class(con, peakid)

Arguments

`con`	SQLite database connection
`peakid`	integer vector of primary keys for peaks table

Value

data.frame object of sample classes associated with a given peak

get_search_object

R Documentation

Generate msdata object from input peak data

Description

Generate msdata object from input peak data

Usage

get_search_object(searchmzml, zoom = c(1, 4))

Arguments

`searchmzml`	mzml with searching dataframe from ‘getmzML’ function
`zoom`	vector length of 2 containing +/- the area around the MS1 precursor ion to collect data.

Value

LIST object of data.frames include MS1 and MS2 analytical data, and the search parameters used to generate them

get_suspectlist

R Documentation

Get the current NIST PFAS suspect list.

Description

Downloads the current NIST suspect list of PFAS from the NIST Public Data Repository to the current project directory.

Usage

get_suspectlist(
  destfile = file.path("R", "compoundlist", "suspectlist.xlsx"),
  url_file = file.path("config", "suspectlist_url.txt"),
  default_url = SUS_LIST_URL,
  save_local = FALSE
)

Arguments

`destfile`	CHR scalar file.path of location to save the downloaded file
`url_file`	CHR scalar file.path of where the text file containing the download URL for the NIST PFAS Suspect List
`save_local`	LGL scalar of whether to retain an R expression in the current environment after download

Value

none

Examples

get_suspectlist()

get_ums

R Documentation

Generate consensus mass spectrum

Description

The function calculates the uncertainty mass spectrum for a single peak table based on specific settings described in https://doi.org/10.1021/jasms.0c00423

Usage

get_ums(
  peaktable,
  correl = NULL,
  ph = NULL,
  freq = NULL,
  normfn = "sum",
  cormethod = "pearson"
)

Arguments

`peaktable`	result of the ‘create_peak_table_ms1’ or ‘create_peak_table_ms1’ function
`correl`	Minimum correlation coefficient between the target ions and the base ion intensity of the targeted m/z to be included in the mass spectrum
`ph`	Minimum chromatographic peak height from which to extract MS2 data for the mass spectrum
`freq`	minimum observational frequency of the target ions to be included in the mass spectrum
`normfn`	the normalization function typically “mean” or “sum” for normalizing the intensity values
`cormethod`	the correlation method used for calculating the correlation, see ‘cor’ function for methods

Value

nested list of dataframes containing all MS1 and MS2 data for the peak

get_uniques

R Documentation

Get unique components of a nested list

Description

There are times when the concept of “samples” and “grouped data” may become intertwined and difficult to parse. The import process is one of those times depending on how the import file is generated. This function takes a nested list and compares a specific aspect of it, grouping the output based on that aspect and returning its characteristics.

Usage

get_uniques(objects, aspect)

Arguments

`objects`	LIST object
`aspect`	CHR scalar name of the aspect from which to generate unique combinations

Details

For example, the standard NIST import includes the “sample” aspect, which may be identical for multiple data import files. This provides a unique listing of those sample characteristics to reduce data manipulation and storage, and minimize database “chatter” during read/write. It returns a set of unique characteristics in a list, with appended characteristics “import_object” with the index number and object name of entries matching those characteristics.

This is largely superceded by later developments to database operations that first check for a table primary key id given a comprehensive list of column values in those tables where only a single record should contain those values (e.g. a complete unique case, enforced or unenforced).

Value

Unnamed LIST of length equaling the number of unique combinations with their values and indices

Examples

tmp <- list(list(a = 1:10, b = 1:10), list(a = 1:5, b = 1:5), list(a = 1:10, b = 1:5))
get_uniques(tmp)

getcharge

R Documentation

Get polarity of a ms scan within mzML object

Description

Get polarity of a ms scan within mzML object

Usage

getcharge(mzml, i)

Arguments

`mzml`	list mzML object generated from ‘mzMLtoR’ function
`i`	integer scan number

Value

integer representing scan polarity (either 1 (positive) or -1 (negative))

getmslevel

R Documentation

Get MS Level of a ms scan within mzML object

Description

Get MS Level of a ms scan within mzML object

Usage

getmslevel(mzml, i)

Arguments

`mzml`	list mzML object generated from ‘mzMLtoR’ function
`i`	integer scan number

Value

integer representing the MS Level (1, 2, … n)

getmzML

R Documentation

Brings raw data file into environment

Description

If filename is not extension .mzML, then converts the raw file

Usage

getmzML(
  search_df,
  CONVERT = FALSE,
  CHECKCONVERT = TRUE,
  is_waters = FALSE,
  lockmass = NULL,
  lockmasswidth = NULL,
  correct = FALSE
)

Arguments

`search_df`	data.frame output of [create_search_df] or file name of a raw file to be converted
`CONVERT`	LGL scalar of whether or not to convert the search_df filename (default FALSE)
`CHECKCONVERT`	LGL scalar of whether or not to verify the conversion format (default TRUE)

Value

LIST value of the trimmed mzML file matching search criteria

getprecursor

R Documentation

Get precursor ion of a ms scan within mzML object

Description

Get precursor ion of a ms scan within mzML object

Usage

getprecursor(mzml, i)

Arguments

`mzml`	list mzML object generated from ‘mzMLtoR’ function
`i`	integer scan number

Value

numeric designating the precursor ion (or middle of the scan range for SWATCH or DIA), returns NULL if no precursor was selected

gettime

R Documentation

Get time of a ms scan within mzML object

Description

Get time of a ms scan within mzML object

Usage

gettime(mzml, i)

Arguments

`mzml`	list mzML object generated from ‘mzMLtoR’ function
`i`	integer scan number

Value

numeric of the scan time

has_missing_elements

R Documentation

Simple check for if an object is empty

Description

Checks for empty vectors, a blank character string, NULL, and NA values. If fed a list object, returns TRUE if any element is is the “empty” set. For data.frames checks that nrow is not 0. [rlang:::is_empty] only checks for length 0.

Usage

has_missing_elements(x, logging = TRUE)

Arguments

`x`	Object to be checked
`logging`	LGL scalar of whether or not to make log messages (default: TRUE)

Value

LGL scalar of whether x is empty

Note

Reminder that vectors created with NULL values will be automatically reduced by R.

Examples

has_missing_elements("a")
# FALSE
has_missing_elements(c(NULL, 1:5))
# FALSE
has_missing_elements(list(NULL, 1:5))
# TRUE
has_missing_elements(data.frame(a = character(0)))
# TRUE

is_elemental_match

R Documentation

Checks if two elemental formulas match

Description

Checks if two elemental formulas match

Usage

is_elemental_match(testformula, trueformula)

Arguments

`testformula`	character string of elemental formula to test
`trueformula`	character string of elemental formula to check against (truth)

Value

logical

is_elemental_subset

R Documentation

Check if elemental formula is a subset of another formula

Description

Check if elemental formula is a subset of another formula

Usage

is_elemental_subset(fragmentformula, parentformula)

Arguments

`fragmentformula`	character string of elemental formula subset to test
`parentformula`	character string of elemental formula to check for subset

Value

logical

Examples

is_elemental_subset("C2H2", "C2H5O")

is_elemental_subset("C2H2", "C2H1O")

isotopic_distribution

R Documentation

Isotopic distribution functions Generate isotopic distribution mass spectrum of elemental formula

Description

Isotopic distribution functions Generate isotopic distribution mass spectrum of elemental formula

Usage

isotopic_distribution(
  elementalformula,
  exactmasschart,
  remove.elements = c(),
  max.dist = 3,
  min.int = 0.001,
  charge = "neutral"
)

Arguments

`elementalformula`	character string of elemental formula to simulate isotopic pattern
`exactmasschart`	exact mass chart generated from function create_exactmasschart
`remove.elements`	character vector of elements to remove from elemental formula
`max.dist`	numeric maximum mass distance (in Da) from exact mass to include in simulated isotopic pattern
`min.int`	numeric minimum relative intensity (maximum = 1, minimum = 0) to include in simulated isotopic pattern
`charge`	character string for the charge state of the simulated isotopic pattern, options are ‘neutral’, ‘positive’, and ‘negative’

Value

data frame containing mz and int values of mass spectrum

lockmass_remove

R Documentation

Remove lockmass scan from mzml object

Description

For Waters instruments only, identifies the scans that are due to a lock mass scan and removes them for easier processing.

Usage

lockmass_remove(
  mzml,
  lockmass = NULL,
  lockmasswidth = NULL,
  correct = FALSE,
  approach = "baseion"
)

Arguments

`mzml`	mzML object generated from mzMLtoR() function
`lockmass`	m/z value of the lockmass to remove
`lockmasswidth`	m/z value for the half-window of the lockmass scan
`correct`	logical if the subsequent spectra should be corrected

Value

A copy of the object provided to ‘mzml’ with the lock mass removed.

log_as_dataframe

R Documentation

Pull a log file into an R object

Description

Log messages generated by logger with anything other than the standard formatting options can have multiple formatting tags to display in the R console. These “junk up” any resulting object. If you want to read it directly in the console and preserve formatting, call [read_log] with the default ‘as_object’ argument (FALSE). For deeper inspection, a data frame works well, provided the formatting matches up. In ‘env_logger.R’ there is an option to set formatting layouts. In addition to setting formatting layouts, generate regex strings matching the desired format - ‘log_remove_color’ will remove the colors (the majority should be caught by the string provided as the default in this package) and ‘log_split_column’ will split the lines in your logging file into discrete categories named by ‘df_titles’.

Usage

log_as_dataframe("log.txt")

Arguments

`file`	CHR scalar file path to a log file (default NULL is translated as “log.txt”)
`last_n`	INT scalar of the last ‘n’ log entries to read.
`condense`	LGL scalar of whether to nest the resulting tibble by the nearest second.
`regex_remove`	CHR scalar regular expression of characters to REMOVE from log messages via [stringr::str_remove_all]
`regex_split`	CHR scalar regular expression of characters used to split the log entry into columns from log messages via [tidyr::separate]
`df_titles`	CHR vector of headers for the resulting data frame, passed as the “into” argument of [tidyr::separate]

Details

This will attempt to fail gracefully.

Value

tibble with one row per log entry (or groups)

Note

If “time” is included and ‘condense’ == TRUE, the log messages in the resulting tibble will nested to the nearest second.

If “status” is included it will be a factor with levels including the valid statuses from logger (see [logger::log_levels]).

Use care to develop ‘regex_split’ in order to split the log entries into the appropriate columns as defined by ‘df_titles’; extra values will be merged into the messages column.

log_fn

R Documentation

Simple logging convenience

Description

Conveniently add a log message at the trace level. Typically this would be called twice bookending the body of a function along the lines of “Start fn()” and “End fn()” when calling a function. This can help provided traceability to deeply nested function calls within a log.

Usage

fn <- function() {log_fn("start"); 1+1; log_fn("end")}
fn()

Arguments

`status`	CHR scalar to prefix the log message; will be coerced to sentence case. Typically “start” or “end” but anything is accepted (default “start”).
`log_ns`	CHR scalar of the logger namespace to use (default NA_character_)
`level`	CHR scalar of the logging level to be passed to [log_it] (default “trace”)

Value

None, hands logging messages to [log_it]

log_it

R Documentation

Conveniently log a message to the console

Description

Use this to log messages of arbitrary level and message. It works best with [logger] but will also print directly to the console to support setups where package [logger] may not be available or custom log levels are desired.

Usage

log_it(
  log_level,
  msg = NULL,
  log_ns = NULL,
  reset_logger_settings = FALSE,
  reload_all = FALSE,
  logger_settings = file.path("config", "env_logger.R"),
  add_unknown_ns = FALSE,
  clone_settings_from = NULL
)

Arguments

`log_level`	CHR scalar of the level at which to log a given statement. If using the [logger] package, must match one of [logger:::log_levels]
`msg`	CHR scalar of the message to accompany the log.
`log_ns`	CHR scalar of the logging namespace to use during execution (default: NULL prints to the global logging namespace)
`reset_logger_settings`	LGL scalar indicating whether or not to refresh the logger settings using the file identified in `logger_settings` (default: FALSE)
`reload_all`	LGL scalar indicating whether to, during `reset_logger_settings`, to reload the R environment configuration file
`logger_settings`	CHR file path to the file containing logger settings (default: file.path(“config”, “env_logger.R”))
`add_unknown_ns`	LGL scalar indicating whether or not to add a new namespace if `log_ns` is not defined in `logger_settings` (default: FALSE)
`clone_settings_from`	CHR scalar indicating

Details

When using [logger], create settings for each namespace in file config/env_logger.R as a list (see examples there) and make sure it is sourced. If using with [logger] and “file” or “both” is selected for the namespace LOGGING[[log_ns]]$to</code> parameter in <code>env_logger.R</code> logs will be written to disk at the file defined in <code>LOGGING[[log_ns]]$file as well as the console.

Value

Adds to the logger file (if enabled) and/or prints to the console if enabled. See

Examples

log_it("test", "a test message")
test_log <- function() {
  log_it("success", "a success message")
  log_it("warn", "a warning message")
}
test_log()
# Try it with and without logger loaded.

make_acronym

R Documentation

Simple acronym generator

Description

At times it is useful for display purposes to generate acronyms for longer bits of text. This naively generates those by extracting the first letter as upper case from each word in text elements.

Usage

make_acronym(text)

Arguments

text

CHR vector of the text to acronym-ize

Value

CHR vector of length equal to that of text with the acronym

Examples

make_acronym("test me")
make_acronym(paste("department of ", c("commerce", "energy", "defense")))

make_install_code

R Documentation

Convenience function to set a new installation code

Description

Convenience function to set a new installation code

Usage

make_install_code(db_conn = con, new_name = NULL, log_ns = "db")

Arguments

`db_conn`	connection object (default “con”)
`new_name`	CHR scalar of the human readable name of the installation (e.g. your project name) (default: NULL)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None

make_requirements

R Documentation

Make import requirements file

Description

Importing from the NIST contribution spreadsheet requires a certain format. In order to proceed smoothly, that format must be verified for gross integrity with regard to expectations about shape (i.e. class), names of elements, and whether they are required for import. This function creates a JSON expression of the expected import structure and saves it to the project directory.

Usage

make_requirements(
  example_import,
  file_name = "import_requirements.json",
  not_required = c("annotation", "chromatography", "opt_ums_params"),
  archive = TRUE,
  retain_in_R = TRUE,
  log_ns = "db"
)

Arguments

`example_import`	CHR or LIST object containing an example of the expected import format; this should include only a SINGLE compound contribution file
`file_name`	CHR scalar indicating a file name to save the resulting name or search on any existing file to archive if ‘archive’ = TRUE (default: “import_requirements.json”)
`not_required`	CHR vector matching element names of ‘example_import’ which are not required; all others will be assumed to be required
`archive`	LGL indicating whether or not to archive an existing file matching ‘file_name’ by suffixing the file name with current date. Only one archive per date is supported; if a file already exists, it will be deleted. (default: TRUE)
`retain_in_R`	LGL indicating whether to retain a local copy of the requirements file generated (default: TRUE)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Details

Either an existing JSON expression or an R list object may be used for ‘example_import’. If it is a character scalar, it will be assumed to be a file name, which will be loaded based on file extension. That file must be a JSON parseable text file, though raw text is acceptable.

An example file is located in the project directory at “example/PFAC30PAR_PFCA1_mzML_cmpd2627.JSON”

As with any file manipulation, use care with ‘file_name’.

Value

writes a file to the project directory (based on the found location of ‘file_name’) with the JSON structure

manage_connection

R Documentation

Check for, and optionally remove, a database connection object

Description

This function seeks to abstract connection management objects to a degree. It seeks to streamline the process of connecting and disconnecting existing connections as defined by function parameters. This release has not been tested extensively with drivers other than SQLite.

Usage

manage_connection("test.sqlite", conn_name = "test_con")

Arguments

`db`	CHR scalar name of the database to check, defaults to the name supplied in config/env.R (default: session variable DB_NAME)
`drv_pack`	CHR scalar of the package used to connect to this database (default: session variable DB_DRIVER)
`conn_class`	CHR vector of connection object classes to check against. Note this may depend heavily on connection packages and must be present in the class names of the driver used. (default session variable DB_CLASS)
`conn_name`	CHR scalar of the R environment object name to use for this connection (default: “con”)
`is_local`	LGL scalar indicating whether or not the referenced database is a local file, if not it will be treated as though it is either a DSN or a database name on your host server, connecting as otherwise defined
`rm_objects`	LGL scalar indicating whether or not to remove objects identifiably connected to the database from the current environment. This is particularly useful if there are outstanding connections that need to be closed (default: TRUE)
`reconnect`	LGL scalar indicating whether or not to connect if a connection does not exist; if both this and ‘disconnect’ are true, it will first be disconnected before reconnecting. (default: TRUE)
`disconnect`	LGL scalar indicating whether or not to terminate and remove the connection from the current global environment (default: TRUE)
`log_ns`	CHR scalar of the namespace (if any) to use for logging
`.environ`	environment within which to place this connection object
`…`	named list of any other connection parameters required for your database driver (e.g. postgres username/password)

Value

None

Note

If you want to disconnect everything but retain tibble pointers to your data source as tibbles in this session, use [close_up_shop] instead.

For more complicated setups, it may be easier to use this function by storing parameters in a list and calling with [base::do.call()]

map_import

R Documentation

Map an import file to the database schema

Description

This parses an import object and attempts to map it to database fields and tables as defined by an import map stored in an object of class data.frame, typically created during project compliance as “IMPORT_MAP”. This object is a list of all columns and their tables in the import file matched with the database table and column to which they should be imported.

Usage

map_import(
  import_obj,
  aspect,
  import_map,
  case_sensitive = TRUE,
  fuzzy = FALSE,
  ignore = TRUE,
  id_column = "_*id$",
  alias_column = "^alias$",
  resolve_normalization = TRUE,
  strip_na = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`import_obj`	LIST object of values to import
`aspect`	CHR scalar of the import aspect (e.g. “sample”) to map
`import_map`	data.frame object of the import map (e.g. from a CSV)
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

LIST of final mapped values

Note

The object used for ‘import_map’ must be of a data.frame object that at minimum includes names columns that includes import_category, import_parameter, alias_lookup, and sql_normalization

mode_checks

R Documentation

Get list of available functions

Description

Helper function for verify_args() that returns all the currently available functions matching a given prefix. This searches the entire library associated with the current R install.

Usage

mode_checks(prefix = "is", use_deprecated = FALSE)

Arguments

`prefix`	CHR scalar for the function prefix to search (default “is”)
`use_deprecated`	BOOL scalar indicating whether or not to include functions marked as deprecated (PLACEHOLDER default FALSE)

Details

Note: argument use_deprecated is not currently used but serves as a placeholder for future development to avoid or include deprecated functions

Value

CHR vector of functions matching prefix

Examples

mode_checks()

molecule_picture

R Documentation

Picture a molecule from structural notation

Description

This is a thin wrapper to rdkit.Chem.MolFromX methods to generate molecular models from common structure notation such as InChI or SMILES. All picture files produced will be in portable network graphics (.png) format.

Usage

caffeine <- "C[n]1cnc2N(C)C(=O)N(C)C(=O)c12"
molecule_picture(caffeine, show = TRUE)

Arguments

`mol`	CHR scalar expression of molecular structure
`mol_type`	CHR scalar indicating the expression type of ‘mol’ (default: “smiles”)
`file_name`	CHR scalar of an intended file destination (default: NULL will produce a random 10 character file name). Note that any file extensions provided here will be ignored.
`rdkit_name`	CHR scalar indication the name of the R object bound to RDkit OR the name of the R object directly (i.e. without quotes)
`open_file`	LGL scalar of whether to open the file after creation (default: FALSE)
`show`	LGL scalar of whether to return the image itself as an object (default: FALSE)

Value

None, or displays the resulting picture if ‘show == TRUE’

Note

Supported ‘mol’ expressions include FASTA, HELM, Inchi, Mol2Block, Mol2File, MolBlock, MolFile, PDBBlock, PDBFile, PNGFile, PNGString, RDKitSVG, Sequence, Smarts, Smiles, TPLBlock, and TPLFile

monoisotope.list

R Documentation

Calculate the monoisotopic mass of a elemental formulas in

Description

Calculate the monoisotopic mass of a elemental formulas in

Usage

monoisotope.list(
  df,
  column,
  exactmasses,
  remove.elements = c(),
  adduct = "neutral"
)

Arguments

`df`	data.frame with at least one column with elemental formulas
`column`	integer or CHR scalar indicating the column containing the elemental formulas, if CHR then regex match is used
`exactmasses`	list of exact masses
`remove.elements`	elements to remove from the elemental formulas
`adduct`	character string adduct/charge state to add to the elemental formula, options are ‘neutral’, ‘+H’, ‘-H’, ‘+Na’, ‘+K’, ‘+’, ‘-’, ‘-radical’, ‘+radical’

Value

data.frame with column of exact masses appended to it

ms_plot_peak

R Documentation

Plot a peak from database mass spectral data

Description

Plots the intensity of ion traces over the scan period and annotates them with the mass to charge value. Several flexible plotting aspects are provided as data may become complicated.

Usage

ms_plot_peak(
  data,
  peak_type = c("area", "line", "segment"),
  peak_facet_by = "ms_n",
  peak_mz_resolution = 0,
  peak_drop_ratio = 0.01,
  peak_repel_labels = TRUE,
  peak_line_color = "black",
  peak_fill_color = "grey50",
  peak_fill_alpha = 0.2,
  peak_text_size = 3,
  peak_text_offset = 0.02,
  include_method = TRUE,
  db_conn = con
)

Arguments

`data`	data.frame of spectral data in the form of the ‘ms_data’ table
`peak_type`	CHR scalar of the plot type to draw, must be one of “line”, “segment”, or “area” (default: “line”)
`peak_facet_by`	CHR scalar name of a column by which to facet the resulting plot (default: “ms_n”)
`peak_mz_resolution`	INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int”), ion m/z value (as “base_ion”), and scan time (as “scantime”) - (default: 0 goes to unit resolution)
`peak_drop_ratio`	NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)
`peak_repel_labels`	LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation
`peak_line_color`	CHR scalar name of the color to use for the “color” aesthetic (only a single color is supported; default: “black”)
`peak_fill_color`	CHR scalar name of the color to use for the “fill” aesthetic (only a single color is supported; default: “grey70”)
`peak_text_offset`	NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.02 offsets labels in the positive direction by 2% of the maximum intensity)
`db_conn`	database connection (default: con) which must be live to pull sample and compound identification information

Details

The basic default plot will group all mass-to-charge ratio values by unit resolution (increase resolution with ‘peak_mz_resolution’) and plot them as an area trace over the scanning period. Traces are annotated with the grouping value. Values of ‘peak_mz_resolution’ greater than available data (e.g. 10 when data resolution is to the 5th decimal point) will default to maximum resolution.

Traces are filtered out completely if their maximum intensity is below the ratio set by ‘peak_drop_ratio’; only complete traces are filtered out this way, not individual data points within a retained trace. Set this as the fraction of the base peak (the peak of maximum intensity) to use to filter out low-intensity traces. The calculated intensity threshold will be printed to the caption.

Value

ggplot object

Note

Increasing ‘peak_mz_resolution’ will likely result in multiple separate traces.

Implicitly missing values are not interpolated, but lines are drawn through to the next point.

‘peak_type’ can will accept abbreviations of its accepted values (e.g. “l” for “line”)

ms_plot_peak_overview

R Documentation

Create a patchwork plot of peak spectral properties

Description

Call this function to generate a combined plot from [ms_plot_peak], [ms_plot_spectra], and [ms_plot_spectral_intensity] using the [patchwork] package, which must be installed. All arguments will be passed directly to the underlying functions to provide flexibility in the final display. The default settings match those of the called plotting functions, and the output can be further manipulated with the patchwork package.

Usage

ms_plot_peak_overview(
  plot_peak_id,
  peak_type = c("area", "line", "segment"),
  peak_facet_by = "ms_n",
  peak_mz_resolution = 0,
  peak_drop_ratio = 0.01,
  peak_repel_labels = TRUE,
  peak_line_color = "black",
  peak_fill_color = "grey50",
  peak_fill_alpha = 0.2,
  peak_text_size = 3,
  peak_text_offset = 0.02,
  spectra_mz_resolution = 3,
  spectra_drop_ratio = 0.01,
  spectra_repel_labels = TRUE,
  spectra_repel_line_color = "grey50",
  spectra_nudge_y_factor = 0.03,
  spectra_log_y = FALSE,
  spectra_text_size = 3,
  spectra_max_overlaps = 50,
  intensity_plot_resolution = c("spectra", "peak"),
  intensity_mz_resolution = 3,
  intensity_drop_ratio = 0,
  patchwork_design = c(area(1, 4, 7, 7), area(1, 1, 4, 2), area(6, 1, 7, 2)),
  as_individual_plots = FALSE,
  include_method = TRUE,
  db_conn = con,
  log_ns = "global"
)

Arguments

`peak_type`	CHR scalar of the plot type to draw, must be one of “line”, “segment”, or “area” (default: “line”)
`peak_facet_by`	CHR scalar name of a column by which to facet the resulting plot (default: “ms_n”)
`peak_mz_resolution`	INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int”), ion m/z value (as “base_ion”), and scan time (as “scantime”) - (default: 0 goes to unit resolution)
`peak_drop_ratio`	NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)
`peak_repel_labels`	LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation
`peak_line_color`	CHR scalar name of the color to use for the “color” aesthetic (only a single color is supported; default: “black”)
`peak_fill_color`	CHR scalar name of the color to use for the “fill” aesthetic (only a single color is supported; default: “grey70”)
`peak_text_offset`	NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.02 offsets labels in the positive direction by 2% of the maximum intensity)
`spectra_mz_resolution`	INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as base_int), ion m/z value (as base_ion), and scan time (as scantime) - (default: 3)
`spectra_drop_ratio`	NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)
`spectra_repel_labels`	LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation
`spectra_repel_line_color`	CHR scalar name of the color to use for the “color” aesthetic of the lines connecting repelled labels to their data points; passed to [ggrepel::geom_text_repel] as segment.color (only a single color is supported; default: “grey50”)
`spectra_nudge_y_factor`	NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.03 offsets labels in the positive direction by 3% of the maximum intensity)
`spectra_log_y`	LGL scalar of whether or not to apply a log10 scaling factor to the y-axis (default: FALSE)
`spectra_text_size`	NUM scalar of the text size to use for annotation labels (default: 3)
`spectra_max_overlaps`	INT scalar of the maximum number of text overlaps to allow (default: 50)
`intensity_mz_resolution`	INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int” or “intensity”), ion m/z value (as “base_ion” or “mz”), and scan time (as “scantime”) - (default: 5)
`intensity_drop_ratio`	NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 0 returns all); if > 1 the inversion will be used (1e5 -> 1e-5)
`patchwork_design`	the layout of the final plot see [patchwork::design]
`as_individual_plots`	LGL scalar of whether to return the plots individually in a list (set TRUE) or as a patchwork plot (default: FALSE)
`db_conn`	database connection (default: con) which must be live to pull sample and compound identification information

Value

object of classes ‘gg’ and ‘ggplot’, as a patchwork unless ‘as_individual_plots’ is TRUE

Note

Requires a live connection to the database to pull all plots for a given peak_id.

Defaults are as for called functions

ms_plot_spectra

R Documentation

Plot a fragment map from database mass spectral data

Description

Especially for non-targeted analysis workflows, it is often necessary to examine annotated fragment data for spectra across a given peak of interest. Annotated fragments lend increasing confidence in the identification of the compound giving rise to a mass spectral peak. If a fragment has been annotated, that identification is displayed along with the mass to charge value in blue. Annotations of the mass to charge ratio for unannotated fragments are displayed in red.

Usage

ms_plot_spectra(
  data,
  spectra_type = c("separated", "zipped"),
  spectra_mz_resolution = 3,
  spectra_drop_ratio = 0.01,
  spectra_repel_labels = TRUE,
  spectra_repel_line_color = "grey50",
  spectra_nudge_y_factor = 0.03,
  spectra_log_y = FALSE,
  spectra_is_file = FALSE,
  spectra_from_JSON = FALSE,
  spectra_animate = FALSE,
  spectra_text_size = 3,
  spectra_max_overlaps = 50,
  include_method = TRUE,
  db_conn = con
)

Arguments

`data`	data.frame of spectral data in the form of the ‘ms_data’ table
`spectra_mz_resolution`	INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as base_int), ion m/z value (as base_ion), and scan time (as scantime) - (default: 3)
`spectra_drop_ratio`	NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)
`spectra_repel_labels`	LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation
`spectra_repel_line_color`	CHR scalar name of the color to use for the “color” aesthetic of the lines connecting repelled labels to their data points; passed to [ggrepel::geom_text_repel] as segment.color (only a single color is supported; default: “grey50”)
`spectra_nudge_y_factor`	NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.03 offsets labels in the positive direction by 3% of the maximum intensity)
`spectra_log_y`	LGL scalar of whether or not to apply a log10 scaling factor to the y-axis (default: FALSE)
`spectra_is_file`	LGL scalar of whether data are coming from a file (default: FALSE)
`spectra_from_JSON`	LGL scalar of whether data are in JSON format; other formats are not supported when ‘spectra_is_file = TRUE’ (default: FALSE)
`spectra_animate`	LGL scalar of whether to produce an animation across the scantime for these data (default: FALSE)
`spectra_text_size`	NUM scalar of the text size to use for annotation labels (default: 3)
`spectra_max_overlaps`	INT scalar of the maximum number of text overlaps to allow (default: 50)
`db_conn`	database connection (default: con) which must be live to pull sample and compound identification information

Value

ggplot object

Note

If ‘spectra_animate’ is set to true, it requires the [gganimate] package to be installed (and may also require the [gifski] package) and WILL take a large amount of time to complete, but results in an animation that will iterate through the scan period and display mass spectral data as they appear across the peak. Your mileage likely will vary.

ms_plot_spectral_intensity

R Documentation

Create a spectral intensity plot

Description

Often it is useful to get an overview of mass-to-charge intensity across the scanning time of a peak. Typically this is done with individual traces in the peak fashion, but large peaks can often mask smaller ones, or wash out lower intensity signals. Use this to plot m/z as dependent upon scan time with intensity shown by color and size. It is intended as a complement to [ms_plot_peak] and may be called at the same levels of granularity, generally greater so than [ms_plot_peak] which is more of an overview.

Usage

ms_plot_spectral_intensity(
  data,
  intensity_mz_resolution = 5,
  intensity_drop_ratio = 0,
  intensity_facet_by = NULL,
  intensity_plot_resolution = c("spectra", "peak"),
  include_method = TRUE,
  db_conn = con
)

Arguments

`data`	tibble or pointer with data to plot, either at the peak level, in which case “base_ion” must be present, or at the spectral level, in which case “intensity” must be present
`intensity_mz_resolution`	INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int” or “intensity”), ion m/z value (as “base_ion” or “mz”), and scan time (as “scantime”) - (default: 5)
`intensity_drop_ratio`	NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 0 returns all); if > 1 the inversion will be used (1e5 -> 1e-5)
`intensity_facet_by`	CHR scalar of a column name in ‘data’ by which to facet the resulting plot (default: NULL)
`db_conn`	database connection (default: con) which must be live to pull sample and compound identification information

Value

object of classes ‘gg’ and ‘ggplot’

ms_plot_titles

R Documentation

Consistent title for ms_plot_x functions

Description

This helper function creates consistently formatted plot label elements in an opinionated manner. This is unlikely to be useful outside the direct context of [ms_plot_peak], [ms_plot_spectra], and [ms_plot_spectral_intensity].

Usage

ms_plot_titles(
  plot_data,
  mz_resolution,
  drop_ratio,
  include_method,
  db_conn = con
)

Arguments

`plot_data`	data.frame object passed from the plotting function
`mz_resolution`	NUM scalar passed from the plotting function
`drop_ratio`	NUM scalar passed from the plotting function
`include_method`	LGL scalar indicating whether or not to get the method narrative from the database
`db_conn`	database connection (default: con) which must be live to pull sample and compound identification information

Value

LIST of strings named for ggplot title elements “title”, “subtitle”, and “caption”

ms_spectra_separated

R Documentation

Parse “Separated” MS Data

Description

The “separated” format includes spectra packed into two separate columns, one for mass and another for intensity. All values for a given scan time are packed into these columns, separated by space, with an unlimited number of discrete values, and must be a 1:1 ratio of values between the two columns.

Usage

ms_spectra_separated(df, ms_cols = c("mz", "intensity"))

Arguments

`df`	data.frame or json object containing spectra compressed in the “separated” format
`ms_cols`	CHR vector of length 2 identifying the column names to use for mass and intensity in the source data; must be of length 2, with the first value identifying the mass-to-charge ratio column and the second identifying the intensity column

Value

data.frame object of the unpacked spectra as a list column

Note

ms_cols is treated as regex expressions, but it is safest to provide matching column names

Examples

### JSON Example
tmp <- jsonify::as.json('{
 "measured_mz": "712.9501 713.1851",
 "measured_intensity": "15094.41015625 34809.9765625"
}')
ms_spectra_separated(tmp)

### Example data.frame
tmp <- data.frame(
  measured_mz = "712.9501 713.1851",
  measured_intensity = "15094.41015625 34809.9765625"
)
ms_spectra_separated(tmp)

ms_spectra_zipped

R Documentation

Parse “Zipped” MS Data

Description

The “zipped” format includes spectra packed into one column containing alternating mass and intensity values for all observations. All values are packed into these columns for a given scan time, separated by spaces, with an unlimited number of discrete values, and must be in an alternating 1:1 pattern of values of the form “mass intensity mass intensity”.

Usage

ms_spectra_zipped(df, spectra_col = "data")

Arguments

`df`	data.frame object containing spectra compressed in the “zipped” format
`spectra_col`	CHR vector of length 2 identifying the column names to use for mass and intensity in the source data; must be of length 2, with the first value identifying the mass column and the second identifying the intensity column

Value

data.frame object containing unpacked spectra as a list column

Note

spectra-col is treated as a regex expression, but it is safest to provide a matching column name

Examples

### JSON Example
tmp <- jsonlite::as.json('{"msdata": "712.9501 15094.41015625 713.1851 34809.9765625"}')
ms_spectra_separated(tmp)

### Example data.frame
tmp <- data.frame(
  msdata = "712.9501 15094.41015625 713.1851 34809.9765625"
)
ms_spectra_zipped(tmp)

mzMLconvert

R Documentation

Converts a raw file into an mzML

Description

Converts a raw file into an mzML

Usage

mzMLconvert(rawfile, msconvert = NULL, config = NULL, outdir = getwd())

Arguments

`rawfile`	file path of the MS raw file to be converted
`msconvert`	file path of the msconvert.exe file, if NULL retrieves information from config directory
`config`	configuration settings file for msconvert conversion to mzML, if NULL retrives information from config directory
`outdir`	directory path for the converted mzML file.

Value

CHR scalar path to the created file

mzMLtoR

R Documentation

Opens file of type mzML into R environment

Description

Opens file of type mzML into R environment

Usage

mzMLtoR(
  mzmlfile = file.choose(),
  lockmass = NULL,
  lockmasswidth = NULL,
  correct = FALSE,
  approach = "hybrid"
)

Arguments

`mzmlfile`	the file path of the mzML file which the data are to be read from.
`lockmass`	NUM scalar m/z value of the lockmass to remove (Waters instruments only) (default: NULL)
`lockmasswidth`	NUM scalar instrumental uncertainty associated with ‘lockmass’ (default: NULL)
`correct`	logical if the subsequent spectra should be corrected for the lockmass (Waters instruments only)
`approach`	character string defining the type of lockmass removal filter to use, default is ‘hybrid’

Value

list containing mzML data with unzipped masses and intensity information

nist_shinyalert

R Documentation

Call [shinyalert::shinyalert] with specific styling

Description

This pass through function serves only to call [shinyalert::shinyalert] with parameters defined by this function, and can be used for additional styling that may be necessary. It is used solely for consistency sake.

Usage

nist_shinyalert("test", "info", shiny::h3("test"))

Arguments

`title`	The title of the modal.
`type`	The type of the modal. There are 4 built-in types which will show a corresponding icon: `“warning”`, `“error”`, `“success”` and `“info”`. You can also set `type=“input”` to get a prompt in the modal where the user can enter a response. By default, the modal has no type.
`text`	The modal’s text. Can either be simple text, or Shiny tags (including Shiny inputs and outputs). If using Shiny tags, then you must also set `html=TRUE`.
`className`	A custom CSS class name for the modal’s container.
`html`	If `TRUE`, the content of the title and text will not be escaped. By default, the content in the title and text are escaped, so any HTML tags will not render as HTML.
`closeOnClickOutside`	If `TRUE`, the user can dismiss the modal by clicking outside it.
`immediate`	If `TRUE`, close any previously opened alerts and display the current one immediately.
`…`	Additional named parameters to be passed to shinyalert. Unrecognized ones will be ignored.

Value

None, shows a shinyalert modal

Sanity check for environment object names

Description

Provides a sanity check on whether or not a name reference exists and return its name if so. If not, return the default name defined from default_name. This largely is used to prevent naming conflicts as part of managing the plumber service but can be used for any item in the current namespace.

Usage

if (exists("log_it")) {
    obj_name_check("test", "test")
    test <- letters
    obj_name_check(test)
  }

Arguments

`obj`	R object or CHR scalar in question to be resolved in the namespace
`default_name`	CHR scalar name to use for `obj` if it does not exist (default: NULL).

Value

CHR scalar of the resolved object name

open_env

R Documentation

Convenience shortcut to open and edit session environment variables

Description

Calls [open_proj_file] for either the R, global, or logging environment settings containing the most common settings dictating project behavior.

Usage

open_env(name = c("R", "global", "logging", "rdkit", "shiny", "plumber"))

Arguments

name

CHR scalar, one of “R”, “global”, or “logging”.

Value

None, opens a file for editing

open_proj_file

R Documentation

Open and edit project files

Description

Project files are organized in several topical directories depending on their purpose as part of the package. For example, several project control variables are set to establish the session global environment in the “config” directory rather than the “R” directory.

Usage

open_proj_file(name, dir = NULL, create_new = FALSE)

Arguments

`name`	CHR scalar of the file name to open, accepts regex
`dir`	CHR scalar of a directory name to search within
`create_new`	LGL scalar of whether to create the file (similar functionality to [usethis]; default FALSE)

Details

If a direct file match to name is not found, it will be searched for using a recursive [list.files] allowing for regex matches (e.g. “.R$”). Directories are similarly sought out within the project. Reasonable feedback is provided.

This convenience function uses [usethis::edit_file] to open (or create if ‘create_new’ is TRUE) any given file in the project.

Value

None, opens a file for editing

Note

If the directory and file cannot be found, and ‘create_new’ is true, the directory will be placed within the project directory.

optimal_ums

R Documentation

Get the optimal uncertainty mass spectrum parameters for data

Description

Get the optimal uncertainty mass spectrum parameters for data

Usage

optimal_ums(
  peaktable,
  max_correl = 0.75,
  correl_bin = 0.05,
  max_ph = 10,
  ph_bin = 1,
  max_freq = 10,
  freq_bin = 1,
  min_n_peaks = 3,
  cormethod = "pearson"
)

Arguments

`peaktable`	list generated from ‘create_peak_table_ms1’ or ‘create_peak_table_ms2’
`max_correl`	numeric maximum acceptable correlation
`correl_bin`	numeric sequence bin width from max_correl..0
`max_ph`	numeric maximum acceptable peak height (%)
`ph_bin`	numeric sequence bin width from max_ph..0
`max_freq`	numeric maximum acceptable observational frequency (%)
`freq_bin`	numeric sequence bin width from max_freq..0
`min_n_peaks`	integer ideal minimum number of scans for mass spectrum
`cormethod`	string indicating correlation function to use (see [cor()] for description)

Value

data.frame object containing optimized search parameters

overlap

R Documentation

Calculate overlap ranges

Description

Internal function: determines if two ranges (x1-e1 to x1+e1) and (x2-e2 to x2+e2) overlap (nonstatistical evaluation)

Usage

overlap(x1, e1, x2, e2)

Arguments

`x1, x2`	values containing mean values
`e1, e2`	values containing respective error values

pair_ums

R Documentation

Pairwise data.frame of two uncertainty mass spectra

Description

The function stacks two uncertainty mass spectra together based on binned m/z values

Usage

pair_ums(ums1, ums2, error = 5, minerror = 0.002)

Arguments

`ums1`	uncertainty mass spectrum from ‘get_ums’ function
`ums2`	uncertainty mass spectrum from ‘get_ums’ function
`minerror`	the minimum mass error (in Da) of the instrument data
`masserror`	the mass accuracy (in ppm) of the instrument data

peak_gather_json

R Documentation

Extract peak data and metadata

Description

gathers metadata from methodjson and extracts the MS1 and MS2 data from the mzml

Usage

peak_gather_json(
  methodjson,
  mzml,
  compoundtable,
  zoom = c(1, 5),
  minerror = 0.002
)

Arguments

`methodjson`	list of JSON generated from ‘parse_method_json’ function
`mzml`	list of msdata from ‘mzMLtoR’ function
`compoundtable`	data.frame containing compound identities [should be extractable from SQL later]
`zoom`	numeric vector specifying the range around the precursor ion to include, from m/z - zoom[1] to m/z + zoom[2]
`minerror`	numeric the minimum error (in Da) of the instrument

Value

list of peak objects

plot_compare_ms

R Documentation

Plot MS Comparison

Description

Plots a butterfly plot for the comparison of two uncertainty mass spectra

Usage

plot_compare_ms(
  ums1,
  ums2,
  main = "Comparison Mass Spectrum",
  size = 1,
  c1 = "black",
  c2 = "red",
  ylim.exp = 1
)

Arguments

`ums1, ums2`	uncertainty mass spectrum from ‘get_ums’ function
`main`	Main Title of the Plot
`size`	line width of the mass spectra lines
`c1`	Color of the top (ums1) mass spectral lines
`c2`	Color of the bottom (ums2) mass spectral lines
`ylim.exp`	Expansion unit for the y-axis

plot_ms

R Documentation

Generate consensus mass spectrum

Description

Extract relevant information from a mass spectrum and plot it as an uncertainty mass spectrum.

Usage

plot_ms(
  ms,
  xlim = NULL,
  ylim = NULL,
  main = "Mass Spectrum",
  color = "black",
  size = 1,
  removal = 0
)

Arguments

peaklist

result of the ‘create_peak_list’ function

Details

Extract relevant information from a mass spectrum and plot it as an uncertainty mass spectrum.

Value

ggplot object

pool.sd

R Documentation

Pool standard deviations

Description

Internal function: calculates a pooled standard deviation

Usage

pool.sd(sd, n)

Arguments

`sd`	A vector containing numeric values of standard deviations
`n`	A vector containing integers for the number of observations respective to the sd values

pool.ums

R Documentation

Pool uncertainty mass spectra

Description

Calculates a pooled uncertainty mass spectrum that is a result of data from multiple uncertainty mass spectra.

Usage

pool.ums(umslist, error = 5, minerror = 0.002)

Arguments

`umslist`	A list where each item is a uncertainty mass spectrum from function ‘get_ums’
`minerror`	the minimum mass error (in Da) of the instrument data
`masserror`	the mass accuracy (in ppm) of the instrument data

pragma_table_def

R Documentation

Get table definition from SQLite

Description

Given a database connection (‘con’). Get more information about the properties of (a) database table(s) directly from ‘PRAGMA table_info()’ rather than e.g. [DBI::dbListFields()]. Set ‘get_sql’ to ‘TRUE’ to include the direct schema using sqlite_master; depending on formatting this may or may not be directly usable though some effort has been made to remove formatting characters (e.g. line feeds, tabs, etc) if stringr is available.

Usage

pragma_table_def(db_table, db_conn = con, get_sql = FALSE, pretty = TRUE)

Arguments

`db_table`	CHR vector name of the table(s) to inspect
`db_conn`	connection object (default: con)
`get_sql`	BOOL scalar of whether or not to return the schema sql (default FALSE)
`pretty`	BOOL scalar for whether to return “pretty” SQL that includes human readability enhancements; if this is set to TRUE (the default), it is recommended that the output is fed through ‘cat’ and, in the case of multiple tables

Details

Note that the package ‘stringr’ is required for formatting returns that include either ‘get_sql’ or ‘pretty’ as TRUE.

Value

data.frame object representing the SQL PRAGMA expression

pragma_table_info

R Documentation

Explore properties of an SQLite table

Description

Add functionality to ‘pragma_table_def’ by filtering on column properties such as required and primary key fields. This provides some flexibility to searching table properties without sacrificing the full details of table schema. Parameter ‘get_sql’ is forced to FALSE; only information available via PRAGMA is searched by this function.

Usage

pragma_table_info("compounds")

Arguments

`db_table`	CHR vector name of the table(s) to inspect
`db_conn`	connection object (default: con)
`condition`	CHR vector matching specific checks, must be one of c(“required”, “has_default”, “is_PK”) for constraints where a field must not be null, has a default value defined, and is a primary key field, respectively. (default: NULL)
`name_like`	CHR vector of character patterns to match against column names via grep. If length > 1, will be collapsed to a basic OR regex (e.g. c(“a”, “b”) becomes “a\|b”). As regex, abbreviations and wildcards will typically work, but care should be used in that case. (default: NULL)
`data_type`	CHR vector of character patterns to match against column data types via grep. If length > 1 will be collapsed to a basic “OR” regex (e.g. c(“int”, “real”) becomes “int\|real”). As regex, abbreviations and wildcards will typically work, but care should be used in that case. (default: NULL)
`include_comments`	LGL scalar of whether to include comments in the return data frame (default: FALSE)
`names_only`	LGL scalar of whether to include names meeting defined criteria as a vector return value (default: FALSE)

Details

This is intended to support validation during database communications with an SQLite connection, especially for application (e.g. ‘shiny’ development) by allowing for programmatic inspection of datbase columns by name and property.

Value

data.frame object describing the database entity

py_modules_available

R Documentation

Are all conda modules available in the active environment

Description

Checks that all defined modules are available in the currently active python binding. Supports error logging

Usage

py_modules_available("rdkit")

Arguments

required_modules

CHR vector of required modules

Value

LGL scalar of whether or not all modules are available. Check console for further details.

rdkit_active

R Documentation

Sanity check on RDKit binding

Description

Given a name of an R object, performs a simple check on RDKit availability on that object, creating it if it does not exist. A basic structure conversion check is tried and a TRUE/FALSE result returned. Leave all arguments as their defaults of NULL to ensure they will honor the settings in ‘rdkit/env_py.R’.

Usage

rdkit_active(
  rdkit_ref = NULL,
  rdkit_name = NULL,
  log_ns = NULL,
  make_if_not = FALSE
)

Arguments

`rdkit_ref`	CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)
`rdkit_name`	CHR scalar the name of a python environment able to run rdkit (default NULL goes to “rdkit” for convenience with other pipelines in this project)
`log_ns`
`make_if_not`	LGL scalar of whether or not to create a new python environment using [activate_py_env] if the binding is not active

Value

LGL scalar of whether or not the test of RDKit was successful

rdkit_mol_aliases

R Documentation

Create aliases for a molecule from RDKit

Description

Call this function to generate any number of machine-readable aliases from an identifier set. Given the ‘identifiers’ and their ‘type’, RDKit will be polled for conversion functions to create a mol object. That mol object is then used to create machine-readable aliases in any number of supported formats. See the RDKit Documentation for options. The ‘type’ argument is used to match against a “MolFromX” funtion, while the ‘aliases’ argument is used to match against a “MolToX” function.

Usage

rdkit_mol_aliases(
  identifiers,
  type = "smiles",
  mol_from_prefix = "MolFrom",
  get_aliases = c("inchi", "inchikey"),
  mol_to_prefix = "MolTo",
  rdkit_ref = "rdk",
  log_ns = "rdk",
  make_if_not = TRUE
)

Arguments

`identifiers`	CHR vector of machine-readable molecule identifiers in a format matching ‘type’
`type`	CHR scalar of the type of encoding to use for ‘identifiers’ (default: smiles)
`mol_from_prefix`	CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”)
`get_aliases`	CHR vector of aliases to produce (default: c(“inchi”, “inchikey”))
`mol_to_prefix`	CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”)
`rdkit_ref`	CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)
`log_ns`
`make_if_not`	LGL scalar of whether or not to create a new python environment using [activate_py_env] if the binding is not active

Details

At the time of authorship, RDK v2021.09.4 was in use, which contained the following options findable by this function: CMLBlock, CXSmarts, CXSmiles, FASTA, HELM, Inchi, InchiAndAuxInfo, InchiKey, JSON, MolBlock, PDBBlock, RandomSmilesVect, Sequence, Smarts, Smiles, TPLBlock, V3KMolBlock, XYZBlock.

Value

data.frame object containing the aliases and the original identifiers

Note

Both ‘type’ and ‘aliases’ are case insensitive.

If ‘aliases’ is set to NULL, all possible expressions (excluding those with “File” in the name) are returned from RDKit, which will likely produce NULL values and module ArgumentErrors.

read_log

R Documentation

Read a log from a log file

Description

By default if ‘file’ does not exist (i.e. ‘file’ is not a fully defined path) this looks for log text files in the directory defined by ‘LOG_DIRECTORY’ in the session.

Usage

read_log("log.txt")

Arguments

`file`	CHR scalar file path to a log file (default NULL is translated to “log.txt”)
`last_n`	INT scalar of the last ‘n’ log entries to read.
`as_object`	LGL scalar of whether to return the log as an R object or just to print the log to the console.

Value

CHR vector of the requested log file entries if ‘as_object’ is TRUE, or none with a console print if ‘as_object’ is FALSE

rebuild_help_htmls

R Documentation

Rebuild the help files as HTML with an index

Description

Rebuild the help files as HTML with an index

Usage

rebuild_help_htmls(rebuild_book = TRUE, book = "dimspec_user_guide")

Arguments

`rebuild_book`	LGL scalar of whether or not to rebuild an associated bookdown document
`book`	Path to folder containing the bookdown document to rebuild

Value

URL to the requested book

rectify_null_from_env

R Documentation

Rectify NULL values provided to functions

Description

To support redirection of sensible parameter reads from an environment, either Global or System, functions in this package may include NULL as their default value. This returns values in precedence of parameter, env_parameter and default.

Usage

rectify_null_from_env(test, test, "test")

Arguments

`parameter`	the object being evaluated
`env_parameter`	the name or object of a value to use from the environment if `parameter` is NULL
`default`	the fallback value to use if `parameter` is NULL and `env_parameter` does not exist
`log_ns`	the namespace to use with [log_it] if available

Value

The requested value, either as-is, rectified from the environment, or the default

Note

log_ns is only applicable if logging is set up in this project (see project settings in env_glob.txt, env_R.R, and env_logger.R for details).

Both [base::.GlobalEnv] and [base::Sys.getenv] are checked, and can be provided as a character scalar or as an object reference

ref_table_from_map

R Documentation

Get the name of a linked normalization table

Description

Extract the name of a normalization table from the database given a table and column reference.

Usage

ref_table_from_map("table1", "fk_column1", er_map(con), "references")

Arguments

`table_name`	CHR scalar name of the database table
`table_column`	CHR scalar name of the foreign key table column
`this_map`	LIST object containing the schema representation from ‘er_map’ (default: an object named “db_map” created as part of the package spin up)
`fk_refs_in`	CHR scalar name of the item in ‘this_map’ containing the SQL “REFERENCES” statements extracted from the schema

Value

CHR scalar name of the table to which a FK column is linked or an empty character string if no match is located (i.e. ‘table_column’ is not a defined foreign key).

Note

This requires an object of the same shape and properties as those resulting from [er_map] as ‘this_map’.

remove_db

R Documentation

Remove an existing database

Description

This is limited to only the current working directory and includes its subdirectories. If you wish to retain a copy of the prior database, ensure argument ‘archive = TRUE’ (note the default is FALSE) to create a copy of the requested database prior to rebuild; this is created in the same directory as the found database and appends

Usage

remove_db("test.sqlite", archive = TRUE)

Arguments

`db`	CHR scalar name of the database to build (default: session value DB_NAME)
`archive`	LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE)

Value

None, check console for details

remove_icon_from

R Documentation

Remove the last icon attached to an HTML element

Description

Remove the last icon attached to an HTML element

Usage

remove_icon_from(id)

Arguments

`id`	CHR scalar of the HTML ID from which to remove the last icon

Value

CHR scalar suitable to execute with ‘shinyjs::runJS’

Examples

append_icon_to("example", "r-project", "fa-3x")
remove_icon_from("example")

remove_sample

R Documentation

Delete a sample

Description

Removes a sample from the database and associated records in ms_methods, conversion_software_settings, and conversion_software_linkage. Associated peak and mass spectrometric signals will also be removed.

Usage

remove_sample(sample_ids, db_conn = con, log_ns = "db")

Arguments

`sample_ids`	INT vector of IDs to remove from the samples table.
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

None, executes actions on the database

repair_xl_casrn_forced_to_date

R Documentation

Repair CAS RNs forced to a date numeric by MSXL

Description

If a file is opened in Microsoft Excel(R), Chemical Abstract Service (CAS) Registry Numbers (RNs) can occasionally be read as a pseudodate (e.g. “1903-02-8”). Without tight controls over column formatting, this can result in CAS RNs that are not real entering a processing pipeline. This convenience function attempts to undo that automatic formatting by forcing vector members whose values when coerced to numeric are equal to those provided to a properly formatted date with an origin depending on operating system platform (as read by ‘.Platform$OS.type’); Windows operating systems use the Windows MSXL origin date of “1899-12-30” while others use “1904-01-01”. Text entries of “NA” are coerced to NA.

Usage

repair_xl_casrn_forced_to_date(casrn_vec, output_format = "%Y-%m-%d")

Arguments

`casrn_vec`	CHR or NUM vector of what should be valid CAS RNs
`output_format`	CHR scalar of the output format, which

Value

CHR vector of length equal to that of ‘casrn_vec’ where numeric entries have been coerced to the assumed date

Examples

repair_xl_casrn_forced_to_date(c("64324-08-3", "12332"))

repl_nan

R Documentation

Replace NaN

Description

Replace all NaN values with a specified value

Usage

repl_nan(x, repl = NULL)

Arguments

`x`	vector of values
`repl`	value to replace NaN contained in ‘x’

Value

vector with all NaN replaced with ‘repl’

report_qc

R Documentation

Export QC result JSONfile into PDF

Description

Export QC result JSONfile into PDF

Usage

report_qc(
  jsonfile = file.choose(),
  outputfile = gsub(".json", ".pdf", jsonfile, ignore.case = TRUE)
)

Arguments

`jsonfile`	jsonfile file path
`outputfile`	output pdf file path

Value

generates reporting PDF

reset_logger_settings

R Documentation

Update logger settings

Description

This is a simple action wrapper to update any settings that may have been changed with regard to logger. If, for instance, something is not logging the way you expect it to, change the relevant setting and then run update_logger_settings() to reflect the current environment.

Usage

reset_logger_settings()

Arguments

reload

LGL scalar indicating (if TRUE) whether or not to refresh from env_R.R or (if FALSE) to use the current environment settings (e.g. for testing purposes) (default: FALSE)

Value

None

resolve_compound_aliases

R Documentation

Resolve compound aliases provided as part of the import routine

Description

Call this to add any aliases for a given ‘compound_id’ that may not be present in the database. Only those identifiable as part of the accepted types defined in ‘norm_alias_table’ will be mapped. If multiple items are provided in the import NAME, ADDITIONAL, or other items matching names in ‘norm_alias_table’.name column, indicate the split character in ‘split_multiples_by’ and any separator between names and values (e.g. CLASS:example) in ‘identify_property_by’.

Usage

resolve_compound_aliases(
  obj,
  compound_id,
  compounds_in = "compounddata",
  compound_alias_table = "compound_aliases",
  norm_alias_table = "norm_analyte_alias_references",
  norm_alias_name_column = "name",
  headers_to_examine = c("ADDITIONAL", "NAME"),
  split_multiples_by = ";",
  identify_property_by = ":",
  out_file = "unknown_compound_aliases.csv",
  db_conn = con,
  log_ns = "db",
  ...
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`compound_id`	INT scalar of the compound_id to use for these aliases
`compounds_in`	CHR scalar name in ‘obj’ holding compound data (default: “compounddata”)
`norm_alias_table`	CHR scalar name of the table normalizing analyte alias references (default: “norm_analyte_alias_references”)
`norm_alias_name_column`	CHR scalar name of the column in ‘norm_alias_table’ containing the human-readable expression of alias type classes (default: “name”)
`…`	Named list of any additional aliases to tack on that are not found in the import object, with names matching those found in ‘norm_alias_table’.’norm_alias_name_column’

Value

None, though if unclassifiable aliases (those with alias types not present in the normalization table) are found, they will be written to a file (‘out_file’) in the project directory

Note

Existing aliases, and aliases for which there is no ‘compound_id’ will be ignored and not imported.

Compound IDs provided in ‘compound_id’ must be present in the compounds table and must be provided explicitly on a 1:1 basis for each element extracted from ‘obj’. If you provide an import object with 10 components for compound data, you must provide tying ‘compound_id’ identifiers for each. If all extracted components represent aliases for the same ‘compound_id’ then one may be provided.

Alias types (e.g. “InChI” are case insensitive)

resolve_compound_fragments

R Documentation

Link together peaks, fragments, and compounds

Description

This function links together the peaks, annotated_fragments, and compounds table. This serves as the main connection table conceptually tying together peaks, the fragments annotated within those peaks, and the compound identification associated with the peaks. The database supports flexible assignment wherein compounds may be related to either peaks or annotated fragments, or both, and vice versa. At least two IDs are required for linkage; i.e. compounds may not have an acciated peak in the database, but are known to produce fragments at a particular m/z value. Ideally, all three are provided to provide traceback from compounds, a complete list of their annotated fragments, and association with a peak object with data containing unannotated fragments, which can be traced back to the sample from which it was drawn and the associated metrological method information.

Usage

resolve_compound_fragments(
  values = NULL,
  peak_id = NA,
  annotated_fragment_id = NA,
  compound_id = NA,
  linkage_table = "compound_fragments",
  peaks_table = "peaks",
  annotated_fragments_table = "annotated_fragments",
  compounds_table = "compounds",
  db_conn = con,
  log_ns = "db"
)

Arguments

`values`	LIST item containing items for ‘peak_id’, ‘annotated_fragment_id’, and ‘compound_id’ (default: NULL); used preferentially if provided
`peak_id`	INT vector (ideally of length 1) of the peak ID(s) to link; ignored if ‘values’ is provided (default: NA)
`annotated_fragment_id`	INT vector of fragment ID(s) to link; ignored if ‘values’ is provided (default: NA)
`compound_id`	INT vector of compound ID(s) to link; ignored if ‘values’ is provided (default: NA)
`linkage_table`	CHR scalar name of the database table containing linkages between peaks, fragments, and compounds (default: “compound_fragments”)
`peaks_table`	CHR scalar name of the database table containing peaks for look up (default: “peaks”)
`compounds_table`	CHR scalar name of the table holding compound information
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)
`fragments_table`	CHR scalar name of the table holding annotated fragment information

Value

None, value checks entries and executes database actions

resolve_compounds

R Documentation

Resolve the compounds node during bulk import

Description

Call this function as part of an import routine to resolve the compounds node.

Usage

resolve_compounds(
  obj,
  compounds_in = "compounddata",
  compounds_table = "compounds",
  compound_category = NULL,
  compound_category_table = "compound_categories",
  compound_alias_table = "compound_aliases",
  norm_alias_table = "norm_analyte_alias_references",
  norm_alias_name_column = "name",
  NIST_id_in = "id",
  require_all = FALSE,
  import_map = IMPORT_MAP,
  ensure_unique = TRUE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`compounds_in`	CHR scalar name in ‘obj’ holding compound data (default: “compounddata”)
`compounds_table`	CHR scalar name the database table holding compound data (default: “compounds”)
`compound_category`	CHR or INT scalar of the compound category (either a direct ID or a matching category label in ‘compound_category_table’) (default: NULL)
`compound_category_table`	CHR scalar name the database table holding normalized compound categories (default: “compound_categories”)
`norm_alias_table`	CHR scalar name of the table normalizing analyte alias references (default: “norm_analyte_alias_references”)
`norm_alias_name_column`	CHR scalar name of the column in ‘norm_alias_table’ containing the human-readable expression of alias type classes (default: “name”)
`require_all`	LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`ensure_unique`	LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

INT scalar if successful, result of the call to [add_or_get_id] otherwise

Note

This function is called as part of [full_import()]

resolve_description_NTAMRT

R Documentation

Resolve the method description tables during import

Description

Two tables (and their associated normalization tables) exist in the database to store additional information about mass spectrometric and chromatographic methods. These tables are “ms_descriptions” and “chromatography_descriptions” and cannot be easily mapped directly. This function serves to coerce values supplied during import into that required by the database. Primarily, the issue rests in the need to support multiple descriptions of analytical instrumentation (e.g. multiple mass analyzer types, multiple vendors, multiple separation columns, etc.). Tables targeted by this function are “long” tables that may well have ‘n’ records for each mass spectrometric method.

Usage

resolve_description_NTAMRT(
  obj,
  method_id,
  type = c("massspec", "chromatography"),
  mass_spec_in = "massspectrometry",
  chrom_spec_in = "chromatography",
  db_conn = con,
  fuzzy = TRUE,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`method_id`	INT scalar of the ms_method.id record to associate
`type`	CHR scalar, one of “massspec” or “chromatography” depending on the type of description to add; much of the logic is shared, only details differ
`mass_spec_in`	CHR scalar name of the element in ‘obj’ holding mass spectrometry information (default: “massspectrometry”)
`chrom_spec_in`	CHR scalar name of the element in ‘obj’ holding chromatographic information (default: “chromatography”)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]

This function is brittle; built specifically for the NIST NTA MRT import format. If using a different import format, customize to your needs using this function as a guide.

resolve_fragments_NTAMRT

R Documentation

Resolve the fragments node during database import

Description

Call this function as part of an import routine to resolve the fragments node including fragment inspections and aliases. If the python connection to RDKit is available and no aliases are provided, aliases as defined in ‘rdkit_aliases’ will be generated and stored if ‘generate_missing_aliases’ is set to TRUE. Components of the import file will be collated, have their values normalized, and any new fragment identifiers will be added to the database.

Usage

resolve_fragments_NTAMRT(
  obj,
  sample_id = NULL,
  generation_type = NULL,
  fragments_in = "annotation",
  fragments_table = "annotated_fragments",
  fragments_norm_table = ref_table_from_map(fragments_table, "fragment_id"),
  fragments_sources_table = "fragment_sources",
  citation_info_in = "fragment_citation",
  inspection_info_in = "fragment_inspections",
  inspection_table = "fragment_inspections",
  generate_missing_aliases = FALSE,
  fragment_aliases_in = "fragment_aliases",
  fragment_aliases_table = "fragment_aliases",
  alias_type_norm_table = ref_table_from_map(fragment_aliases_table, "alias_type"),
  inchi_prefix = "InChI=1S/",
  rdkit_name = ifelse(exists("PYENV_NAME"), PYENV_NAME, "rdkit"),
  rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"),
  rdkit_ns = "rdk",
  rdkit_make_if_not = TRUE,
  rdkit_aliases = c("Inchi", "InchiKey"),
  mol_to_prefix = "MolTo",
  mol_from_prefix = "MolFrom",
  type = "smiles",
  import_map = IMPORT_MAP,
  case_sensitive = FALSE,
  fuzzy = FALSE,
  strip_na = TRUE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`sample_id`	INT scalar matching a sample ID to which to tie these fragments (optional, default: NULL)
`generation_type`	CHR scalar containing the generation type as defined in the “norm_generation_type” table (default: NULL will obtain the generation type attached to the ‘sample_id’ by database lookup)
`fragments_in`	CHR scalar name of the ‘obj’ component holding annotated fragment information (default: “annotation”)
`fragments_table`	CHR scalar name of the database table holding annotated fragment information (default: “annotated_fragments”)
`fragments_norm_table`	CHR scalar name of the database table holding normalized fragment identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragments_table’)
`fragments_sources_table`	CHR scalar name of the database table holding fragment source (e.g. generation) information (default: “fragment_sources”)
`citation_info_in`	CHR scalar name of the ‘obj’ component holding fragment citation information (default: “fragment_citation”)
`inspection_info_in`	CHR scalar name of the ‘obj’ component holding fragment inspection information (default: “fragment_inspections”)
`inspection_table`	CHR scalar name of the database table holding fragment inspection information (default: “fragment_inspections”)
`generate_missing_aliases`	LGL scalar determining whether or not to generate machine readable expressions (e.g. InChI) for fragment aliases from RDKit (requires RDKit activation; default: FALSE); see formals list for [add_rdkit_aliases]
`fragment_aliases_in`	CHR scalar name of the ‘obj’ component holding fragment aliases (default: “fragment_aliases”)
`fragment_aliases_table`	CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”)
`rdkit_ref`	CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)
`mol_to_prefix`	CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”)
`mol_from_prefix`	CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”)
`type`	CHR scalar of the type of encoding to use for ‘identifiers’ (default: smiles)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)
`fragment_alias_type_norm_table`	CHR scalar name of the database table holding normalized fragment alias type identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragment_aliases_table’)

Details

Fragments missing structure annotation are supported (e.g. those with a formula but no SMILES notation provided).

For new fragments, the calculated molecular mass is generated by [calculate.monoisotope] from exact masses of each constituent atom. If RDKit is available and a SMILES notation is provided, the formal molecular net charge is also calculated via rdkit.Chem.GetFormalCharge.

Database tables affected by resolving the fragments node include: annotated_fragments, norm_fragments, fragment_inspections, fragment_aliases, and fragment_sources.

Value

INT vector of resolved annotated fragment IDs; executes database actions

Note

This function is called as part of [full_import()]

If components named in ‘citation_info_in’ and ‘inspection_info_in’ do not exist, that information will not be appended to the resulting database records.

Typical usage as part of the import workflow involves simply passing the import object and associated sample id: resolve_fragments_NTAMRT(obj = import_object, sample_id = 1), though wrapper functions like [full_import] also contain name-matched arguments to be passed in a [do.call] context.

resolve_method

R Documentation

Add an ms_method record via import

Description

Part of the data import routine. Adds a record to the “ms_methods” table with the values provided in the JSON import template. Makes extensive uses of [resolve_normalization_value] to parse foreign key relationships.

Usage

resolve_method(
  obj,
  method_in = "massspectrometry",
  ms_methods_table = "ms_methods",
  db_conn = con,
  ensure_unique = TRUE,
  log_ns = "db",
  qc_method_in = "qcmethod",
  qc_search_text = "QC Method Used",
  qc_value_in = "value",
  require_all = TRUE,
  import_map = IMPORT_MAP,
  ...
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`method_in`	CHR scalar name of the ‘obj’ list containing method information
`ms_methods_table`	CHR scalar name of the database table containing method information
`db_conn`	connection object (default: con)
`ensure_unique`	LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)
`qc_method_in`	CHR scalar name of the import object element containing QC method information (default: “qcmethod”)
`qc_search_text`	CHR scalar name of an element in the import object in part ‘qc_method_in’ identifying whether or not a QC method was used (default: “QC Method Used”)
`qc_value_in`	CHR scalar name of an element in the import object corresponding to ‘qc_method_in’ where the value of the metric named for ‘qc_search_text’ is located (default: “value”)
`require_all`	LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`…`	Other named elements to be appended to “ms_methods” as necessary for workflow resolution, can be used to pass defaults or additional values.

Value

INT scalar if successful, result of the call to [add_or_get_id] otherwise

Note

This function is called as part of [full_import()]

resolve_mobile_phase_NTAMRT

R Documentation

Resolve the mobile phase node

Description

The database node containing chromatographic method information is able to handle any number of descriptive aspects regarding chromatography. It houses normalized and aliased data in a manner that maximizes flexibility, allowing any number of carrier agents (e.g. gasses for GC, solvents for LC) to be described in increasing detail. To accommodate that, the structure itself may be unintuitive and may not map well as records may be heavily nested.

Usage

resolve_mobile_phase_NTAMRT(
  obj,
  method_id,
  sample_id,
  peak_id,
  carrier_mix_names = NULL,
  id_mix_by = "^mp*[0-9]+",
  ms_methods_table = "ms_methods",
  sample_table = "samples",
  peak_table = "peaks",
  db_conn = con,
  mix_collection_table = "carrier_mix_collections",
  mobile_phase_props = list(in_item = "chromatography", db_table = "mobile_phases", props
    = c(flow = "flow", flow_units = "flowunits", duration = "duration", duration_units =
    "durationunits")),
  carrier_props = list(db_table = "carrier_mixes", norm_by = "norm_carriers", alias_in =
    "carrier_aliases", props = c(id_by = "solvent", fraction_by = "fraction")),
  additive_props = list(db_table = "carrier_additives", norm_by = "norm_additives",
    alias_in = "additive_aliases", props = c(id_by = "add$", amount_by = "_amount",
    units_by = "_units")),
  exclude_values = c("none", "", NA),
  fuzzy = TRUE,
  clean_up = TRUE,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`method_id`	INT scalar of the method id (e.g. from the import workflow)
`sample_id`	INT scalar of the sample id (e.g. from the import workflow)
`peak_id`	INT scalar of the peak id (e.g. from the import workflow)
`carrier_mix_names`	CHR vector (optional) of carrier mix collection names to assign, the length of which should equal 1 or the length of discrete carrier mixtures; the default, NULL, will automatically assign names as a function of the method and sample id.
`id_mix_by`	CHR scalar regex to identify the elements of ‘obj’ to use for the mobile phase node (default “^mp*[0-9]+“) grouping of carrier mix collections, this is the main piece of connectivity pulling together the descriptions and should only be changed to match different import naming schemes
`ms_methods_table`	CHR scalar name of the methods table (default: “ms_methods”)
`sample_table`	CHR scalar name of the samples table (default: “samples”)
`peak_table`	CHR scalar name of the peaks table (default: “peaks”)
`db_conn`	existing connection object (e.g. of class “SQLiteConnection”)
`mix_collection_table`	CHR scalar name of the mix collections table (default: “carrier_mix_collections”)
`mobile_phase_props`	LIST object describing how to import the mobile phase table containing: in_item: CHR scalar name of the ‘obj’ name containing chromatographic information (default: “chromatography”); db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); props: named CHR vector of name mappings with names equal to database columns in ‘mobile_phase_props$db_table' and values matching regex to match names in 'obj[[mobile_phase_props$in_item]]’
`carrier_props`	LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_carriers”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “carrier_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘carrier_props$db_table' and values matching regex to match names in 'obj[[mobile_phase_props$in_item]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate a carrier (e.g. “solvent”)
`additive_props`	LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_additives”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “additive_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘additive_props$db_table' and values matching regex to match names in 'obj[[mobile_phase_props$in_item]]’ ‘obj[[mobile_phase_props$in_item]][[mobile_phase_props$db_table]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate an additive (e.g. names terminating in “add”)
`exclude_values`	CHR vector indicating which values to ignore in ‘obj’ (default: c(“none”, ““, NA))
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL SQL LIKE clause bookended with wildcards; overrides the ‘case_sensitive’ setting if TRUE (default: FALSE).
`clean_up`	LGL scalar determining whether or not to clean up the ‘mix_collection_table’ by removing just-added records if there are errors adding to ‘carrier_props$db_table’ (default: TRUE)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Details

The mobile phase node contains one record in table “mobile_phases” for each method id, sample id, and carrier mix collection id with its associated flow rate, normalized flow units, duration, and normalized duration units. Each carrier mix collection has a name and child tables containing: records for each value normalized carrier component and its unit fraction (e.g. in carrier_mixes: Helium 1 would indicate pure Helium as a carrier gas in GC work; Water, 0.9; Methanol, 0.1 to indicate a solvent mixture of 10 in water), as well as value normalized carrier additives, their amount, and the units for that amount (mostly for LC work; e.g. in carrier_additives: ammonium acetate, 5, mMol to indicate an additive to a solvent of 5 mMol ammonium acetate); these are linked through the carrier mix collection id.

Call this function to import the results of the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT), or feed it as ‘obj’ a flat list containing chromatography information.

Value

None, executes actions on the database

Note

This is a brittle function, and should only be used as part of the NTA MRT import process, or as a template for how to import data.

Some arguments are complicated by design to keep conceptual information together. These should be fed a structured list matching expectations. This applies to ‘mobile_phase_props’, ‘carrier_props’, and ‘additive_props’. See defaults in documentation for examples.

Database insertions are done in real time, so failures may result in hanging or orphaned records. Turn on ‘clean_up’ to roll back by removing entries from ‘mix_collection_table’ and relying on delete cascades built into the database. Additional names are provided here to match the schema.

This function is called as part of [full_import()]

resolve_ms_data

R Documentation

Resolve and store mass spectral data during import

Description

Use peak IDs generated by the import workflow to assign and store mass spectral data (if coming from the NIST NTA Method Reporting Tool, these will all be in the “separated” format). Optionally also calls [resolve_ms_spectra] if unpack_spectra = TRUE. Mass spectral data are stored in either one (“zipped”)

Usage

resolve_ms_data(
  obj,
  peak_id = NULL,
  peaks_table = "peaks",
  ms_data_in = "msdata",
  ms_data_table = "ms_data",
  unpack_spectra = FALSE,
  ms_spectra_table = "ms_spectra",
  unpack_format = c("separated", "zipped"),
  as_object = FALSE,
  import_map = IMPORT_MAP,
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`peak_id`	INT scalar of the peak ID in question, which must be present
`peaks_table`	CHR scalar name of the peaks table in the database (default: “peaks”)
`ms_data_in`	CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”)
`ms_data_table`	CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)
`unpack_spectra`	LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE)
`ms_spectra_table`	CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)
`unpack_format`	CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”
`as_object`	LGL scalar indicating whether or not to return the result to the session as an object (TRUE) or to add it to the database (default: FALSE)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

If ‘as_object’ == TRUE, a data.frame object containing either packed (if ‘unpack_spectra’ == FALSE) or unpacked (if ‘unpack_spectra’ == TRUE) spectra, otherwise adds spectra to the database

Note

This function is called as part of [full_import()] during the call to [resolve_peaks]

resolve_ms_spectra

R Documentation

Unpack mass spectral data in compressed format

Description

For some spectra, searching in a long form is much more performant. Use this function to unpack data already present in the ‘ms_data’ table into the ‘ms_spectra’ table. Data should be packed in one of two ways, either two columns for mass-to-charge ratio and intensity (“separated” - see [ms_spectra_separated]) or in a single column with interleaved data (“zipped” - see [ms_spectra_zipped]).

Usage

resolve_ms_spectra(
  peak_id,
  spectra_data = NULL,
  peaks_table = "peaks",
  ms_data_table = "ms_data",
  ms_spectra_table = "ms_spectra",
  unpack_format = c("separated", "zipped"),
  as_object = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`peak_id`	INT scalar of the peak ID in question, which must be present
`spectra_data`	data.frame object containing spectral data
`peaks_table`	CHR scalar name of the peaks table in the database (default: “peaks”)
`ms_data_table`	CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)
`ms_spectra_table`	CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)
`unpack_format`	CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”
`as_object`	LGL scalar of whether to return the unpacked spectra to the session (default: TRUE) or to insert into the database (FALSE)
`db_conn`	database connection object (default: con)
`log_ns`	CHR scalar name of the logging namespace to use

Value

If ‘as_object’ == TRUE, a data.frame of unpacked spectra, otherwise no return and a database insertion will be performed

Note

This function may be slow, especially with peaks containing a large number of scans or a large amount of data

References

ms_spectra_separated

ms_spectra_zipped

resolve_multiple_values

R Documentation

Utility function to resolve multiple choices interactively

Description

This function is generally not called directly, but rather as a workflow component from within [resolve_normalization_value] during interactive sessions to get feedback from users during the normalization value resolution process.

Usage

resolve_multiple_values(values, search_value, as_regex = FALSE, db_table = "")

Arguments

`values`	CHR vector of possible values
`search_value`	CHR scalar of the value to search
`as_regex`	LGL scalar of whether to treat ‘search_value’ as a regular expression string (TRUE) or to use it directly (FALSE, default)
`db_table`	CHR scalar name of the database table to search, used for printing log messages only (default: ““)

Value

CHR scalar result of the user’s choice

resolve_normalization_value

R Documentation

Resolve a normalization value against the database

Description

Normalized SQL databases often need to resolve primary keys. This function checks for a given value in a given table and either returns the matching index value or, if a value is not found and ‘interactive()’ is TRUE, it will add that value to the table and return the new index value. It will look for the first matching value in all columns of the requested table to support loose finding of identifiers and is meant to operate only on normalization tables (i.e. look up tables).

Usage

resolve_normalization_value(
  this_value,
  db_table,
  id_column = "id",
  case_sensitive = FALSE,
  fuzzy = FALSE,
  db_conn = con,
  log_ns = "db",
  ...
)

Arguments

`this_value`	CHR (or coercible to) scalar value to look up
`db_table`	CHR scalar of the database table to search
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)
`…`	other values to add to the normalization table, where names must match the table schema

Details

The search itself is done using [check_for_value].

Value

The database primary key (typically INT) of the normalized value

Note

This is mostly a DRY convenience function to avoid having to write the loookup and add logic each time.

Interactive sessions are required to add new values

resolve_peak_ums_params

R Documentation

Resolve and import optimal uncertain mass spectrum parameters

Description

This imports the defined object component containing parameters for the optimized uncertainty mass spectrum used to compare with new data. This function may be called at any time to add data for a given peak, but there is no row unique restriction on the underlying table and is best used in a “one pass” method during the import routine. These parameters are calculated as part of NIST QA procedures and are added to the output of the NTA MRT after those JSONs have been created.

Usage

resolve_peak_ums_params(
  obj,
  peak_id,
  ums_params_in = "opt_ums_params",
  ums_params_table = "opt_ums_params",
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`peak_id`	INT scalar of the peak ID in question, which must be present (e.g. from the import workflow)
`ums_params_in`	CHR scalar name of the item in ‘obj’ containing optimized uncertainty mass spectrum parameters
`ums_params_table`	CHR scalar name of the database table holding optimized uncertainty mass spectrum parameters
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Value

Nothing if successful, a data frame object of the extracted parameters otherwise.

Note

This function is called as part of [resolve_peaks()]

resolve_peaks

R Documentation

Resolve the peaks node during import

Description

Call this function to resolve and insert information for the “peaks” node in the database including software conversion settings (via [resolve_software_settings_NTAMRT]) and mass spectra data (via [resolve_ms_data] and, optionally, [resolve_ms_spectra]). This function relies on the import object being formatted appropriately.

Usage

resolve_peaks(
  obj,
  sample_id,
  peaks_table = "peaks",
  software_timestamp = NULL,
  software_settings_in = "msconvertsettings",
  ms_data_in = "msdata",
  ms_data_table = "ms_data",
  unpack_spectra = FALSE,
  unpack_format = c("separated", "zipped"),
  ms_spectra_table = "ms_spectra",
  linkage_table = "conversion_software_peaks_linkage",
  settings_table = "conversion_software_settings",
  as_date_format = "%Y-%m-%d %H:%M:%S",
  format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"),
  min_datetime = "2000-01-01 00:00:00",
  import_map = IMPORT_MAP,
  ums_params_in = "opt_ums_params",
  ums_params_table = "opt_ums_params",
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	CHR vector describing settings or a named LIST with names matching column names in table conversion_software_settings.
`sample_id`	INT scalar of the sample id (e.g. from the import workflow)
`peaks_table`	CHR scalar of the database table name holding QC method check information (default: “peaks”)
`ms_data_in`	CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”)
`ms_data_table`	CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)
`unpack_spectra`	LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE)
`unpack_format`	CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”
`ms_spectra_table`	CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`ums_params_in`	CHR scalar name of the item in ‘obj’ containing optimized uncertainty mass spectrum parameters
`ums_params_table`	CHR scalar name of the database table holding optimized uncertainty mass spectrum parameters
`db_conn`	Connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

INT scalar of the newly inserted or identified peak ID(s)

Note

This function is called as part of [full_import()]

This function relies on an import map

resolve_qc_data_NTAMRT

R Documentation

Resolve and import quality control data for import

Description

This imports the defined object component containing QC data (i.e. a nested list of multiple quality control checks) from the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT).

Usage

resolve_qc_data_NTAMRT(
  obj,
  peak_id,
  qc_data_in = "qc",
  qc_data_table = "qc_data",
  peaks_table = "peaks",
  ignore = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`peak_id`	INT vector of the peak ids (e.g. from the import workflow)
`qc_data_in`	CHR scalar name of the component in ‘obj’ containing QC data (default: “qc”)
`qc_data_table`	CHR scalar name of the database table holding QC data (default: “qc_data”)
`peaks_table`	CHR scalar name of the database table holding peaks data (default: “peaks”)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]

resolve_qc_methods_NTAMRT

R Documentation

Resolve and import quality control method information

Description

This imports the defined object component containing QC method information (i.e. a data frame of multiple quality control checks) from the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT).

Usage

resolve_qc_methods_NTAMRT(
  obj,
  peak_id,
  qc_method_in = "qcmethod",
  qc_method_table = "qc_methods",
  qc_method_norm_table = "norm_qc_methods_name",
  qc_method_norm_reference = "norm_qc_methods_reference",
  qc_references_in = "source",
  peaks_table = "peaks",
  ignore = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`peak_id`	INT vector of the peak ids (e.g. from the import workflow)
`qc_method_in`	CHR scalar of the name in ‘obj’ that contains QC method check information (default: “qcmethod”)
`qc_method_table`	CHR scalar of the database table name holding QC method check information (default: “qc_methods”)
`qc_method_norm_table`	CHR scalar name of the database table normalizing QC methods type (default: “norm_qc_methods_name”)
`qc_method_norm_reference`	CHR scalar name of the database table normalizing QC methods reference type (default: “norm_qc_methods_reference”)
`qc_references_in`	CHR scalar of the name in ‘obj[[qc_method_in]]’ that contains the reference or citation for the QC protocol (default: “source”)
`peaks_table`	CHR scalar name of the database table holding sample information (default: “samples”)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]

resolve_sample

R Documentation

Add a sample via import

Description

Part of the data import routine. Adds a record to the “samples” table with the values provided in the JSON import template. Uses [verify_sample_class] and [verify_contributor] to parse foreign key relationships, [resolve_method] to add a record to ms_methods to get the proper id, and [resolve_software_settings_NTAMRT] to insert records into and get the proper conversion software linkage id from tables “conversion_software_settings” and “conversion_software_linkage” if appropriate.

Usage

resolve_sample(
  obj,
  db_conn = con,
  method_id = NULL,
  sample_in = "sample",
  sample_table = "samples",
  generation_type = NULL,
  generation_type_default = "empirical",
  generation_type_norm_table = "norm_generation_type",
  import_map = IMPORT_MAP,
  ensure_unique = TRUE,
  require_all = TRUE,
  fuzzy = FALSE,
  case_sensitive = TRUE,
  log_ns = "db",
  ...
)

Arguments

`obj`	LIST object containing data formatted from the import generator
`db_conn`	connection object (default: con)
`method_id`	INT scalar of the associated ms_methods record id
`sample_in`	CHR scalar of the import object name storing sample data (default: “sample”)
`sample_table`	CHR scalar name of the database table holding sample information (default: “samples”)
`generation_type`	CHR scalar of the type of data generated for this sample (e.g. “empirical” or “in silico”). The default (NULL) will assign based on ‘generation_type_default’; any other value will override the default value and be checked against values in ‘geneation_type_norm_table’
`generation_type_default`	CHR scalar naming the default data generation type (default: “empirical”)
`generation_type_norm_table`	CHR scalar name of the database table normalizing sample generation type (default: “empirical”)
`import_map`	data.frame object of the import map (e.g. from a CSV)
`ensure_unique`	LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)
`require_all`	LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)
`fuzzy`	LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).
`case_sensitive`	LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)
`…`	Other named elements to be appended to samples as necessary for workflow resolution, can be used to pass defaults or additional values.

Value

INT scalar if successful, result of the call to [add_or_get_id] otherwise

Note

This function is called as part of [full_import()]

resolve_sample_aliases

R Documentation

Resolve and import sample aliases

Description

Call this function to attach sample aliases to a sample record in the database. This can be done either through the import object with a name reference or directly by assigning additional values.

Usage

resolve_sample_aliases(
  sample_id,
  obj = NULL,
  aliases_in = NULL,
  values = NULL,
  db_table = "sample_aliases",
  db_conn = con,
  log_ns = "db"
)

Arguments

`sample_id`	INT scalar of the sample id (e.g. from the import workflow)
`obj`	(optional) LIST object containing data formatted from the import generator (default: NULL)
`aliases_in`	(optional) CHR scalar of the name in ‘obj’ containing the sample aliases in list format (default: NULL)
`values`	(optional) LIST containing the sample aliases with names as the alias name and values containing the reference (e.g. URI, link to a containing repository, or reference to the owner or project from which a sample is drawn) to that alias
`db_table`	CHR scalar name of the database table containing sample aliases (default: “sample_aliases”)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]

One of ‘values’ or both of ‘obj’ and ‘aliases_in’ must be provided to add new sample aliases.

resolve_software_settings_NTAMRT

R Documentation

Import software settings

Description

Part of the standard import pipeline, adding rows to the ‘conversion_software_settings’ table with a given sample id. Some argument names are shared with other import functions, specifically ‘obj’ but are formed differently to resolve the node complexity correctly.

Usage

resolve_software_settings_NTAMRT(
  obj,
  software_timestamp = NULL,
  db_conn = con,
  software_settings_in = "msconvertsettings",
  settings_table = "conversion_software_settings",
  linkage_table = "conversion_software_peaks_linkage",
  as_date_format = "%Y-%m-%d %H:%M:%S",
  format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"),
  min_datetime = "2000-01-01 00:00:00",
  log_ns = "db"
)

Arguments

`obj`	CHR vector describing settings or a named LIST with names matching column names in table conversion_software_settings.
`software_timestamp`	CHR scalar of the sample timestamp (e.g. sample$starttime) to use for linking software conversion settings with peak data, with a call back to the originating sample. If NULL (the default), the current system timestamp in UTC will be used from [lubridate::now()].
`db_conn`	connection object (default: con)
`software_settings_in`	CHR scalar name of the component in ‘obj’ containing software settings (default: “msconvertsettings”)
`settings_table`	CHR scalar name of the database table containing the software settings used for an imported data file (default: “conversion_software_settings”)
`linkage_table`	CHR scalar name of the database table containing the linkage between peaks and their software settings (default: “conversion_software_peaks_linkage”)
`as_date_format`	CHR scalar the format to use when storing timestamps that matches database column expectations (default: “%Y-%m-%d %H:%M:%S”)
`format_checks`	CHR vector of the [lubridate::parse_date_time()] format checks to execute in order of priority; these must match a lubridate function of the same name (default: c(“ymd_HMS”, “ydm_HMS”, “mdy_HMS”, “dmy_HMS”))
`min_datetime`	CHR scalar of the minimum reasonable timestamp used as a sanity check (default: “2000-01-01 00:00:00”)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

NULL on errors, INT scalar of the inserted software linkage id if successful

Note

This function is called as part of [full_import()]

resolve_table_name

R Documentation

Check presence of a database table

Description

This convenience function checks for the existence of one or more ‘db_table’ objects in a database.

Usage

resolve_table_name(db_table = "compounds", db_conn = "test_con")

Arguments

`db_table`	CHR vector of table names to check
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the namespace (if any) to use for logging (default: “db”)

Value

CHR vector of existing tables

save_data_dictionary

R Documentation

Save the current data dictionary to disk

Description

Executes [data_dictionary()] and saves the output to a local file. If output_format is one of “data.frame” or “list”, the resulting file will be saved as an RDS. Parameter output_file will be used during the save process; relative paths will be identified by the current working directory.

Usage

save_data_dictionary(db_conn = con)

Arguments

`db_conn`	connection object (default: con)
`output_format`	CHR scalar, one of (capitalization insensitive) “json”, “csv”, “data.frame”, or “list” (default “json”)
`output_file`	CHR scalar indicating where to save the resulting file; an appropriate file name will be constructed if left NULL (default: NULL)
`overwrite_existing`	LGL scalar indicating whether to overwrite an existing file whose name matches that determined from ‘output_file’ (default: TRUE); file names will be appended with “(x)” sequentially if this is FALSE and a file with matching name exists.

Value

None, saves a file to the current working directory

search_all

R Documentation

Search all mass spectra within database against unknown mass spectrum

Description

Search all mass spectra within database against unknown mass spectrum

Usage

search_all(
  con,
  searchms,
  normfn = "sum",
  cormethod = "pearson",
  optimized_params = TRUE
)

Arguments

`con`	SQLite database connection
`searchms`	object generated from ‘create_search_ms’ function
`normfn`	the normalization function typically “mean” or “sum” for normalizing the intensity values
`cormethod`	the correlation method used for calculating the correlation, see ‘cor’ function for methods
`optimized_params`	LGL scalar indicating whether or not to use the optimal search parameters stored in the database table ‘opt_ums_params’

Value

LIST of search results

search_precursor

R Documentation

Search the database for all compounds with matching precursor ion m/z values

Description

Search the database for all compounds with matching precursor ion m/z values

Usage

search_precursor(
  con,
  searchms,
  normfn = "sum",
  cormethod = "pearson",
  optimized_params = TRUE
)

Arguments

`con`	SQLite database connection
`searchms`	object generated from ‘create_search_ms’ function
`normfn`	the normalization function typically “mean” or “sum” for normalizing the intensity values
`cormethod`	the correlation method used for calculating the correlation, see ‘cor’ function for methods
`optimized_params`	LGL scalar indicating whether or not to use the optimal search parameters stored in the database table ‘opt_ums_params’

Value

table of match statistics for the compound of interest

setup_rdkit

R Documentation

Conveniently set up an RDKit python environment for use with R

Description

Conveniently set up an RDKit python environment for use with R

Usage

setup_rdkit(env_name = "nist_hrms_db", required_libraries = c("reticulate", "rdkit"), env_ref = "rdk")

Arguments

`env_name`	CHR scalar of the name of a python environment
`env_ref`	CHR scalar of the name of an R expression bound to a python library OR an R object reference by name to an existing object that should be bound to RDKit (e.g. from [reticulate::import])
`ns`	CHR scalar

Value

None, though calls to utility functions will give their own returns

sigtest

R Documentation

Significance testing function

Description

Internal function: enables significance testing between two values

Usage

sigtest(x1, x2, s1, s2, n1, n2, sig = 0.95)

Arguments

`x1, x2`	mean values to be compared
`s1, s2`	standard deviation of their respective values
`n1, n2`	number of observations of the respective values
`sig`	significance level to test (0.95 = 95%)

smilestoformula

R Documentation

Convert SMILES string to Formula and other information

Description

The function converts SMILES strings into a data frame containing the molecular formula (FORMULA), fixed mass of the formula (FIXED MASS), and the net charge (NETCHARGE).

Usage

smilestoformula(SMILES)

Arguments

SMILES

vector of SMILES strings

Value

data frame

Examples

smilestoformula(c("CCCC", "C(F)(F)F"))

smilestoformula("CCCC")

sql_to_msp

R Documentation

Export SQL Database to a MSP NIST MS Format

Description

Export SQL Database to a MSP NIST MS Format

Usage

sql_to_msp(
  con,
  optimized_params = TRUE,
  outputfile = paste0("DimSpecExport", Sys.Date(), ".msp"),
  cormethod = "pearson",
  normfn = "sum"
)

Arguments

`con`	SQLite database connection
`optimized_params`	Boolean TRUE indicates that the optimized parameters for uncertainty mass spectra will be used.
`outputfile`	Text string file name and/or location to save MSP file format
`cormethod`	Text string type of correlation function to use (DEFAULT = ‘pearson’)
`normfn`	Text string type of normalization function to use (DEFAULT = ‘sum’)

Value

None, saves a *.msp file to the local file system.

sqlite_auto_trigger

R Documentation

Create a basic SQL trigger for handling foreign key relationships

Description

This creates a simple trigger designed to streamline foreign key compliance for SQLite databases. Resulting triggers will check during table insert or update actions that have one or more foreign key relationships defined as ‘target_table.fk_col = norm_table.pk_col’. It is primarily for use in controlled vocabulary lists where a single id is tied to a single value in the parent table, but more complicated relationships can be handled.

Usage

sqlite_auto_trigger(target_table = "test", fk_col = c("col1", "col2",
  "col3"), norm_table = c("norm_col1", "norm_col2", "norm_col3"), pk_col =
  "id", val_col = "value", action_occurs = "after", trigger_action =
  "insert", table_action = "update")

Arguments

`target_table`	CHR scalar name of a table with a foreign key constraint.
`fk_col`	CHR vector name(s) of the column(s) in ‘target_table’ with foreign key relationship(s) defined.
`norm_table`	CHR vector name(s) of the table(s) containing the primary key relationship(s).
`pk_col`	CHR vector name(s) of the column(s) in ‘norm_table’ containing the primary key(s) side of the relationship(s).
`val_col`	CHR vector name(s) of the column(s) in ‘norm_table’ containing values related to the primary key(s) of the relationship(s).
`action_occurs`	CHR scalar on when to run the trigger, must be one of ‘c(“before”, “after”, “instead”)’ (“instead” should only be used if ‘target_table’ is a view - this restriction is not enforced).
`trigger_action`	CHR scalar on what type of trigger this is (e.g. ‘when’ = “after” and ‘trigger_action’ = “insert” -> “AFTER INSERT INTO”) and must be one of ‘c(“insert”, “update”, “delete”)’.
`for_each`	CHR scalar for SQLite this must be only ‘row’ - translated into a “FOR EACH ROW” clause. Set to any given noun for other SQL engines supporting other trigger transaction types (e.g. “FOR EACH STATEMENT” triggers)
`table_action`	CHR scalar on what type of action to run when the trigger fires, must be one of ‘c(“insert”, “update”, “delete”)’.
`filter_col`	CHR scalar of a filter column to override the final WHERE clause in the trigger. This should almost always be left as the default ““.
`filter_val`	CHR scalar of a filter value to override the final WHERE clause in the trigger. This should almost always be left as the default ““.
`or_ignore`	LGL scalar on whether to ignore insertions to normalization tables if an error occurs (default: TRUE, which can under certain conditions raise exceptions during execution of the trigger if more than a single value column exists in the parent table)
`addl_actions`	CHR vector of additional target actions to add to ‘table_action’ statements, appended to the end of the resulting “insert” or “update” actions to ‘target_table’. If multiple tables are in use, use positional matching in the vector (e.g. with three normalization tables, and additional actions to only the second, use c(““,”additional actions”, ““))

Details

These are intended as native database backup support for when connections do not change the default SQLite setting of PRAGMA foreign_keys = off. Theoretically any trigger could be created, but should only be used with care outside the intended purpose.

Triggers created by this function will check all new INSERT and UPDATE statements by checking provided values against their parent table keys. If an index match is found no action will be taken on the parent table. If no match is found, it is assumed this is a new normalized value and it will be added to the normalization table and the resulting new key will be replaced in the target table column.

Value

CHR scalar of class glue containing the SQL necessary to create a trigger. This is raw text; it is not escaped and should be further manipulated (e.g. via dbplyr::sql()) as your needs and database communication pipelines dictate.

Note

While this will work on any number of combinations, all triggers should be heavily inspected prior to use. The default case for this trigger is to set it for a single FK/PK relationship with a single normalization value. It will run on any number of normalized columns however trigger behavior may be unexpected for more complex relationships.

If ‘or_ignore’ is set to TRUE, errors in adding to the parent table will be ignored silently, possibly causing NULL values to be inserted into the target table foreign key column. For this reason it is recommended that the ‘or_ignore’ parameter only be set to true to expand parent table entries, but it will only supply a single value for the new normalization table. If additional columns in the parent table must be populated (e.g. the parent table has two required columns “value” and “acronym”), it is recommended to take care of those prior to any action that would activate these triggers.

Parameters are not checked against a schema (e.g. tables and columns exist, or that a relationships exists between tables). This function processes only text provided to it.

Define individual relationships between ‘fk_col’, ‘norm_table’, ‘pk_col’, and ‘val_col’ as necessary. Lengths for these parameters should match in a 1:1:1:1 manner to fully describe the relationships. If the schema of all tables listed in ‘norm_table’ are close matches, e.g. all have two columns “id” and “value” then ‘pk_col’ and ‘val_col’ will be reused when only a single value is provided for them. That is, provided three ‘norm_table’(s) and one ‘pk_col’ and one ‘val_col’, the arguments for ‘pk_col’ and ‘val_col’ will apply to each ‘norm_table’.

The usage example is built on a hypothetical SQLite schema containing four tables, one of which (“test” - with columns “id”, “col1”, “col2”, and “col3”) defines foreign key relationships to the other three (“norm_col1”, “norm_col2”, and “norm_col3”).

Create a basic SQL view of a normalized table

Description

Many database viewers will allow links for normalization tables to get the human-readable value of a normalized column. Instead it is often preferable to build in views automatically that “denormalize” such tables for display or use in an application. This function seeks to script the process of creating those views. It examines the table definition from [pragma_table_info] and will extract the primary/foreign key relationships to build a “denormalized” view of the table using [get_fkpk_relationships] which requires a database map created from [er_map] and data dictionary created from [data_dictionary].

Usage

sqlite_auto_view(table_pragma = pragma_table_info("contributors"),
  target_table = "contributors", relationships =
  get_fkpk_relationships(db_map = er_map(con), dictionary =
  data_dictionary(con)), drop_if_exists = FALSE)

Arguments

`table_pragma`	data.frame object from [pragma_table_info] for a given table name in the database
`target_table`	CHR scalar name of the database table to build for, which should be present in the relationship definition
`relationships`	data.frame object describing the foreign key relationships for ‘target_table’, which should generally be the result of a call to [get_fkpk_relationships]
`drop_if_exists`	LGL scalar indicating whether to include a “DROP VIEW” prefix for the generated view statement; as this has an impact on schema, no default is set

Details

TODO for v2: abstract the relationships call by looking for objects in the current session.

Value

CHR scalar of class glue containing the SQL necessary to create a “denormalized” view. This is raw text; it is not escaped and should be further manipulated (e.g. via dbplyr::sql()) as your needs and database communication pipelines dictate.

Note

No schema checking is performed by this function, but rather relies on definitions from other functions.

This example will run slowly if the database map [er_map] and dictionary [data_dictionary] haven’t yet been called. If they exist in your session, use those as arguments to get_fkpk_relationships.

Parse SQL build statements

Description

Reading SQL files directly into R can be problematic. This function is primarily called in [create_fallback_build]. To support multiline, human-readable SQL statements, ‘sql_statements’ must be of length 1.

Usage

example_file <- "./config/sql_nodes/reference.sql"
if (file.exists(example_file)) {
  build_commands <- readr::read_file(example_file)
  sqlite_parse_build(build_commands)
}

Arguments

`sql_statements`	CHR scalar of SQL build statements from an SQL file.
`magicsplit`	CHR scalar regex indicating some “magic” split point SQL comment to simplify the identification of discrete commands; will be used to split results (optional but highly recommended)
`header`	CHR scalar regex indicating the format of header comments SQL comment to remove (optional)
`section`	CHR scalar regex indicating the format of section comments SQL comment to remove (optional)

Details

All arguments ‘magicsplit’, ‘header’, and ‘section’ provide flexibility in the comment structure of the SQL file and accept regex for character matching purposes.

Value

LIST of parsed complete build commands as CHR vectors containing each line.

sqlite_parse_import

R Documentation

Parse SQL import statements

Description

In the absence of the sqlite command line interface (CLI), the [build_db] process needs a full set of SQL statements to execute directly rather than CLI dot commands. This utility function parses formatted SQL statements containing CLI “.import” commands to create SQL INSERT statements. This function is primarily called in [create_fallback_build].

Usage

if (file.exists("./config/data/elements.csv")) {
  sqlite_parse_import(".import --csv --skip 1 ./config/data/elements.csv elements")
}

Arguments

build_statements

CHR vector of SQL build statements from an SQL file.

Value

LIST of parsed .import statements as full “INSERT” statements.

start_api

R Documentation

Start the plumber interface from a clean environment

Description

This convenience function launches the plumber instance if it was not set to launch during the session setup. It is a thin wrapper with a more intuitive name than [api_reload] and the default background setting turned off to test the server in the current session.

Usage

start_api()

Arguments

`plumber_file`	CHR scalar name of the plumber definition file, which should be in `src_dir` (default: NULL)
`plumber_host`	CHR scalar of the host server address (default: NULL)
`plumber_port`	INT scalar of the listening port on the host server (default: NULL)
`background`	LGL scalar of whether to launch the API in a background process (default: FALSE)
`src_dir`	CHR scalar file path to settings and functions enabling the plumber API (default: here::here(“inst”, “plumber”))
`log_ns`	CHR scalar name of the logging namespace to use for this function (default: “api”)

Value

None, launches the plumber instance

Note

This function is intended to pull from the environment variables identifying the plumber file, host, and port.

start_app

R Documentation

WIP Launch a shiny application

Description

Call this function to launch an app either directly or in a background process. The name must be present in the app directory or as a named element of SHINY_APPS in the current environment.

Usage

start_app("table_explorer")

Arguments

`app_name`	CHR scalar name of the shiny app to run, this should be the name of a directory containing a shiny app that is located within the directory defined by `app_dir` or the name of an app as defined in your environment SHINY_APPS variable
`app_dir`	file path to a directory containing shiny apps (default: here::here(“inst”, “apps”))
`background`	LGL scalar of whether to launch the application in a background process (default: FALSE)
`…`	Other named parameters to be passed to [shiny::runApp]

Value

None, launches a browser with the requested shiny application

Note

Background launching of shiny apps is not yet supported.

start_rdkit

R Documentation

Start the RDKit integration

Description

If the session was started without RDKit integration, e.g. INFORMATICS or USE_RDKIT were FALSE in [config/env_R.R], start up RDKit in this session.

Usage

start_rdkit(src_dir = here::here("inst", "rdkit"), log_ns = "rdkit")

Arguments

`src_dir`	CHR scalar file path to settings and functions enabling rdkit (default: here::here(“inst”, “rdkit”))
`log_ns`	CHR scalar name of the logging namespace to use for this function (default: “rdkit”)

Value

LGL scalar indicating whether starting RDKit integration was successful

Note

RDKit and rcdk are incompatible. If the session was started with INFORMATICS = TRUE and USE_RDKIT = FALSE, ChemmineR was likely loaded. If this is the case, the session will need to be restarted due to java conflicts between the two.

summarize_check_fragments

R Documentation

Summarize results of check_fragments function

Description

Summarize results of check_fragments function

Usage

summarize_check_fragments(fragments_checked)

Arguments

fragments_checked

output of ‘check_fragments’ function

Value

table summary of check_fragments function

support_info

R Documentation

R session information for support needs

Description

Several items of interest for this particular project including: - DB_DATE, DB_VERSION, BUILD_FILE, LAST_DB_SCHEMA, LAST_MODIFIED, DEPENDS_ON, and EXCLUSIONS as defined in the project’s ../config/env_R.R file.

Usage

support_info()

Arguments

app_info

BOOL scalar on whether to return this application’s properties

Value

LIST of values

suspectlist_at_NIST

R Documentation

Open the NIST PDR entry for the current NIST PFAS suspect list

Description

This simply points your browser to the NIST public data repository for the current NIST suspect list, where you can find additional information. Click the download button in the left column of any file to download it. s Requires the file “suspectlist_url.txt” to be present in the ‘config’ subdirectory of the current working directory.

Usage

suspectlist_at_NIST(url_file = file.path("config", "suspectlist_url.txt"))

Value

none

Examples

suspectlist_at_NIST()

table_msdata

R Documentation

Tabulate MS Data

Description

Pulls specified MS Data from mzML and converts it into table format for further processing Internal function for ‘peak_gather_json’ function

Usage

table_msdata(mzml, scans, mz = NA, zoom = NA, masserror = NA, minerror = NA)

Arguments

`mzml`	list of msdata from ‘mzMLtoR’ function
`scans`	integer vector containing scan numbers to extract MS data
`mz`	numeric targeted m/z
`zoom`	numeric vector specifying the range around m/z, from m/z - zoom[1] to m/z + zoom[2]
`masserror`	numeric relative mass error (in ppm) of the instrument
`minerror`	numeric minimum mass error (in Da) of the instrument

Value

data.frame containing MS data

tack_on

R Documentation

Append additional named elements to a list

Description

This does nothing more than [base::append] ellipsis arguments to be added directly to the end of an existing list object. This primarily supports additional property assignment during the import process for future development and refinement. Call this as part of any function with additional arguments. This may result in failures or ignoring unrecognized named parameters. If no additional arguments are passed obj is returned as provided.

Usage

tack_on(obj, ..., log_ns = "db")

Arguments

`obj`	LIST of any length to be appended to
`…`	Additional arguments passed to/from the ellipsis parameter of calling functions. If named, names are preserved.
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

LIST object of length equal to obj plus additional named arguments

Note

If duplicate names exists in obj and those provided as ellipsis arguments, those provided as part of the ellipsis will replace those in obj.

Examples

tack_on(list(a = 1:3), b = letters, c = rnorm(10))
tack_on(list(a = 1:3))

tidy_comments

R Documentation

Tidy up table and field comments

Description

Creates more human-readable outputs after extracting the raw SQL used to build entities and parsing out the comments as identified with the /* … */ multi-line comment flag pair. Single line comments are not extracted. The first comment is assumed to be the table comment. See examples in the ‘config/sql_nodes’ directory.

Usage

tidy_comments(pragma_table_def("compounds", get_sql = TRUE))

Arguments

obj

result of calling [pragma_table_def] with ‘get_sql’ = TRUE

Value

LIST of length equal to ‘obj’ containing extracted comments

tidy_ms_spectra

R Documentation

Tidy Spectra

Description

A convenience function to take outputs from [ms_spectra_separated] and [ms_spectra_zipped] and return them as a tidy data frame by unpacking the list column “spectra”.

Usage

tidy_ms_spectra(df = packed_data)

Arguments

`df`	data.frame object containing nested spectra in a column

Value

data.frame object containing tidy spectra

tidy_spectra

R Documentation

Decompress Spectra

Description

This convenience wrapper will automatically decompress ms spectra in the “separate” and “zipped” formats and return them as tidy data frames suitable for further manipulation or visualization.

Usage

tidy_spectra(
  target,
  is_file = FALSE,
  is_format = c("separated", "zipped"),
  spectra_set = "msdata",
  ms_col_sep = c("measured_mz", "measured_intensity"),
  ms_col_zip = "data",
  is_JSON = FALSE
)

Arguments

`target`	CHR scalar file path to use OR an R object containing compressed spectral data in the “separate” or “zipped” format
`is_file`	BOOL scalar of whether or not ‘target’ is a file. Set to FALSE to use an existing R object, which should contain an object with a named element matching parameter ‘spectra_set’ (default TRUE)
`is_format`	CHR scalar of the compression format, which must be one of the supported compression forms (“separated” or “zipped”); ignored if the compression format can be inferred from the text in ‘target’ (default “separate”)
`spectra_set`	CHR scalar of the object name holding a spectra data frame to decompress (default “msdata”)
`ms_col_sep`	CHR vector of the column names holding spectral masses and intensities in the “separate” format (default c(“masses”, “intensities”))
`ms_col_zip`	CHR scalar of the name of the column holding spectral masses and intensities in the “unzip” format (default “msdata”)
`is_JSON`	BOOL scalar of whether or not ‘target’ is a JSON expression needing conversion (default TRUE)

Value

data.frame object containing unpacked spectra

Examples

tidy_spectra('{"msdata": "712.9501 15094.41015625 713.1851 34809.9765625"}', is_format = "zipped")
tidy_spectra('{"measured_mz":"712.9501 713.1851","measured_intensity":"15094.41015625 34809.9765625"}')

unzip

R Documentation

Unzip binary data into vector

Description

Unzip binary data into vector

Usage

unzip(x, type = "gzip")

Arguments

`x`	String of binary data to convert
`type`	type of compression (see ‘base::memDecompress’). Default is ‘gzip’

Value

vector containing data from converted binary data

update_all

R Documentation

Convenience function to rebuild all database related files

Description

This is a development and deployment function that should be used with caution. It is intended solely to assist with the development process of rebuilding a database schema from source files and producing the supporting data. It will create both the JSON expressin of the data dictionary and the fallback SQL file.

Usage

update_all()

Arguments

`rebuild`	LGL scalar indicating whether to first rebuild from environment settings (default: FALSE for safety)
`api_running`	LGL scalar of whether or not the API service is currently running (default: TRUE)
`api_monitor`	process object pointing to the API service (default: NULL)
`db`	CHR scalar of the database name (default: session value DB_NAME)
`build_from`	CHR scalar of a SQL build script to use (default: environment value DB_BUILD_FILE)
`populate`	LGL scalar of whether to populate with data from the file in ‘populate_with’ (default: TRUE)
`populate_with`	CHR scalar for the populate script (e.g. “populate_demo.sql”) to during after the build is complete; (default: session value DB_DATA); ignored if ‘populate = FALSE’
`archive`	LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE), passed to [‘remove_db()’]
`sqlite_cli`	CHR scalar to use to look for installed sqlite3 CLI tools in the current system environment (default: session value SQLITE_CLI)
`connect`	LGL scalar of whether or not to connect to the rebuilt database in the global environment as object ’con“ (default: FALSE)
`log_ns`	CHR scalar of the logging namespace to use during execution (default: “db”)

Details

!! To preserve data, do not call this with both ‘rebuild’ = TRUE and ‘archive’ = FALSE !!

Value

Files for the new database, fallback build, and data dictionary will be created in the project directory and objects will be created in the global environment for the database map (LIST “db_map”) and current dictionary (LIST “db_dict”)

Note

This does not recast the views and triggers files created through [sqlite_autoview] and [sqlite_autotrigger] as the output of those may often need additional customization. Existing auto-views and -triggers will be created as defined. To exclude those, first modify the build file referenced by [build_db].

This requires references to be in place to the individual functions in the current environment.

update_data_sources

R Documentation

Dump current database contents

Description

Perform one or both of two main tasks for backing up the NTA database.

Usage

update_data_sources(
  project,
  data_dir = file.path("config", "data"),
  create_backups = TRUE,
  dump_tables = TRUE,
  dump_sql = TRUE,
  db_conn = con,
  sqlite_cli = ifelse(exists("SQLITE_CLI"), SQLITE_CLI, NULL),
  db_name = ifelse(exists("DB_NAME"), DB_NAME, NULL)
)

Arguments

`project`	CHR scalar of the directory containing project specific data (required, no default)
`data_dir`	CHR scalar of the directory containing project independent data sources used for population (default: ‘file.path(“config”, “data”)’)
`create_backups`	LGL scalar indicating whether to create backups prior to writing updated data files (default: TRUE)
`dump_tables`	LGL scalar indicating whether to dump contents of database tables as comma-separated-value files (default: TRUE)
`dump_sql`	LGL scalar indicating whether to create an SQL dump file containing both schema and data as a backup (default: TRUE)
`db_conn`	connection object (default: con)
`SQLITE_CLI`	CHR scalar system reference to your installation of the sqlite command line interface

Details

The main task is to update CSV files in the config/data directory with the current contents of the database. This is done on a table by table basis and results in flat files whose structures no longer interrelate except numerically. Primarily this would be used to migrate database contents to other systems or for further manipulation. Please specify a ‘project’ that project-specific information can be maintained.

Backups created with this function are placed in a “backups” subdirectory of the directory defined by parameter ‘data_dir’. If ‘dump_sql = TRUE’ SQL dump files will be written to “backups/sqlite” with file names equal to the current database name prefixed by date.

Value

None, copies database information to the local file system

update_env_from_file

R Documentation

Update a conda environment from a requirements file

Description

The ‘requirements_file’ can be any formatted file that contains a definition for python libraries to add to an environment (e.g. requirements.txt, environment.yml, etc) that is understood by conda. Relative file paths are fine, but the file will not be discovered (e.g. by ‘list.files’) so specificity is always better.

Usage

update_env_from_file("nist_hrms_db")

Arguments

`env_name`	CHR scalar of a python environment
`requirements_file`	CHR scalar file path to a suitable requirements.txt or environment.yml file
`conda_alias`	CHR scalar of the command line interface alias for your conda tools (default: NULL is translated first to the environment variable CONDA_CLI and then to “conda”)

Details

This is a helper function, largely to support versions of reticulate prior to the introduction of the environment argument in version 1.24+.

Value

None, directly updates the referenced python environment

Note

This requires conda CLI tools to be installed.

A default installation alias of “conda” is assumed.

Set global variable ‘CONDA_CLI’ to your conda alias for better support.

update_logger_settings

R Documentation

Update logger settings

Description

This applies the internal routing and formatting for logger functions to the current value of the LOGGING object. If LOGGING is changed (i.e. a logging namespace is added or changed) this function should be run to update routing and formatting to be in line with the current settings.

Usage

update_logger_settings(log_all_warnings = FALSE, log_all_errors =
  FALSE)

Arguments

`log_all_warnings`	LGL scalar indicating whether or not to log all warnings (default: TRUE)
`log_all_errors`	LGL scalar indicating whether or not to log all errors (default: TRUE)

Value

None

Note

The calling stack for auto logging of warnings and errors does not work with background processes. These settings call [logger::log_warnings()] and [logger::log_errors()].

This function is used only for its side effects.

user_guide

R Documentation

Launch the User Guide for DIMSpec

Description

Use this function to launch the bookdown version of the User Guide for the NIST Database Infrastructure for Mass Spectrometry (DIMSpec) Toolkit

Usage

user_guide()

Arguments

`path`	CHR scalar representing a valid file path to the local user guide
`url_gh`	CHR scalar pointing to the web resource, in this case the URL to the User Guide hosted on GitHub pages
`view_on_github`	LGL scalar of whether to use the hosted version of the User Guide on GitHub (default: TRUE is recommended) which will always display the most up to date version

Value

None, opens a browser to the index page of the User Guide

Note

This works ONLY when DIMSpec is used as a project with the defined directory structure

valid_file_format

R Documentation

Ensure files uploaded to a shiny app are of the required file type

Description

This input validation check uses [tools::file_ext] to ensure that files uploaded to [shiny::fileInput] are among the acceptable file formats. Users may sometimes wish to load a file outside the “accepts” format list by manually changing it during the upload process. If they are not, a [nist_shinyalert] modal is displayed prompting the user to upload a file in one of the requested formats.

Usage

req(valid_file_format(input$file_upload, c(".csv", ".xls")))

Arguments

`filename`	CHR scalar name of the file uploaded to the shiny server
`accepts`	CHR vector of acceptable file formats
`show_alert`	LGL scalar indicating whether or not to show an alert, set FALSE to return the status of the check

Value

Whether or not all required values are present.

validate_casrns

R Documentation

Validate a CAS RN

Description

Chemical Abstract Service (CAS) Registry Numbers (RNs) follow a standard creation format. From [https://www.cas.org/support/documentation/chemical-substances/faqs], a CAS RN is a “numeric identifier that can contain up to 10 digits, divided by hyphens into three parts. The right-most digit is a check digit used to verify the validity and uniqueness of the entire number. For example, 58-08-2 is the CAS Registry Number for caffeine.”

Usage

validate_casrns(casrn_vec, strip_bad_cas = TRUE)

Arguments

`casrn_vec`	CHR vector of what CAS RNs to validate
`strip_bad_cas`	LGL scalar of whether to strip out invalid CAS RNs (default: TRUE)

Details

Provided CAS RNs in ‘casrn_vec’ are validated for format and their checksum digit. Those failing will be printed to the console by default, and users have the option of stripping unverified entries from the return vector.

This only validates that a CAS RN is properly constructed; it does not indicate that the registry number exists in the CAS Registry.

See [repair_xl_casrn_forced_to_date] as one possible pre-processing step.

Value

CHR vector of length equal to that of ‘casrn_vec’

Examples

validate_casrns(c("64324-08-9", "64324-08-5", "12332"))
validate_casrns(c("64324-08-9", "64324-08-5", "12332"), strip_bad_cas = FALSE)

validate_column_names

R Documentation

Ensure database column presence

Description

When working with SQL databases, this convenience function validates any number of column names by comparing against the list of column names in any number of tables. Typically it is called transparently inline to cause execution failure when column names are not present in referenced tables during build of SQL queries.

Usage

validate_column_names(con, "peaks", "id")

Arguments

`db_conn`	connection object (e.g. of class “SQLiteConnection”)
`table_names`	CHR vector of tables to search
`column_names`	CHR vector of column names to validate

Value

None

validate_tables

R Documentation

Ensure database table presence

Description

When working with SQL databases, this convenience function validates any number of table names by comparing against the list of those present. Typically it is called transparently inline to cause execution failure when tables are not present during build of SQL queries.

Usage

validate_tables(con, "peaks")

Arguments

`db_conn`	connection object (e.g. of class “SQLiteConnection”)
`table_names`	CHR vector name of tables to ensure are present

Value

Failure if the table doesn’t exist, none if it does.

verify_args

R Documentation

Verify arguments for a function

Description

This helper function checks arguments against a list of expectations. This was in part inspired by the excellent testthat package and shares concepts with the Checkmate package. However, this function performs many of the common checks without additional package dependencies, and can be inserted into other functions for a project easily with:

  arg_check <- verify_args(args = as.list(environment()),
  conditions = list(param1 = c("mode", "logical"), param2 = c("length", 1))

and check the return with

  if (!arg_check$valid) cat(paste0(arg_check$messages, "\n"))

where argument conditions describes the tests. This comes at the price of readability as the list items in conditions do not have to be named, but can be to improve clarity. See more details below for argument conditions to view which expectations are currently supported. As this is a nested list condition check, it can also originate from any source coercible to a list (e.g. JSON, XML, etc.) and this feature, along with the return of human-meaningful evaluation strings, is particularly useful for development of shiny applications. Values from other sources MUST be coercible to a full list (e.g. if being parsed from JSON, use jsonlite::fromJSON(simplifyMatrix = FALSE))

Usage

verify_args(args = list(character_length_2 = c("a", "b")),
            conditions = list(character_length_2 = list(c("mode", "character"),
                                                        c("length", 3))
)
verify_args(args = list(boolean = c(TRUE, FALSE, TRUE)),
            conditions = list(list(c("mode", "logical"),
                                   c("length", 1)))
)
verify_args(args = list(foo = c(letters[1:3]),
                        bar = 1:10),
            conditions = list(foo = list(c("mode", "numeric"),
                                         c("n>", 5)),
                              bar = list(c("mode", "logical"),
                                         c("length", 5),
                                         c(">", 10),
                                         c("between", list(100, 200)),
                                         c("choices", list("a", "b"))))
)

Arguments

`args`	LIST of named arguments and their values, typically passed directly from a function definition in the form `args = list(foo = 1:2, bar = c(“a”, “b”, “c”))` or directly by passing `environment()`
`conditions`	Nested LIST of conditions and values to check, with one list item for each element in `args`. The first element of each list should be a character scalar in the supported list. The second element of each list should be the check values themselves and may be of any type. Multiple expectation conditions can be set for each element of `args` in the form `conditions = list(foo = list(c(“mode”, “numeric”), c(“length”, 2)), bar = list(c(“mode”, “character”), c(“n<”, 5)))` Currently supported expectations are: `class`: checks strict class expectation by direct comparison with `class` to support object classes not supported with the `is.x` or `is_x` family of functions; much stricter than a “mode” check in that the requested check must be present in the return from a call to `class` e.g. “list” will fail if a “data.frame” object is passed `mode`: checks class expectation by applying the `is.X` or the `is_X` family of functions either directly or flexibly depending on the value provided to `conditions` (e.g. `c(“mode”, “character”)` and `c(“mode”, “is.character”)` and `c(“mode”, “is_character”)` all work equally well) and will default to the version you provide explicitly (e.g. if you wish to prioritize “is_character” over “is.character” simply provide “is_character” as the condition. Only those modes able to be checked by this family of functions are supported. Run function `mode_checks()` for a complete sorted list for your current configuration. `length`: length of values matches a pre-determined exact length, typically a single value expectation (e.g. `c(“length”,#’ 1)`) `no_na`: no `NA` values are present `n>`: length of values is greater than a given value - “n<” length of values is lesser than a given value `n>=`: length of values is greater than or equal to a given value `n<=`: length of values is lesser than or equal to a given value `>`: numeric or date value is greater than a given value `<`: numeric or date value is greater than a given value `>=`: numeric or date value is greater than or equal to a given value `<=`: numeric or date value is lesser than or equal to a given value `between`: numeric or date values are bound within an INCLUSIVE range (e.g. `c(“range”, 1:5)`) `choices`: provided values are part of a selected list of expectations (e.g. `c(“choices”, list(letters[1:3]))`) `FUN`: apply a function to the value and check that the result is valid or that the function can be executed without error; this evaluates the check condition using [tryCatch()] via [do.call()] and so can also accept a full named list of arg values. This is a strict check in the sense that a warning will also result in a failed result, passing the warning (or error if the function fails) message back to the user, but does not halt checks
`from_fn`	CHR scalar of the function from which this is called, used if logger is enabled and ignored if not; by default it will pull the calling function’s name from the call stack, but can be overwritten by a manual entry here for better tracing. (default `NULL`)
`silent`	LGL scalar of whether to silence warnings for individual failiures, leaving them only as part of the output. (default: `FALSE`)

Value

LIST of the resulting values and checks, primarily useful for its $valid</code> (<code>TRUE</code> if all checks pass or <code>FALSE</code> if any fail) and <code>$message values.

Note

If logger is enabled, also provides some additional meaningful feedback.

At least one condition check is required for every element passed to args.

verify_import_columns

R Documentation

Verify column names for import

Description

This function validates that all required columns are present prior to importing into a database column by examining provided values against the database schema. This is more of a sanity check on other functions than anything, but also strips extraneous columns to meet the needs of an INSERT action. The input to ‘values’ should be either a LIST or named CHR vector of values for insertion or a CHR vector of the column names.

Usage

verify_import_columns(
  values,
  db_table,
  names_only = FALSE,
  require_all = TRUE,
  db_conn = con,
  log_ns = "db"
)

Arguments

`values`	LIST or CHR vector of values to add. If ‘names_only’ is TRUE, values are directly interpreted as column names. Otherwise, all values provided must be named.
`db_table`	CHR scalar of the table name
`names_only`	LGL scalar of whether to treat entries of ‘values’ as the column names rather than the column values (default: FALSE)
`require_all`	LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)
`db_conn`	connection object (default: con)
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Value

An object of the same type as ‘values’ with extraneous values (i.e. those not matching a database column header) stripped away.

Note

If columns are defined as required in the schema and are not present, this will fail with an informative message about which columns were missing.

If columns are provided that do not match the schema, they will be stripped away in the return value.

verify_import_requirements

R Documentation

Verify an import file’s properties

Description

Checks an import file’s characteristics against expectations. This is mostly a sanity check against changing conditions from project to project. Import requirements should be defined at the environment level and enumerated as a JSON object, which can be created by calling [make_requirements] on an example import for simplicity. An example is provided in the ‘examples’ directory as “NIST_import_requirements.json”. If multiple requirements are in use (e.g. pulling from multiple locations), this can be run multiple times with different values of ‘requirement_obj’ or ‘file_name’.

Usage

verify_import_requirements(
  obj,
  ignore_extra = TRUE,
  requirements_obj = "import_requirements",
  file_name = "import_requirements",
  log_issues_as = "warn",
  log_ns = "db"
)

Arguments

`obj`	LIST of the object to import matching structure expectations, typically from a JSON file fed through [full_import]
`ignore_extra`	LGL scalar of whether to ignore extraneous import elements or stop the import process (default: TRUE)
`requirements_obj`	CHR scalar of the name of an R object holding import requirements; this is a convenience shorthand to prevent multiple imports from parameter ‘file_name’ (default: “import_requirements”)
`file_name`	CHR scalar of the name of a file holding import requirements; if this has already been added to the calling environment, ‘requirements_obj’ will be used preferentially as the name of that object
`log_issues_as`	CHR scalar of the log level to use (default: “warn”), which must be a valid log level as in [logger::FATAL]; will be ignored if the [logger] package isn’t available
`log_ns`	CHR scalar of the logging namespace to use (default: “db”)

Details

The return from this is a tibble with 9 columns. The first is the name of the import object member, typically the file name. If a single, unnested import object is provided this will be “import object”. The other columns include the following verification checks:

has_all_required: Are all required names present in the sample? (TRUE/FALSE)

missing_requirements: Character vectors naming any of the missing requirements

has_full_detail: Is all expected detail present? (TRUE/FALSE)

missing_detail: Character vectors naming any missing value sets

has_extra: Are there unexpected values provided? (TRUE/FALSE)

extra_cols: Character vectors naming any has_extra columns; these will be dropped from the import but are provided for information sake

has_name_mismatches: Are there name differences between the import requirement elements and the import object? (TRUE/FALSE)

mismatched_names: Named lists enumerating which named elements (if any) from the import object did not match name expectations in the requirements

All of this is defined by the ‘requirements_obj’ list. Do not provide that list directly, instead pass this function the name of the requirements object for interoperability. If a ‘requirements_obj’ cannot be identified via [base::exists] then the ‘file_name’ will take precedence and be imported. Initial use and set up may be easier in interactive sessions.

Value

A tibble object with 9 columns containing the results of the checks.

Note

If ‘file_name’ is provided, it need not be fully defined. The value provided will be used to search the project directory.

with_help

R Documentation

Convenience application of `add_help` using pipes directly in `UI.R`

Description

This may not work for certain widgets with heavily nested HTML. Note that classes may be CSS dependent.

Usage

actionButton("example", "With Help") 
  with_help("Now with a question mark icon hosting a tooltip")
actionButton("example", "With Help") 
  with_help("Large and green", size = "xl", class = "success")

Arguments

`widget`	shiny.tag widget
`tooltip`	CHR scalar of the tooltip text
`…`	Other named arguments to be passed to ‘add_help’

Value

The widget provided with a hover tooltip icon appended to it.

Note

Most standard Shiny widgets are supported, but maybe not all.