Function Reference

This appendix contains links to all documented functions included as part of the DIMSpec toolkit. As is common in R packages, not all functions are documented, but most are. Functions referenced in the rest of this user guide are linked directly to their entry on this page. Click any function in this table of contents to open its documentation.

DIMSpec Help Index

R Documentation
activate_py_env Activate a python environment
active_connection Is a connection object still available?
add_help Attach a superscript icon with a bsTooltip to an HTML element
add_normalization_value Add value(s) to a normalization table
add_or_get_id Utility function to add a record
add_rdkit_aliases Add fragment or compound aliases generated by RDKit functions
adduct_formula Add Adduct to Formula
api_endpoint Build an API endpoint programmatically
api_open_doc Open Swagger API documentation
api_reload Reloads the plumber API
api_start Start the plumber API
api_stop Stop the plumber API
append_icon_to Create the JS to append an icon to an HTML element by its ID
bootstrap_compare_ms Calculate dot product match score using bootstrap data
build_db Build or rebuild the database from scratch
build_db_action Build an escaped SQL query
build_triggers Build pairs of INSERT/UPDATE triggers to resolve foreign key relationships
build_views Build SQL to create views on normalized tables in SQLite
calculate.monoisotope Calculate the monoisotopic mass of an elemental formula list
check_for_value Check for a value in a database table
check_fragments Determine number of matching fragments between unknown mass spectrum and specific peaks
check_isotopedist Compare Isotopic Pattern to simulated pattern
check_mzML_convert Check mzML file for specific MSConvert parameters
clause_where Build a WHERE clause for SQL statements
close_up_shop Conveniently close all database connections
compare_ms Calculate dot product match score
complete_form_entry Ensure complete form entry
create_fallback_build Create an SQL file for use without the SQLite CLI
create_peak_list Spectral Uncertainty Functions ———————————————————-
create_peak_table_ms1 Create peak table for MS1 data
create_peak_table_ms2 Create peak table for MS2 data
create_py_env Create a python environment for RDKit
create_search_df Create data.frame containing parameters for extraction and searching
create_search_ms Generate uncertainty mass spectrum for MS1 and MS2 data
data_dictionary Create a data dictionary
dataframe_match Match multiple values in a database table
dotprod Calculate dot product
dt_color_by Apply colors to DT objects by value in a column
dt_formatted Easily format multiple DT objects in a shiny project in the same manner
er_map Create a simple entity relationship map
export_msp Export to MSP
extend_suspect_list Extend the compounds and aliases tables
extract.elements Elemental Formula Functions
flush_dir Flush a directory with archive
fn_guide View an index of help documentation in your browser
fn_help Get function documentation for this project
format_id Format a file name as an HTML element ID
format_list_of_names Grammatically collapse a list of values
formulalize Generate standard chemical formula notation
full_import Import one or more files from the NIST Method Reporting Tool for NTA
gather_qc Quality Control Check of Import Data
get_annotated_fragments Get all annotated fragments have matching masses
get_component Resolve components from a list or named vector
get_compound_fragments Get all fragments associated with compounds
get_compoundid Get compound ID and name for specific peaks
get_fkpk_relationships Extract foreign key relationships from a schema
get_massadj Calculate the mass adjustment for a specific adduct
get_msconvert_data Extract msconvert metadata
get_msdata Get all mass spectral data within the database
get_msdata_compound Get all mass spectral data for a specific compound
get_msdata_peakid Get all mass spectral data for a specific peak id
get_msdata_precursors Get all mass spectral data with a specific precursor ion
get_opt_params Get optimized uncertainty mass spectra parameters for a peak
get_peak_fragments Get annotated fragments for a specific peak
get_peak_precursor Get precursor ion m/z for a specific peak
get_sample_class Get sample class information for specific peaks
get_search_object Generate msdata object from input peak data
get_suspectlist Get the current NIST PFAS suspect list.
get_ums Generate consensus mass spectrum
get_uniques Get unique components of a nested list
getcharge Get polarity of a ms scan within mzML object
getmslevel Get MS Level of a ms scan within mzML object
getmzML Brings raw data file into environment
getprecursor Get precursor ion of a ms scan within mzML object
gettime Get time of a ms scan within mzML object
has_missing_elements Simple check for if an object is empty
is_elemental_match Checks if two elemental formulas match
is_elemental_subset Check if elemental formula is a subset of another formula
isotopic_distribution Isotopic distribution functions
lockmass_remove Remove lockmass scan from mzml object
log_as_dataframe Pull a log file into an R object
log_fn Simple logging convenience
log_it Conveniently log a message to the console
make_acronym Simple acronym generator
make_install_code Convenience function to set a new installation code
make_requirements Make import requirements file
manage_connection Check for, and optionally remove, a database connection object
map_import Map an import file to the database schema
mode_checks Get list of available functions
molecule_picture Picture a molecule from structural notation
monoisotope.list Calculate the monoisotopic mass of a elemental formulas in
ms_plot_peak Plot a peak from database mass spectral data
ms_plot_peak_overview Create a patchwork plot of peak spectral properties
ms_plot_spectra Plot a fragment map from database mass spectral data
ms_plot_spectral_intensity Create a spectral intensity plot
ms_plot_titles Consistent for ms_plot_x functions
ms_spectra_separated Parse “Separated” MS Data
ms_spectra_zipped Parse “Zipped” MS Data
mzMLconvert Converts a raw file into an mzML
mzMLtoR Opens file of type mzML into R environment
nist_shinyalert Call [shinyalert::shinyalert] with specific styling
obj_name_check Sanity check for environment object names
open_env Convenience shortcut to open and edit session environment variables
open_proj_file Open and edit project files
optimal_ums Get the optimal uncertainty mass spectrum parameters for data
overlap Calculate overlap ranges
pair_ums Pairwise data.frame of two uncertainty mass spectra
peak_gather_json Extract peak data and metadata
plot_compare_ms Plot MS Comparison
plot_ms Generate consensus mass spectrum
pool.sd Pool standard deviations
pool.ums Pool uncertainty mass spectra
pragma_table_def Get table definition from SQLite
pragma_table_info Explore properties of an SQLite table
py_modules_available Are all conda modules available in the active environment
rdkit_active Sanity check on RDKit binding
rdkit_mol_aliases Create aliases for a molecule from RDKit
read_log Read a log from a log file
rebuild_helps Rebuild the help files as HTML with an index
rectify_null_from_env Rectify NULL values provided to functions
ref_table_from_map Get the name of a linked normalization table
remove_db Remove an existing database
remove_icon_from Remove the last icon attached to an HTML element
remove_sample Delete a sample
repair_xl_casrn_forced_to_date Repair CAS RNs forced to a date numeric by MSXL
repl_nan Replace NaN
report_qc Export QC result JSONfile into PDF
reset_logger_settings Update logger settings
resolve_compound_aliases Resolve compound aliases provided as part of the import routine
resolve_compound_fragments Link together peaks, fragments, and compounds
resolve_compounds Resolve the compounds node during bulk import
resolve_description_NTAMRT Resolve the method description tables during import
resolve_fragments_NTAMRT Resolve the fragments node during database import
resolve_method Add an ms_method record via import
resolve_mobile_phase_NTAMRT Resolve the mobile phase node
resolve_ms_data Resolve and store mass spectral data during import
resolve_ms_spectra Unpack mass spectral data in compressed format
resolve_multiple_values Utility function to resolve multiple choices interactively
resolve_normalization_value Resolve a normalization value against the database
resolve_peak_ums_params Resolve and import optimal uncertain mass spectrum parameters
resolve_peaks Resolve the peaks node during import
resolve_qc_data_NTAMRT Resolve and import quality control data for import
resolve_qc_methods_NTAMRT Resolve and import quality control method information
resolve_sample Add a sample via import
resolve_sample_aliases Resolve and import sample aliases
resolve_software_settings_NTAMRT Import software settings
resolve_table_name Check presence of a database table
save_data_dictionary Save the current data dictionary to disk
search_all Search all mass spectra within database against unknown mass spectrum
search_precursor Search the database for all compounds with matching precursor ion m/z values
setup_rdkit Conveniently set up an RDKit python environment for use with R
sigtest Significance testing function
smilestoformula Convert SMILES string to Formula and other information
sql_to_msp Export SQL Database to a MSP NIST MS Format
sqlite_auto_trigger Create a basic SQL trigger for handling foreign key relationships
sqlite_auto_view Create a basic SQL view of a normalized table
sqlite_parse_build Parse SQL build statements
sqlite_parse_import Parse SQL import statements
start_api Start the plumber interface from a clean environment
start_app WIP Launch a shiny application
start_rdkit Start the RDKit integration
summarize_check_fragments Summarize results of check_fragments function
support_info R session information for support needs
suspectlist_at_NIST Open the NIST PDR entry for the current NIST PFAS suspect list
table_msdata Tabulate MS Data
tack_on Append additional named elements to a list
tidy_comments Tidy up table and field comments
tidy_ms_spectra Tidy Spectra
tidy_spectra Decompress Spectra
unzip Unzip binary data into vector
update_all Convenience function to rebuild all database related files
update_data_sources Dump current database contents
update_env_from_file Update a conda environment from a requirements file
update_logger_settings Update logger settings
user_guide Launch the User Guide for DIMSpec
valid_file_format Ensure files uploaded to a shiny app are of the required file type
validate_casrns Validate a CAS RN
validate_column_names Ensure database column presence
validate_tables Ensure database table presence
verify_args Verify arguments for a function
verify_import_columns Verify column names for import
verify_import_requirements Verify an import file’s properties
with_help Convenience application of codeadd_help using pipes directly in codeUI.R

activate_py_env R Documentation

Activate a python environment

Description

Programmatically setting up python bindings is a bit more convoluted than in a standard script. Given the name of a Python environment, it either (1) checks the provided ‘env_name’ against currently installed environments and binds the current session to it if found OR (2) installs a new environment with [create_py_env] and activates it by calling itself.

Usage

activate_py_env(
  env_name = NULL,
  required_libraries = NULL,
  required_modules = NULL,
  log_ns = NULL,
  conda_path = NULL
)

Arguments

env_name

CHR scalar of a python environment name to bind. The default, NULL, will look for an environment variable named ‘PYENV_NAME’

required_libraries

CHR vector of python libraries to include in the environment, if building a new environment. Ignored if ‘env_name’ is an existing environment. The default, NULL, will look for an environment variable named ‘PYENV_LIBRARIES’.

required_modules

CHR vector of modules to be checked for availability once the environment is activated. The default, NULL, will look for an environment variable named ‘PYENV_MODULES’.

log_ns

CHR scalar of the logging namespace to use, if any.

Details

It is recommended that project variables in ‘../config/env_py.R’ and ‘../config/env_glob.txt’ be used to control most of the behavior of this function. This works with both virtual and conda environments, though creation of new environments is done in conda.

Value

LGL scalar of whether or not activate was successful

Note

Where parameters are NULL, [rectify_null_from_env] will be used to get a value associated with it if they exist.


active_connection R Documentation

Is a connection object still available?

Description

This is a thin wrapper for [DBI::dbIsValid] with some error logging.

Usage

active_connection(db_conn = con)

Arguments

db_conn

connection object (default “con”)

Value

LGL scalar indicating whether the database is available


add_help R Documentation

Attach a superscript icon with a bsTooltip to an HTML element

Description

Attach a superscript icon with a bsTooltip to an HTML element

Usage

add_help(
  id,
  tooltip,
  icon_name = "question",
  size = "xs",
  icon_class = "info-tooltip primary",
  ...
)

Arguments

id

CHR scalar of the HTML ID to which to append the icon

tooltip

CHR scalar of the tooltip text

icon_name

CHR scalar of the icon name, which must be understandable by the ‘shiny::icon’ function; e.g. a font-awesome icon name (default: “question”).

size

CHR scalar of the general icon size as understandable by the font-awesome library (default: “xs”)

icon_class

CHR vector of classes to apply to the ‘<sup>’ container, as defined in the current CSS (default: “info-tooltip primary”)

Other named arguments to be passed to ‘shinyBS:bsTooltip’

Value

LIST of HTML tags for the desired help icon and its tooltip

Note

The following CSS is typically defined to go with this. .info-tooltip opacity: 30 transition: opacity .25s;

.info-tooltip:hover opacity: 100

.primary color: #3c8dbc;

Examples

add_help("example", "a tooltip")


add_normalization_value R Documentation

Add value(s) to a normalization table

Description

One of the most common database operations is to look up or add a value in a normalization table. This utility function adds a single value and returns its associated id by using [build_db_action]. This is only suitable for a single value. If you need to bulk add multiple new values, use this with something like [lapply].

Usage

add_normalization_value("norm_table", name = "new value", acronym = "NV")

Arguments

db_table

CHR scalar of the normalization table’s name

db_conn

connection object (default “con”)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

id_column

CHR scalar of the column to use as the primary key identifier for ‘db_table’ (default: “id”)

database_map

LIST of the database entity relationship map, typically from calling [er_map]. If NULL (default) the object “db_map” will be searched for and used by default, otherwise it will be created with [er_map]

CHR vector of additional named arguments to be added; names not appearing in the referenced table will be ignored

Value

NULL if unable to add the values, INT scalar of the new ID otherwise


add_or_get_id R Documentation

Utility function to add a record

Description

Checks a table in the attached SQL connection for a primary key ID matching the provided ‘values’ and returns the ID. If none exists, adds a record and returns the resulting ID if successful. Values should be provided as a named vector of the values to add. No data coercion is performed, relying almost entirely on the database schema or preprocessing to ensure data integrity.

Usage

add_or_get_id(
  db_table,
  values,
  db_conn = con,
  ensure_unique = TRUE,
  require_all = TRUE,
  ignore = FALSE,
  log_ns = "db"
)

Arguments

db_table

CHR scalar name of the database table being modified

values

named vector of the values being added, passed to [build_db_action]

db_conn

connection object (default: con)

ensure_unique

LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)

require_all

LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)

ignore

LGL scalar on whether to treat the insert try as an “INSERT OR IGNORE” SQL statement (default: FALSE)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Details

Provided values are checked agaisnt required columns in the table using [verify_import_columns].

Operations to add the record and get the resulting ID are both performed with [build_db_action] and are performed virtually back to back with the latest-added ID being given preference in cases where added values may match multiple extant records.

Value

INT scalar of the record identifier

Note

If this is used in high volume/traffic applications, ID conflicts may occur if the timing is such that another record containing identical values is added before the call getting the ID completes.


add_rdkit_aliases R Documentation

Add fragment or compound aliases generated by RDKit functions

Description

Aliases are stored for both compounds and fragments within the database to facilitate search and unambiguous identification. Given one molecular structure notation (SMILES is preferred), other machine-readable expressions can be generated quickly. Requested aliases as provided to ‘rdkit_aliases’ will be prefixed by ‘mol_to_prefix’ and checked against the namespace of available functions in RDKit and the correct functions automatically assigned.

Usage

add_rdkit_aliases(
  identifiers,
  alias_category = c("compounds", "fragments"),
  compound_aliases_table = "compound_aliases",
  fragment_aliases_table = "fragment_aliases",
  inchi_prefix = "InChI=1S/",
  rdkit_name = ifelse(exists("PYENV_NAME"), PYENV_NAME, "rdkit"),
  rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"),
  rdkit_ns = "rdk",
  rdkit_make_if_not = TRUE,
  rdkit_aliases = c("inchi", "inchikey"),
  mol_to_prefix = "MolTo",
  mol_from_prefix = "MolFrom",
  type = "smiles",
  as_object = TRUE,
  db_conn = con,
  log_ns = "rdk"
)

Arguments

identifiers

CHR vector of machine readable notations in ‘type’ format

alias_category

CHR scalar, one of “compounds” or “fragments” to determine where in the database to store the resulting aliases (default: “compounds”)

compound_aliases_table

CHR scalar name of the database table holding compound aliases (default: “compound_aliases”)

fragment_aliases_table

CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”)

inchi_prefix

CHR scalar prefix for the InChI code to use, if InChI is requested as part of ‘rdkit_aliases’

rdkit_name

CHR scalar name of the python environment at which RDKit is installed (default: is the session variable PYENV_NAME or “rdkit”)

rdkit_ref

CHR scalar name of the R pointer object to RDKit (default: is the session variable PYENV_REF or “rdk”)

rdkit_ns

CHR scalar name of the logging namespace to use (default: “rdk”); will be ignored if logging is off

rdkit_make_if_not

LGL scalar of whether to create an RDKit environment if it does not exist (default: TRUE)

rdkit_aliases

CHR vector of machine-readable aliases to generate, which must be recognizeable as names in the RDKit namespace when prefixed by ‘mol_to_prefix’ (default: c(“inchi”, “inchikey”)); these are not case sensitive

mol_to_prefix

CHR scalar of the prefix identifying alias creation functions, which must be recognizeable as names in the RDKit namespace when suffixed by ‘rdkit_aliases’ (default: “MolTo”); this is not case sensitive

mol_from_prefix

CHR scalar of the prefix identifying molecule expression creation functions, which must be recognizeable as names in the RDKit namespace when suffixed by ‘type’ (default: “MolFrom”); this is not case sensitive

type

CHR scalar indicating the type of ‘identifiers’ to be converted to molecule notation (default: “smiles”); this is not case sensitive

as_object

LGL scalar indicating whether to return the alias list to the session as an object (default: TRUE) or write aliases to the database (FALSE)

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

If ‘as_object’ == TRUE, a data.frame of unpacked spectra, otherwise no return and a database insertion will be performed

Note

It is not recommended to change the defaults here unless you are familiar with the naming conventions of RDKit.

Requires both INFORMATICS and USE_RDKIT set to TRUE in the session and a valid installation of the RDKIT python environment to function.

See the RDKit Documentation for more details.


adduct_formula R Documentation

Add Adduct to Formula

Description

Add Adduct to Formula

Usage

adduct_formula(elementalformula, adduct = "+H")

Arguments

elementalformula

character string elemental formula

adduct

character string adduct state to add to the elemental formula, must contain an element, options are ‘+H’, ‘-H’, ‘+Na’, ‘+K’

Value

character string containing elemental formula with adduct

Examples

adduct_formula("C2H5O", adduct = "+H")

api_endpoint R Documentation

Build an API endpoint programmatically

Description

This is a convenience function intended to support plumber endpoints. It only assists in the construction (and execution if ‘execute’ == TRUE) of endpoints. Endpoints must still be understood. Validity checking, execution, and opening in a web browser are supported. Invalid endpoints will not be executed or opened for viewing.

Usage

api_endpoint(
  path,
  ...,
  server_addr = PLUMBER_URL,
  check_valid = TRUE,
  execute = TRUE,
  open_in_browser = FALSE,
  raw_result = FALSE,
  max_pings = 20L,
  return_type = c("text", "raw", "parsed"),
  return_format = c("vector", "data.frame", "list")
)

Arguments

path

CHR scalar of the endpoint path.

Additional named parameters added to the endpoint, most typically the query portion. If only one is provided, it can remain unnamed and a query is assumed. If more than one is provided, all must be named. Named elements must be components of the return from [httr::parse_url] (see https://tools.ietf.org/html/rfc3986) for details of the parsing algorithm; unrecognized elements will be ignored.

server_addr

CHR scalar uniform resource locator (URL) address of an API server (e.g. “https://myapi.com:8080”) (defaults to the current environment variable “PLUMBER_URL”)

check_valid

LGL scalar on whether or not to first check that an endpoint returns a valid status code (200-299) (default: TRUE).

execute

LGL scalar of whether or not to execute the constructed endpoint and return the result; will be defaulted to FALSE if ‘check_valid’ == TRUE and the endpoint returns anything other than a valid status code. (default: TRUE)

open_in_browser

LGL scalar of whether or not to open the resulting endpoint in the system’s default browser; will be defaulted to FALSE if ‘check_valid’ == TRUE and the endpoint returns anything other than a valid status code. (default: FALSE)

max_pings

INT scalar maximum number of pings to try before timeout if using endpoint “_ping”; this is only used for endpoint “_ping” (default: 20)

return_type

CHR scalar on which return type to use, which must be one of “text”, “raw”, or “parsed” which will be used to read the content of the response item (default: “text”)

return_format

CHR scalar on which form to return data, which must be one of “vector”, “data.frame”, or “list” (default: “vector” to support primarily single value responses)

Value

CHR scalar of the constructed endpoint, with messages regarding status checks, return from the endpoint (typically JSON) if valid and ‘execute’ == TRUE, or NONE if ‘open_in_browser’ == TRUE

Note

Special support is provided for the way in which the NIST Public Data Repository treats URL fragments

This only support [httr::GET] requests.

Examples

api_endpoint("https://www.google.com/search", list(q = "something"), open_in_browser = TRUE)
api_endpoint("https://www.google.com/search", query = list(q = "NIST Public Data Repository"), open_in_browser = TRUE)

api_open_doc R Documentation

Open Swagger API documentation

Description

This will launch the Swagger UI in a browser tab. The URL suffix “docs” will be automatically added if not part of the host URL accepted as ‘url’.

Usage

api_open_doc(url = PLUMBER_URL)

Arguments

url

CHR URL/URI of the plumber documentation host (default: environment variable “PLUMBER_URL”)

Value

None, opens a browser to the requested URL


api_reload R Documentation

Reloads the plumber API

Description

Depending on system architecture, the plumber service may take some time to spin up and spin down. If ‘background’ is TRUE, this may mean the calling R thread runs ahead of the background process resulting in unexpected behavior (e.g. newly defined endpoints not being available), effectively binding it to the prior iteration. If the API does not appear to be reloading properly, it may be necessary to manually kill the process controlling it through your OS and to call this function again.

Usage

api_reload(
  pr = NULL,
  background = TRUE,
  plumber_file = NULL,
  on_host = NULL,
  on_port = NULL,
  log_ns = "api"
)

Arguments

pr

CHR scalar name of the plumber service object, typically only created as a background observer from [callr::r_bg] as a result of calling [api_reload] (default: NULL gets the environment setting for PLUMBER_OBJ_NAME)

background

LGL scalar of whether to load the plumber server as a background service (default: TRUE); set to FALSE for testing

plumber_file

CHR scalar of the path to a plumber API to launch (default: NULL)

on_host

CHR scalar of the host IP address (default: NULL)

on_port

CHR or INT scalar of the host port to use (default: NULL)

log_ns

CHR scalar namespace to use for logging (default: “api”)

Value

Launches the plumber API service on your local machine and returns the URL on which it can be accessed as a CHR scalar


api_start R Documentation

Start the plumber API

Description

This is a wrapper to [plumber::pr_run] pointing to a project’s opinionated plumber settings with some error trapping. The host, port, and plumber file are set in the “config/env_R.R” location as PLUMBER_HOST, PLUMBER_PORT, and PLUMBER_FILE respectively.

Usage

api_start(plumber_file = NULL, on_host = NULL, on_port = NULL)

Arguments

plumber_file

CHR scalar of the path to a plumber API to launch (default: NULL)

on_host

CHR scalar of the host IP address (default: NULL)

on_port

CHR or INT scalar of the host port to use (default: NULL)

Value

LGL scalar with success status

Note

If either of ‘on_host’ or ‘on_port’ are NULL they will default first to any existing environment values of PLUMBER_HOST and PLUMBER_PORT, then to getOption(“plumber.host”, “127.0.0.1”) and getOption(“plumber.port”, 8080)

This will fail if the requested port is in use.


api_stop R Documentation

Stop the plumber API

Description

Stop the plumber API

Usage

api_stop(pr = NULL, flush = TRUE, db_conn = "con", remove_service_obj = TRUE)

Arguments

pr

CHR scalar name of the plumber service object, typically only created as a background observer from [callr::r_bg] as a result of calling [api_reload] (default: NULL gets the environment setting for PLUMBER_OBJ_NAME)

flush

LGL scalar of whether to disconnect and reconnect to a database connection named as ‘db_conn’ (default: TRUE)

db_conn

CHR scalar of the connection object name (default: “con”)

remove_service_obj

LGL scalar of whether to remove the reference to ‘pr’ from the current global environment (default: TRUE)

Value

None, stops the plumber server

Note

This will also kill and restart the connection object if ‘flush’ is TRUE to release connections with certain configurations such as SQLite in write ahead log mode.

This function assumes the object referenced by name ‘pr’ exists in the global environment, and ‘remove_service_object’ will only remove it from .GlobalEnv.


append_icon_to R Documentation

Create the JS to append an icon to an HTML element by its ID

Description

Create the JS to append an icon to an HTML element by its ID

Usage

append_icon_to(id, icon_name, icon_class = NULL)

Arguments

id

CHR scalar of the HTML ID to which to append an icon

icon_name

CHR scalar of the icon name, which must be understandable by the ‘shiny::icon’ function; e.g. a font-awesome icon name.

icon_class

CHR vector of classes to apply

Value

CHR scalar suitable to execute with ‘shinyjs::runJS’

Examples

append_icon_to("example", "r-project", "fa-3x")


bootstrap_compare_ms R Documentation

Calculate dot product match score using bootstrap data

Description

Calculates a the match score (based on dot product) of the two uncertainty mass spectra. To generate a distribution of match scores using the uncertainty of the two mass spectra, bootstrapped data (using ‘rnorm’ for now)

Usage

bootstrap_compare_ms(
  ms1,
  ms2,
  error = c(5, 5),
  minerror = c(0.002, 0.002),
  m = 1,
  n = 0.5,
  runs = 10000
)

Arguments

ms1, ms2

the uncertainty mass spectra from function ‘get_ums’

error

a vector of the respective mass error (in ppm) for each mass spectrum or a single vector representing the mass error for all m/z values

minerror

a two component vector of the respective minimum mass error (in Da) for each mass spectrum or a single value representing the minimum mass error of all m/z values

m, n

weighting values for mass (m) and intensity (n)

runs

build_db R Documentation

Build or rebuild the database from scratch

Description

This function will build or rebuild the NIST HRAMS database structure from scratch, removing the existing instance. By default, most parameters are set in the environment (at “./config/env_glob.txt”) but any values can be passed directly. This can be used to quickly spin up multiple copies with a clean slate using different build files, data files, or return to the last stable release.

Usage

build_db(db = "test_db.sqlite", db_conn_name = "test_conn")

Arguments

db

CHR scalar of the database name (default: session value DB_NAME)

build_from

CHR scalar of a SQL build script to use (default: environment value DB_BUILD_FILE)

populate

LGL scalar of whether to populate with data from the file in ‘populate_with’ (default: TRUE)

populate_with

CHR scalar for the populate script (e.g. “populate_demo.sql”) to during after the build is complete; (default: session value DB_DATA); ignored if ‘populate = FALSE’

archive

LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE), passed to [‘remove_db()’]

sqlite_cli

CHR scalar to use to look for installed sqlite3 CLI tools in the current system environment (default: session value SQLITE_CLI)

connect

LGL scalar of whether or not to connect to the rebuilt database in the global environment as object ’con“ (default: FALSE)

Details

If sqlite3 and its command line interface are available on your platform, that will be used (preferred method) but, if not, this function will read in all the necessary files to directly create it using shell commands. The shell method may not be universally applicable to certain compute environments or may require elevated permissions.

Value

None, check console for details


build_db_action R Documentation

Build an escaped SQL query

Description

In most cases, issuing basic SQL queries is made easy by tidyverse compliant functions such as [dplyr::tbl]. Full interaction with an SQLite database is a bit more complicated and typically requires [DBI::dbExecute] and writing SQL directly; several helpers exist for that (e.g. [glue::glue_sql]) but aren’t as friendly or straight forward when writing more complicated actions, and still require directly writing SQL equivalents, routing through [DBI::dbQuoteIdentifier] and [DBI::dbQuoteLiteral] to prevent SQL injection attacks.

Usage

build_db_action("insert", "table", values = list(col1 = "a", col2 = 2,
  col3 = "describe"), execute = FALSE) build_db_action("insert", "table",
  values = list(col1 = "a", col2 = 2, col3 = "describe"))
  
  build_db_action("get_id", "table", match_criteria = list(id = 2))
  
  build_db_action("delete", "table", match_criteria = list(id = 2))
  
  build_db_action("select", "table", columns = c("col1", "col2", "col3"),
  match_criteria = list(id = 2)) build_db_action("select", "table",
  match_criteria = list(sample_name = "sample 123"))
  
  build_db_action("select", "table", match_criteria = list(sample_name =
  list(value = "sample 123", exclude = TRUE)) build_db_action("select",
  "table", match_criteria = list(sample_name = "sample 123",
  sample_contributor = "Smith"), and_or = "AND", limit = 5)

Arguments

action

CHR scalar, of one “INSERT”, “UPDATE”, “SELECT”, “GET_ID”, or “DELETE”

table_name

CHR scalar of the table name to which this query applies

column_names

CHR vector of column names to include (default NULL)

values

LIST of CHR vectors with values to INSERT or UPDATE (default NULL)

match_criteria

LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

and_or

LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”)

limit

INT scalar of the maximum number of rows to return (default NULL)

group_by

CHR vector of columns by which to group (default NULL)

order_by

named CHR vector of columns by which to order, with names matching columns and values indicating whether to sort ascending (default NULL)

distinct

LGL scalar of whether or not to apply the DISTINCT clause to all match criteria (default FALSE)

get_all_columns

LGL scalar of whether to return all columns; will be set to TRUE automatically if no column names are provided (default FALSE)

execute

LGL scalar of whether or not to immediately execute the build query statement (default TRUE)

single_column_as_vector

LGL scalar of whether to return results as a vector if they consist of only a single column (default TRUE)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Details

This function is intended to ease that by taking care of most of the associated logic and enabling routing through other functions, or picking up arguments from within other function calls.

Value

CHR scalar of the constructed query


build_triggers R Documentation

Build pairs of INSERT/UPDATE triggers to resolve foreign key relationships

Description

When building schema by script, it is often handy to enforce certain behaviors on database transactions involving foreign keys, especially in SQLite. Given a properly structured list object describing the mappings between tables in a schema (e.g. one deriving from [er_map]), this function will parse those for foreign key relationships.

Usage

build_triggers(er_map(db_conn = con))

Arguments

db_map

LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’

references_in

CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map])

create_insert_trigger

LGL scalar indicating whether to build an insert trigger for each table (default: TRUE).

create_update_trigger

LGL scalar indicating whether to build an update trigger for each table (default: FALSE).

save_to_file

CHR scalar of a file in which to write the output, if any (default: NULL will return the resulting object to the R session)

Details

Primarily, this requires a list object referring to tables that contains in each element a child element with the name provided in ‘references_in’. The pre-pass parsing function [get_fkpk_relationships] is used to pull references from the full map is used.

Value

LIST object containing one element for each table in ‘db_map’ containing foreign key references, with one child

Note

Tables in ‘db_map’ that do not contain foreign key relationships will be dropped from the output list.

This is largely a convenience function to programmatically apply [make_sql_triggers] to an entire schema. To skip tables with defined foreign key relationships for which triggers are undesirable, remove those tables from ‘db_map’ prior to calling this function.


build_views R Documentation

Build SQL to create views on normalized tables in SQLite

Description

Build SQL to create views on normalized tables in SQLite

Usage

build_views(db_map = er_map(con), dictionary = data_dictionary(con))

Arguments

db_map

LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’

references_in

CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map])

dictionary

LIST object containing the schema dictionary produced by [data_dictionary] fully describing table entities

drop_if_exists

LGL scalar indicating whether to include a “DROP VIEW” prefix for the generated view statement; as this has an impact on schema, no default is set

save_to_file

CHR scalar name of a file path to save generated SQL (default: NULL will return a list object to the R session)

append

LGL scalar on whether to appead to ‘save_to_file’ (default: FALSE)

Value

LIST if ‘save_to_file = FALSE’ or none


calculate.monoisotope R Documentation

Calculate the monoisotopic mass of an elemental formula list

Description

Calculate the monoisotopic mass of an elemental formula list

Usage

calculate.monoisotope(
  elementlist,
  exactmasses = NULL,
  adduct = "neutral",
  db_conn = "con"
)

Arguments

elementlist

list of elemental formula from ‘extract.elements’ function

exactmasses

list of exact masses of elements

adduct

character string adduct/charge state to add to the elemental formula, options are ‘neutral’, ‘+H’, ‘-H’, ‘+Na’, ‘+K’, ‘+’, ‘-’, ‘-radical’, ‘+radical’

db_conn

database connection object, either a CHR scalar name (default: “con”) or the connection object itself (preferred)

Value

numeric monoisotopic exact mass

Examples

elementlist <- extract.elements("C2H5O")
calculate.monoisotope(elementalist, adduct = "neutral")


check_for_value R Documentation

Check for a value in a database table

Description

This convenience function simply checks whether a value exists in the distinct values of a given column. Only one column may be searched at a time; serialize it in other code to check multiple columns. It leverages the flexibility of [build_db_action] to do the searching. The ‘values’ parameter will be fed directly and can accept the nested list structure defined in [clause_where] for exclusions and like clauses.

Usage

con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
alphabet <- dplyr::tibble(lower = letters, upper = LETTERS)
dplyr::copy_to(con2, alphabet)
check_for_value("A", "alphabet", "upper", db_conn = con2)
check_for_value("A", "alphabet", "lower", db_conn = con2)
check_for_value(letters[1:10], "alphabet", "lower", db_conn = con2)

Arguments

values

CHR vector of the values to search

db_table

CHR scalar of the database table to search

db_column

CHR scalar of the column to search

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

db_conn

connection object (default: con)

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

Value

LIST of length 1-2 containing “exists” as a LGL scalar for whether the values were found, and “values” containing the result of the database call, a data.frame object containing matching rows or NULL if exists == FALSE.


check_fragments R Documentation

Determine number of matching fragments between unknown mass spectrum and specific peaks

Description

Determine number of matching fragments between unknown mass spectrum and specific peaks

Usage

check_fragments(con, ums, peakid, masserror = 5, minerror = 0.001)

Arguments

con

SQLite database connection

ums

uncertainty mass spectrum of unknown compound

peakid

integer vector of primary keys for peaks table

masserror

numeric relative mass error (ppm)

minerror

numeric minimum mass error (Da)

Value

table of fragments and TRUE/FALSE for if the fragment is within the unknown mass spectrum


check_isotopedist R Documentation

Compare Isotopic Pattern to simulated pattern

Description

calculates the isotopic distribution of the stated elemental formula and compares against the empirical ms

Usage

check_isotopedist(
  ms,
  elementalformula,
  exactmasschart,
  error,
  minerror = 0.002,
  remove.elements = c(),
  max.dist = 3,
  min.int = 0.001,
  charge = "neutral",
  m = 1,
  n = 0.5
)

Arguments

ms

data.frame mass spectrum containing pair-wise m/z and intensity values of empirical isotopic pattern

elementalformula

character string of elemental formula to simulate isotopic pattern

exactmasschart

exact mass chart

error

numeric relative mass error (in ppm) of mass spectrometer

minerror

numeric minimum mass error (in Da) of mass spectrometer

remove.elements

character vector of elements to remove from elemental formula

max.dist

numeric maximum mass distance (in Da) from exact mass to include in simulated isotopic pattern

min.int

numeric minimum relative intensity (maximum = 1, minimum = 0) to include in simulated isotopic pattern

charge

character string for the charge state of the simulated isotopic pattern, options are ‘neutral’, ‘positive’, and ‘negative’

m

numeric dot product mass weighting

n

numeric dot product intensity weighting

Value

numeric vector of match scores between the empirical and calculated isotopic distribution.


check_mzML_convert R Documentation

Check mzML file for specific MSConvert parameters

Description

Check mzML file for specific MSConvert parameters

Usage

check_mzML_convert(mzml)

Arguments

mzml

list of msdata from ‘mzMLtoR’ function

Value

data.frame object of conversion veracity checks


clause_where R Documentation

Build a WHERE clause for SQL statements

Description

Properly escaping SQL to prevent injection attacks can be difficult with more complicated queries. This clause constructor is intended to be specific to the WHERE clause of SELECT to UPDATE statements. The majority of construction is achieved with the ‘match_criteria’ parameter, which should always be a list with names for the columns to appear in the WHERE clause. A variety of convenience is built in, from simple comparisons to more complicated ones including negation and similarity (see the description for argument ‘match_criteria’).

Usage

clause_where(ANSI(), "example", list(foo = "bar", cat = "dog"))
clause_where(ANSI(), "example", list(foo = list(values = "bar", like = TRUE)))
clause_where(ANSI(), "example", list(foo = list(values = "bar", exclude = TRUE)))

Arguments

table_names

CHR vector of tables to search

match_criteria

LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

and_or

LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”)

Value

CHR scalar of the constructed where clause for an SQL statement


close_up_shop R Documentation

Conveniently close all database connections

Description

This closes both the plumber service and all database connections from the current running environment. If outstanding promises exist to database tables or views were created as class ‘tbl_’ (e.g. with ‘tbl(con, “table”)’), set ‘back_up_connected_tbls’ to TRUE to collect data from those and preserve in-place in the current global environment.

Usage

manage_connection()
close_up_shop(TRUE)

Arguments

back_up_connected_tbls

LGL scalar of whether to clone currently promised tibble connections to database objects as data frames (default: FALSE).

Value

None, modifies the current global environment in place


compare_ms R Documentation

Calculate dot product match score

Description

Calculates a the match score (based on dot product) of the two uncertainty mass spectra. Note: this is a static match score and does not include associated uncertainties.

Usage

compare_ms(
  ms1,
  ms2,
  error = c(5, 5),
  minerror = c(0.002, 0.002),
  m = 1,
  n = 0.5
)

Arguments

ms1, ms2

the uncertainty mass spectra from function ‘get_ums’

error

a vector of the respective mass error (in ppm) for each mass spectrum or a single vector representing the mass error for all m/z values

minerror

a two component vector of the respective minimum mass error (in Da) for each mass spectrum or a single value representing the minimum mass error of all m/z values

m, n

weighting values for mass (m) and intensity (n)


complete_form_entry R Documentation

Ensure complete form entry

Description

This input validation check ensures the current session’s input object includes non-NA, non-NULL, and non-blank values similarly to [shiny::req] and [shiny::validate] but can be called with a predefined list of input names to check. Typically this is used for validate form entry completion. Call this function prior to reading form entries to ensure that all values requested by name in in ‘values’ are present. If they are not, a [nist_shinyalert] modal is displayed prompting the user to complete the form.

Usage

req(complete_form_entry(input, c("need1", "need2")))

Arguments

input

The session input object

values

CHR vector of input object names to require

show_alert

LGL scalar indicating whether or not to show an alert, set FALSE to return the status of the check

Value

Whether or not all required values are present.


create_fallback_build R Documentation

Create an SQL file for use without the SQLite CLI

Description

For cases where the SQLite Command Line Interface is not available, dot commands used to simplify the database build pipeline are not usable. Call this function to create a self-contained SQL build file that can be used in [build_db] to build the database. The self-contained file will include all “CREATE” and “INSERT” statements necessary by parsing lines including “.read” and “.import” commands and directly reading referenced files.

Usage

create_fallback_build(build_file = file.path("config", "build.sql"))

Arguments

build_file

CHR scalar name SQL build file to use. The default, NULL, will use the environment variable “DB_BUILD_FILE” if it is available.

populate

LGL scalar of whether to populate data (default: TRUE)

populate_with

CHR scalar name SQL population file to use. The default, NULL, will use the environment variable “DB_DATA” if it is available.

driver

CHR scalar of the database driver class to use to correctly interpolate SQL commands (default: “SQLite”)

comments

CHR scalar regex identifying SQLite comments

out_file

CHR scalar of the output file name and destination. The default, NULL, will write to a file named similarly to ‘build_file’ suffixed with “_full”.

Value

None: a file will be written at ‘out_file’ with the output.


create_peak_list R Documentation

Spectral Uncertainty Functions ———————————————————- Create peak list from SQL ms_data table

Description

The function extracts the relevant information and sorts it into nested lists for use in the uncertainty functions

Usage

create_peak_list(ms_data)

Arguments

ms_data

extraction of the ms_data from the SQL table for a specified peak

Value

nested list of all data


create_peak_table_ms1 R Documentation

Create peak table for MS1 data

Description

Takes a nested peak list and creates a peak table for easier determination of uncertainty of the measurement for MS1 data.

Usage

create_peak_table_ms1(peak, mass, masserror = 5, minerror = 0.002, int0 = NA)

Arguments

mass

the exact mass of the compound of interest

masserror

the mass accuracy (in ppm) of the instrument data

minerror

the minimum mass error (in Da) of the instrument data

int0

the default setting for intensity values for missing m/z values

peaklist

result of the ‘create_peak_list’ function

Value

nested list of dataframes containing all MS2 data for the peak


create_peak_table_ms2 R Documentation

Create peak table for MS2 data

Description

Takes a nested peak list and creates a peak table for easier determination of uncertainty of the measurement for MS2 data.

Usage

create_peak_table_ms2(peak, mass, masserror = 5, minerror = 0.002, int0 = NA)

Arguments

mass

the exact mass of the compound of interest

masserror

the mass accuracy (in ppm) of the instrument data

minerror

the minimum mass error (in Da) of the instrument data

int0

the default setting for intensity values for missing m/z values

peaklist

result of the ‘create_peak_list’ function

Value

nested list of dataframes containing all MS2 data for the peak


create_py_env R Documentation

Create a python environment for RDKit

Description

This project offers a full integration of RDKit via [reticulate]. This function does the heavy lifting for setting up that environment, either from an environment specifications file or from the conda forge channel.

Usage

create_py_env("nist_hrms_db", c("reticulate", "rdkit"))

Arguments

env_name

CHR scalar of a python environment

Details

Preferred set up is to set variables in the ‘env_py.R’ file, which will be used over the internal defaults chosen here. The exception is if ‘INSTALL_FROM == “local”’ and no value is provided for ‘INSTALL_FROM_FILE’ which has no internal default.

Germane variables are ‘PYENV_NAME’ (default “reticulated_rdkit”), ‘CONDA_PATH’ (default “auto”), ‘CONDA_MODULES’ (default “rdkit”, “r-reticulate” will be added), ‘INSTALL_FROM’ (default “conda”), ‘INSTALL_FROM_FILE’ (default “rdkit/environment.yml”), ‘MIN_PY_VER’ (default 3.9).

Value

None


create_search_df R Documentation

Create data.frame containing parameters for extraction and searching

Description

Use this to create an intermediate data frame object used as part of the search routine.

Usage

create_search_df(
  filename,
  precursormz,
  rt,
  rt_start,
  rt_end,
  masserror,
  minerror,
  ms2exp,
  isowidth
)

Arguments

filename

CHR scalar path to the mzml file

precursormz

NUM scalar for the mass-to-charge ratio to examine

rt

NUM scalar for the retention time centroid to examine

rt_start

NUM scalar for the retention time start point of the feature

rt_end

NUM scalar for the retention time end point of the feature

masserror

NUM scalar of the instrument mass error value in parts per million

minerror

NUM scalar of the minimum mass error value to use in absolute terms

ms2exp

NUM scalar type of the fragmentation experiment (e.g. MS1 or MS2)

isowidth

NUM scalar mass isolation width to use

Value

data.frame object collating provided values


create_search_ms R Documentation

Generate uncertainty mass spectrum for MS1 and MS2 data

Description

Generate uncertainty mass spectrum for MS1 and MS2 data

Usage

create_search_ms(
  searchobj,
  correl = NULL,
  ph = NULL,
  freq = NULL,
  normfn = "sum",
  cormethod = "pearson"
)

Arguments

searchobj

list object generated from ‘get_search-object’

correl

correlation limit for ions to MS1

ph

peak height to select scans for generating mass spectrum

freq

observational frequency minimum for ions to use for generating mass spectrum

normfn

normalization function, options are “sum” or “mean”

cormethod

correlation function, default is “pearson”

Value

list object containing the ms1 uncertainty mass spectrum ‘ums1’, ms2 uncertainty mass spectrum ‘ums2’ and respective uncertainty mass spectrum parameters ‘ms1params’ and ‘ms2params’


data_dictionary R Documentation

Create a data dictionary

Description

Get a list of tables and their defined columns with properties, including comments, suitable as a data dictionary from a connection object amenable to [odbc::dbListTables]. This function relies on [pragma_table_info].

Usage

data_dictionary(db_conn = con)

Arguments

db_conn

connection object (default:con)

Value

LIST of length equal to the number of tables in ‘con’ with attributes identifying which tables, if any, failed to render into the dictionary.


dataframe_match R Documentation

Match multiple values in a database table

Description

Complex queries are sometimes necessary to match against multiple varied conditions across multiple items in a list or data frame. Call this function to apply vectorization to all items in ‘match_criteria’ and create a fully qualified SQL expression using [clause_where] and execute that query against the database connection in ‘db_conn’. Speed is not optimized during the call to clause where as each clause is built independently and joined together with “OR” statements.

Usage

dataframe_match(
  match_criteria,
  table_names,
  and_or = "AND",
  db_conn = con,
  log_ns = "db"
)

Arguments

match_criteria

LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’

table_names

CHR vector of tables to search

and_or

LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”)

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Details

This is intended for use with a data frame object

Value

data.frame of the matching database rows


dotprod R Documentation

Calculate dot product

Description

Internal function: calculates the dot product between paired m/z and intensity values

Usage

dotprod(m1, i1, m2, i2, m = 1, n = 0.5)

Arguments

m1, m2

paired vectors containing measured m/z values

i1, i2

paired vectors containing measured intensity values

m, n

weighting values for mass (m) and intensity (n)


dt_color_by R Documentation

Apply colors to DT objects by value in a column

Description

Adds a class to each node meeting the criteria defined elsewhere as project object ‘table_bg_classes’ as a list of colors with names matches values.

Usage

dt_color_by(names(DT_table_data), "color_by")

Arguments

table_names

CHR vector of the names going into a table

look_for

CHR vector of the column name to color by

Value

JS function to apply to a DT object by row


dt_formatted R Documentation

Easily format multiple DT objects in a shiny project in the same manner

Description

This serves solely to reduce the amount of options fed into ‘DT::datatable’ by providing common defaults and transparent options. Parameters largely do exactly what they say and will create a list ‘column_defs’ suitable for use as ‘datatable(… options = list(columnDefs = column_defs)’. Leave NULL to ignore any aspect.

Usage

dt_formatted(
  dataframe,
  show_rownames = FALSE,
  hide_cols = NULL,
  center_cols = NULL,
  narrow_cols = NULL,
  narrow_col_width = "5%",
  medium_cols = NULL,
  medium_col_width = "10%",
  large_cols = NULL,
  large_col_width = "15%",
  truncate_cols = NULL,
  truncate_width = 20,
  date_cols = NULL,
  date_col_width = "10%",
  selection_mode = "single",
  callback = NULL,
  color_by_column = NULL,
  names_to = "title",
  filter_at = "top",
  chr_to_factor = TRUE,
  page_length = 10,
  page_length_menu = c(10, 25, 50),
  ...
)

Arguments

dataframe

data.frame to be converted to a DT::datatable object

hide_cols

CHR vector of column names to hide

center_cols

CHR vector of column names to center

narrow_cols

CHR vector of column names to make ‘narrow_col_width’ wide

narrow_col_width

CHR scalar defining column width (default: “5%”)

medium_cols

CHR vector of column names to make ‘medium_col_width’ wide

medium_col_width

CHR scalar defining column width (default: “10%”)

large_cols

CHR vector of column names to make ‘large_col_width’ wide

large_col_width

CHR scalar defining column width (default: “15%”)

truncate_cols

CHR vector of column names to truncate

truncate_width

INT scalar of the position at which to truncate

date_cols

CHR vector of column names identifying dates

date_col_width

CHR scalar defining column width (default: “10%”)

selection_mode

CHR scalar of the DT selection mode (default: “single”)

callback

JS custom callback to apply to the datatable widget

color_by_column

CHR scalar of the column name by which to color rows

names_to

CHR scalar of the name formatting modification to apply, as one of the options available in the ‘stringr’ package (default: “title” to apply ‘stringr::str_to_title’)

filter_at

CHR scalar of the position for the column filter as understood by ‘DT::datatable(…, filter = filter_at)’. (default: “top”)

chr_to_factor

BOOL scalar for whether or not to automatically convert character columns to factor columns (default: TRUE)

other named arguments to be passed to ‘DT::datatable’

Value

DT::datatable object formatted as requested

Note

Truncation applies a JS function to retain the underlying information as a hover tooltip and truncates using ellipses.

Column name formatting relies on being able to parse ‘names_to’ as a valid function of the form ’sprintf(“str_to_ recognized options include”lower”, “upper”, “title”, and “sentence”.

To apply a custom format, define these parameters as a list (e.g. “dt_format_options”) and pass it, along with your dataframe, as do.call(“dt_formatted”, c(dataframe = df, dt_format_options))


er_map R Documentation

Create a simple entity relationship map

Description

This will poll the database connection and create an entity relationship map as a list directly from defined SQL statements used to build the table or view. For each table object it returns a list of length three containing the entity names that the table (1) ‘references’ (i.e. has a foreign key to), (2) is ‘referenced_by’ (i.e. is a foreign key for), and (3) views where it is ‘used_in_view’. These are names. This is intended for use as a mapping shortcut when ER Diagrams are unavailable, or for quick reference within a project, similarly to a dictionary relationship reference.

Usage

er_map(db_conn = con)

Arguments

db_conn

connection object, specifically of class “SQLiteConnection” but not strictly enforced

Details

SQL is generated from [pragma_table_def()] with argument ‘get_sql’ = TRUE and ignores entities whose names start with “sqlite”.

Value

nested LIST object describing the database entity connections


export_msp R Documentation

Export to MSP

Description

The function exports an uncertainty mass spectrum into a NIST MS Search .msp file

Usage

export_msp(
  ms,
  file,
  precursor = "",
  name = "Exported Mass Spectrum",
  headerdata = c(),
  append = FALSE
)

Arguments

ms

uncertainty mass spectrum from ‘get_ums’ function

file

export .msp file to save the msp files

precursor

If available, the numeric precursor m/z for the designated mass spectrum

name

Text name to assign to the mass spectrum (not used in spectral searching)

headerdata

character string containing named values for additional data to put in the header

append

boolean (TRUE/FALSE) to append to .msp file (TRUE) or overwrite (FALSE)


extend_suspect_list R Documentation

Extend the compounds and aliases tables

Description

Suspect lists are occasionally updated. To keep the current database up to date, run this function by pointing it to the updated or current suspect list. That suspect list should be one of (1) a file in either comma-separated-value (CSV) or a Microsoft Excel format (XLS or XLSX), (2) a data frame containing the new compounds in the standard format of the suspect list, or (3) a URL pointing to the suspect list.

Usage

extend_suspect_list(suspect_list, db_conn = con, retain_current = TRUE)

Arguments

suspect_list

CHR scalar pointing either to a file (CSV, XLS, or XLSX) or URL pointing to an XLSX file.

db_conn

connection object (default: con)

retain_current

LGL scalar of whether to retain the current list by attempting to match new entries to older ones, or to append all entries (default: TRUE)

Details

If ‘suspect_list’ does not contain one of the expected file extensions, it will be assumed to be a URL pointing to a Microsoft Excel file with the suspect list in the first spreadsheet. The file for that URL will be downloaded temporarily, read in as a data frame, and then removed.

Required columns for the compounds table are first pulled and all other columns are treated as aliases. If ‘retain_current’ is TRUE, entries in the “name” column will be matched against current aliases and the compound id will be persisted for that compound.

Value

None


extract.elements R Documentation

Elemental Formula Functions Extract elements from formula

Description

Converts elemental formula into list of ‘elements’ and ‘counts’ corresponding to the composition

Usage

extract.elements(composition.str, remove.elements = c())

Arguments

composition.str

character string elemental formula

remove.elements

character vector containing elements to remove from

Value

list with ‘elements’ and ‘counts’

Examples

extract.elements("C2H5O")

extract.elements("C2H5ONa", remove.elements = c("Na", "Cl"))

flush_dir R Documentation

Flush a directory with archive

Description

Clear a directory and archive those files if desired in any directory matching any pattern.

Clear a directory and archive those files if desired in any directory matching any pattern.

Usage

flush_dir("logs", ".txt")

flush_dir(directory = "logs")

Arguments

archive

LGL scalar on whether to archive current logs

directory

CHR scalar path to the directory to flush

Value

None, executes directory actions

None, removes files from a directory


fn_guide R Documentation

View an index of help documentation in your browser

Description

View an index of help documentation in your browser

Usage

fn_guide()

Value

None


fn_help R Documentation

Get function documentation for this project

Description

This function is analogous to “?”, “??”, and “help”. For now, this effort is distributed as a project instead of a package. This imposes certain limitations, particularly regarding function documentation. Use this function to see the documentation for functions in this project just as you would any installed package. The other limitation is that these help files will not populate directly as a pop up when using RStudio tab completion.

Usage

fn_help(fn_name)

Arguments

fn_name

Object or CHR string name of a function in this project.

Value

None, opens help file.

Note

This function will be deprecated if the project is moved to a package.

Examples

fn_help(fn_help)

format_html_id R Documentation

Format a file name as an HTML element ID

Description

This is often useful to provide feedback to the user about the files they’ve provided to a shiny application in a more informative manner, as IDs produced here are suitable to build dynamic UI around. This can serve as the base ID for tooltips, additional information, icons, etc. and produce everything necessary in one place for any number of files.

Usage

format_html_id(filename)

Arguments

filename

CHR vector of file names

Value

CHR vector of the same size as filename

Examples

format_html_id(list.files())


format_list_of_names R Documentation

Grammatically collapse a list of values

Description

Given a vector of arbitrary length that coerces properly to a human-readable character string, return it formatted as one of: “one”, “one and two”, or “one, two, …, and three” using glue::glue. This is functionally the same as a static version of [glue::glue_collapse] with parameters sep = “,”, width = Inf, and last = “, and”.

Usage

format_list_of_names(namelist, add_quotes = FALSE)

Arguments

namelist

vector of values to format

add_quotes

LGL scalar of whether to enclose individual values in quotation marks

Value

CHR vector of length one

Examples

format_list_of_names("test")
format_list_of_names(c("apples", "bananas"))
format_list_of_names(c(1:3))
format_list_of_names(seq.Date(Sys.Date(), Sys.Date() + 3, by = 1))

formulalize R Documentation

Generate standard chemical formula notation

Description

Generate standard chemical formula notation

Usage

formulalize(formula)

Arguments

formula

CHR string of an elemental formula

Value

string with a standard ordered formula

Examples


formula <- "C10H15S1O3"
formulalize(formula)

full_import R Documentation

Import one or more files from the NIST Method Reporting Tool for NTA

Description

This function serves as a single entry point for data imports. It is predicated upon the NIST import routine defined here and relies on several assumptions. It is intended ONLY as an interactive manner of importing n data files from the NIST Method Reporting Tool for NTA (MRT NTA).

Usage

full_import(
  import_object = NULL,
  file_name = NULL,
  db_conn = con,
  exclude_missing_required = FALSE,
  stop_if_missing_required = TRUE,
  include_if_missing_recommended = FALSE,
  stop_if_missing_recommended = TRUE,
  ignore_extra = TRUE,
  ignore_insert_conflicts = TRUE,
  requirements_obj = "import_requirements",
  method_in = "massspectrometry",
  ms_methods_table = "ms_methods",
  instrument_properties_table = "instrument_properties",
  sample_info_in = "sample",
  sample_table = "samples",
  contributor_in = "data_generator",
  contributors_table = "contributors",
  sample_aliases = NULL,
  generation_type = NULL,
  generation_type_norm_table = ref_table_from_map(sample_table, "generation_type"),
  mass_spec_in = "massspectrometry",
  chrom_spec_in = "chromatography",
  mobile_phases_in = "chromatography",
  qc_method_in = "qcmethod",
  qc_method_table = "qc_methods",
  qc_method_norm_table = ref_table_from_map(qc_method_table, "name"),
  qc_references_in = "source",
  qc_data_in = "qc",
  qc_data_table = "qc_data",
  carrier_mix_names = NULL,
  id_mix_by = "^mp*[0-9]+",
  mix_collection_table = "carrier_mix_collections",
  mobile_phase_props = list(in_item = "chromatography", db_table = "mobile_phases", props
    = c(flow = "flow", flow_units = "flowunits", duration = "duration", duration_units =
    "durationunits")),
  carrier_props = list(db_table = "carrier_mixes", norm_by =
    ref_table_from_map("carrier_mixes", "component"), alias_in = "carrier_aliases", props
    = c(id_by = "solvent", fraction_by = "fraction")),
  additive_props = list(db_table = "carrier_additives", norm_by =
    ref_table_from_map("carrier_additives", "component"), alias_in = "additive_aliases",
    props = c(id_by = "add$", amount_by = "_amount", units_by = "_units")),
  exclude_values = c("none", "", NA),
  peaks_in = "peak",
  peaks_table = "peaks",
  software_timestamp = NULL,
  software_settings_in = "msconvertsettings",
  ms_data_in = "msdata",
  ms_data_table = "ms_data",
  unpack_spectra = FALSE,
  unpack_format = c("separated", "zipped"),
  ms_spectra_table = "ms_spectra",
  linkage_table = "conversion_software_peaks_linkage",
  settings_table = "conversion_software_settings",
  as_date_format = "%Y-%m-%d %H:%M:%S",
  format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"),
  min_datetime = "2000-01-01 00:00:00",
  fragments_in = "annotation",
  fragments_table = "annotated_fragments",
  fragments_sources_table = "fragment_sources",
  fragments_norm_table = "norm_fragments",
  citation_info_in = "fragment_citation",
  inspection_info_in = "fragment_inspections",
  inspection_table = "fragment_inspections",
  generate_missing_aliases = TRUE,
  fragment_aliases_in = "fragment_aliases",
  fragment_aliases_table = "fragment_aliases",
  fragment_alias_type_norm_table = ref_table_from_map(fragment_aliases_table,
    "alias_type"),
  inchi_prefix = "InChI=1S/",
  rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"),
  rdkit_ns = "rdk",
  rdkit_make_if_not = TRUE,
  rdkit_aliases = c("inchi", "inchikey"),
  mol_to_prefix = "MolTo",
  mol_from_prefix = "MolFrom",
  type = "smiles",
  compounds_in = "compounddata",
  compounds_table = "compounds",
  compound_category = NULL,
  compound_category_table = "compound_categories",
  compound_aliases_in = "compound_aliases",
  compound_aliases_table = "compound_aliases",
  compound_alias_type_norm_table = ref_table_from_map(compound_aliases_table,
    "alias_type"),
  fuzzy = FALSE,
  case_sensitive = TRUE,
  ensure_unique = TRUE,
  require_all = FALSE,
  import_map = IMPORT_MAP,
  log_ns = "db"
)

Arguments

import_object

nested LIST object of JSON data to import; this import routine was built around output from the NTA MRT (default: NULL) - note you may supply either import object or file_name

file_name

external file in JSON format of data to import; this import routine was built around output from the NTA MRT (default: NULL) - note you may supply either import object or file_name

db_conn

connection object (default: con)

exclude_missing_required

LGL scalar of whether or not to skip imports missing required information (default: FALSE); if set to TRUE, this will override the setting for ‘stop_if_missing_required’ and the import will continue with logging messages for which files were incomplete

stop_if_missing_required

LGL scalar of whether or not to to stop the import routine if a file is missing required information (default: TRUE)

include_if_missing_recommended

LGL scalar of whether or not to include imports missing recommended information (default: FALSE)

stop_if_missing_recommended

LGL scalar of whether or not to to stop the import routine if a file is missing recommended information (default: TRUE)

ignore_extra

LGL scalar of whether to ignore extraneous import elements or stop the import process (default: TRUE)

ignore_insert_conflicts

LGL scalar of whether to ignore insert conflicts during the qc methods and qc data import steps (default: TRUE)

requirements_obj

CHR scalar of the name of an R object holding import requirements; this is a convenience shorthand to prevent multiple imports from parameter ‘file_name’ (default: “import_requirements”)

method_in

CHR scalar name of the ‘obj’ list containing method information

ms_methods_table

CHR scalar name of the database table containing method information

instrument_properties_table

CHR scalar name of the database table holding instrument property information for a given method (default: “instrument_properties”)

sample_info_in

CHR scalar name of the element within ‘import_object’ containing samples information

sample_table

CHR scalar name of the database table holding sample information (default: “samples”)

contributor_in

CHR scalar name of the element within ‘import_object[[sample_info_in]]’ containing contributor information (default: “data_generator”)

contributors_table

CHR scalar name of the database table holding contributor information (default: “contributors”)

sample_aliases

named CHR vector of aliases with names matching the alias, and values of the alias reference e.g. c(“ACU1234” = “NIST Biorepository GUAID”) which can be virutally any reference text; it is recommended that the reference be to a resolver service if connecting with external data sources (default: NULL)

generation_type

CHR scalar of the type of data generated for this sample (e.g. “empirical” or “in silico”). The default (NULL) will assign based on ‘generation_type_default’; any other value will override the default value and be checked against values in ‘geneation_type_norm_table’

generation_type_norm_table

CHR scalar name of the database table normalizing sample generation type (default: “empirical”)

mass_spec_in

CHR scalar name of the element in ‘obj’ holding mass spectrometry information (default: “massspectrometry”)

chrom_spec_in

CHR scalar name of the element in ‘obj’ holding chromatographic information (default: “chromatography”)

mobile_phases_in

CHR scalar name of the database table holding mobile phase and chromatographic information (default: “chromatography”)

qc_method_in

CHR scalar name of the import object element containing QC method information (default: “qcmethod”)

qc_method_table

CHR scalar of the database table name holding QC method check information (default: “qc_methods”)

qc_method_norm_table

CHR scalar name of the database table normalizing QC methods type (default: “norm_qc_methods_name”)

qc_references_in

CHR scalar of the name in ‘obj[[qc_method_in]]’ that contains the reference or citation for the QC protocol (default: “source”)

carrier_mix_names

CHR vector (optional) of carrier mix collection names to assign, the length of which should equal 1 or the length of discrete carrier mixtures; the default, NULL, will automatically assign names as a function of the method and sample id.

id_mix_by

regex CHR to identify mobile phase mixtures (default: “^mp*[0-9]+” matches the generated mixture names)

mix_collection_table

CHR scalar name of the mix collections table (default: “carrier_mix_collections”)

mobile_phase_props

LIST object describing how to import the mobile phase table containing: in_item: CHR scalar name of the ‘obj’ name containing chromatographic information (default: “chromatography”); db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); props: named CHR vector of name mappings with names equal to database columns in ‘mobile_phase_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’

carrier_props

LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_carriers”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “carrier_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘carrier_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate a carrier (e.g. “solvent”)

additive_props

LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_additives”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “additive_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘additive_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’ ‘obj[[mobile_phase_props\(in_item]][[mobile_phase_props\)db_table]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate an additive (e.g. “add$”)

exclude_values

CHR vector indicating which values to ignore in ‘obj’ (default: c(“none”, ““, NA))

peaks_in

CHR scalar name of the element within ‘import_object’ containing peak information

peaks_table

CHR scalar name of the database table holding sample information (default: “samples”)

ms_data_in

CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”)

ms_data_table

CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)

unpack_spectra

LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE)

unpack_format

CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”

ms_spectra_table

CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)

fragments_in

CHR scalar name of the ‘obj’ component holding annotated fragment information (default: “annotation”)

fragments_table

CHR scalar name of the database table holding annotated fragment information (default: “annotated_fragments”)

fragments_sources_table

CHR scalar name of the database table holding fragment source (e.g. generation) information (default: “fragment_sources”)

fragments_norm_table

CHR scalar name of the database table holding normalized fragment identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragments_table’)

citation_info_in

CHR scalar name of the ‘obj’ component holding fragment citation information (default: “fragment_citation”)

inspection_info_in

CHR scalar name of the ‘obj’ component holding fragment inspection information (default: “fragment_inspections”)

inspection_table

CHR scalar name of the database table holding fragment inspection information (default: “fragment_inspections”)

generate_missing_aliases

LGL scalar determining whether or not to generate machine readable expressions (e.g. InChI) for fragment aliases from RDKit (requires RDKit activation; default: FALSE); see formals list for [add_rdkit_aliases]

fragment_aliases_in

CHR scalar name of the ‘obj’ component holding fragment aliases (default: “fragment_aliases”)

fragment_aliases_table

CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”)

fragment_alias_type_norm_table

CHR scalar name of the alias reference normalization table, by default the return of ref_table_from_map(fragment_aliases_table, “alias_type”)

rdkit_ref

CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)

mol_to_prefix

CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”)

mol_from_prefix

CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”)

type

The type of chemical structure notation (default: SMILES)

compounds_in

CHR scalar name in ‘obj’ holding compound data (default: “compounddata”)

compounds_table

CHR scalar name the database table holding compound data (default: “compounds”)

compound_category

CHR or INT scalar of the compound category (either a direct ID or a matching category label in ‘compound_category_table’) (default: NULL)

compound_category_table

CHR scalar name the database table holding normalized compound categories (default: “compound_categories”)

compound_aliases_in

CHR scalar name of where compound aliases are located within the import (default: “compound_aliases”), passed to [resolve_compounds] as “norm_alias_table”

compound_aliases_table

CHR scalar name of the alias reference table to use when assigning compound aliases (default: “compound_aliases”) passed to [resolve_compounds] as “compounds_table”

compound_alias_type_norm_table

CHR scalar name of the alias reference normalization table, by default the return of ref_table_from_map(compound_aliases_table, “alias_type”)

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

ensure_unique

LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)

require_all

LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)

import_map

data.frame object of the import map (e.g. from a CSV)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Details

Import files should be in JSON format as created by the MRT NTA. Examples are provided in the “example” directory of the project.

Defaults for this release are set throughout as of the latest database schema, but left here as arguments in case those should change, or slight changes are made to column and table names.

Value

Console logging if enabled and interactive prompts when user intervention is required. There is no formal return as it executes database actions.

Note

Many calls within this function are executed as do.call with a filtered argument list based on the names of formals for the called function. Several arguments to those functions are also left as the defaults set there; names must match exactly to be passed in this manner. See the list of inherited parameters.


gather_qc R Documentation

Quality Control Check of Import Data

Description

Performs the quality control check on the imported data from the peak gather function.

Usage

gather_qc(
  gather_peak,
  exactmasses,
  exactmasschart,
  ms1range = c(0.5, 3),
  ms1isomatchlimit = 0.5,
  minerror = 0.002,
  max_correl = 0.8,
  correl_bin = 0.1,
  max_ph = 10,
  ph_bin = 1,
  max_freq = 10,
  freq_bin = 1,
  min_n_peaks = 3,
  cormethod = "pearson"
)

Arguments

gather_peak

peak object generated from ‘peak_gather_json’ function

exactmasses

exactmasses list

ms1range

2-component vector containing stating the range to evaluate the isotopic pattern of the precursor ion, from mass - ms1range[1] to mass + ms1range[2]

ms1isomatchlimit

the reverse dot product minimum score for the isotopic pattern match

minerror

the minimum mass error (in Da) allowable for the instrument

max_correl

[TODO PLACEHOLDER]

correl_bin

[TODO PLACEHOLDER]

max_ph

[TODO PLACEHOLDER]

ph_bin

[TODO PLACEHOLDER]

max_freq

[TODO PLACEHOLDER]

freq_bin

[TODO PLACEHOLDER]

min_n_peaks

[TODO PLACEHOLDER]

cormethod

[TODO PLACEHOLDER]

Value

nested list of quality control check results


get_annotated_fragments R Documentation

Get all annotated fragments have matching masses

Description

Get all annotated fragments have matching masses

Usage

get_annotated_fragments(con, fragmentions, masserror, minerror)

Arguments

con

SQLite database connection

fragmentions

numeric vector containing m/z values for fragments to search

masserror

numeric relative mass error (ppm)

minerror

numeric minimum mass error (Da)

Value

data.frame of mass spectral data


get_component R Documentation

Resolve components from a list or named vector

Description

Call this to pull a component named obj_component from a list or named vector provided as obj and optionally use [tack_on] to append to it. This is intended to ease the process of pulling specific components from a list for further treatment in the import process by isolating that component.

Usage

get_component(obj, obj_component, silence = TRUE, log_ns = "global", ...)

Arguments

obj

LIST or NAMED vector in which to find obj_component

obj_component

CHR vector of named elements to find in obj

silence

LGL scalar indicating whether to silence recursive messages, which may be the same for each element of obj (default: TRUE)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Additional arguments passed to/from the ellipsis parameter of calling functions. If named, names are preserved.

Details

This is similar in scope to [purrr::pluck] in many regards, but always returns items with names, and will search an entire list structure, including data frames, to return all values associated with that name in individual elements.

Value

LIST object containing the elements of obj

Note

This is a recursive function.

If ellipsis arguments are provided, they will be appended to each identified component via [tack_on]. Use with caution, but this can be useful for appending common data to an entire list (e.g. a datetime stamp for logging processing time or a processor name, human or software).

Examples

get_component(list(a = letters, b = 1:10), "a")
get_component(list(ex = list(a = letters, b = 1:10), ex2 = list(c = 1:5, a = LETTERS)), "a")
get_component(list(a = letters, b = 1:10), "a", c = 1:5)


get_compound_fragments R Documentation

Get all fragments associated with compounds

Description

Get all fragments associated with compounds

Usage

get_compound_fragments(con, fragmentions, masserror, minerror)

Arguments

con

SQLite database connection

fragmentions

numeric vector containing m/z values for fragments to search

masserror

numeric relative mass error (ppm)

minerror

numeric minimum mass error (Da)

Value

data.frame object describing known fragments in the database with known compound and peak references attached


get_compoundid R Documentation

Get compound ID and name for specific peaks

Description

Get compound ID and name for specific peaks

Usage

get_compoundid(con, peakid)

Arguments

con

SQLite database connection

peakid

integer vector of primary keys for peaks table

Value

table of compound IDs and names


get_fkpk_relationships R Documentation

Extract foreign key relationships from a schema

Description

This convenience function is part of the automatic generation of SQL commands building views and triggers from a defined schema. Its sole purpose is as a pre-pass extraction of foreign key relationships between tables from an object created by [db_map], which in turn relies on specific formatting in the schema SQL definitions.

Usage

get_fkpk_relationships(er_map(db_conn = con))

Arguments

db_map

LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’

references_in

CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map])

dictionary

LIST object containing the schema dictionary produced by [data_dictionary] fully describing table entities

Value

LIST of data frames with one element for each table with a foreign key defined

Note

This only functions for list objects formatted correctly. That is, each entry in [db_map] must contain an element with a name matching that provided to ‘references_in’ which contains a character vector formatted as “table1 REFERENCES table2(pk_column)”.


get_massadj R Documentation

Calculate the mass adjustment for a specific adduct

Description

Calculate the mass adjustment for a specific adduct

Usage

get_massadj(adduct = "+H", exactmasses = NULL, db_conn = "con")

Arguments

adduct

character string containing the + or - and the elemental formula of the adduct, note “2H” should be represented as “H2”

exactmasses

list of exact masses of elements, NULL pulls from the database

db_conn

database connection object, either a CHR scalar name (default: “con”) or the connection object itself (preferred)

Value

NUM scalar of the mass adjustment value


get_msconvert_data R Documentation

Extract msconvert metadata

Description

Extracts relevant Proteowizard MSConvert metadata from mzml file. Used for ‘peak_gather_json’ function

Usage

get_msconvert_data(mzml)

Arguments

mzml

list of msdata from ‘mzMLtoR’ function

Value

list of msconvert parameters


get_msdata R Documentation

Get all mass spectral data within the database

Description

Get all mass spectral data within the database

Usage

get_msdata(con)

Arguments

con

SQLite database connection

Value

data.frame of mass spectral data


get_msdata_compound R Documentation

Get all mass spectral data for a specific compound

Description

Get all mass spectral data for a specific compound

Usage

get_msdata_compound(con, 15)

Arguments

con

SQLite database connection

compoundid

integer compound ID value

Value

data.frame of mass spectral data


get_msdata_peakid R Documentation

Get all mass spectral data for a specific peak id

Description

Get all mass spectral data for a specific peak id

Usage

get_msdata_peakid(con, 15)

Arguments

con

SQLite database connection

peakid

integer vector of peak ids

Value

data.frame of mass spectral data


get_msdata_precursors R Documentation

Get all mass spectral data with a specific precursor ion

Description

Get all mass spectral data with a specific precursor ion

Usage

get_msdata_precursors(con, precursorion, masserror, minerror)

Arguments

con

SQLite database connection

precursorion

numeric precursor ion m/z value

masserror

numeric relative mass error (ppm)

minerror

numeric minimum mass error (Da)

Value

data.frame of mass spectral data


get_opt_params R Documentation

Get optimized uncertainty mass spectra parameters for a peak

Description

Get optimized uncertainty mass spectra parameters for a peak

Usage

get_opt_params(con, peak_ids)

Arguments

con

SQLite database connection

peak_ids

integer vector of primary keys for peaks table

Value

data.frame object of available optimized search parameters


get_peak_fragments R Documentation

Get annotated fragments for a specific peak

Description

Get annotated fragments for a specific peak

Usage

get_peak_fragments(con, peakid)

Arguments

con

SQLite database connection

peakid

integer vector of primary keys for peaks table

Value

data.frame of annotated fragments


get_peak_precursor R Documentation

Get precursor ion m/z for a specific peak

Description

Get precursor ion m/z for a specific peak

Usage

get_peak_precursor(con, peakid)

Arguments

con

SQLite database connection

peakid

integer primary key for peaks table

Value

numeric value of precursor ion m/z value


get_sample_class R Documentation

Get sample class information for specific peaks

Description

Get sample class information for specific peaks

Usage

get_sample_class(con, peakid)

Arguments

con

SQLite database connection

peakid

integer vector of primary keys for peaks table

Value

data.frame object of sample classes associated with a given peak


get_search_object R Documentation

Generate msdata object from input peak data

Description

Generate msdata object from input peak data

Usage

get_search_object(searchmzml, zoom = c(1, 4))

Arguments

searchmzml

mzml with searching dataframe from ‘getmzML’ function

zoom

vector length of 2 containing +/- the area around the MS1 precursor ion to collect data.

Value

LIST object of data.frames include MS1 and MS2 analytical data, and the search parameters used to generate them


get_suspectlist R Documentation

Get the current NIST PFAS suspect list.

Description

Downloads the current NIST suspect list of PFAS from the NIST Public Data Repository to the current project directory.

Usage

get_suspectlist(
  destfile = file.path("R", "compoundlist", "suspectlist.xlsx"),
  url_file = file.path("config", "suspectlist_url.txt"),
  default_url = SUS_LIST_URL,
  save_local = FALSE
)

Arguments

destfile

CHR scalar file.path of location to save the downloaded file

url_file

CHR scalar file.path of where the text file containing the download URL for the NIST PFAS Suspect List

save_local

LGL scalar of whether to retain an R expression in the current environment after download

Value

none

Examples

get_suspectlist()

get_ums R Documentation

Generate consensus mass spectrum

Description

The function calculates the uncertainty mass spectrum for a single peak table based on specific settings described in https://doi.org/10.1021/jasms.0c00423

Usage

get_ums(
  peaktable,
  correl = NULL,
  ph = NULL,
  freq = NULL,
  normfn = "sum",
  cormethod = "pearson"
)

Arguments

peaktable

result of the ‘create_peak_table_ms1’ or ‘create_peak_table_ms1’ function

correl

Minimum correlation coefficient between the target ions and the base ion intensity of the targeted m/z to be included in the mass spectrum

ph

Minimum chromatographic peak height from which to extract MS2 data for the mass spectrum

freq

minimum observational frequency of the target ions to be included in the mass spectrum

normfn

the normalization function typically “mean” or “sum” for normalizing the intensity values

cormethod

the correlation method used for calculating the correlation, see ‘cor’ function for methods

Value

nested list of dataframes containing all MS1 and MS2 data for the peak


get_uniques R Documentation

Get unique components of a nested list

Description

There are times when the concept of “samples” and “grouped data” may become intertwined and difficult to parse. The import process is one of those times depending on how the import file is generated. This function takes a nested list and compares a specific aspect of it, grouping the output based on that aspect and returning its characteristics.

Usage

get_uniques(objects, aspect)

Arguments

objects

LIST object

aspect

CHR scalar name of the aspect from which to generate unique combinations

Details

For example, the standard NIST import includes the “sample” aspect, which may be identical for multiple data import files. This provides a unique listing of those sample characteristics to reduce data manipulation and storage, and minimize database “chatter” during read/write. It returns a set of unique characteristics in a list, with appended characteristics “import_object” with the index number and object name of entries matching those characteristics.

This is largely superceded by later developments to database operations that first check for a table primary key id given a comprehensive list of column values in those tables where only a single record should contain those values (e.g. a complete unique case, enforced or unenforced).

Value

Unnamed LIST of length equaling the number of unique combinations with their values and indices

Examples

tmp <- list(list(a = 1:10, b = 1:10), list(a = 1:5, b = 1:5), list(a = 1:10, b = 1:5))
get_uniques(tmp)

getcharge R Documentation

Get polarity of a ms scan within mzML object

Description

Get polarity of a ms scan within mzML object

Usage

getcharge(mzml, i)

Arguments

mzml

list mzML object generated from ‘mzMLtoR’ function

i

integer scan number

Value

integer representing scan polarity (either 1 (positive) or -1 (negative))


getmslevel R Documentation

Get MS Level of a ms scan within mzML object

Description

Get MS Level of a ms scan within mzML object

Usage

getmslevel(mzml, i)

Arguments

mzml

list mzML object generated from ‘mzMLtoR’ function

i

integer scan number

Value

integer representing the MS Level (1, 2, … n)


getmzML R Documentation

Brings raw data file into environment

Description

If filename is not extension .mzML, then converts the raw file

Usage

getmzML(
  search_df,
  CONVERT = FALSE,
  CHECKCONVERT = TRUE,
  is_waters = FALSE,
  lockmass = NULL,
  lockmasswidth = NULL,
  correct = FALSE
)

Arguments

search_df

data.frame output of [create_search_df] or file name of a raw file to be converted

CONVERT

LGL scalar of whether or not to convert the search_df filename (default FALSE)

CHECKCONVERT

LGL scalar of whether or not to verify the conversion format (default TRUE)

Value

LIST value of the trimmed mzML file matching search criteria


getprecursor R Documentation

Get precursor ion of a ms scan within mzML object

Description

Get precursor ion of a ms scan within mzML object

Usage

getprecursor(mzml, i)

Arguments

mzml

list mzML object generated from ‘mzMLtoR’ function

i

integer scan number

Value

numeric designating the precursor ion (or middle of the scan range for SWATCH or DIA), returns NULL if no precursor was selected


gettime R Documentation

Get time of a ms scan within mzML object

Description

Get time of a ms scan within mzML object

Usage

gettime(mzml, i)

Arguments

mzml

list mzML object generated from ‘mzMLtoR’ function

i

integer scan number

Value

numeric of the scan time


has_missing_elements R Documentation

Simple check for if an object is empty

Description

Checks for empty vectors, a blank character string, NULL, and NA values. If fed a list object, returns TRUE if any element is is the “empty” set. For data.frames checks that nrow is not 0. [rlang:::is_empty] only checks for length 0.

Usage

has_missing_elements(x, logging = TRUE)

Arguments

x

Object to be checked

logging

LGL scalar of whether or not to make log messages (default: TRUE)

Value

LGL scalar of whether x is empty

Note

Reminder that vectors created with NULL values will be automatically reduced by R.

Examples

has_missing_elements("a")
# FALSE
has_missing_elements(c(NULL, 1:5))
# FALSE
has_missing_elements(list(NULL, 1:5))
# TRUE
has_missing_elements(data.frame(a = character(0)))
# TRUE

is_elemental_match R Documentation

Checks if two elemental formulas match

Description

Checks if two elemental formulas match

Usage

is_elemental_match(testformula, trueformula)

Arguments

testformula

character string of elemental formula to test

trueformula

character string of elemental formula to check against (truth)

Value

logical


is_elemental_subset R Documentation

Check if elemental formula is a subset of another formula

Description

Check if elemental formula is a subset of another formula

Usage

is_elemental_subset(fragmentformula, parentformula)

Arguments

fragmentformula

character string of elemental formula subset to test

parentformula

character string of elemental formula to check for subset

Value

logical

Examples

is_elemental_subset("C2H2", "C2H5O")

is_elemental_subset("C2H2", "C2H1O")

isotopic_distribution R Documentation

Isotopic distribution functions Generate isotopic distribution mass spectrum of elemental formula

Description

Isotopic distribution functions Generate isotopic distribution mass spectrum of elemental formula

Usage

isotopic_distribution(
  elementalformula,
  exactmasschart,
  remove.elements = c(),
  max.dist = 3,
  min.int = 0.001,
  charge = "neutral"
)

Arguments

elementalformula

character string of elemental formula to simulate isotopic pattern

exactmasschart

exact mass chart generated from function create_exactmasschart

remove.elements

character vector of elements to remove from elemental formula

max.dist

numeric maximum mass distance (in Da) from exact mass to include in simulated isotopic pattern

min.int

numeric minimum relative intensity (maximum = 1, minimum = 0) to include in simulated isotopic pattern

charge

character string for the charge state of the simulated isotopic pattern, options are ‘neutral’, ‘positive’, and ‘negative’

Value

data frame containing mz and int values of mass spectrum


lockmass_remove R Documentation

Remove lockmass scan from mzml object

Description

For Waters instruments only, identifies the scans that are due to a lock mass scan and removes them for easier processing.

Usage

lockmass_remove(
  mzml,
  lockmass = NULL,
  lockmasswidth = NULL,
  correct = FALSE,
  approach = "baseion"
)

Arguments

mzml

mzML object generated from mzMLtoR() function

lockmass

m/z value of the lockmass to remove

lockmasswidth

m/z value for the half-window of the lockmass scan

correct

logical if the subsequent spectra should be corrected

Value

A copy of the object provided to ‘mzml’ with the lock mass removed.


log_as_dataframe R Documentation

Pull a log file into an R object

Description

Log messages generated by logger with anything other than the standard formatting options can have multiple formatting tags to display in the R console. These “junk up” any resulting object. If you want to read it directly in the console and preserve formatting, call [read_log] with the default ‘as_object’ argument (FALSE). For deeper inspection, a data frame works well, provided the formatting matches up. In ‘env_logger.R’ there is an option to set formatting layouts. In addition to setting formatting layouts, generate regex strings matching the desired format - ‘log_remove_color’ will remove the colors (the majority should be caught by the string provided as the default in this package) and ‘log_split_column’ will split the lines in your logging file into discrete categories named by ‘df_titles’.

Usage

log_as_dataframe("log.txt")

Arguments

file

CHR scalar file path to a log file (default NULL is translated as “log.txt”)

last_n

INT scalar of the last ‘n’ log entries to read.

condense

LGL scalar of whether to nest the resulting tibble by the nearest second.

regex_remove

CHR scalar regular expression of characters to REMOVE from log messages via [stringr::str_remove_all]

regex_split

CHR scalar regular expression of characters used to split the log entry into columns from log messages via [tidyr::separate]

df_titles

CHR vector of headers for the resulting data frame, passed as the “into” argument of [tidyr::separate]

Details

This will attempt to fail gracefully.

Value

tibble with one row per log entry (or groups)

Note

If “time” is included and ‘condense’ == TRUE, the log messages in the resulting tibble will nested to the nearest second.

If “status” is included it will be a factor with levels including the valid statuses from logger (see [logger::log_levels]).

Use care to develop ‘regex_split’ in order to split the log entries into the appropriate columns as defined by ‘df_titles’; extra values will be merged into the messages column.


log_fn R Documentation

Simple logging convenience

Description

Conveniently add a log message at the trace level. Typically this would be called twice bookending the body of a function along the lines of “Start fn()” and “End fn()” when calling a function. This can help provided traceability to deeply nested function calls within a log.

Usage

fn <- function() {log_fn("start"); 1+1; log_fn("end")}
fn()

Arguments

status

CHR scalar to prefix the log message; will be coerced to sentence case. Typically “start” or “end” but anything is accepted (default “start”).

log_ns

CHR scalar of the logger namespace to use (default NA_character_)

level

CHR scalar of the logging level to be passed to [log_it] (default “trace”)

Value

None, hands logging messages to [log_it]


log_it R Documentation

Conveniently log a message to the console

Description

Use this to log messages of arbitrary level and message. It works best with [logger] but will also print directly to the console to support setups where package [logger] may not be available or custom log levels are desired.

Usage

log_it(
  log_level,
  msg = NULL,
  log_ns = NULL,
  reset_logger_settings = FALSE,
  reload_all = FALSE,
  logger_settings = file.path("config", "env_logger.R"),
  add_unknown_ns = FALSE,
  clone_settings_from = NULL
)

Arguments

log_level

CHR scalar of the level at which to log a given statement. If using the [logger] package, must match one of [logger:::log_levels]

msg

CHR scalar of the message to accompany the log.

log_ns

CHR scalar of the logging namespace to use during execution (default: NULL prints to the global logging namespace)

reset_logger_settings

LGL scalar indicating whether or not to refresh the logger settings using the file identified in logger_settings (default: FALSE)

reload_all

LGL scalar indicating whether to, during reset_logger_settings, to reload the R environment configuration file

logger_settings

CHR file path to the file containing logger settings (default: file.path(“config”, “env_logger.R”))

add_unknown_ns

LGL scalar indicating whether or not to add a new namespace if log_ns is not defined in logger_settings (default: FALSE)

clone_settings_from

CHR scalar indicating

Details

When using [logger], create settings for each namespace in file config/env_logger.R as a list (see examples there) and make sure it is sourced. If using with [logger] and “file” or “both” is selected for the namespace LOGGING[[log_ns]]\(to</code> parameter in <code>env_logger.R</code> logs will be written to disk at the file defined in <code>LOGGING[[log_ns]]\)file as well as the console.

Value

Adds to the logger file (if enabled) and/or prints to the console if enabled. See

Examples

log_it("test", "a test message")
test_log <- function() {
  log_it("success", "a success message")
  log_it("warn", "a warning message")
}
test_log()
# Try it with and without logger loaded.

make_acronym R Documentation

Simple acronym generator

Description

At times it is useful for display purposes to generate acronyms for longer bits of text. This naively generates those by extracting the first letter as upper case from each word in text elements.

Usage

make_acronym(text)

Arguments

text

CHR vector of the text to acronym-ize

Value

CHR vector of length equal to that of text with the acronym

Examples

make_acronym("test me")
make_acronym(paste("department of ", c("commerce", "energy", "defense")))

make_install_code R Documentation

Convenience function to set a new installation code

Description

Convenience function to set a new installation code

Usage

make_install_code(db_conn = con, new_name = NULL, log_ns = "db")

Arguments

db_conn

connection object (default “con”)

new_name

CHR scalar of the human readable name of the installation (e.g. your project name) (default: NULL)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None


make_requirements R Documentation

Make import requirements file

Description

Importing from the NIST contribution spreadsheet requires a certain format. In order to proceed smoothly, that format must be verified for gross integrity with regard to expectations about shape (i.e. class), names of elements, and whether they are required for import. This function creates a JSON expression of the expected import structure and saves it to the project directory.

Usage

make_requirements(
  example_import,
  file_name = "import_requirements.json",
  not_required = c("annotation", "chromatography", "opt_ums_params"),
  archive = TRUE,
  retain_in_R = TRUE,
  log_ns = "db"
)

Arguments

example_import

CHR or LIST object containing an example of the expected import format; this should include only a SINGLE compound contribution file

file_name

CHR scalar indicating a file name to save the resulting name or search on any existing file to archive if ‘archive’ = TRUE (default: “import_requirements.json”)

not_required

CHR vector matching element names of ‘example_import’ which are not required; all others will be assumed to be required

archive

LGL indicating whether or not to archive an existing file matching ‘file_name’ by suffixing the file name with current date. Only one archive per date is supported; if a file already exists, it will be deleted. (default: TRUE)

retain_in_R

LGL indicating whether to retain a local copy of the requirements file generated (default: TRUE)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Details

Either an existing JSON expression or an R list object may be used for ‘example_import’. If it is a character scalar, it will be assumed to be a file name, which will be loaded based on file extension. That file must be a JSON parseable text file, though raw text is acceptable.

An example file is located in the project directory at “example/PFAC30PAR_PFCA1_mzML_cmpd2627.JSON”

As with any file manipulation, use care with ‘file_name’.

Value

writes a file to the project directory (based on the found location of ‘file_name’) with the JSON structure


manage_connection R Documentation

Check for, and optionally remove, a database connection object

Description

This function seeks to abstract connection management objects to a degree. It seeks to streamline the process of connecting and disconnecting existing connections as defined by function parameters. This release has not been tested extensively with drivers other than SQLite.

Usage

manage_connection("test.sqlite", conn_name = "test_con")

Arguments

db

CHR scalar name of the database to check, defaults to the name supplied in config/env.R (default: session variable DB_NAME)

drv_pack

CHR scalar of the package used to connect to this database (default: session variable DB_DRIVER)

conn_class

CHR vector of connection object classes to check against. Note this may depend heavily on connection packages and must be present in the class names of the driver used. (default session variable DB_CLASS)

conn_name

CHR scalar of the R environment object name to use for this connection (default: “con”)

is_local

LGL scalar indicating whether or not the referenced database is a local file, if not it will be treated as though it is either a DSN or a database name on your host server, connecting as otherwise defined

rm_objects

LGL scalar indicating whether or not to remove objects identifiably connected to the database from the current environment. This is particularly useful if there are outstanding connections that need to be closed (default: TRUE)

reconnect

LGL scalar indicating whether or not to connect if a connection does not exist; if both this and ‘disconnect’ are true, it will first be disconnected before reconnecting. (default: TRUE)

disconnect

LGL scalar indicating whether or not to terminate and remove the connection from the current global environment (default: TRUE)

log_ns

CHR scalar of the namespace (if any) to use for logging

.environ

environment within which to place this connection object

named list of any other connection parameters required for your database driver (e.g. postgres username/password)

Value

None

Note

If you want to disconnect everything but retain tibble pointers to your data source as tibbles in this session, use [close_up_shop] instead.

For more complicated setups, it may be easier to use this function by storing parameters in a list and calling with [base::do.call()]


map_import R Documentation

Map an import file to the database schema

Description

This parses an import object and attempts to map it to database fields and tables as defined by an import map stored in an object of class data.frame, typically created during project compliance as “IMPORT_MAP”. This object is a list of all columns and their tables in the import file matched with the database table and column to which they should be imported.

Usage

map_import(
  import_obj,
  aspect,
  import_map,
  case_sensitive = TRUE,
  fuzzy = FALSE,
  ignore = TRUE,
  id_column = "_*id$",
  alias_column = "^alias$",
  resolve_normalization = TRUE,
  strip_na = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

import_obj

LIST object of values to import

aspect

CHR scalar of the import aspect (e.g. “sample”) to map

import_map

data.frame object of the import map (e.g. from a CSV)

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

LIST of final mapped values

Note

The object used for ‘import_map’ must be of a data.frame object that at minimum includes names columns that includes import_category, import_parameter, alias_lookup, and sql_normalization


mode_checks R Documentation

Get list of available functions

Description

Helper function for verify_args() that returns all the currently available functions matching a given prefix. This searches the entire library associated with the current R install.

Usage

mode_checks(prefix = "is", use_deprecated = FALSE)

Arguments

prefix

CHR scalar for the function prefix to search (default “is”)

use_deprecated

BOOL scalar indicating whether or not to include functions marked as deprecated (PLACEHOLDER default FALSE)

Details

Note: argument use_deprecated is not currently used but serves as a placeholder for future development to avoid or include deprecated functions

Value

CHR vector of functions matching prefix

Examples

mode_checks()

molecule_picture R Documentation

Picture a molecule from structural notation

Description

This is a thin wrapper to rdkit.Chem.MolFromX methods to generate molecular models from common structure notation such as InChI or SMILES. All picture files produced will be in portable network graphics (.png) format.

Usage

caffeine <- "C[n]1cnc2N(C)C(=O)N(C)C(=O)c12"
molecule_picture(caffeine, show = TRUE)

Arguments

mol

CHR scalar expression of molecular structure

mol_type

CHR scalar indicating the expression type of ‘mol’ (default: “smiles”)

file_name

CHR scalar of an intended file destination (default: NULL will produce a random 10 character file name). Note that any file extensions provided here will be ignored.

rdkit_name

CHR scalar indication the name of the R object bound to RDkit OR the name of the R object directly (i.e. without quotes)

open_file

LGL scalar of whether to open the file after creation (default: FALSE)

show

LGL scalar of whether to return the image itself as an object (default: FALSE)

Value

None, or displays the resulting picture if ‘show == TRUE’

Note

Supported ‘mol’ expressions include FASTA, HELM, Inchi, Mol2Block, Mol2File, MolBlock, MolFile, PDBBlock, PDBFile, PNGFile, PNGString, RDKitSVG, Sequence, Smarts, Smiles, TPLBlock, and TPLFile


monoisotope.list R Documentation

Calculate the monoisotopic mass of a elemental formulas in

Description

Calculate the monoisotopic mass of a elemental formulas in

Usage

monoisotope.list(
  df,
  column,
  exactmasses,
  remove.elements = c(),
  adduct = "neutral"
)

Arguments

df

data.frame with at least one column with elemental formulas

column

integer or CHR scalar indicating the column containing the elemental formulas, if CHR then regex match is used

exactmasses

list of exact masses

remove.elements

elements to remove from the elemental formulas

adduct

character string adduct/charge state to add to the elemental formula, options are ‘neutral’, ‘+H’, ‘-H’, ‘+Na’, ‘+K’, ‘+’, ‘-’, ‘-radical’, ‘+radical’

Value

data.frame with column of exact masses appended to it


ms_plot_peak R Documentation

Plot a peak from database mass spectral data

Description

Plots the intensity of ion traces over the scan period and annotates them with the mass to charge value. Several flexible plotting aspects are provided as data may become complicated.

Usage

ms_plot_peak(
  data,
  peak_type = c("area", "line", "segment"),
  peak_facet_by = "ms_n",
  peak_mz_resolution = 0,
  peak_drop_ratio = 0.01,
  peak_repel_labels = TRUE,
  peak_line_color = "black",
  peak_fill_color = "grey50",
  peak_fill_alpha = 0.2,
  peak_text_size = 3,
  peak_text_offset = 0.02,
  include_method = TRUE,
  db_conn = con
)

Arguments

data

data.frame of spectral data in the form of the ‘ms_data’ table

peak_type

CHR scalar of the plot type to draw, must be one of “line”, “segment”, or “area” (default: “line”)

peak_facet_by

CHR scalar name of a column by which to facet the resulting plot (default: “ms_n”)

peak_mz_resolution

INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int”), ion m/z value (as “base_ion”), and scan time (as “scantime”) - (default: 0 goes to unit resolution)

peak_drop_ratio

NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)

peak_repel_labels

LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation

peak_line_color

CHR scalar name of the color to use for the “color” aesthetic (only a single color is supported; default: “black”)

peak_fill_color

CHR scalar name of the color to use for the “fill” aesthetic (only a single color is supported; default: “grey70”)

peak_text_offset

NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.02 offsets labels in the positive direction by 2% of the maximum intensity)

db_conn

database connection (default: con) which must be live to pull sample and compound identification information

Details

The basic default plot will group all mass-to-charge ratio values by unit resolution (increase resolution with ‘peak_mz_resolution’) and plot them as an area trace over the scanning period. Traces are annotated with the grouping value. Values of ‘peak_mz_resolution’ greater than available data (e.g. 10 when data resolution is to the 5th decimal point) will default to maximum resolution.

Traces are filtered out completely if their maximum intensity is below the ratio set by ‘peak_drop_ratio’; only complete traces are filtered out this way, not individual data points within a retained trace. Set this as the fraction of the base peak (the peak of maximum intensity) to use to filter out low-intensity traces. The calculated intensity threshold will be printed to the caption.

Value

ggplot object

Note

Increasing ‘peak_mz_resolution’ will likely result in multiple separate traces.

Implicitly missing values are not interpolated, but lines are drawn through to the next point.

‘peak_type’ can will accept abbreviations of its accepted values (e.g. “l” for “line”)


ms_plot_peak_overview R Documentation

Create a patchwork plot of peak spectral properties

Description

Call this function to generate a combined plot from [ms_plot_peak], [ms_plot_spectra], and [ms_plot_spectral_intensity] using the [patchwork] package, which must be installed. All arguments will be passed directly to the underlying functions to provide flexibility in the final display. The default settings match those of the called plotting functions, and the output can be further manipulated with the patchwork package.

Usage

ms_plot_peak_overview(
  plot_peak_id,
  peak_type = c("area", "line", "segment"),
  peak_facet_by = "ms_n",
  peak_mz_resolution = 0,
  peak_drop_ratio = 0.01,
  peak_repel_labels = TRUE,
  peak_line_color = "black",
  peak_fill_color = "grey50",
  peak_fill_alpha = 0.2,
  peak_text_size = 3,
  peak_text_offset = 0.02,
  spectra_mz_resolution = 3,
  spectra_drop_ratio = 0.01,
  spectra_repel_labels = TRUE,
  spectra_repel_line_color = "grey50",
  spectra_nudge_y_factor = 0.03,
  spectra_log_y = FALSE,
  spectra_text_size = 3,
  spectra_max_overlaps = 50,
  intensity_plot_resolution = c("spectra", "peak"),
  intensity_mz_resolution = 3,
  intensity_drop_ratio = 0,
  patchwork_design = c(area(1, 4, 7, 7), area(1, 1, 4, 2), area(6, 1, 7, 2)),
  as_individual_plots = FALSE,
  include_method = TRUE,
  db_conn = con,
  log_ns = "global"
)

Arguments

peak_type

CHR scalar of the plot type to draw, must be one of “line”, “segment”, or “area” (default: “line”)

peak_facet_by

CHR scalar name of a column by which to facet the resulting plot (default: “ms_n”)

peak_mz_resolution

INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int”), ion m/z value (as “base_ion”), and scan time (as “scantime”) - (default: 0 goes to unit resolution)

peak_drop_ratio

NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)

peak_repel_labels

LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation

peak_line_color

CHR scalar name of the color to use for the “color” aesthetic (only a single color is supported; default: “black”)

peak_fill_color

CHR scalar name of the color to use for the “fill” aesthetic (only a single color is supported; default: “grey70”)

peak_text_offset

NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.02 offsets labels in the positive direction by 2% of the maximum intensity)

spectra_mz_resolution

INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as base_int), ion m/z value (as base_ion), and scan time (as scantime) - (default: 3)

spectra_drop_ratio

NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)

spectra_repel_labels

LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation

spectra_repel_line_color

CHR scalar name of the color to use for the “color” aesthetic of the lines connecting repelled labels to their data points; passed to [ggrepel::geom_text_repel] as segment.color (only a single color is supported; default: “grey50”)

spectra_nudge_y_factor

NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.03 offsets labels in the positive direction by 3% of the maximum intensity)

spectra_log_y

LGL scalar of whether or not to apply a log10 scaling factor to the y-axis (default: FALSE)

spectra_text_size

NUM scalar of the text size to use for annotation labels (default: 3)

spectra_max_overlaps

INT scalar of the maximum number of text overlaps to allow (default: 50)

intensity_mz_resolution

INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int” or “intensity”), ion m/z value (as “base_ion” or “mz”), and scan time (as “scantime”) - (default: 5)

intensity_drop_ratio

NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 0 returns all); if > 1 the inversion will be used (1e5 -> 1e-5)

patchwork_design

the layout of the final plot see [patchwork::design]

as_individual_plots

LGL scalar of whether to return the plots individually in a list (set TRUE) or as a patchwork plot (default: FALSE)

db_conn

database connection (default: con) which must be live to pull sample and compound identification information

Value

object of classes ‘gg’ and ‘ggplot’, as a patchwork unless ‘as_individual_plots’ is TRUE

Note

Requires a live connection to the database to pull all plots for a given peak_id.

Defaults are as for called functions


ms_plot_spectra R Documentation

Plot a fragment map from database mass spectral data

Description

Especially for non-targeted analysis workflows, it is often necessary to examine annotated fragment data for spectra across a given peak of interest. Annotated fragments lend increasing confidence in the identification of the compound giving rise to a mass spectral peak. If a fragment has been annotated, that identification is displayed along with the mass to charge value in blue. Annotations of the mass to charge ratio for unannotated fragments are displayed in red.

Usage

ms_plot_spectra(
  data,
  spectra_type = c("separated", "zipped"),
  spectra_mz_resolution = 3,
  spectra_drop_ratio = 0.01,
  spectra_repel_labels = TRUE,
  spectra_repel_line_color = "grey50",
  spectra_nudge_y_factor = 0.03,
  spectra_log_y = FALSE,
  spectra_is_file = FALSE,
  spectra_from_JSON = FALSE,
  spectra_animate = FALSE,
  spectra_text_size = 3,
  spectra_max_overlaps = 50,
  include_method = TRUE,
  db_conn = con
)

Arguments

data

data.frame of spectral data in the form of the ‘ms_data’ table

spectra_mz_resolution

INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as base_int), ion m/z value (as base_ion), and scan time (as scantime) - (default: 3)

spectra_drop_ratio

NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5)

spectra_repel_labels

LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation

spectra_repel_line_color

CHR scalar name of the color to use for the “color” aesthetic of the lines connecting repelled labels to their data points; passed to [ggrepel::geom_text_repel] as segment.color (only a single color is supported; default: “grey50”)

spectra_nudge_y_factor

NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.03 offsets labels in the positive direction by 3% of the maximum intensity)

spectra_log_y

LGL scalar of whether or not to apply a log10 scaling factor to the y-axis (default: FALSE)

spectra_is_file

LGL scalar of whether data are coming from a file (default: FALSE)

spectra_from_JSON

LGL scalar of whether data are in JSON format; other formats are not supported when ‘spectra_is_file = TRUE’ (default: FALSE)

spectra_animate

LGL scalar of whether to produce an animation across the scantime for these data (default: FALSE)

spectra_text_size

NUM scalar of the text size to use for annotation labels (default: 3)

spectra_max_overlaps

INT scalar of the maximum number of text overlaps to allow (default: 50)

db_conn

database connection (default: con) which must be live to pull sample and compound identification information

Value

ggplot object

Note

If ‘spectra_animate’ is set to true, it requires the [gganimate] package to be installed (and may also require the [gifski] package) and WILL take a large amount of time to complete, but results in an animation that will iterate through the scan period and display mass spectral data as they appear across the peak. Your mileage likely will vary.


ms_plot_spectral_intensity R Documentation

Create a spectral intensity plot

Description

Often it is useful to get an overview of mass-to-charge intensity across the scanning time of a peak. Typically this is done with individual traces in the peak fashion, but large peaks can often mask smaller ones, or wash out lower intensity signals. Use this to plot m/z as dependent upon scan time with intensity shown by color and size. It is intended as a complement to [ms_plot_peak] and may be called at the same levels of granularity, generally greater so than [ms_plot_peak] which is more of an overview.

Usage

ms_plot_spectral_intensity(
  data,
  intensity_mz_resolution = 5,
  intensity_drop_ratio = 0,
  intensity_facet_by = NULL,
  intensity_plot_resolution = c("spectra", "peak"),
  include_method = TRUE,
  db_conn = con
)

Arguments

data

tibble or pointer with data to plot, either at the peak level, in which case “base_ion” must be present, or at the spectral level, in which case “intensity” must be present

intensity_mz_resolution

INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int” or “intensity”), ion m/z value (as “base_ion” or “mz”), and scan time (as “scantime”) - (default: 5)

intensity_drop_ratio

NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 0 returns all); if > 1 the inversion will be used (1e5 -> 1e-5)

intensity_facet_by

CHR scalar of a column name in ‘data’ by which to facet the resulting plot (default: NULL)

db_conn

database connection (default: con) which must be live to pull sample and compound identification information

Value

object of classes ‘gg’ and ‘ggplot’


ms_plot_titles R Documentation

Consistent title for ms_plot_x functions

Description

This helper function creates consistently formatted plot label elements in an opinionated manner. This is unlikely to be useful outside the direct context of [ms_plot_peak], [ms_plot_spectra], and [ms_plot_spectral_intensity].

Usage

ms_plot_titles(
  plot_data,
  mz_resolution,
  drop_ratio,
  include_method,
  db_conn = con
)

Arguments

plot_data

data.frame object passed from the plotting function

mz_resolution

NUM scalar passed from the plotting function

drop_ratio

NUM scalar passed from the plotting function

include_method

LGL scalar indicating whether or not to get the method narrative from the database

db_conn

database connection (default: con) which must be live to pull sample and compound identification information

Value

LIST of strings named for ggplot title elements “title”, “subtitle”, and “caption”


ms_spectra_separated R Documentation

Parse “Separated” MS Data

Description

The “separated” format includes spectra packed into two separate columns, one for mass and another for intensity. All values for a given scan time are packed into these columns, separated by space, with an unlimited number of discrete values, and must be a 1:1 ratio of values between the two columns.

Usage

ms_spectra_separated(df, ms_cols = c("mz", "intensity"))

Arguments

df

data.frame or json object containing spectra compressed in the “separated” format

ms_cols

CHR vector of length 2 identifying the column names to use for mass and intensity in the source data; must be of length 2, with the first value identifying the mass-to-charge ratio column and the second identifying the intensity column

Value

data.frame object of the unpacked spectra as a list column

Note

ms_cols is treated as regex expressions, but it is safest to provide matching column names

Examples

### JSON Example
tmp <- jsonify::as.json('{
 "measured_mz": "712.9501 713.1851",
 "measured_intensity": "15094.41015625 34809.9765625"
}')
ms_spectra_separated(tmp)

### Example data.frame
tmp <- data.frame(
  measured_mz = "712.9501 713.1851",
  measured_intensity = "15094.41015625 34809.9765625"
)
ms_spectra_separated(tmp)

ms_spectra_zipped R Documentation

Parse “Zipped” MS Data

Description

The “zipped” format includes spectra packed into one column containing alternating mass and intensity values for all observations. All values are packed into these columns for a given scan time, separated by spaces, with an unlimited number of discrete values, and must be in an alternating 1:1 pattern of values of the form “mass intensity mass intensity”.

Usage

ms_spectra_zipped(df, spectra_col = "data")

Arguments

df

data.frame object containing spectra compressed in the “zipped” format

spectra_col

CHR vector of length 2 identifying the column names to use for mass and intensity in the source data; must be of length 2, with the first value identifying the mass column and the second identifying the intensity column

Value

data.frame object containing unpacked spectra as a list column

Note

spectra-col is treated as a regex expression, but it is safest to provide a matching column name

Examples

### JSON Example
tmp <- jsonlite::as.json('{"msdata": "712.9501 15094.41015625 713.1851 34809.9765625"}')
ms_spectra_separated(tmp)

### Example data.frame
tmp <- data.frame(
  msdata = "712.9501 15094.41015625 713.1851 34809.9765625"
)
ms_spectra_zipped(tmp)

mzMLconvert R Documentation

Converts a raw file into an mzML

Description

Converts a raw file into an mzML

Usage

mzMLconvert(rawfile, msconvert = NULL, config = NULL, outdir = getwd())

Arguments

rawfile

file path of the MS raw file to be converted

msconvert

file path of the msconvert.exe file, if NULL retrieves information from config directory

config

configuration settings file for msconvert conversion to mzML, if NULL retrives information from config directory

outdir

directory path for the converted mzML file.

Value

CHR scalar path to the created file


mzMLtoR R Documentation

Opens file of type mzML into R environment

Description

Opens file of type mzML into R environment

Usage

mzMLtoR(
  mzmlfile = file.choose(),
  lockmass = NULL,
  lockmasswidth = NULL,
  correct = FALSE,
  approach = "hybrid"
)

Arguments

mzmlfile

the file path of the mzML file which the data are to be read from.

lockmass

NUM scalar m/z value of the lockmass to remove (Waters instruments only) (default: NULL)

lockmasswidth

NUM scalar instrumental uncertainty associated with ‘lockmass’ (default: NULL)

correct

logical if the subsequent spectra should be corrected for the lockmass (Waters instruments only)

approach

character string defining the type of lockmass removal filter to use, default is ‘hybrid’

Value

list containing mzML data with unzipped masses and intensity information


nist_shinyalert R Documentation

Call [shinyalert::shinyalert] with specific styling

Description

This pass through function serves only to call [shinyalert::shinyalert] with parameters defined by this function, and can be used for additional styling that may be necessary. It is used solely for consistency sake.

Usage

nist_shinyalert("test", "info", shiny::h3("test"))

Arguments

title

The title of the modal.

type

The type of the modal. There are 4 built-in types which will show a corresponding icon: “warning”, “error”, “success” and “info”. You can also set type=“input” to get a prompt in the modal where the user can enter a response. By default, the modal has no type.

text

The modal’s text. Can either be simple text, or Shiny tags (including Shiny inputs and outputs). If using Shiny tags, then you must also set html=TRUE.

className

A custom CSS class name for the modal’s container.

html

If TRUE, the content of the title and text will not be escaped. By default, the content in the title and text are escaped, so any HTML tags will not render as HTML.

closeOnClickOutside

If TRUE, the user can dismiss the modal by clicking outside it.

immediate

If TRUE, close any previously opened alerts and display the current one immediately.

Additional named parameters to be passed to shinyalert. Unrecognized ones will be ignored.

Value

None, shows a shinyalert modal

See Also

shinyalert::shinyalert


obj_name_check R Documentation

Sanity check for environment object names

Description

Provides a sanity check on whether or not a name reference exists and return its name if so. If not, return the default name defined from default_name. This largely is used to prevent naming conflicts as part of managing the plumber service but can be used for any item in the current namespace.

Usage

if (exists("log_it")) {
    obj_name_check("test", "test")
    test <- letters
    obj_name_check(test)
  }

Arguments

obj

R object or CHR scalar in question to be resolved in the namespace

default_name

CHR scalar name to use for obj if it does not exist (default: NULL).

Value

CHR scalar of the resolved object name


open_env R Documentation

Convenience shortcut to open and edit session environment variables

Description

Calls [open_proj_file] for either the R, global, or logging environment settings containing the most common settings dictating project behavior.

Usage

open_env(name = c("R", "global", "logging", "rdkit", "shiny", "plumber"))

Arguments

name

CHR scalar, one of “R”, “global”, or “logging”.

Value

None, opens a file for editing


open_proj_file R Documentation

Open and edit project files

Description

Project files are organized in several topical directories depending on their purpose as part of the package. For example, several project control variables are set to establish the session global environment in the “config” directory rather than the “R” directory.

Usage

open_proj_file(name, dir = NULL, create_new = FALSE)

Arguments

name

CHR scalar of the file name to open, accepts regex

dir

CHR scalar of a directory name to search within

create_new

LGL scalar of whether to create the file (similar functionality to [usethis]; default FALSE)

Details

If a direct file match to name is not found, it will be searched for using a recursive [list.files] allowing for regex matches (e.g. “.R$”). Directories are similarly sought out within the project. Reasonable feedback is provided.

This convenience function uses [usethis::edit_file] to open (or create if ‘create_new’ is TRUE) any given file in the project.

Value

None, opens a file for editing

Note

If the directory and file cannot be found, and ‘create_new’ is true, the directory will be placed within the project directory.


optimal_ums R Documentation

Get the optimal uncertainty mass spectrum parameters for data

Description

Get the optimal uncertainty mass spectrum parameters for data

Usage

optimal_ums(
  peaktable,
  max_correl = 0.75,
  correl_bin = 0.05,
  max_ph = 10,
  ph_bin = 1,
  max_freq = 10,
  freq_bin = 1,
  min_n_peaks = 3,
  cormethod = "pearson"
)

Arguments

peaktable

list generated from ‘create_peak_table_ms1’ or ‘create_peak_table_ms2’

max_correl

numeric maximum acceptable correlation

correl_bin

numeric sequence bin width from max_correl..0

max_ph

numeric maximum acceptable peak height (%)

ph_bin

numeric sequence bin width from max_ph..0

max_freq

numeric maximum acceptable observational frequency (%)

freq_bin

numeric sequence bin width from max_freq..0

min_n_peaks

integer ideal minimum number of scans for mass spectrum

cormethod

string indicating correlation function to use (see [cor()] for description)

Value

data.frame object containing optimized search parameters


overlap R Documentation

Calculate overlap ranges

Description

Internal function: determines if two ranges (x1-e1 to x1+e1) and (x2-e2 to x2+e2) overlap (nonstatistical evaluation)

Usage

overlap(x1, e1, x2, e2)

Arguments

x1, x2

values containing mean values

e1, e2

values containing respective error values


pair_ums R Documentation

Pairwise data.frame of two uncertainty mass spectra

Description

The function stacks two uncertainty mass spectra together based on binned m/z values

Usage

pair_ums(ums1, ums2, error = 5, minerror = 0.002)

Arguments

ums1

uncertainty mass spectrum from ‘get_ums’ function

ums2

uncertainty mass spectrum from ‘get_ums’ function

minerror

the minimum mass error (in Da) of the instrument data

masserror

the mass accuracy (in ppm) of the instrument data


peak_gather_json R Documentation

Extract peak data and metadata

Description

gathers metadata from methodjson and extracts the MS1 and MS2 data from the mzml

Usage

peak_gather_json(
  methodjson,
  mzml,
  compoundtable,
  zoom = c(1, 5),
  minerror = 0.002
)

Arguments

methodjson

list of JSON generated from ‘parse_method_json’ function

mzml

list of msdata from ‘mzMLtoR’ function

compoundtable

data.frame containing compound identities [should be extractable from SQL later]

zoom

numeric vector specifying the range around the precursor ion to include, from m/z - zoom[1] to m/z + zoom[2]

minerror

numeric the minimum error (in Da) of the instrument

Value

list of peak objects


plot_compare_ms R Documentation

Plot MS Comparison

Description

Plots a butterfly plot for the comparison of two uncertainty mass spectra

Usage

plot_compare_ms(
  ums1,
  ums2,
  main = "Comparison Mass Spectrum",
  size = 1,
  c1 = "black",
  c2 = "red",
  ylim.exp = 1
)

Arguments

ums1, ums2

uncertainty mass spectrum from ‘get_ums’ function

main

Main Title of the Plot

size

line width of the mass spectra lines

c1

Color of the top (ums1) mass spectral lines

c2

Color of the bottom (ums2) mass spectral lines

ylim.exp

Expansion unit for the y-axis


plot_ms R Documentation

Generate consensus mass spectrum

Description

Extract relevant information from a mass spectrum and plot it as an uncertainty mass spectrum.

Usage

plot_ms(
  ms,
  xlim = NULL,
  ylim = NULL,
  main = "Mass Spectrum",
  color = "black",
  size = 1,
  removal = 0
)

Arguments

peaklist

result of the ‘create_peak_list’ function

Details

Extract relevant information from a mass spectrum and plot it as an uncertainty mass spectrum.

Value

ggplot object


pool.sd R Documentation

Pool standard deviations

Description

Internal function: calculates a pooled standard deviation

Usage

pool.sd(sd, n)

Arguments

sd

A vector containing numeric values of standard deviations

n

A vector containing integers for the number of observations respective to the sd values


pool.ums R Documentation

Pool uncertainty mass spectra

Description

Calculates a pooled uncertainty mass spectrum that is a result of data from multiple uncertainty mass spectra.

Usage

pool.ums(umslist, error = 5, minerror = 0.002)

Arguments

umslist

A list where each item is a uncertainty mass spectrum from function ‘get_ums’

minerror

the minimum mass error (in Da) of the instrument data

masserror

the mass accuracy (in ppm) of the instrument data


pragma_table_def R Documentation

Get table definition from SQLite

Description

Given a database connection (‘con’). Get more information about the properties of (a) database table(s) directly from ‘PRAGMA table_info()’ rather than e.g. [DBI::dbListFields()]. Set ‘get_sql’ to ‘TRUE’ to include the direct schema using sqlite_master; depending on formatting this may or may not be directly usable though some effort has been made to remove formatting characters (e.g. line feeds, tabs, etc) if stringr is available.

Usage

pragma_table_def(db_table, db_conn = con, get_sql = FALSE, pretty = TRUE)

Arguments

db_table

CHR vector name of the table(s) to inspect

db_conn

connection object (default: con)

get_sql

BOOL scalar of whether or not to return the schema sql (default FALSE)

pretty

BOOL scalar for whether to return “pretty” SQL that includes human readability enhancements; if this is set to TRUE (the default), it is recommended that the output is fed through ‘cat’ and, in the case of multiple tables

Details

Note that the package ‘stringr’ is required for formatting returns that include either ‘get_sql’ or ‘pretty’ as TRUE.

Value

data.frame object representing the SQL PRAGMA expression


pragma_table_info R Documentation

Explore properties of an SQLite table

Description

Add functionality to ‘pragma_table_def’ by filtering on column properties such as required and primary key fields. This provides some flexibility to searching table properties without sacrificing the full details of table schema. Parameter ‘get_sql’ is forced to FALSE; only information available via PRAGMA is searched by this function.

Usage

pragma_table_info("compounds")

Arguments

db_table

CHR vector name of the table(s) to inspect

db_conn

connection object (default: con)

condition

CHR vector matching specific checks, must be one of c(“required”, “has_default”, “is_PK”) for constraints where a field must not be null, has a default value defined, and is a primary key field, respectively. (default: NULL)

name_like

CHR vector of character patterns to match against column names via grep. If length > 1, will be collapsed to a basic OR regex (e.g. c(“a”, “b”) becomes “a|b”). As regex, abbreviations and wildcards will typically work, but care should be used in that case. (default: NULL)

data_type

CHR vector of character patterns to match against column data types via grep. If length > 1 will be collapsed to a basic “OR” regex (e.g. c(“int”, “real”) becomes “int|real”). As regex, abbreviations and wildcards will typically work, but care should be used in that case. (default: NULL)

include_comments

LGL scalar of whether to include comments in the return data frame (default: FALSE)

names_only

LGL scalar of whether to include names meeting defined criteria as a vector return value (default: FALSE)

Details

This is intended to support validation during database communications with an SQLite connection, especially for application (e.g. ‘shiny’ development) by allowing for programmatic inspection of datbase columns by name and property.

Value

data.frame object describing the database entity


py_modules_available R Documentation

Are all conda modules available in the active environment

Description

Checks that all defined modules are available in the currently active python binding. Supports error logging

Usage

py_modules_available("rdkit")

Arguments

required_modules

CHR vector of required modules

Value

LGL scalar of whether or not all modules are available. Check console for further details.


rdkit_active R Documentation

Sanity check on RDKit binding

Description

Given a name of an R object, performs a simple check on RDKit availability on that object, creating it if it does not exist. A basic structure conversion check is tried and a TRUE/FALSE result returned. Leave all arguments as their defaults of NULL to ensure they will honor the settings in ‘rdkit/env_py.R’.

Usage

rdkit_active(
  rdkit_ref = NULL,
  rdkit_name = NULL,
  log_ns = NULL,
  make_if_not = FALSE
)

Arguments

rdkit_ref

CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)

rdkit_name

CHR scalar the name of a python environment able to run rdkit (default NULL goes to “rdkit” for convenience with other pipelines in this project)

log_ns
make_if_not

LGL scalar of whether or not to create a new python environment using [activate_py_env] if the binding is not active

Value

LGL scalar of whether or not the test of RDKit was successful


rdkit_mol_aliases R Documentation

Create aliases for a molecule from RDKit

Description

Call this function to generate any number of machine-readable aliases from an identifier set. Given the ‘identifiers’ and their ‘type’, RDKit will be polled for conversion functions to create a mol object. That mol object is then used to create machine-readable aliases in any number of supported formats. See the RDKit Documentation for options. The ‘type’ argument is used to match against a “MolFromX” funtion, while the ‘aliases’ argument is used to match against a “MolToX” function.

Usage

rdkit_mol_aliases(
  identifiers,
  type = "smiles",
  mol_from_prefix = "MolFrom",
  get_aliases = c("inchi", "inchikey"),
  mol_to_prefix = "MolTo",
  rdkit_ref = "rdk",
  log_ns = "rdk",
  make_if_not = TRUE
)

Arguments

identifiers

CHR vector of machine-readable molecule identifiers in a format matching ‘type’

type

CHR scalar of the type of encoding to use for ‘identifiers’ (default: smiles)

mol_from_prefix

CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”)

get_aliases

CHR vector of aliases to produce (default: c(“inchi”, “inchikey”))

mol_to_prefix

CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”)

rdkit_ref

CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)

log_ns
make_if_not

LGL scalar of whether or not to create a new python environment using [activate_py_env] if the binding is not active

Details

At the time of authorship, RDK v2021.09.4 was in use, which contained the following options findable by this function: CMLBlock, CXSmarts, CXSmiles, FASTA, HELM, Inchi, InchiAndAuxInfo, InchiKey, JSON, MolBlock, PDBBlock, RandomSmilesVect, Sequence, Smarts, Smiles, TPLBlock, V3KMolBlock, XYZBlock.

Value

data.frame object containing the aliases and the original identifiers

Note

Both ‘type’ and ‘aliases’ are case insensitive.

If ‘aliases’ is set to NULL, all possible expressions (excluding those with “File” in the name) are returned from RDKit, which will likely produce NULL values and module ArgumentErrors.


read_log R Documentation

Read a log from a log file

Description

By default if ‘file’ does not exist (i.e. ‘file’ is not a fully defined path) this looks for log text files in the directory defined by ‘LOG_DIRECTORY’ in the session.

Usage

read_log("log.txt")

Arguments

file

CHR scalar file path to a log file (default NULL is translated to “log.txt”)

last_n

INT scalar of the last ‘n’ log entries to read.

as_object

LGL scalar of whether to return the log as an R object or just to print the log to the console.

Value

CHR vector of the requested log file entries if ‘as_object’ is TRUE, or none with a console print if ‘as_object’ is FALSE


rebuild_help_htmls R Documentation

Rebuild the help files as HTML with an index

Description

Rebuild the help files as HTML with an index

Usage

rebuild_help_htmls(rebuild_book = TRUE, book = "dimspec_user_guide")

Arguments

rebuild_book

LGL scalar of whether or not to rebuild an associated bookdown document

book

Path to folder containing the bookdown document to rebuild

Value

URL to the requested book


rectify_null_from_env R Documentation

Rectify NULL values provided to functions

Description

To support redirection of sensible parameter reads from an environment, either Global or System, functions in this package may include NULL as their default value. This returns values in precedence of parameter, env_parameter and default.

Usage

rectify_null_from_env(test, test, "test")

Arguments

parameter

the object being evaluated

env_parameter

the name or object of a value to use from the environment if parameter is NULL

default

the fallback value to use if parameter is NULL and env_parameter does not exist

log_ns

the namespace to use with [log_it] if available

Value

The requested value, either as-is, rectified from the environment, or the default

Note

log_ns is only applicable if logging is set up in this project (see project settings in env_glob.txt, env_R.R, and env_logger.R for details).

Both [base::.GlobalEnv] and [base::Sys.getenv] are checked, and can be provided as a character scalar or as an object reference


ref_table_from_map R Documentation

Get the name of a linked normalization table

Description

Extract the name of a normalization table from the database given a table and column reference.

Usage

ref_table_from_map("table1", "fk_column1", er_map(con), "references")

Arguments

table_name

CHR scalar name of the database table

table_column

CHR scalar name of the foreign key table column

this_map

LIST object containing the schema representation from ‘er_map’ (default: an object named “db_map” created as part of the package spin up)

fk_refs_in

CHR scalar name of the item in ‘this_map’ containing the SQL “REFERENCES” statements extracted from the schema

Value

CHR scalar name of the table to which a FK column is linked or an empty character string if no match is located (i.e. ‘table_column’ is not a defined foreign key).

Note

This requires an object of the same shape and properties as those resulting from [er_map] as ‘this_map’.


remove_db R Documentation

Remove an existing database

Description

This is limited to only the current working directory and includes its subdirectories. If you wish to retain a copy of the prior database, ensure argument ‘archive = TRUE’ (note the default is FALSE) to create a copy of the requested database prior to rebuild; this is created in the same directory as the found database and appends

Usage

remove_db("test.sqlite", archive = TRUE)

Arguments

db

CHR scalar name of the database to build (default: session value DB_NAME)

archive

LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE)

Value

None, check console for details


remove_icon_from R Documentation

Remove the last icon attached to an HTML element

Description

Remove the last icon attached to an HTML element

Usage

remove_icon_from(id)

Arguments

id

CHR scalar of the HTML ID from which to remove the last icon

Value

CHR scalar suitable to execute with ‘shinyjs::runJS’

Examples

append_icon_to("example", "r-project", "fa-3x")
remove_icon_from("example")

remove_sample R Documentation

Delete a sample

Description

Removes a sample from the database and associated records in ms_methods, conversion_software_settings, and conversion_software_linkage. Associated peak and mass spectrometric signals will also be removed.

Usage

remove_sample(sample_ids, db_conn = con, log_ns = "db")

Arguments

sample_ids

INT vector of IDs to remove from the samples table.

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

None, executes actions on the database


repair_xl_casrn_forced_to_date R Documentation

Repair CAS RNs forced to a date numeric by MSXL

Description

If a file is opened in Microsoft Excel(R), Chemical Abstract Service (CAS) Registry Numbers (RNs) can occasionally be read as a pseudodate (e.g. “1903-02-8”). Without tight controls over column formatting, this can result in CAS RNs that are not real entering a processing pipeline. This convenience function attempts to undo that automatic formatting by forcing vector members whose values when coerced to numeric are equal to those provided to a properly formatted date with an origin depending on operating system platform (as read by ‘.Platform$OS.type’); Windows operating systems use the Windows MSXL origin date of “1899-12-30” while others use “1904-01-01”. Text entries of “NA” are coerced to NA.

Usage

repair_xl_casrn_forced_to_date(casrn_vec, output_format = "%Y-%m-%d")

Arguments

casrn_vec

CHR or NUM vector of what should be valid CAS RNs

output_format

CHR scalar of the output format, which

Value

CHR vector of length equal to that of ‘casrn_vec’ where numeric entries have been coerced to the assumed date

Examples

repair_xl_casrn_forced_to_date(c("64324-08-3", "12332"))

repl_nan R Documentation

Replace NaN

Description

Replace all NaN values with a specified value

Usage

repl_nan(x, repl = NULL)

Arguments

x

vector of values

repl

value to replace NaN contained in ‘x’

Value

vector with all NaN replaced with ‘repl’


report_qc R Documentation

Export QC result JSONfile into PDF

Description

Export QC result JSONfile into PDF

Usage

report_qc(
  jsonfile = file.choose(),
  outputfile = gsub(".json", ".pdf", jsonfile, ignore.case = TRUE)
)

Arguments

jsonfile

jsonfile file path

outputfile

output pdf file path

Value

generates reporting PDF


reset_logger_settings R Documentation

Update logger settings

Description

This is a simple action wrapper to update any settings that may have been changed with regard to logger. If, for instance, something is not logging the way you expect it to, change the relevant setting and then run update_logger_settings() to reflect the current environment.

Usage

reset_logger_settings()

Arguments

reload

LGL scalar indicating (if TRUE) whether or not to refresh from env_R.R or (if FALSE) to use the current environment settings (e.g. for testing purposes) (default: FALSE)

Value

None


resolve_compound_aliases R Documentation

Resolve compound aliases provided as part of the import routine

Description

Call this to add any aliases for a given ‘compound_id’ that may not be present in the database. Only those identifiable as part of the accepted types defined in ‘norm_alias_table’ will be mapped. If multiple items are provided in the import NAME, ADDITIONAL, or other items matching names in ‘norm_alias_table’.name column, indicate the split character in ‘split_multiples_by’ and any separator between names and values (e.g. CLASS:example) in ‘identify_property_by’.

Usage

resolve_compound_aliases(
  obj,
  compound_id,
  compounds_in = "compounddata",
  compound_alias_table = "compound_aliases",
  norm_alias_table = "norm_analyte_alias_references",
  norm_alias_name_column = "name",
  headers_to_examine = c("ADDITIONAL", "NAME"),
  split_multiples_by = ";",
  identify_property_by = ":",
  out_file = "unknown_compound_aliases.csv",
  db_conn = con,
  log_ns = "db",
  ...
)

Arguments

obj

LIST object containing data formatted from the import generator

compound_id

INT scalar of the compound_id to use for these aliases

compounds_in

CHR scalar name in ‘obj’ holding compound data (default: “compounddata”)

norm_alias_table

CHR scalar name of the table normalizing analyte alias references (default: “norm_analyte_alias_references”)

norm_alias_name_column

CHR scalar name of the column in ‘norm_alias_table’ containing the human-readable expression of alias type classes (default: “name”)

Named list of any additional aliases to tack on that are not found in the import object, with names matching those found in ‘norm_alias_table’.’norm_alias_name_column’

Value

None, though if unclassifiable aliases (those with alias types not present in the normalization table) are found, they will be written to a file (‘out_file’) in the project directory

Note

Existing aliases, and aliases for which there is no ‘compound_id’ will be ignored and not imported.

Compound IDs provided in ‘compound_id’ must be present in the compounds table and must be provided explicitly on a 1:1 basis for each element extracted from ‘obj’. If you provide an import object with 10 components for compound data, you must provide tying ‘compound_id’ identifiers for each. If all extracted components represent aliases for the same ‘compound_id’ then one may be provided.

Alias types (e.g. “InChI” are case insensitive)


resolve_compound_fragments R Documentation

Link together peaks, fragments, and compounds

Description

This function links together the peaks, annotated_fragments, and compounds table. This serves as the main connection table conceptually tying together peaks, the fragments annotated within those peaks, and the compound identification associated with the peaks. The database supports flexible assignment wherein compounds may be related to either peaks or annotated fragments, or both, and vice versa. At least two IDs are required for linkage; i.e. compounds may not have an acciated peak in the database, but are known to produce fragments at a particular m/z value. Ideally, all three are provided to provide traceback from compounds, a complete list of their annotated fragments, and association with a peak object with data containing unannotated fragments, which can be traced back to the sample from which it was drawn and the associated metrological method information.

Usage

resolve_compound_fragments(
  values = NULL,
  peak_id = NA,
  annotated_fragment_id = NA,
  compound_id = NA,
  linkage_table = "compound_fragments",
  peaks_table = "peaks",
  annotated_fragments_table = "annotated_fragments",
  compounds_table = "compounds",
  db_conn = con,
  log_ns = "db"
)

Arguments

values

LIST item containing items for ‘peak_id’, ‘annotated_fragment_id’, and ‘compound_id’ (default: NULL); used preferentially if provided

peak_id

INT vector (ideally of length 1) of the peak ID(s) to link; ignored if ‘values’ is provided (default: NA)

annotated_fragment_id

INT vector of fragment ID(s) to link; ignored if ‘values’ is provided (default: NA)

compound_id

INT vector of compound ID(s) to link; ignored if ‘values’ is provided (default: NA)

linkage_table

CHR scalar name of the database table containing linkages between peaks, fragments, and compounds (default: “compound_fragments”)

peaks_table

CHR scalar name of the database table containing peaks for look up (default: “peaks”)

compounds_table

CHR scalar name of the table holding compound information

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

fragments_table

CHR scalar name of the table holding annotated fragment information

Value

None, value checks entries and executes database actions


resolve_compounds R Documentation

Resolve the compounds node during bulk import

Description

Call this function as part of an import routine to resolve the compounds node.

Usage

resolve_compounds(
  obj,
  compounds_in = "compounddata",
  compounds_table = "compounds",
  compound_category = NULL,
  compound_category_table = "compound_categories",
  compound_alias_table = "compound_aliases",
  norm_alias_table = "norm_analyte_alias_references",
  norm_alias_name_column = "name",
  NIST_id_in = "id",
  require_all = FALSE,
  import_map = IMPORT_MAP,
  ensure_unique = TRUE,
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

compounds_in

CHR scalar name in ‘obj’ holding compound data (default: “compounddata”)

compounds_table

CHR scalar name the database table holding compound data (default: “compounds”)

compound_category

CHR or INT scalar of the compound category (either a direct ID or a matching category label in ‘compound_category_table’) (default: NULL)

compound_category_table

CHR scalar name the database table holding normalized compound categories (default: “compound_categories”)

norm_alias_table

CHR scalar name of the table normalizing analyte alias references (default: “norm_analyte_alias_references”)

norm_alias_name_column

CHR scalar name of the column in ‘norm_alias_table’ containing the human-readable expression of alias type classes (default: “name”)

require_all

LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)

import_map

data.frame object of the import map (e.g. from a CSV)

ensure_unique

LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

INT scalar if successful, result of the call to [add_or_get_id] otherwise

Note

This function is called as part of [full_import()]


resolve_description_NTAMRT R Documentation

Resolve the method description tables during import

Description

Two tables (and their associated normalization tables) exist in the database to store additional information about mass spectrometric and chromatographic methods. These tables are “ms_descriptions” and “chromatography_descriptions” and cannot be easily mapped directly. This function serves to coerce values supplied during import into that required by the database. Primarily, the issue rests in the need to support multiple descriptions of analytical instrumentation (e.g. multiple mass analyzer types, multiple vendors, multiple separation columns, etc.). Tables targeted by this function are “long” tables that may well have ‘n’ records for each mass spectrometric method.

Usage

resolve_description_NTAMRT(
  obj,
  method_id,
  type = c("massspec", "chromatography"),
  mass_spec_in = "massspectrometry",
  chrom_spec_in = "chromatography",
  db_conn = con,
  fuzzy = TRUE,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

method_id

INT scalar of the ms_method.id record to associate

type

CHR scalar, one of “massspec” or “chromatography” depending on the type of description to add; much of the logic is shared, only details differ

mass_spec_in

CHR scalar name of the element in ‘obj’ holding mass spectrometry information (default: “massspectrometry”)

chrom_spec_in

CHR scalar name of the element in ‘obj’ holding chromatographic information (default: “chromatography”)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]

This function is brittle; built specifically for the NIST NTA MRT import format. If using a different import format, customize to your needs using this function as a guide.


resolve_fragments_NTAMRT R Documentation

Resolve the fragments node during database import

Description

Call this function as part of an import routine to resolve the fragments node including fragment inspections and aliases. If the python connection to RDKit is available and no aliases are provided, aliases as defined in ‘rdkit_aliases’ will be generated and stored if ‘generate_missing_aliases’ is set to TRUE. Components of the import file will be collated, have their values normalized, and any new fragment identifiers will be added to the database.

Usage

resolve_fragments_NTAMRT(
  obj,
  sample_id = NULL,
  generation_type = NULL,
  fragments_in = "annotation",
  fragments_table = "annotated_fragments",
  fragments_norm_table = ref_table_from_map(fragments_table, "fragment_id"),
  fragments_sources_table = "fragment_sources",
  citation_info_in = "fragment_citation",
  inspection_info_in = "fragment_inspections",
  inspection_table = "fragment_inspections",
  generate_missing_aliases = FALSE,
  fragment_aliases_in = "fragment_aliases",
  fragment_aliases_table = "fragment_aliases",
  alias_type_norm_table = ref_table_from_map(fragment_aliases_table, "alias_type"),
  inchi_prefix = "InChI=1S/",
  rdkit_name = ifelse(exists("PYENV_NAME"), PYENV_NAME, "rdkit"),
  rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"),
  rdkit_ns = "rdk",
  rdkit_make_if_not = TRUE,
  rdkit_aliases = c("Inchi", "InchiKey"),
  mol_to_prefix = "MolTo",
  mol_from_prefix = "MolFrom",
  type = "smiles",
  import_map = IMPORT_MAP,
  case_sensitive = FALSE,
  fuzzy = FALSE,
  strip_na = TRUE,
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

sample_id

INT scalar matching a sample ID to which to tie these fragments (optional, default: NULL)

generation_type

CHR scalar containing the generation type as defined in the “norm_generation_type” table (default: NULL will obtain the generation type attached to the ‘sample_id’ by database lookup)

fragments_in

CHR scalar name of the ‘obj’ component holding annotated fragment information (default: “annotation”)

fragments_table

CHR scalar name of the database table holding annotated fragment information (default: “annotated_fragments”)

fragments_norm_table

CHR scalar name of the database table holding normalized fragment identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragments_table’)

fragments_sources_table

CHR scalar name of the database table holding fragment source (e.g. generation) information (default: “fragment_sources”)

citation_info_in

CHR scalar name of the ‘obj’ component holding fragment citation information (default: “fragment_citation”)

inspection_info_in

CHR scalar name of the ‘obj’ component holding fragment inspection information (default: “fragment_inspections”)

inspection_table

CHR scalar name of the database table holding fragment inspection information (default: “fragment_inspections”)

generate_missing_aliases

LGL scalar determining whether or not to generate machine readable expressions (e.g. InChI) for fragment aliases from RDKit (requires RDKit activation; default: FALSE); see formals list for [add_rdkit_aliases]

fragment_aliases_in

CHR scalar name of the ‘obj’ component holding fragment aliases (default: “fragment_aliases”)

fragment_aliases_table

CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”)

rdkit_ref

CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project)

mol_to_prefix

CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”)

mol_from_prefix

CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”)

type

CHR scalar of the type of encoding to use for ‘identifiers’ (default: smiles)

import_map

data.frame object of the import map (e.g. from a CSV)

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

fragment_alias_type_norm_table

CHR scalar name of the database table holding normalized fragment alias type identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragment_aliases_table’)

Details

Fragments missing structure annotation are supported (e.g. those with a formula but no SMILES notation provided).

For new fragments, the calculated molecular mass is generated by [calculate.monoisotope] from exact masses of each constituent atom. If RDKit is available and a SMILES notation is provided, the formal molecular net charge is also calculated via rdkit.Chem.GetFormalCharge.

Database tables affected by resolving the fragments node include: annotated_fragments, norm_fragments, fragment_inspections, fragment_aliases, and fragment_sources.

Value

INT vector of resolved annotated fragment IDs; executes database actions

Note

This function is called as part of [full_import()]

If components named in ‘citation_info_in’ and ‘inspection_info_in’ do not exist, that information will not be appended to the resulting database records.

Typical usage as part of the import workflow involves simply passing the import object and associated sample id: resolve_fragments_NTAMRT(obj = import_object, sample_id = 1), though wrapper functions like [full_import] also contain name-matched arguments to be passed in a [do.call] context.


resolve_method R Documentation

Add an ms_method record via import

Description

Part of the data import routine. Adds a record to the “ms_methods” table with the values provided in the JSON import template. Makes extensive uses of [resolve_normalization_value] to parse foreign key relationships.

Usage

resolve_method(
  obj,
  method_in = "massspectrometry",
  ms_methods_table = "ms_methods",
  db_conn = con,
  ensure_unique = TRUE,
  log_ns = "db",
  qc_method_in = "qcmethod",
  qc_search_text = "QC Method Used",
  qc_value_in = "value",
  require_all = TRUE,
  import_map = IMPORT_MAP,
  ...
)

Arguments

obj

LIST object containing data formatted from the import generator

method_in

CHR scalar name of the ‘obj’ list containing method information

ms_methods_table

CHR scalar name of the database table containing method information

db_conn

connection object (default: con)

ensure_unique

LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

qc_method_in

CHR scalar name of the import object element containing QC method information (default: “qcmethod”)

qc_search_text

CHR scalar name of an element in the import object in part ‘qc_method_in’ identifying whether or not a QC method was used (default: “QC Method Used”)

qc_value_in

CHR scalar name of an element in the import object corresponding to ‘qc_method_in’ where the value of the metric named for ‘qc_search_text’ is located (default: “value”)

require_all

LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)

import_map

data.frame object of the import map (e.g. from a CSV)

Other named elements to be appended to “ms_methods” as necessary for workflow resolution, can be used to pass defaults or additional values.

Value

INT scalar if successful, result of the call to [add_or_get_id] otherwise

Note

This function is called as part of [full_import()]


resolve_mobile_phase_NTAMRT R Documentation

Resolve the mobile phase node

Description

The database node containing chromatographic method information is able to handle any number of descriptive aspects regarding chromatography. It houses normalized and aliased data in a manner that maximizes flexibility, allowing any number of carrier agents (e.g. gasses for GC, solvents for LC) to be described in increasing detail. To accommodate that, the structure itself may be unintuitive and may not map well as records may be heavily nested.

Usage

resolve_mobile_phase_NTAMRT(
  obj,
  method_id,
  sample_id,
  peak_id,
  carrier_mix_names = NULL,
  id_mix_by = "^mp*[0-9]+",
  ms_methods_table = "ms_methods",
  sample_table = "samples",
  peak_table = "peaks",
  db_conn = con,
  mix_collection_table = "carrier_mix_collections",
  mobile_phase_props = list(in_item = "chromatography", db_table = "mobile_phases", props
    = c(flow = "flow", flow_units = "flowunits", duration = "duration", duration_units =
    "durationunits")),
  carrier_props = list(db_table = "carrier_mixes", norm_by = "norm_carriers", alias_in =
    "carrier_aliases", props = c(id_by = "solvent", fraction_by = "fraction")),
  additive_props = list(db_table = "carrier_additives", norm_by = "norm_additives",
    alias_in = "additive_aliases", props = c(id_by = "add$", amount_by = "_amount",
    units_by = "_units")),
  exclude_values = c("none", "", NA),
  fuzzy = TRUE,
  clean_up = TRUE,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

method_id

INT scalar of the method id (e.g. from the import workflow)

sample_id

INT scalar of the sample id (e.g. from the import workflow)

peak_id

INT scalar of the peak id (e.g. from the import workflow)

carrier_mix_names

CHR vector (optional) of carrier mix collection names to assign, the length of which should equal 1 or the length of discrete carrier mixtures; the default, NULL, will automatically assign names as a function of the method and sample id.

id_mix_by

CHR scalar regex to identify the elements of ‘obj’ to use for the mobile phase node (default “^mp*[0-9]+“) grouping of carrier mix collections, this is the main piece of connectivity pulling together the descriptions and should only be changed to match different import naming schemes

ms_methods_table

CHR scalar name of the methods table (default: “ms_methods”)

sample_table

CHR scalar name of the samples table (default: “samples”)

peak_table

CHR scalar name of the peaks table (default: “peaks”)

db_conn

existing connection object (e.g. of class “SQLiteConnection”)

mix_collection_table

CHR scalar name of the mix collections table (default: “carrier_mix_collections”)

mobile_phase_props

LIST object describing how to import the mobile phase table containing: in_item: CHR scalar name of the ‘obj’ name containing chromatographic information (default: “chromatography”); db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); props: named CHR vector of name mappings with names equal to database columns in ‘mobile_phase_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’

carrier_props

LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_carriers”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “carrier_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘carrier_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate a carrier (e.g. “solvent”)

additive_props

LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_additives”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “additive_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘additive_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’ ‘obj[[mobile_phase_props\(in_item]][[mobile_phase_props\)db_table]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate an additive (e.g. names terminating in “add”)

exclude_values

CHR vector indicating which values to ignore in ‘obj’ (default: c(“none”, ““, NA))

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL SQL LIKE clause bookended with wildcards; overrides the ‘case_sensitive’ setting if TRUE (default: FALSE).

clean_up

LGL scalar determining whether or not to clean up the ‘mix_collection_table’ by removing just-added records if there are errors adding to ‘carrier_props$db_table’ (default: TRUE)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Details

The mobile phase node contains one record in table “mobile_phases” for each method id, sample id, and carrier mix collection id with its associated flow rate, normalized flow units, duration, and normalized duration units. Each carrier mix collection has a name and child tables containing: records for each value normalized carrier component and its unit fraction (e.g. in carrier_mixes: Helium 1 would indicate pure Helium as a carrier gas in GC work; Water, 0.9; Methanol, 0.1 to indicate a solvent mixture of 10 in water), as well as value normalized carrier additives, their amount, and the units for that amount (mostly for LC work; e.g. in carrier_additives: ammonium acetate, 5, mMol to indicate an additive to a solvent of 5 mMol ammonium acetate); these are linked through the carrier mix collection id.

Call this function to import the results of the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT), or feed it as ‘obj’ a flat list containing chromatography information.

Value

None, executes actions on the database

Note

This is a brittle function, and should only be used as part of the NTA MRT import process, or as a template for how to import data.

Some arguments are complicated by design to keep conceptual information together. These should be fed a structured list matching expectations. This applies to ‘mobile_phase_props’, ‘carrier_props’, and ‘additive_props’. See defaults in documentation for examples.

Database insertions are done in real time, so failures may result in hanging or orphaned records. Turn on ‘clean_up’ to roll back by removing entries from ‘mix_collection_table’ and relying on delete cascades built into the database. Additional names are provided here to match the schema.

This function is called as part of [full_import()]


resolve_ms_data R Documentation

Resolve and store mass spectral data during import

Description

Use peak IDs generated by the import workflow to assign and store mass spectral data (if coming from the NIST NTA Method Reporting Tool, these will all be in the “separated” format). Optionally also calls [resolve_ms_spectra] if unpack_spectra = TRUE. Mass spectral data are stored in either one (“zipped”)

Usage

resolve_ms_data(
  obj,
  peak_id = NULL,
  peaks_table = "peaks",
  ms_data_in = "msdata",
  ms_data_table = "ms_data",
  unpack_spectra = FALSE,
  ms_spectra_table = "ms_spectra",
  unpack_format = c("separated", "zipped"),
  as_object = FALSE,
  import_map = IMPORT_MAP,
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

peak_id

INT scalar of the peak ID in question, which must be present

peaks_table

CHR scalar name of the peaks table in the database (default: “peaks”)

ms_data_in

CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”)

ms_data_table

CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)

unpack_spectra

LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE)

ms_spectra_table

CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)

unpack_format

CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”

as_object

LGL scalar indicating whether or not to return the result to the session as an object (TRUE) or to add it to the database (default: FALSE)

import_map

data.frame object of the import map (e.g. from a CSV)

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

If ‘as_object’ == TRUE, a data.frame object containing either packed (if ‘unpack_spectra’ == FALSE) or unpacked (if ‘unpack_spectra’ == TRUE) spectra, otherwise adds spectra to the database

Note

This function is called as part of [full_import()] during the call to [resolve_peaks]


resolve_ms_spectra R Documentation

Unpack mass spectral data in compressed format

Description

For some spectra, searching in a long form is much more performant. Use this function to unpack data already present in the ‘ms_data’ table into the ‘ms_spectra’ table. Data should be packed in one of two ways, either two columns for mass-to-charge ratio and intensity (“separated” - see [ms_spectra_separated]) or in a single column with interleaved data (“zipped” - see [ms_spectra_zipped]).

Usage

resolve_ms_spectra(
  peak_id,
  spectra_data = NULL,
  peaks_table = "peaks",
  ms_data_table = "ms_data",
  ms_spectra_table = "ms_spectra",
  unpack_format = c("separated", "zipped"),
  as_object = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

peak_id

INT scalar of the peak ID in question, which must be present

spectra_data

data.frame object containing spectral data

peaks_table

CHR scalar name of the peaks table in the database (default: “peaks”)

ms_data_table

CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)

ms_spectra_table

CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)

unpack_format

CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”

as_object

LGL scalar of whether to return the unpacked spectra to the session (default: TRUE) or to insert into the database (FALSE)

db_conn

database connection object (default: con)

log_ns

CHR scalar name of the logging namespace to use

Value

If ‘as_object’ == TRUE, a data.frame of unpacked spectra, otherwise no return and a database insertion will be performed

Note

This function may be slow, especially with peaks containing a large number of scans or a large amount of data

References

ms_spectra_separated

ms_spectra_zipped


resolve_multiple_values R Documentation

Utility function to resolve multiple choices interactively

Description

This function is generally not called directly, but rather as a workflow component from within [resolve_normalization_value] during interactive sessions to get feedback from users during the normalization value resolution process.

Usage

resolve_multiple_values(values, search_value, as_regex = FALSE, db_table = "")

Arguments

values

CHR vector of possible values

search_value

CHR scalar of the value to search

as_regex

LGL scalar of whether to treat ‘search_value’ as a regular expression string (TRUE) or to use it directly (FALSE, default)

db_table

CHR scalar name of the database table to search, used for printing log messages only (default: ““)

Value

CHR scalar result of the user’s choice


resolve_normalization_value R Documentation

Resolve a normalization value against the database

Description

Normalized SQL databases often need to resolve primary keys. This function checks for a given value in a given table and either returns the matching index value or, if a value is not found and ‘interactive()’ is TRUE, it will add that value to the table and return the new index value. It will look for the first matching value in all columns of the requested table to support loose finding of identifiers and is meant to operate only on normalization tables (i.e. look up tables).

Usage

resolve_normalization_value(
  this_value,
  db_table,
  id_column = "id",
  case_sensitive = FALSE,
  fuzzy = FALSE,
  db_conn = con,
  log_ns = "db",
  ...
)

Arguments

this_value

CHR (or coercible to) scalar value to look up

db_table

CHR scalar of the database table to search

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

other values to add to the normalization table, where names must match the table schema

Details

The search itself is done using [check_for_value].

Value

The database primary key (typically INT) of the normalized value

Note

This is mostly a DRY convenience function to avoid having to write the loookup and add logic each time.

Interactive sessions are required to add new values


resolve_peak_ums_params R Documentation

Resolve and import optimal uncertain mass spectrum parameters

Description

This imports the defined object component containing parameters for the optimized uncertainty mass spectrum used to compare with new data. This function may be called at any time to add data for a given peak, but there is no row unique restriction on the underlying table and is best used in a “one pass” method during the import routine. These parameters are calculated as part of NIST QA procedures and are added to the output of the NTA MRT after those JSONs have been created.

Usage

resolve_peak_ums_params(
  obj,
  peak_id,
  ums_params_in = "opt_ums_params",
  ums_params_table = "opt_ums_params",
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

peak_id

INT scalar of the peak ID in question, which must be present (e.g. from the import workflow)

ums_params_in

CHR scalar name of the item in ‘obj’ containing optimized uncertainty mass spectrum parameters

ums_params_table

CHR scalar name of the database table holding optimized uncertainty mass spectrum parameters

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Value

Nothing if successful, a data frame object of the extracted parameters otherwise.

Note

This function is called as part of [resolve_peaks()]


resolve_peaks R Documentation

Resolve the peaks node during import

Description

Call this function to resolve and insert information for the “peaks” node in the database including software conversion settings (via [resolve_software_settings_NTAMRT]) and mass spectra data (via [resolve_ms_data] and, optionally, [resolve_ms_spectra]). This function relies on the import object being formatted appropriately.

Usage

resolve_peaks(
  obj,
  sample_id,
  peaks_table = "peaks",
  software_timestamp = NULL,
  software_settings_in = "msconvertsettings",
  ms_data_in = "msdata",
  ms_data_table = "ms_data",
  unpack_spectra = FALSE,
  unpack_format = c("separated", "zipped"),
  ms_spectra_table = "ms_spectra",
  linkage_table = "conversion_software_peaks_linkage",
  settings_table = "conversion_software_settings",
  as_date_format = "%Y-%m-%d %H:%M:%S",
  format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"),
  min_datetime = "2000-01-01 00:00:00",
  import_map = IMPORT_MAP,
  ums_params_in = "opt_ums_params",
  ums_params_table = "opt_ums_params",
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

CHR vector describing settings or a named LIST with names matching column names in table conversion_software_settings.

sample_id

INT scalar of the sample id (e.g. from the import workflow)

peaks_table

CHR scalar of the database table name holding QC method check information (default: “peaks”)

ms_data_in

CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”)

ms_data_table

CHR scalar name of the table holding packed spectra in the database (default: “ms_data”)

unpack_spectra

LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE)

unpack_format

CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped”

ms_spectra_table

CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”)

import_map

data.frame object of the import map (e.g. from a CSV)

ums_params_in

CHR scalar name of the item in ‘obj’ containing optimized uncertainty mass spectrum parameters

ums_params_table

CHR scalar name of the database table holding optimized uncertainty mass spectrum parameters

db_conn

Connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

INT scalar of the newly inserted or identified peak ID(s)

Note

This function is called as part of [full_import()]

This function relies on an import map


resolve_qc_data_NTAMRT R Documentation

Resolve and import quality control data for import

Description

This imports the defined object component containing QC data (i.e. a nested list of multiple quality control checks) from the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT).

Usage

resolve_qc_data_NTAMRT(
  obj,
  peak_id,
  qc_data_in = "qc",
  qc_data_table = "qc_data",
  peaks_table = "peaks",
  ignore = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

peak_id

INT vector of the peak ids (e.g. from the import workflow)

qc_data_in

CHR scalar name of the component in ‘obj’ containing QC data (default: “qc”)

qc_data_table

CHR scalar name of the database table holding QC data (default: “qc_data”)

peaks_table

CHR scalar name of the database table holding peaks data (default: “peaks”)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]


resolve_qc_methods_NTAMRT R Documentation

Resolve and import quality control method information

Description

This imports the defined object component containing QC method information (i.e. a data frame of multiple quality control checks) from the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT).

Usage

resolve_qc_methods_NTAMRT(
  obj,
  peak_id,
  qc_method_in = "qcmethod",
  qc_method_table = "qc_methods",
  qc_method_norm_table = "norm_qc_methods_name",
  qc_method_norm_reference = "norm_qc_methods_reference",
  qc_references_in = "source",
  peaks_table = "peaks",
  ignore = FALSE,
  db_conn = con,
  log_ns = "db"
)

Arguments

obj

LIST object containing data formatted from the import generator

peak_id

INT vector of the peak ids (e.g. from the import workflow)

qc_method_in

CHR scalar of the name in ‘obj’ that contains QC method check information (default: “qcmethod”)

qc_method_table

CHR scalar of the database table name holding QC method check information (default: “qc_methods”)

qc_method_norm_table

CHR scalar name of the database table normalizing QC methods type (default: “norm_qc_methods_name”)

qc_method_norm_reference

CHR scalar name of the database table normalizing QC methods reference type (default: “norm_qc_methods_reference”)

qc_references_in

CHR scalar of the name in ‘obj[[qc_method_in]]’ that contains the reference or citation for the QC protocol (default: “source”)

peaks_table

CHR scalar name of the database table holding sample information (default: “samples”)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]


resolve_sample R Documentation

Add a sample via import

Description

Part of the data import routine. Adds a record to the “samples” table with the values provided in the JSON import template. Uses [verify_sample_class] and [verify_contributor] to parse foreign key relationships, [resolve_method] to add a record to ms_methods to get the proper id, and [resolve_software_settings_NTAMRT] to insert records into and get the proper conversion software linkage id from tables “conversion_software_settings” and “conversion_software_linkage” if appropriate.

Usage

resolve_sample(
  obj,
  db_conn = con,
  method_id = NULL,
  sample_in = "sample",
  sample_table = "samples",
  generation_type = NULL,
  generation_type_default = "empirical",
  generation_type_norm_table = "norm_generation_type",
  import_map = IMPORT_MAP,
  ensure_unique = TRUE,
  require_all = TRUE,
  fuzzy = FALSE,
  case_sensitive = TRUE,
  log_ns = "db",
  ...
)

Arguments

obj

LIST object containing data formatted from the import generator

db_conn

connection object (default: con)

method_id

INT scalar of the associated ms_methods record id

sample_in

CHR scalar of the import object name storing sample data (default: “sample”)

sample_table

CHR scalar name of the database table holding sample information (default: “samples”)

generation_type

CHR scalar of the type of data generated for this sample (e.g. “empirical” or “in silico”). The default (NULL) will assign based on ‘generation_type_default’; any other value will override the default value and be checked against values in ‘geneation_type_norm_table’

generation_type_default

CHR scalar naming the default data generation type (default: “empirical”)

generation_type_norm_table

CHR scalar name of the database table normalizing sample generation type (default: “empirical”)

import_map

data.frame object of the import map (e.g. from a CSV)

ensure_unique

LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE)

require_all

LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)

fuzzy

LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE).

case_sensitive

LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Other named elements to be appended to samples as necessary for workflow resolution, can be used to pass defaults or additional values.

Value

INT scalar if successful, result of the call to [add_or_get_id] otherwise

Note

This function is called as part of [full_import()]


resolve_sample_aliases R Documentation

Resolve and import sample aliases

Description

Call this function to attach sample aliases to a sample record in the database. This can be done either through the import object with a name reference or directly by assigning additional values.

Usage

resolve_sample_aliases(
  sample_id,
  obj = NULL,
  aliases_in = NULL,
  values = NULL,
  db_table = "sample_aliases",
  db_conn = con,
  log_ns = "db"
)

Arguments

sample_id

INT scalar of the sample id (e.g. from the import workflow)

obj

(optional) LIST object containing data formatted from the import generator (default: NULL)

aliases_in

(optional) CHR scalar of the name in ‘obj’ containing the sample aliases in list format (default: NULL)

values

(optional) LIST containing the sample aliases with names as the alias name and values containing the reference (e.g. URI, link to a containing repository, or reference to the owner or project from which a sample is drawn) to that alias

db_table

CHR scalar name of the database table containing sample aliases (default: “sample_aliases”)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Value

None, executes actions on the database

Note

This function is called as part of [full_import()]

One of ‘values’ or both of ‘obj’ and ‘aliases_in’ must be provided to add new sample aliases.


resolve_software_settings_NTAMRT R Documentation

Import software settings

Description

Part of the standard import pipeline, adding rows to the ‘conversion_software_settings’ table with a given sample id. Some argument names are shared with other import functions, specifically ‘obj’ but are formed differently to resolve the node complexity correctly.

Usage

resolve_software_settings_NTAMRT(
  obj,
  software_timestamp = NULL,
  db_conn = con,
  software_settings_in = "msconvertsettings",
  settings_table = "conversion_software_settings",
  linkage_table = "conversion_software_peaks_linkage",
  as_date_format = "%Y-%m-%d %H:%M:%S",
  format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"),
  min_datetime = "2000-01-01 00:00:00",
  log_ns = "db"
)

Arguments

obj

CHR vector describing settings or a named LIST with names matching column names in table conversion_software_settings.

software_timestamp

CHR scalar of the sample timestamp (e.g. sample$starttime) to use for linking software conversion settings with peak data, with a call back to the originating sample. If NULL (the default), the current system timestamp in UTC will be used from [lubridate::now()].

db_conn

connection object (default: con)

software_settings_in

CHR scalar name of the component in ‘obj’ containing software settings (default: “msconvertsettings”)

settings_table

CHR scalar name of the database table containing the software settings used for an imported data file (default: “conversion_software_settings”)

linkage_table

CHR scalar name of the database table containing the linkage between peaks and their software settings (default: “conversion_software_peaks_linkage”)

as_date_format

CHR scalar the format to use when storing timestamps that matches database column expectations (default: “%Y-%m-%d %H:%M:%S”)

format_checks

CHR vector of the [lubridate::parse_date_time()] format checks to execute in order of priority; these must match a lubridate function of the same name (default: c(“ymd_HMS”, “ydm_HMS”, “mdy_HMS”, “dmy_HMS”))

min_datetime

CHR scalar of the minimum reasonable timestamp used as a sanity check (default: “2000-01-01 00:00:00”)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

NULL on errors, INT scalar of the inserted software linkage id if successful

Note

This function is called as part of [full_import()]


resolve_table_name R Documentation

Check presence of a database table

Description

This convenience function checks for the existence of one or more ‘db_table’ objects in a database.

Usage

resolve_table_name(db_table = "compounds", db_conn = "test_con")

Arguments

db_table

CHR vector of table names to check

db_conn

connection object (default: con)

log_ns

CHR scalar of the namespace (if any) to use for logging (default: “db”)

Value

CHR vector of existing tables


save_data_dictionary R Documentation

Save the current data dictionary to disk

Description

Executes [data_dictionary()] and saves the output to a local file. If output_format is one of “data.frame” or “list”, the resulting file will be saved as an RDS. Parameter output_file will be used during the save process; relative paths will be identified by the current working directory.

Usage

save_data_dictionary(db_conn = con)

Arguments

db_conn

connection object (default: con)

output_format

CHR scalar, one of (capitalization insensitive) “json”, “csv”, “data.frame”, or “list” (default “json”)

output_file

CHR scalar indicating where to save the resulting file; an appropriate file name will be constructed if left NULL (default: NULL)

overwrite_existing

LGL scalar indicating whether to overwrite an existing file whose name matches that determined from ‘output_file’ (default: TRUE); file names will be appended with “(x)” sequentially if this is FALSE and a file with matching name exists.

Value

None, saves a file to the current working directory


search_all R Documentation

Search all mass spectra within database against unknown mass spectrum

Description

Search all mass spectra within database against unknown mass spectrum

Usage

search_all(
  con,
  searchms,
  normfn = "sum",
  cormethod = "pearson",
  optimized_params = TRUE
)

Arguments

con

SQLite database connection

searchms

object generated from ‘create_search_ms’ function

normfn

the normalization function typically “mean” or “sum” for normalizing the intensity values

cormethod

the correlation method used for calculating the correlation, see ‘cor’ function for methods

optimized_params

LGL scalar indicating whether or not to use the optimal search parameters stored in the database table ‘opt_ums_params’

Value

LIST of search results


search_precursor R Documentation

Search the database for all compounds with matching precursor ion m/z values

Description

Search the database for all compounds with matching precursor ion m/z values

Usage

search_precursor(
  con,
  searchms,
  normfn = "sum",
  cormethod = "pearson",
  optimized_params = TRUE
)

Arguments

con

SQLite database connection

searchms

object generated from ‘create_search_ms’ function

normfn

the normalization function typically “mean” or “sum” for normalizing the intensity values

cormethod

the correlation method used for calculating the correlation, see ‘cor’ function for methods

optimized_params

LGL scalar indicating whether or not to use the optimal search parameters stored in the database table ‘opt_ums_params’

Value

table of match statistics for the compound of interest


setup_rdkit R Documentation

Conveniently set up an RDKit python environment for use with R

Description

Conveniently set up an RDKit python environment for use with R

Usage

setup_rdkit(env_name = "nist_hrms_db", required_libraries = c("reticulate", "rdkit"), env_ref = "rdk")

Arguments

env_name

CHR scalar of the name of a python environment

env_ref

CHR scalar of the name of an R expression bound to a python library OR an R object reference by name to an existing object that should be bound to RDKit (e.g. from [reticulate::import])

ns

CHR scalar

Value

None, though calls to utility functions will give their own returns


sigtest R Documentation

Significance testing function

Description

Internal function: enables significance testing between two values

Usage

sigtest(x1, x2, s1, s2, n1, n2, sig = 0.95)

Arguments

x1, x2

mean values to be compared

s1, s2

standard deviation of their respective values

n1, n2

number of observations of the respective values

sig

significance level to test (0.95 = 95%)


smilestoformula R Documentation

Convert SMILES string to Formula and other information

Description

The function converts SMILES strings into a data frame containing the molecular formula (FORMULA), fixed mass of the formula (FIXED MASS), and the net charge (NETCHARGE).

Usage

smilestoformula(SMILES)

Arguments

SMILES

vector of SMILES strings

Value

data frame

Examples

smilestoformula(c("CCCC", "C(F)(F)F"))

smilestoformula("CCCC")

sql_to_msp R Documentation

Export SQL Database to a MSP NIST MS Format

Description

Export SQL Database to a MSP NIST MS Format

Usage

sql_to_msp(
  con,
  optimized_params = TRUE,
  outputfile = paste0("DimSpecExport", Sys.Date(), ".msp"),
  cormethod = "pearson",
  normfn = "sum"
)

Arguments

con

SQLite database connection

optimized_params

Boolean TRUE indicates that the optimized parameters for uncertainty mass spectra will be used.

outputfile

Text string file name and/or location to save MSP file format

cormethod

Text string type of correlation function to use (DEFAULT = ‘pearson’)

normfn

Text string type of normalization function to use (DEFAULT = ‘sum’)

Value

None, saves a *.msp file to the local file system.


sqlite_auto_trigger R Documentation

Create a basic SQL trigger for handling foreign key relationships

Description

This creates a simple trigger designed to streamline foreign key compliance for SQLite databases. Resulting triggers will check during table insert or update actions that have one or more foreign key relationships defined as ‘target_table.fk_col = norm_table.pk_col’. It is primarily for use in controlled vocabulary lists where a single id is tied to a single value in the parent table, but more complicated relationships can be handled.

Usage

sqlite_auto_trigger(target_table = "test", fk_col = c("col1", "col2",
  "col3"), norm_table = c("norm_col1", "norm_col2", "norm_col3"), pk_col =
  "id", val_col = "value", action_occurs = "after", trigger_action =
  "insert", table_action = "update")

Arguments

target_table

CHR scalar name of a table with a foreign key constraint.

fk_col

CHR vector name(s) of the column(s) in ‘target_table’ with foreign key relationship(s) defined.

norm_table

CHR vector name(s) of the table(s) containing the primary key relationship(s).

pk_col

CHR vector name(s) of the column(s) in ‘norm_table’ containing the primary key(s) side of the relationship(s).

val_col

CHR vector name(s) of the column(s) in ‘norm_table’ containing values related to the primary key(s) of the relationship(s).

action_occurs

CHR scalar on when to run the trigger, must be one of ‘c(“before”, “after”, “instead”)’ (“instead” should only be used if ‘target_table’ is a view - this restriction is not enforced).

trigger_action

CHR scalar on what type of trigger this is (e.g. ‘when’ = “after” and ‘trigger_action’ = “insert” -> “AFTER INSERT INTO”) and must be one of ‘c(“insert”, “update”, “delete”)’.

for_each

CHR scalar for SQLite this must be only ‘row’ - translated into a “FOR EACH ROW” clause. Set to any given noun for other SQL engines supporting other trigger transaction types (e.g. “FOR EACH STATEMENT” triggers)

table_action

CHR scalar on what type of action to run when the trigger fires, must be one of ‘c(“insert”, “update”, “delete”)’.

filter_col

CHR scalar of a filter column to override the final WHERE clause in the trigger. This should almost always be left as the default ““.

filter_val

CHR scalar of a filter value to override the final WHERE clause in the trigger. This should almost always be left as the default ““.

or_ignore

LGL scalar on whether to ignore insertions to normalization tables if an error occurs (default: TRUE, which can under certain conditions raise exceptions during execution of the trigger if more than a single value column exists in the parent table)

addl_actions

CHR vector of additional target actions to add to ‘table_action’ statements, appended to the end of the resulting “insert” or “update” actions to ‘target_table’. If multiple tables are in use, use positional matching in the vector (e.g. with three normalization tables, and additional actions to only the second, use c(““,”additional actions”, ““))

Details

These are intended as native database backup support for when connections do not change the default SQLite setting of PRAGMA foreign_keys = off. Theoretically any trigger could be created, but should only be used with care outside the intended purpose.

Triggers created by this function will check all new INSERT and UPDATE statements by checking provided values against their parent table keys. If an index match is found no action will be taken on the parent table. If no match is found, it is assumed this is a new normalized value and it will be added to the normalization table and the resulting new key will be replaced in the target table column.

Value

CHR scalar of class glue containing the SQL necessary to create a trigger. This is raw text; it is not escaped and should be further manipulated (e.g. via dbplyr::sql()) as your needs and database communication pipelines dictate.

Note

While this will work on any number of combinations, all triggers should be heavily inspected prior to use. The default case for this trigger is to set it for a single FK/PK relationship with a single normalization value. It will run on any number of normalized columns however trigger behavior may be unexpected for more complex relationships.

If ‘or_ignore’ is set to TRUE, errors in adding to the parent table will be ignored silently, possibly causing NULL values to be inserted into the target table foreign key column. For this reason it is recommended that the ‘or_ignore’ parameter only be set to true to expand parent table entries, but it will only supply a single value for the new normalization table. If additional columns in the parent table must be populated (e.g. the parent table has two required columns “value” and “acronym”), it is recommended to take care of those prior to any action that would activate these triggers.

Parameters are not checked against a schema (e.g. tables and columns exist, or that a relationships exists between tables). This function processes only text provided to it.

Define individual relationships between ‘fk_col’, ‘norm_table’, ‘pk_col’, and ‘val_col’ as necessary. Lengths for these parameters should match in a 1:1:1:1 manner to fully describe the relationships. If the schema of all tables listed in ‘norm_table’ are close matches, e.g. all have two columns “id” and “value” then ‘pk_col’ and ‘val_col’ will be reused when only a single value is provided for them. That is, provided three ‘norm_table’(s) and one ‘pk_col’ and one ‘val_col’, the arguments for ‘pk_col’ and ‘val_col’ will apply to each ‘norm_table’.

The usage example is built on a hypothetical SQLite schema containing four tables, one of which (“test” - with columns “id”, “col1”, “col2”, and “col3”) defines foreign key relationships to the other three (“norm_col1”, “norm_col2”, and “norm_col3”).

See Also

build_triggers


sqlite_auto_view R Documentation

Create a basic SQL view of a normalized table

Description

Many database viewers will allow links for normalization tables to get the human-readable value of a normalized column. Instead it is often preferable to build in views automatically that “denormalize” such tables for display or use in an application. This function seeks to script the process of creating those views. It examines the table definition from [pragma_table_info] and will extract the primary/foreign key relationships to build a “denormalized” view of the table using [get_fkpk_relationships] which requires a database map created from [er_map] and data dictionary created from [data_dictionary].

Usage

sqlite_auto_view(table_pragma = pragma_table_info("contributors"),
  target_table = "contributors", relationships =
  get_fkpk_relationships(db_map = er_map(con), dictionary =
  data_dictionary(con)), drop_if_exists = FALSE)

Arguments

table_pragma

data.frame object from [pragma_table_info] for a given table name in the database

target_table

CHR scalar name of the database table to build for, which should be present in the relationship definition

relationships

data.frame object describing the foreign key relationships for ‘target_table’, which should generally be the result of a call to [get_fkpk_relationships]

drop_if_exists

LGL scalar indicating whether to include a “DROP VIEW” prefix for the generated view statement; as this has an impact on schema, no default is set

Details

TODO for v2: abstract the relationships call by looking for objects in the current session.

Value

CHR scalar of class glue containing the SQL necessary to create a “denormalized” view. This is raw text; it is not escaped and should be further manipulated (e.g. via dbplyr::sql()) as your needs and database communication pipelines dictate.

Note

No schema checking is performed by this function, but rather relies on definitions from other functions.

This example will run slowly if the database map [er_map] and dictionary [data_dictionary] haven’t yet been called. If they exist in your session, use those as arguments to get_fkpk_relationships.

See Also

build_views

pragma_table_info

get_fkpk_relationships

er_map

data_dictionary


sqlite_parse_build R Documentation

Parse SQL build statements

Description

Reading SQL files directly into R can be problematic. This function is primarily called in [create_fallback_build]. To support multiline, human-readable SQL statements, ‘sql_statements’ must be of length 1.

Usage

example_file <- "./config/sql_nodes/reference.sql"
if (file.exists(example_file)) {
  build_commands <- readr::read_file(example_file)
  sqlite_parse_build(build_commands)
}

Arguments

sql_statements

CHR scalar of SQL build statements from an SQL file.

magicsplit

CHR scalar regex indicating some “magic” split point SQL comment to simplify the identification of discrete commands; will be used to split results (optional but highly recommended)

header

CHR scalar regex indicating the format of header comments SQL comment to remove (optional)

section

CHR scalar regex indicating the format of section comments SQL comment to remove (optional)

Details

All arguments ‘magicsplit’, ‘header’, and ‘section’ provide flexibility in the comment structure of the SQL file and accept regex for character matching purposes.

Value

LIST of parsed complete build commands as CHR vectors containing each line.


sqlite_parse_import R Documentation

Parse SQL import statements

Description

In the absence of the sqlite command line interface (CLI), the [build_db] process needs a full set of SQL statements to execute directly rather than CLI dot commands. This utility function parses formatted SQL statements containing CLI “.import” commands to create SQL INSERT statements. This function is primarily called in [create_fallback_build].

Usage

if (file.exists("./config/data/elements.csv")) {
  sqlite_parse_import(".import --csv --skip 1 ./config/data/elements.csv elements")
}

Arguments

build_statements

CHR vector of SQL build statements from an SQL file.

Value

LIST of parsed .import statements as full “INSERT” statements.


start_api R Documentation

Start the plumber interface from a clean environment

Description

This convenience function launches the plumber instance if it was not set to launch during the session setup. It is a thin wrapper with a more intuitive name than [api_reload] and the default background setting turned off to test the server in the current session.

Usage

start_api()

Arguments

plumber_file

CHR scalar name of the plumber definition file, which should be in src_dir (default: NULL)

plumber_host

CHR scalar of the host server address (default: NULL)

plumber_port

INT scalar of the listening port on the host server (default: NULL)

background

LGL scalar of whether to launch the API in a background process (default: FALSE)

src_dir

CHR scalar file path to settings and functions enabling the plumber API (default: here::here(“inst”, “plumber”))

log_ns

CHR scalar name of the logging namespace to use for this function (default: “api”)

Value

None, launches the plumber instance

Note

This function is intended to pull from the environment variables identifying the plumber file, host, and port.


start_app R Documentation

WIP Launch a shiny application

Description

Call this function to launch an app either directly or in a background process. The name must be present in the app directory or as a named element of SHINY_APPS in the current environment.

Usage

start_app("table_explorer")

Arguments

app_name

CHR scalar name of the shiny app to run, this should be the name of a directory containing a shiny app that is located within the directory defined by app_dir or the name of an app as defined in your environment SHINY_APPS variable

app_dir

file path to a directory containing shiny apps (default: here::here(“inst”, “apps”))

background

LGL scalar of whether to launch the application in a background process (default: FALSE)

Other named parameters to be passed to [shiny::runApp]

Value

None, launches a browser with the requested shiny application

Note

Background launching of shiny apps is not yet supported.


start_rdkit R Documentation

Start the RDKit integration

Description

If the session was started without RDKit integration, e.g. INFORMATICS or USE_RDKIT were FALSE in [config/env_R.R], start up RDKit in this session.

Usage

start_rdkit(src_dir = here::here("inst", "rdkit"), log_ns = "rdkit")

Arguments

src_dir

CHR scalar file path to settings and functions enabling rdkit (default: here::here(“inst”, “rdkit”))

log_ns

CHR scalar name of the logging namespace to use for this function (default: “rdkit”)

Value

LGL scalar indicating whether starting RDKit integration was successful

Note

RDKit and rcdk are incompatible. If the session was started with INFORMATICS = TRUE and USE_RDKIT = FALSE, ChemmineR was likely loaded. If this is the case, the session will need to be restarted due to java conflicts between the two.


summarize_check_fragments R Documentation

Summarize results of check_fragments function

Description

Summarize results of check_fragments function

Usage

summarize_check_fragments(fragments_checked)

Arguments

fragments_checked

output of ‘check_fragments’ function

Value

table summary of check_fragments function


support_info R Documentation

R session information for support needs

Description

Several items of interest for this particular project including: - DB_DATE, DB_VERSION, BUILD_FILE, LAST_DB_SCHEMA, LAST_MODIFIED, DEPENDS_ON, and EXCLUSIONS as defined in the project’s ../config/env_R.R file.

Usage

support_info()

Arguments

app_info

BOOL scalar on whether to return this application’s properties

Value

LIST of values


suspectlist_at_NIST R Documentation

Open the NIST PDR entry for the current NIST PFAS suspect list

Description

This simply points your browser to the NIST public data repository for the current NIST suspect list, where you can find additional information. Click the download button in the left column of any file to download it. s Requires the file “suspectlist_url.txt” to be present in the ‘config’ subdirectory of the current working directory.

Usage

suspectlist_at_NIST(url_file = file.path("config", "suspectlist_url.txt"))

Value

none

Examples

suspectlist_at_NIST()

table_msdata R Documentation

Tabulate MS Data

Description

Pulls specified MS Data from mzML and converts it into table format for further processing Internal function for ‘peak_gather_json’ function

Usage

table_msdata(mzml, scans, mz = NA, zoom = NA, masserror = NA, minerror = NA)

Arguments

mzml

list of msdata from ‘mzMLtoR’ function

scans

integer vector containing scan numbers to extract MS data

mz

numeric targeted m/z

zoom

numeric vector specifying the range around m/z, from m/z - zoom[1] to m/z + zoom[2]

masserror

numeric relative mass error (in ppm) of the instrument

minerror

numeric minimum mass error (in Da) of the instrument

Value

data.frame containing MS data


tack_on R Documentation

Append additional named elements to a list

Description

This does nothing more than [base::append] ellipsis arguments to be added directly to the end of an existing list object. This primarily supports additional property assignment during the import process for future development and refinement. Call this as part of any function with additional arguments. This may result in failures or ignoring unrecognized named parameters. If no additional arguments are passed obj is returned as provided.

Usage

tack_on(obj, ..., log_ns = "db")

Arguments

obj

LIST of any length to be appended to

Additional arguments passed to/from the ellipsis parameter of calling functions. If named, names are preserved.

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

LIST object of length equal to obj plus additional named arguments

Note

If duplicate names exists in obj and those provided as ellipsis arguments, those provided as part of the ellipsis will replace those in obj.

Examples

tack_on(list(a = 1:3), b = letters, c = rnorm(10))
tack_on(list(a = 1:3))

tidy_comments R Documentation

Tidy up table and field comments

Description

Creates more human-readable outputs after extracting the raw SQL used to build entities and parsing out the comments as identified with the /* … */ multi-line comment flag pair. Single line comments are not extracted. The first comment is assumed to be the table comment. See examples in the ‘config/sql_nodes’ directory.

Usage

tidy_comments(pragma_table_def("compounds", get_sql = TRUE))

Arguments

obj

result of calling [pragma_table_def] with ‘get_sql’ = TRUE

Value

LIST of length equal to ‘obj’ containing extracted comments


tidy_ms_spectra R Documentation

Tidy Spectra

Description

A convenience function to take outputs from [ms_spectra_separated] and [ms_spectra_zipped] and return them as a tidy data frame by unpacking the list column “spectra”.

Usage

tidy_ms_spectra(df = packed_data)

Arguments

df

data.frame object containing nested spectra in a column

Value

data.frame object containing tidy spectra


tidy_spectra R Documentation

Decompress Spectra

Description

This convenience wrapper will automatically decompress ms spectra in the “separate” and “zipped” formats and return them as tidy data frames suitable for further manipulation or visualization.

Usage

tidy_spectra(
  target,
  is_file = FALSE,
  is_format = c("separated", "zipped"),
  spectra_set = "msdata",
  ms_col_sep = c("measured_mz", "measured_intensity"),
  ms_col_zip = "data",
  is_JSON = FALSE
)

Arguments

target

CHR scalar file path to use OR an R object containing compressed spectral data in the “separate” or “zipped” format

is_file

BOOL scalar of whether or not ‘target’ is a file. Set to FALSE to use an existing R object, which should contain an object with a named element matching parameter ‘spectra_set’ (default TRUE)

is_format

CHR scalar of the compression format, which must be one of the supported compression forms (“separated” or “zipped”); ignored if the compression format can be inferred from the text in ‘target’ (default “separate”)

spectra_set

CHR scalar of the object name holding a spectra data frame to decompress (default “msdata”)

ms_col_sep

CHR vector of the column names holding spectral masses and intensities in the “separate” format (default c(“masses”, “intensities”))

ms_col_zip

CHR scalar of the name of the column holding spectral masses and intensities in the “unzip” format (default “msdata”)

is_JSON

BOOL scalar of whether or not ‘target’ is a JSON expression needing conversion (default TRUE)

Value

data.frame object containing unpacked spectra

Examples

tidy_spectra('{"msdata": "712.9501 15094.41015625 713.1851 34809.9765625"}', is_format = "zipped")
tidy_spectra('{"measured_mz":"712.9501 713.1851","measured_intensity":"15094.41015625 34809.9765625"}')

unzip R Documentation

Unzip binary data into vector

Description

Unzip binary data into vector

Usage

unzip(x, type = "gzip")

Arguments

x

String of binary data to convert

type

type of compression (see ‘base::memDecompress’). Default is ‘gzip’

Value

vector containing data from converted binary data


update_all R Documentation

Convenience function to rebuild all database related files

Description

This is a development and deployment function that should be used with caution. It is intended solely to assist with the development process of rebuilding a database schema from source files and producing the supporting data. It will create both the JSON expressin of the data dictionary and the fallback SQL file.

Usage

update_all()

Arguments

rebuild

LGL scalar indicating whether to first rebuild from environment settings (default: FALSE for safety)

api_running

LGL scalar of whether or not the API service is currently running (default: TRUE)

api_monitor

process object pointing to the API service (default: NULL)

db

CHR scalar of the database name (default: session value DB_NAME)

build_from

CHR scalar of a SQL build script to use (default: environment value DB_BUILD_FILE)

populate

LGL scalar of whether to populate with data from the file in ‘populate_with’ (default: TRUE)

populate_with

CHR scalar for the populate script (e.g. “populate_demo.sql”) to during after the build is complete; (default: session value DB_DATA); ignored if ‘populate = FALSE’

archive

LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE), passed to [‘remove_db()’]

sqlite_cli

CHR scalar to use to look for installed sqlite3 CLI tools in the current system environment (default: session value SQLITE_CLI)

connect

LGL scalar of whether or not to connect to the rebuilt database in the global environment as object ’con“ (default: FALSE)

log_ns

CHR scalar of the logging namespace to use during execution (default: “db”)

Details

!! To preserve data, do not call this with both ‘rebuild’ = TRUE and ‘archive’ = FALSE !!

Value

Files for the new database, fallback build, and data dictionary will be created in the project directory and objects will be created in the global environment for the database map (LIST “db_map”) and current dictionary (LIST “db_dict”)

Note

This does not recast the views and triggers files created through [sqlite_autoview] and [sqlite_autotrigger] as the output of those may often need additional customization. Existing auto-views and -triggers will be created as defined. To exclude those, first modify the build file referenced by [build_db].

This requires references to be in place to the individual functions in the current environment.


update_data_sources R Documentation

Dump current database contents

Description

Perform one or both of two main tasks for backing up the NTA database.

Usage

update_data_sources(
  project,
  data_dir = file.path("config", "data"),
  create_backups = TRUE,
  dump_tables = TRUE,
  dump_sql = TRUE,
  db_conn = con,
  sqlite_cli = ifelse(exists("SQLITE_CLI"), SQLITE_CLI, NULL),
  db_name = ifelse(exists("DB_NAME"), DB_NAME, NULL)
)

Arguments

project

CHR scalar of the directory containing project specific data (required, no default)

data_dir

CHR scalar of the directory containing project independent data sources used for population (default: ‘file.path(“config”, “data”)’)

create_backups

LGL scalar indicating whether to create backups prior to writing updated data files (default: TRUE)

dump_tables

LGL scalar indicating whether to dump contents of database tables as comma-separated-value files (default: TRUE)

dump_sql

LGL scalar indicating whether to create an SQL dump file containing both schema and data as a backup (default: TRUE)

db_conn

connection object (default: con)

SQLITE_CLI

CHR scalar system reference to your installation of the sqlite command line interface

Details

The main task is to update CSV files in the config/data directory with the current contents of the database. This is done on a table by table basis and results in flat files whose structures no longer interrelate except numerically. Primarily this would be used to migrate database contents to other systems or for further manipulation. Please specify a ‘project’ that project-specific information can be maintained.

Backups created with this function are placed in a “backups” subdirectory of the directory defined by parameter ‘data_dir’. If ‘dump_sql = TRUE’ SQL dump files will be written to “backups/sqlite” with file names equal to the current database name prefixed by date.

Value

None, copies database information to the local file system


update_env_from_file R Documentation

Update a conda environment from a requirements file

Description

The ‘requirements_file’ can be any formatted file that contains a definition for python libraries to add to an environment (e.g. requirements.txt, environment.yml, etc) that is understood by conda. Relative file paths are fine, but the file will not be discovered (e.g. by ‘list.files’) so specificity is always better.

Usage

update_env_from_file("nist_hrms_db")

Arguments

env_name

CHR scalar of a python environment

requirements_file

CHR scalar file path to a suitable requirements.txt or environment.yml file

conda_alias

CHR scalar of the command line interface alias for your conda tools (default: NULL is translated first to the environment variable CONDA_CLI and then to “conda”)

Details

This is a helper function, largely to support versions of reticulate prior to the introduction of the environment argument in version 1.24+.

Value

None, directly updates the referenced python environment

Note

This requires conda CLI tools to be installed.

A default installation alias of “conda” is assumed.

Set global variable ‘CONDA_CLI’ to your conda alias for better support.


update_logger_settings R Documentation

Update logger settings

Description

This applies the internal routing and formatting for logger functions to the current value of the LOGGING object. If LOGGING is changed (i.e. a logging namespace is added or changed) this function should be run to update routing and formatting to be in line with the current settings.

Usage

update_logger_settings(log_all_warnings = FALSE, log_all_errors =
  FALSE)

Arguments

log_all_warnings

LGL scalar indicating whether or not to log all warnings (default: TRUE)

log_all_errors

LGL scalar indicating whether or not to log all errors (default: TRUE)

Value

None

Note

The calling stack for auto logging of warnings and errors does not work with background processes. These settings call [logger::log_warnings()] and [logger::log_errors()].

This function is used only for its side effects.


user_guide R Documentation

Launch the User Guide for DIMSpec

Description

Use this function to launch the bookdown version of the User Guide for the NIST Database Infrastructure for Mass Spectrometry (DIMSpec) Toolkit

Usage

user_guide()

Arguments

path

CHR scalar representing a valid file path to the local user guide

url_gh

CHR scalar pointing to the web resource, in this case the URL to the User Guide hosted on GitHub pages

view_on_github

LGL scalar of whether to use the hosted version of the User Guide on GitHub (default: TRUE is recommended) which will always display the most up to date version

Value

None, opens a browser to the index page of the User Guide

Note

This works ONLY when DIMSpec is used as a project with the defined directory structure


valid_file_format R Documentation

Ensure files uploaded to a shiny app are of the required file type

Description

This input validation check uses [tools::file_ext] to ensure that files uploaded to [shiny::fileInput] are among the acceptable file formats. Users may sometimes wish to load a file outside the “accepts” format list by manually changing it during the upload process. If they are not, a [nist_shinyalert] modal is displayed prompting the user to upload a file in one of the requested formats.

Usage

req(valid_file_format(input$file_upload, c(".csv", ".xls")))

Arguments

filename

CHR scalar name of the file uploaded to the shiny server

accepts

CHR vector of acceptable file formats

show_alert

LGL scalar indicating whether or not to show an alert, set FALSE to return the status of the check

Value

Whether or not all required values are present.


validate_casrns R Documentation

Validate a CAS RN

Description

Chemical Abstract Service (CAS) Registry Numbers (RNs) follow a standard creation format. From [https://www.cas.org/support/documentation/chemical-substances/faqs], a CAS RN is a “numeric identifier that can contain up to 10 digits, divided by hyphens into three parts. The right-most digit is a check digit used to verify the validity and uniqueness of the entire number. For example, 58-08-2 is the CAS Registry Number for caffeine.”

Usage

validate_casrns(casrn_vec, strip_bad_cas = TRUE)

Arguments

casrn_vec

CHR vector of what CAS RNs to validate

strip_bad_cas

LGL scalar of whether to strip out invalid CAS RNs (default: TRUE)

Details

Provided CAS RNs in ‘casrn_vec’ are validated for format and their checksum digit. Those failing will be printed to the console by default, and users have the option of stripping unverified entries from the return vector.

This only validates that a CAS RN is properly constructed; it does not indicate that the registry number exists in the CAS Registry.

See [repair_xl_casrn_forced_to_date] as one possible pre-processing step.

Value

CHR vector of length equal to that of ‘casrn_vec’

Examples

validate_casrns(c("64324-08-9", "64324-08-5", "12332"))
validate_casrns(c("64324-08-9", "64324-08-5", "12332"), strip_bad_cas = FALSE)

validate_column_names R Documentation

Ensure database column presence

Description

When working with SQL databases, this convenience function validates any number of column names by comparing against the list of column names in any number of tables. Typically it is called transparently inline to cause execution failure when column names are not present in referenced tables during build of SQL queries.

Usage

validate_column_names(con, "peaks", "id")

Arguments

db_conn

connection object (e.g. of class “SQLiteConnection”)

table_names

CHR vector of tables to search

column_names

CHR vector of column names to validate

Value

None


validate_tables R Documentation

Ensure database table presence

Description

When working with SQL databases, this convenience function validates any number of table names by comparing against the list of those present. Typically it is called transparently inline to cause execution failure when tables are not present during build of SQL queries.

Usage

validate_tables(con, "peaks")

Arguments

db_conn

connection object (e.g. of class “SQLiteConnection”)

table_names

CHR vector name of tables to ensure are present

Value

Failure if the table doesn’t exist, none if it does.


verify_args R Documentation

Verify arguments for a function

Description

This helper function checks arguments against a list of expectations. This was in part inspired by the excellent testthat package and shares concepts with the Checkmate package. However, this function performs many of the common checks without additional package dependencies, and can be inserted into other functions for a project easily with:

  arg_check <- verify_args(args = as.list(environment()),
  conditions = list(param1 = c("mode", "logical"), param2 = c("length", 1))

and check the return with

  if (!arg_check$valid) cat(paste0(arg_check$messages, "\n"))

where argument conditions describes the tests. This comes at the price of readability as the list items in conditions do not have to be named, but can be to improve clarity. See more details below for argument conditions to view which expectations are currently supported. As this is a nested list condition check, it can also originate from any source coercible to a list (e.g. JSON, XML, etc.) and this feature, along with the return of human-meaningful evaluation strings, is particularly useful for development of shiny applications. Values from other sources MUST be coercible to a full list (e.g. if being parsed from JSON, use jsonlite::fromJSON(simplifyMatrix = FALSE))

Usage

verify_args(args = list(character_length_2 = c("a", "b")),
            conditions = list(character_length_2 = list(c("mode", "character"),
                                                        c("length", 3))
)
verify_args(args = list(boolean = c(TRUE, FALSE, TRUE)),
            conditions = list(list(c("mode", "logical"),
                                   c("length", 1)))
)
verify_args(args = list(foo = c(letters[1:3]),
                        bar = 1:10),
            conditions = list(foo = list(c("mode", "numeric"),
                                         c("n>", 5)),
                              bar = list(c("mode", "logical"),
                                         c("length", 5),
                                         c(">", 10),
                                         c("between", list(100, 200)),
                                         c("choices", list("a", "b"))))
)

Arguments

args

LIST of named arguments and their values, typically passed directly from a function definition in the form args = list(foo = 1:2, bar = c(“a”, “b”, “c”)) or directly by passing environment()

conditions

Nested LIST of conditions and values to check, with one list item for each element in args.

  • The first element of each list should be a character scalar in the supported list.

  • The second element of each list should be the check values themselves and may be of any type.

Multiple expectation conditions can be set for each element of args in the form

  • conditions = list(foo = list(c(“mode”, “numeric”), c(“length”, 2)), bar = list(c(“mode”, “character”), c(“n<”, 5)))

Currently supported expectations are:

  • class: checks strict class expectation by direct comparison with class to support object classes not supported with the is.x or is_x family of functions; much stricter than a “mode” check in that the requested check must be present in the return from a call to class e.g. “list” will fail if a “data.frame” object is passed

  • mode: checks class expectation by applying the is.X or the is_X family of functions either directly or flexibly depending on the value provided to conditions (e.g. c(“mode”, “character”) and c(“mode”, “is.character”) and c(“mode”, “is_character”) all work equally well) and will default to the version you provide explicitly (e.g. if you wish to prioritize “is_character” over “is.character” simply provide “is_character” as the condition. Only those modes able to be checked by this family of functions are supported. Run function mode_checks() for a complete sorted list for your current configuration.

  • length: length of values matches a pre-determined exact length, typically a single value expectation (e.g. c(“length”,#’ 1))

  • no_na: no NA values are present

  • n>: length of values is greater than a given value - “n<” length of values is lesser than a given value

  • n>=: length of values is greater than or equal to a given value

  • n<=: length of values is lesser than or equal to a given value

  • >: numeric or date value is greater than a given value

  • <: numeric or date value is greater than a given value

  • >=: numeric or date value is greater than or equal to a given value

  • <=: numeric or date value is lesser than or equal to a given value

  • between: numeric or date values are bound within an INCLUSIVE range (e.g. c(“range”, 1:5))

  • choices: provided values are part of a selected list of expectations (e.g. c(“choices”, list(letters[1:3])))

  • FUN: apply a function to the value and check that the result is valid or that the function can be executed without error; this evaluates the check condition using [tryCatch()] via [do.call()] and so can also accept a full named list of arg values. This is a strict check in the sense that a warning will also result in a failed result, passing the warning (or error if the function fails) message back to the user, but does not halt checks

from_fn

CHR scalar of the function from which this is called, used if logger is enabled and ignored if not; by default it will pull the calling function’s name from the call stack, but can be overwritten by a manual entry here for better tracing. (default NULL)

silent

LGL scalar of whether to silence warnings for individual failiures, leaving them only as part of the output. (default: FALSE)

Value

LIST of the resulting values and checks, primarily useful for its \(valid</code> (<code>TRUE</code> if all checks pass or <code>FALSE</code> if any fail) and <code>\)message values.

Note

If logger is enabled, also provides some additional meaningful feedback.

At least one condition check is required for every element passed to args.


verify_import_columns R Documentation

Verify column names for import

Description

This function validates that all required columns are present prior to importing into a database column by examining provided values against the database schema. This is more of a sanity check on other functions than anything, but also strips extraneous columns to meet the needs of an INSERT action. The input to ‘values’ should be either a LIST or named CHR vector of values for insertion or a CHR vector of the column names.

Usage

verify_import_columns(
  values,
  db_table,
  names_only = FALSE,
  require_all = TRUE,
  db_conn = con,
  log_ns = "db"
)

Arguments

values

LIST or CHR vector of values to add. If ‘names_only’ is TRUE, values are directly interpreted as column names. Otherwise, all values provided must be named.

db_table

CHR scalar of the table name

names_only

LGL scalar of whether to treat entries of ‘values’ as the column names rather than the column values (default: FALSE)

require_all

LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table)

db_conn

connection object (default: con)

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Value

An object of the same type as ‘values’ with extraneous values (i.e. those not matching a database column header) stripped away.

Note

If columns are defined as required in the schema and are not present, this will fail with an informative message about which columns were missing.

If columns are provided that do not match the schema, they will be stripped away in the return value.


verify_import_requirements R Documentation

Verify an import file’s properties

Description

Checks an import file’s characteristics against expectations. This is mostly a sanity check against changing conditions from project to project. Import requirements should be defined at the environment level and enumerated as a JSON object, which can be created by calling [make_requirements] on an example import for simplicity. An example is provided in the ‘examples’ directory as “NIST_import_requirements.json”. If multiple requirements are in use (e.g. pulling from multiple locations), this can be run multiple times with different values of ‘requirement_obj’ or ‘file_name’.

Usage

verify_import_requirements(
  obj,
  ignore_extra = TRUE,
  requirements_obj = "import_requirements",
  file_name = "import_requirements",
  log_issues_as = "warn",
  log_ns = "db"
)

Arguments

obj

LIST of the object to import matching structure expectations, typically from a JSON file fed through [full_import]

ignore_extra

LGL scalar of whether to ignore extraneous import elements or stop the import process (default: TRUE)

requirements_obj

CHR scalar of the name of an R object holding import requirements; this is a convenience shorthand to prevent multiple imports from parameter ‘file_name’ (default: “import_requirements”)

file_name

CHR scalar of the name of a file holding import requirements; if this has already been added to the calling environment, ‘requirements_obj’ will be used preferentially as the name of that object

log_issues_as

CHR scalar of the log level to use (default: “warn”), which must be a valid log level as in [logger::FATAL]; will be ignored if the [logger] package isn’t available

log_ns

CHR scalar of the logging namespace to use (default: “db”)

Details

The return from this is a tibble with 9 columns. The first is the name of the import object member, typically the file name. If a single, unnested import object is provided this will be “import object”. The other columns include the following verification checks:

  1. has_all_required: Are all required names present in the sample? (TRUE/FALSE)

  1. missing_requirements: Character vectors naming any of the missing requirements

  1. has_full_detail: Is all expected detail present? (TRUE/FALSE)

  1. missing_detail: Character vectors naming any missing value sets

  1. has_extra: Are there unexpected values provided? (TRUE/FALSE)

  1. extra_cols: Character vectors naming any has_extra columns; these will be dropped from the import but are provided for information sake

  1. has_name_mismatches: Are there name differences between the import requirement elements and the import object? (TRUE/FALSE)

  1. mismatched_names: Named lists enumerating which named elements (if any) from the import object did not match name expectations in the requirements

All of this is defined by the ‘requirements_obj’ list. Do not provide that list directly, instead pass this function the name of the requirements object for interoperability. If a ‘requirements_obj’ cannot be identified via [base::exists] then the ‘file_name’ will take precedence and be imported. Initial use and set up may be easier in interactive sessions.

Value

A tibble object with 9 columns containing the results of the checks.

Note

If ‘file_name’ is provided, it need not be fully defined. The value provided will be used to search the project directory.


with_help R Documentation

Convenience application of add_help using pipes directly in UI.R

Description

This may not work for certain widgets with heavily nested HTML. Note that classes may be CSS dependent.

Usage

actionButton("example", "With Help") 
  with_help("Now with a question mark icon hosting a tooltip")
actionButton("example", "With Help") 
  with_help("Large and green", size = "xl", class = "success")

Arguments

widget

shiny.tag widget

tooltip

CHR scalar of the tooltip text

Other named arguments to be passed to ‘add_help’

Value

The widget provided with a hover tooltip icon appended to it.

Note

Most standard Shiny widgets are supported, but maybe not all.