Function Reference
This appendix contains links to all documented functions included as part of the DIMSpec toolkit. As is common in R packages, not all functions are documented, but most are. Functions referenced in the rest of this user guide are linked directly to their entry on this page. Click any function in this table of contents to open its documentation.
DIMSpec Help Index |
R Documentation |
activate_py_env | Activate a python environment |
active_connection | Is a connection object still available? |
add_help | Attach a superscript icon with a bsTooltip to an HTML element |
add_normalization_value | Add value(s) to a normalization table |
add_or_get_id | Utility function to add a record |
add_rdkit_aliases | Add fragment or compound aliases generated by RDKit functions |
adduct_formula | Add Adduct to Formula |
api_endpoint | Build an API endpoint programmatically |
api_open_doc | Open Swagger API documentation |
api_reload | Reloads the plumber API |
api_start | Start the plumber API |
api_stop | Stop the plumber API |
append_icon_to | Create the JS to append an icon to an HTML element by its ID |
bootstrap_compare_ms | Calculate dot product match score using bootstrap data |
build_db | Build or rebuild the database from scratch |
build_db_action | Build an escaped SQL query |
build_triggers | Build pairs of INSERT/UPDATE triggers to resolve foreign key relationships |
build_views | Build SQL to create views on normalized tables in SQLite |
calculate.monoisotope | Calculate the monoisotopic mass of an elemental formula list |
check_for_value | Check for a value in a database table |
check_fragments | Determine number of matching fragments between unknown mass spectrum and specific peaks |
check_isotopedist | Compare Isotopic Pattern to simulated pattern |
check_mzML_convert | Check mzML file for specific MSConvert parameters |
clause_where | Build a WHERE clause for SQL statements |
close_up_shop | Conveniently close all database connections |
compare_ms | Calculate dot product match score |
complete_form_entry | Ensure complete form entry |
create_fallback_build | Create an SQL file for use without the SQLite CLI |
create_peak_list | Spectral Uncertainty Functions ———————————————————- |
create_peak_table_ms1 | Create peak table for MS1 data |
create_peak_table_ms2 | Create peak table for MS2 data |
create_py_env | Create a python environment for RDKit |
create_search_df | Create data.frame containing parameters for extraction and searching |
create_search_ms | Generate uncertainty mass spectrum for MS1 and MS2 data |
data_dictionary | Create a data dictionary |
dataframe_match | Match multiple values in a database table |
dotprod | Calculate dot product |
dt_color_by | Apply colors to DT objects by value in a column |
dt_formatted | Easily format multiple DT objects in a shiny project in the same manner |
er_map | Create a simple entity relationship map |
export_msp | Export to MSP |
extend_suspect_list | Extend the compounds and aliases tables |
extract.elements | Elemental Formula Functions |
flush_dir | Flush a directory with archive |
fn_guide | View an index of help documentation in your browser |
fn_help | Get function documentation for this project |
format_id | Format a file name as an HTML element ID |
format_list_of_names | Grammatically collapse a list of values |
formulalize | Generate standard chemical formula notation |
full_import | Import one or more files from the NIST Method Reporting Tool for NTA |
gather_qc | Quality Control Check of Import Data |
get_annotated_fragments | Get all annotated fragments have matching masses |
get_component | Resolve components from a list or named vector |
get_compound_fragments | Get all fragments associated with compounds |
get_compoundid | Get compound ID and name for specific peaks |
get_fkpk_relationships | Extract foreign key relationships from a schema |
get_massadj | Calculate the mass adjustment for a specific adduct |
get_msconvert_data | Extract msconvert metadata |
get_msdata | Get all mass spectral data within the database |
get_msdata_compound | Get all mass spectral data for a specific compound |
get_msdata_peakid | Get all mass spectral data for a specific peak id |
get_msdata_precursors | Get all mass spectral data with a specific precursor ion |
get_opt_params | Get optimized uncertainty mass spectra parameters for a peak |
get_peak_fragments | Get annotated fragments for a specific peak |
get_peak_precursor | Get precursor ion m/z for a specific peak |
get_sample_class | Get sample class information for specific peaks |
get_search_object | Generate msdata object from input peak data |
get_suspectlist | Get the current NIST PFAS suspect list. |
get_ums | Generate consensus mass spectrum |
get_uniques | Get unique components of a nested list |
getcharge | Get polarity of a ms scan within mzML object |
getmslevel | Get MS Level of a ms scan within mzML object |
getmzML | Brings raw data file into environment |
getprecursor | Get precursor ion of a ms scan within mzML object |
gettime | Get time of a ms scan within mzML object |
has_missing_elements | Simple check for if an object is empty |
is_elemental_match | Checks if two elemental formulas match |
is_elemental_subset | Check if elemental formula is a subset of another formula |
isotopic_distribution | Isotopic distribution functions |
lockmass_remove | Remove lockmass scan from mzml object |
log_as_dataframe | Pull a log file into an R object |
log_fn | Simple logging convenience |
log_it | Conveniently log a message to the console |
make_acronym | Simple acronym generator |
make_install_code | Convenience function to set a new installation code |
make_requirements | Make import requirements file |
manage_connection | Check for, and optionally remove, a database connection object |
map_import | Map an import file to the database schema |
mode_checks | Get list of available functions |
molecule_picture | Picture a molecule from structural notation |
monoisotope.list | Calculate the monoisotopic mass of a elemental formulas in |
ms_plot_peak | Plot a peak from database mass spectral data |
ms_plot_peak_overview | Create a patchwork plot of peak spectral properties |
ms_plot_spectra | Plot a fragment map from database mass spectral data |
ms_plot_spectral_intensity | Create a spectral intensity plot |
ms_plot_titles | Consistent for ms_plot_x functions |
ms_spectra_separated | Parse “Separated” MS Data |
ms_spectra_zipped | Parse “Zipped” MS Data |
mzMLconvert | Converts a raw file into an mzML |
mzMLtoR | Opens file of type mzML into R environment |
nist_shinyalert | Call [shinyalert::shinyalert] with specific styling |
obj_name_check | Sanity check for environment object names |
open_env | Convenience shortcut to open and edit session environment variables |
open_proj_file | Open and edit project files |
optimal_ums | Get the optimal uncertainty mass spectrum parameters for data |
overlap | Calculate overlap ranges |
pair_ums | Pairwise data.frame of two uncertainty mass spectra |
peak_gather_json | Extract peak data and metadata |
plot_compare_ms | Plot MS Comparison |
plot_ms | Generate consensus mass spectrum |
pool.sd | Pool standard deviations |
pool.ums | Pool uncertainty mass spectra |
pragma_table_def | Get table definition from SQLite |
pragma_table_info | Explore properties of an SQLite table |
py_modules_available | Are all conda modules available in the active environment |
rdkit_active | Sanity check on RDKit binding |
rdkit_mol_aliases | Create aliases for a molecule from RDKit |
read_log | Read a log from a log file |
rebuild_helps | Rebuild the help files as HTML with an index |
rectify_null_from_env | Rectify NULL values provided to functions |
ref_table_from_map | Get the name of a linked normalization table |
remove_db | Remove an existing database |
remove_icon_from | Remove the last icon attached to an HTML element |
remove_sample | Delete a sample |
repair_xl_casrn_forced_to_date | Repair CAS RNs forced to a date numeric by MSXL |
repl_nan | Replace NaN |
report_qc | Export QC result JSONfile into PDF |
reset_logger_settings | Update logger settings |
resolve_compound_aliases | Resolve compound aliases provided as part of the import routine |
resolve_compound_fragments | Link together peaks, fragments, and compounds |
resolve_compounds | Resolve the compounds node during bulk import |
resolve_description_NTAMRT | Resolve the method description tables during import |
resolve_fragments_NTAMRT | Resolve the fragments node during database import |
resolve_method | Add an ms_method record via import |
resolve_mobile_phase_NTAMRT | Resolve the mobile phase node |
resolve_ms_data | Resolve and store mass spectral data during import |
resolve_ms_spectra | Unpack mass spectral data in compressed format |
resolve_multiple_values | Utility function to resolve multiple choices interactively |
resolve_normalization_value | Resolve a normalization value against the database |
resolve_peak_ums_params | Resolve and import optimal uncertain mass spectrum parameters |
resolve_peaks | Resolve the peaks node during import |
resolve_qc_data_NTAMRT | Resolve and import quality control data for import |
resolve_qc_methods_NTAMRT | Resolve and import quality control method information |
resolve_sample | Add a sample via import |
resolve_sample_aliases | Resolve and import sample aliases |
resolve_software_settings_NTAMRT | Import software settings |
resolve_table_name | Check presence of a database table |
save_data_dictionary | Save the current data dictionary to disk |
search_all | Search all mass spectra within database against unknown mass spectrum |
search_precursor | Search the database for all compounds with matching precursor ion m/z values |
setup_rdkit | Conveniently set up an RDKit python environment for use with R |
sigtest | Significance testing function |
smilestoformula | Convert SMILES string to Formula and other information |
sql_to_msp | Export SQL Database to a MSP NIST MS Format |
sqlite_auto_trigger | Create a basic SQL trigger for handling foreign key relationships |
sqlite_auto_view | Create a basic SQL view of a normalized table |
sqlite_parse_build | Parse SQL build statements |
sqlite_parse_import | Parse SQL import statements |
start_api | Start the plumber interface from a clean environment |
start_app | WIP Launch a shiny application |
start_rdkit | Start the RDKit integration |
summarize_check_fragments | Summarize results of check_fragments function |
support_info | R session information for support needs |
suspectlist_at_NIST | Open the NIST PDR entry for the current NIST PFAS suspect list |
table_msdata | Tabulate MS Data |
tack_on | Append additional named elements to a list |
tidy_comments | Tidy up table and field comments |
tidy_ms_spectra | Tidy Spectra |
tidy_spectra | Decompress Spectra |
unzip | Unzip binary data into vector |
update_all | Convenience function to rebuild all database related files |
update_data_sources | Dump current database contents |
update_env_from_file | Update a conda environment from a requirements file |
update_logger_settings | Update logger settings |
user_guide | Launch the User Guide for DIMSpec |
valid_file_format | Ensure files uploaded to a shiny app are of the required file type |
validate_casrns | Validate a CAS RN |
validate_column_names | Ensure database column presence |
validate_tables | Ensure database table presence |
verify_args | Verify arguments for a function |
verify_import_columns | Verify column names for import |
verify_import_requirements | Verify an import file’s properties |
with_help | Convenience application of codeadd_help using pipes directly in codeUI.R |
activate_py_env | R Documentation |
Activate a python environment
Description
Programmatically setting up python bindings is a bit more convoluted than in a standard script. Given the name of a Python environment, it either (1) checks the provided ‘env_name’ against currently installed environments and binds the current session to it if found OR (2) installs a new environment with [create_py_env] and activates it by calling itself.
Usage
activate_py_env( env_name = NULL, required_libraries = NULL, required_modules = NULL, log_ns = NULL, conda_path = NULL )
Arguments
env_name
|
CHR scalar of a python environment name to bind. The default, NULL, will look for an environment variable named ‘PYENV_NAME’ |
required_libraries
|
CHR vector of python libraries to include in the environment, if building a new environment. Ignored if ‘env_name’ is an existing environment. The default, NULL, will look for an environment variable named ‘PYENV_LIBRARIES’. |
required_modules
|
CHR vector of modules to be checked for availability once the environment is activated. The default, NULL, will look for an environment variable named ‘PYENV_MODULES’. |
log_ns
|
CHR scalar of the logging namespace to use, if any. |
Details
It is recommended that project variables in ‘../config/env_py.R’ and ‘../config/env_glob.txt’ be used to control most of the behavior of this function. This works with both virtual and conda environments, though creation of new environments is done in conda.
Value
LGL scalar of whether or not activate was successful
Note
Where parameters are NULL, [rectify_null_from_env] will be used to get a value associated with it if they exist.
active_connection | R Documentation |
Is a connection object still available?
Description
This is a thin wrapper for [DBI::dbIsValid] with some error logging.
Usage
active_connection(db_conn = con)
Arguments
db_conn
|
connection object (default “con”) |
Value
LGL scalar indicating whether the database is available
add_help | R Documentation |
Attach a superscript icon with a bsTooltip to an HTML element
Description
Attach a superscript icon with a bsTooltip to an HTML element
Usage
add_help( id, tooltip, icon_name = "question", size = "xs", icon_class = "info-tooltip primary", ... )
Arguments
id
|
CHR scalar of the HTML ID to which to append the icon |
tooltip
|
CHR scalar of the tooltip text |
icon_name
|
CHR scalar of the icon name, which must be understandable by the ‘shiny::icon’ function; e.g. a font-awesome icon name (default: “question”). |
size
|
CHR scalar of the general icon size as understandable by the font-awesome library (default: “xs”) |
icon_class
|
CHR vector of classes to apply to the ‘<sup>’ container, as defined in the current CSS (default: “info-tooltip primary”) |
…
|
Other named arguments to be passed to ‘shinyBS:bsTooltip’ |
Value
LIST of HTML tags for the desired help icon and its tooltip
Note
The following CSS is typically defined to go with this. .info-tooltip opacity: 30 transition: opacity .25s;
.info-tooltip:hover opacity: 100
.primary color: #3c8dbc;
Examples
add_help("example", "a tooltip")
add_normalization_value | R Documentation |
Add value(s) to a normalization table
Description
One of the most common database operations is to look up or add a value in a normalization table. This utility function adds a single value and returns its associated id by using [build_db_action]. This is only suitable for a single value. If you need to bulk add multiple new values, use this with something like [lapply].
Usage
add_normalization_value("norm_table", name = "new value", acronym = "NV")
Arguments
db_table
|
CHR scalar of the normalization table’s name |
db_conn
|
connection object (default “con”) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
id_column
|
CHR scalar of the column to use as the primary key identifier for ‘db_table’ (default: “id”) |
database_map
|
LIST of the database entity relationship map, typically from calling [er_map]. If NULL (default) the object “db_map” will be searched for and used by default, otherwise it will be created with [er_map] |
…
|
CHR vector of additional named arguments to be added; names not appearing in the referenced table will be ignored |
Value
NULL if unable to add the values, INT scalar of the new ID otherwise
add_or_get_id | R Documentation |
Utility function to add a record
Description
Checks a table in the attached SQL connection for a primary key ID matching the provided ‘values’ and returns the ID. If none exists, adds a record and returns the resulting ID if successful. Values should be provided as a named vector of the values to add. No data coercion is performed, relying almost entirely on the database schema or preprocessing to ensure data integrity.
Usage
add_or_get_id( db_table, values, db_conn = con, ensure_unique = TRUE, require_all = TRUE, ignore = FALSE, log_ns = "db" )
Arguments
db_table
|
CHR scalar name of the database table being modified |
values
|
named vector of the values being added, passed to [build_db_action] |
db_conn
|
connection object (default: con) |
ensure_unique
|
LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE) |
require_all
|
LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table) |
ignore
|
LGL scalar on whether to treat the insert try as an “INSERT OR IGNORE” SQL statement (default: FALSE) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Details
Provided values are checked agaisnt required columns in the table using [verify_import_columns].
Operations to add the record and get the resulting ID are both performed with [build_db_action] and are performed virtually back to back with the latest-added ID being given preference in cases where added values may match multiple extant records.
Value
INT scalar of the record identifier
Note
If this is used in high volume/traffic applications, ID conflicts may occur if the timing is such that another record containing identical values is added before the call getting the ID completes.
add_rdkit_aliases | R Documentation |
Add fragment or compound aliases generated by RDKit functions
Description
Aliases are stored for both compounds and fragments within the database to facilitate search and unambiguous identification. Given one molecular structure notation (SMILES is preferred), other machine-readable expressions can be generated quickly. Requested aliases as provided to ‘rdkit_aliases’ will be prefixed by ‘mol_to_prefix’ and checked against the namespace of available functions in RDKit and the correct functions automatically assigned.
Usage
add_rdkit_aliases( identifiers, alias_category = c("compounds", "fragments"), compound_aliases_table = "compound_aliases", fragment_aliases_table = "fragment_aliases", inchi_prefix = "InChI=1S/", rdkit_name = ifelse(exists("PYENV_NAME"), PYENV_NAME, "rdkit"), rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"), rdkit_ns = "rdk", rdkit_make_if_not = TRUE, rdkit_aliases = c("inchi", "inchikey"), mol_to_prefix = "MolTo", mol_from_prefix = "MolFrom", type = "smiles", as_object = TRUE, db_conn = con, log_ns = "rdk" )
Arguments
identifiers
|
CHR vector of machine readable notations in ‘type’ format |
alias_category
|
CHR scalar, one of “compounds” or “fragments” to determine where in the database to store the resulting aliases (default: “compounds”) |
compound_aliases_table
|
CHR scalar name of the database table holding compound aliases (default: “compound_aliases”) |
fragment_aliases_table
|
CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”) |
inchi_prefix
|
CHR scalar prefix for the InChI code to use, if InChI is requested as part of ‘rdkit_aliases’ |
rdkit_name
|
CHR scalar name of the python environment at which RDKit is installed (default: is the session variable PYENV_NAME or “rdkit”) |
rdkit_ref
|
CHR scalar name of the R pointer object to RDKit (default: is the session variable PYENV_REF or “rdk”) |
rdkit_ns
|
CHR scalar name of the logging namespace to use (default: “rdk”); will be ignored if logging is off |
rdkit_make_if_not
|
LGL scalar of whether to create an RDKit environment if it does not exist (default: TRUE) |
rdkit_aliases
|
CHR vector of machine-readable aliases to generate, which must be recognizeable as names in the RDKit namespace when prefixed by ‘mol_to_prefix’ (default: c(“inchi”, “inchikey”)); these are not case sensitive |
mol_to_prefix
|
CHR scalar of the prefix identifying alias creation functions, which must be recognizeable as names in the RDKit namespace when suffixed by ‘rdkit_aliases’ (default: “MolTo”); this is not case sensitive |
mol_from_prefix
|
CHR scalar of the prefix identifying molecule expression creation functions, which must be recognizeable as names in the RDKit namespace when suffixed by ‘type’ (default: “MolFrom”); this is not case sensitive |
type
|
CHR scalar indicating the type of ‘identifiers’ to be converted to molecule notation (default: “smiles”); this is not case sensitive |
as_object
|
LGL scalar indicating whether to return the alias list to the session as an object (default: TRUE) or write aliases to the database (FALSE) |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
If ‘as_object’ == TRUE, a data.frame of unpacked spectra, otherwise no return and a database insertion will be performed
Note
It is not recommended to change the defaults here unless you are familiar with the naming conventions of RDKit.
Requires both INFORMATICS and USE_RDKIT set to TRUE in the session and a valid installation of the RDKIT python environment to function.
See the RDKit Documentation for more details.
adduct_formula | R Documentation |
Add Adduct to Formula
Description
Add Adduct to Formula
Usage
adduct_formula(elementalformula, adduct = "+H")
Arguments
elementalformula
|
character string elemental formula |
adduct
|
character string adduct state to add to the elemental formula, must contain an element, options are ‘+H’, ‘-H’, ‘+Na’, ‘+K’ |
Value
character string containing elemental formula with adduct
Examples
adduct_formula("C2H5O", adduct = "+H")
api_endpoint | R Documentation |
Build an API endpoint programmatically
Description
This is a convenience function intended to support plumber endpoints. It only assists in the construction (and execution if ‘execute’ == TRUE) of endpoints. Endpoints must still be understood. Validity checking, execution, and opening in a web browser are supported. Invalid endpoints will not be executed or opened for viewing.
Usage
api_endpoint( path, ..., server_addr = PLUMBER_URL, check_valid = TRUE, execute = TRUE, open_in_browser = FALSE, raw_result = FALSE, max_pings = 20L, return_type = c("text", "raw", "parsed"), return_format = c("vector", "data.frame", "list") )
Arguments
path
|
CHR scalar of the endpoint path. |
…
|
Additional named parameters added to the endpoint, most typically the query portion. If only one is provided, it can remain unnamed and a query is assumed. If more than one is provided, all must be named. Named elements must be components of the return from [httr::parse_url] (see https://tools.ietf.org/html/rfc3986) for details of the parsing algorithm; unrecognized elements will be ignored. |
server_addr
|
CHR scalar uniform resource locator (URL) address of an API server (e.g. “https://myapi.com:8080”) (defaults to the current environment variable “PLUMBER_URL”) |
check_valid
|
LGL scalar on whether or not to first check that an endpoint returns a valid status code (200-299) (default: TRUE). |
execute
|
LGL scalar of whether or not to execute the constructed endpoint and return the result; will be defaulted to FALSE if ‘check_valid’ == TRUE and the endpoint returns anything other than a valid status code. (default: TRUE) |
open_in_browser
|
LGL scalar of whether or not to open the resulting endpoint in the system’s default browser; will be defaulted to FALSE if ‘check_valid’ == TRUE and the endpoint returns anything other than a valid status code. (default: FALSE) |
max_pings
|
INT scalar maximum number of pings to try before timeout if using endpoint “_ping”; this is only used for endpoint “_ping” (default: 20) |
return_type
|
CHR scalar on which return type to use, which must be one of “text”, “raw”, or “parsed” which will be used to read the content of the response item (default: “text”) |
return_format
|
CHR scalar on which form to return data, which must be one of “vector”, “data.frame”, or “list” (default: “vector” to support primarily single value responses) |
Value
CHR scalar of the constructed endpoint, with messages regarding status checks, return from the endpoint (typically JSON) if valid and ‘execute’ == TRUE, or NONE if ‘open_in_browser’ == TRUE
Note
Special support is provided for the way in which the NIST Public Data Repository treats URL fragments
This only support [httr::GET] requests.
Examples
api_endpoint("https://www.google.com/search", list(q = "something"), open_in_browser = TRUE) api_endpoint("https://www.google.com/search", query = list(q = "NIST Public Data Repository"), open_in_browser = TRUE)
api_open_doc | R Documentation |
Open Swagger API documentation
Description
This will launch the Swagger UI in a browser tab. The URL suffix “docs” will be automatically added if not part of the host URL accepted as ‘url’.
Usage
api_open_doc(url = PLUMBER_URL)
Arguments
url
|
CHR URL/URI of the plumber documentation host (default: environment variable “PLUMBER_URL”) |
Value
None, opens a browser to the requested URL
api_reload | R Documentation |
Reloads the plumber API
Description
Depending on system architecture, the plumber service may take some time to spin up and spin down. If ‘background’ is TRUE, this may mean the calling R thread runs ahead of the background process resulting in unexpected behavior (e.g. newly defined endpoints not being available), effectively binding it to the prior iteration. If the API does not appear to be reloading properly, it may be necessary to manually kill the process controlling it through your OS and to call this function again.
Usage
api_reload( pr = NULL, background = TRUE, plumber_file = NULL, on_host = NULL, on_port = NULL, log_ns = "api" )
Arguments
pr
|
CHR scalar name of the plumber service object, typically only created as a background observer from [callr::r_bg] as a result of calling [api_reload] (default: NULL gets the environment setting for PLUMBER_OBJ_NAME) |
background
|
LGL scalar of whether to load the plumber server as a background service (default: TRUE); set to FALSE for testing |
plumber_file
|
CHR scalar of the path to a plumber API to launch (default: NULL) |
on_host
|
CHR scalar of the host IP address (default: NULL) |
on_port
|
CHR or INT scalar of the host port to use (default: NULL) |
log_ns
|
CHR scalar namespace to use for logging (default: “api”) |
Value
Launches the plumber API service on your local machine and returns the URL on which it can be accessed as a CHR scalar
api_start | R Documentation |
Start the plumber API
Description
This is a wrapper to [plumber::pr_run] pointing to a project’s opinionated plumber settings with some error trapping. The host, port, and plumber file are set in the “config/env_R.R” location as PLUMBER_HOST, PLUMBER_PORT, and PLUMBER_FILE respectively.
Usage
api_start(plumber_file = NULL, on_host = NULL, on_port = NULL)
Arguments
plumber_file
|
CHR scalar of the path to a plumber API to launch (default: NULL) |
on_host
|
CHR scalar of the host IP address (default: NULL) |
on_port
|
CHR or INT scalar of the host port to use (default: NULL) |
Value
LGL scalar with success status
Note
If either of ‘on_host’ or ‘on_port’ are NULL they will default first to any existing environment values of PLUMBER_HOST and PLUMBER_PORT, then to getOption(“plumber.host”, “127.0.0.1”) and getOption(“plumber.port”, 8080)
This will fail if the requested port is in use.
api_stop | R Documentation |
Stop the plumber API
Description
Stop the plumber API
Usage
api_stop(pr = NULL, flush = TRUE, db_conn = "con", remove_service_obj = TRUE)
Arguments
pr
|
CHR scalar name of the plumber service object, typically only created as a background observer from [callr::r_bg] as a result of calling [api_reload] (default: NULL gets the environment setting for PLUMBER_OBJ_NAME) |
flush
|
LGL scalar of whether to disconnect and reconnect to a database connection named as ‘db_conn’ (default: TRUE) |
db_conn
|
CHR scalar of the connection object name (default: “con”) |
remove_service_obj
|
LGL scalar of whether to remove the reference to ‘pr’ from the current global environment (default: TRUE) |
Value
None, stops the plumber server
Note
This will also kill and restart the connection object if ‘flush’ is TRUE to release connections with certain configurations such as SQLite in write ahead log mode.
This function assumes the object referenced by name ‘pr’ exists in the global environment, and ‘remove_service_object’ will only remove it from .GlobalEnv.
append_icon_to | R Documentation |
Create the JS to append an icon to an HTML element by its ID
Description
Create the JS to append an icon to an HTML element by its ID
Usage
append_icon_to(id, icon_name, icon_class = NULL)
Arguments
id
|
CHR scalar of the HTML ID to which to append an icon |
icon_name
|
CHR scalar of the icon name, which must be understandable by the ‘shiny::icon’ function; e.g. a font-awesome icon name. |
icon_class
|
CHR vector of classes to apply |
Value
CHR scalar suitable to execute with ‘shinyjs::runJS’
Examples
append_icon_to("example", "r-project", "fa-3x")
bootstrap_compare_ms | R Documentation |
Calculate dot product match score using bootstrap data
Description
Calculates a the match score (based on dot product) of the two uncertainty mass spectra. To generate a distribution of match scores using the uncertainty of the two mass spectra, bootstrapped data (using ‘rnorm’ for now)
Usage
bootstrap_compare_ms( ms1, ms2, error = c(5, 5), minerror = c(0.002, 0.002), m = 1, n = 0.5, runs = 10000 )
Arguments
ms1, ms2
|
the uncertainty mass spectra from function ‘get_ums’ |
error
|
a vector of the respective mass error (in ppm) for each mass spectrum or a single vector representing the mass error for all m/z values |
minerror
|
a two component vector of the respective minimum mass error (in Da) for each mass spectrum or a single value representing the minimum mass error of all m/z values |
m, n
|
weighting values for mass (m) and intensity (n) |
runs
|
build_db | R Documentation |
Build or rebuild the database from scratch
Description
This function will build or rebuild the NIST HRAMS database structure from scratch, removing the existing instance. By default, most parameters are set in the environment (at “./config/env_glob.txt”) but any values can be passed directly. This can be used to quickly spin up multiple copies with a clean slate using different build files, data files, or return to the last stable release.
Usage
build_db(db = "test_db.sqlite", db_conn_name = "test_conn")
Arguments
db
|
CHR scalar of the database name (default: session value DB_NAME) |
build_from
|
CHR scalar of a SQL build script to use (default: environment value DB_BUILD_FILE) |
populate
|
LGL scalar of whether to populate with data from the file in ‘populate_with’ (default: TRUE) |
populate_with
|
CHR scalar for the populate script (e.g. “populate_demo.sql”) to during after the build is complete; (default: session value DB_DATA); ignored if ‘populate = FALSE’ |
archive
|
LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE), passed to [‘remove_db()’] |
sqlite_cli
|
CHR scalar to use to look for installed sqlite3 CLI tools in the current system environment (default: session value SQLITE_CLI) |
connect
|
LGL scalar of whether or not to connect to the rebuilt database in the global environment as object ’con“ (default: FALSE) |
Details
If sqlite3 and its command line interface are available on your platform, that will be used (preferred method) but, if not, this function will read in all the necessary files to directly create it using shell commands. The shell method may not be universally applicable to certain compute environments or may require elevated permissions.
Value
None, check console for details
build_db_action | R Documentation |
Build an escaped SQL query
Description
In most cases, issuing basic SQL queries is made easy by tidyverse compliant functions such as [dplyr::tbl]. Full interaction with an SQLite database is a bit more complicated and typically requires [DBI::dbExecute] and writing SQL directly; several helpers exist for that (e.g. [glue::glue_sql]) but aren’t as friendly or straight forward when writing more complicated actions, and still require directly writing SQL equivalents, routing through [DBI::dbQuoteIdentifier] and [DBI::dbQuoteLiteral] to prevent SQL injection attacks.
Usage
build_db_action("insert", "table", values = list(col1 = "a", col2 = 2, col3 = "describe"), execute = FALSE) build_db_action("insert", "table", values = list(col1 = "a", col2 = 2, col3 = "describe")) build_db_action("get_id", "table", match_criteria = list(id = 2)) build_db_action("delete", "table", match_criteria = list(id = 2)) build_db_action("select", "table", columns = c("col1", "col2", "col3"), match_criteria = list(id = 2)) build_db_action("select", "table", match_criteria = list(sample_name = "sample 123")) build_db_action("select", "table", match_criteria = list(sample_name = list(value = "sample 123", exclude = TRUE)) build_db_action("select", "table", match_criteria = list(sample_name = "sample 123", sample_contributor = "Smith"), and_or = "AND", limit = 5)
Arguments
action
|
CHR scalar, of one “INSERT”, “UPDATE”, “SELECT”, “GET_ID”, or “DELETE” |
table_name
|
CHR scalar of the table name to which this query applies |
column_names
|
CHR vector of column names to include (default NULL) |
values
|
LIST of CHR vectors with values to INSERT or UPDATE (default NULL) |
match_criteria
|
LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’ |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
and_or
|
LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”) |
limit
|
INT scalar of the maximum number of rows to return (default NULL) |
group_by
|
CHR vector of columns by which to group (default NULL) |
order_by
|
named CHR vector of columns by which to order, with names matching columns and values indicating whether to sort ascending (default NULL) |
distinct
|
LGL scalar of whether or not to apply the DISTINCT clause to all match criteria (default FALSE) |
get_all_columns
|
LGL scalar of whether to return all columns; will be set to TRUE automatically if no column names are provided (default FALSE) |
execute
|
LGL scalar of whether or not to immediately execute the build query statement (default TRUE) |
single_column_as_vector
|
LGL scalar of whether to return results as a vector if they consist of only a single column (default TRUE) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Details
This function is intended to ease that by taking care of most of the associated logic and enabling routing through other functions, or picking up arguments from within other function calls.
Value
CHR scalar of the constructed query
build_triggers | R Documentation |
Build pairs of INSERT/UPDATE triggers to resolve foreign key relationships
Description
When building schema by script, it is often handy to enforce certain behaviors on database transactions involving foreign keys, especially in SQLite. Given a properly structured list object describing the mappings between tables in a schema (e.g. one deriving from [er_map]), this function will parse those for foreign key relationships.
Usage
build_triggers(er_map(db_conn = con))
Arguments
db_map
|
LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’ |
references_in
|
CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map]) |
create_insert_trigger
|
LGL scalar indicating whether to build an insert trigger for each table (default: TRUE). |
create_update_trigger
|
LGL scalar indicating whether to build an update trigger for each table (default: FALSE). |
save_to_file
|
CHR scalar of a file in which to write the output, if any (default: NULL will return the resulting object to the R session) |
Details
Primarily, this requires a list object referring to tables that contains in each element a child element with the name provided in ‘references_in’. The pre-pass parsing function [get_fkpk_relationships] is used to pull references from the full map is used.
Value
LIST object containing one element for each table in ‘db_map’ containing foreign key references, with one child
Note
Tables in ‘db_map’ that do not contain foreign key relationships will be dropped from the output list.
This is largely a convenience function to programmatically apply [make_sql_triggers] to an entire schema. To skip tables with defined foreign key relationships for which triggers are undesirable, remove those tables from ‘db_map’ prior to calling this function.
build_views | R Documentation |
Build SQL to create views on normalized tables in SQLite
Description
Build SQL to create views on normalized tables in SQLite
Usage
build_views(db_map = er_map(con), dictionary = data_dictionary(con))
Arguments
db_map
|
LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’ |
references_in
|
CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map]) |
dictionary
|
LIST object containing the schema dictionary produced by [data_dictionary] fully describing table entities |
drop_if_exists
|
LGL scalar indicating whether to include a “DROP VIEW” prefix for the generated view statement; as this has an impact on schema, no default is set |
save_to_file
|
CHR scalar name of a file path to save generated SQL (default: NULL will return a list object to the R session) |
append
|
LGL scalar on whether to appead to ‘save_to_file’ (default: FALSE) |
Value
LIST if ‘save_to_file = FALSE’ or none
calculate.monoisotope | R Documentation |
Calculate the monoisotopic mass of an elemental formula list
Description
Calculate the monoisotopic mass of an elemental formula list
Usage
calculate.monoisotope( elementlist, exactmasses = NULL, adduct = "neutral", db_conn = "con" )
Arguments
elementlist
|
list of elemental formula from ‘extract.elements’ function |
exactmasses
|
list of exact masses of elements |
adduct
|
character string adduct/charge state to add to the elemental formula, options are ‘neutral’, ‘+H’, ‘-H’, ‘+Na’, ‘+K’, ‘+’, ‘-’, ‘-radical’, ‘+radical’ |
db_conn
|
database connection object, either a CHR scalar name (default: “con”) or the connection object itself (preferred) |
Value
numeric monoisotopic exact mass
Examples
elementlist <- extract.elements("C2H5O") calculate.monoisotope(elementalist, adduct = "neutral")
check_for_value | R Documentation |
Check for a value in a database table
Description
This convenience function simply checks whether a value exists in the distinct values of a given column. Only one column may be searched at a time; serialize it in other code to check multiple columns. It leverages the flexibility of [build_db_action] to do the searching. The ‘values’ parameter will be fed directly and can accept the nested list structure defined in [clause_where] for exclusions and like clauses.
Usage
con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") alphabet <- dplyr::tibble(lower = letters, upper = LETTERS) dplyr::copy_to(con2, alphabet) check_for_value("A", "alphabet", "upper", db_conn = con2) check_for_value("A", "alphabet", "lower", db_conn = con2) check_for_value(letters[1:10], "alphabet", "lower", db_conn = con2)
Arguments
values
|
CHR vector of the values to search |
db_table
|
CHR scalar of the database table to search |
db_column
|
CHR scalar of the column to search |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
db_conn
|
connection object (default: con) |
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE). |
Value
LIST of length 1-2 containing “exists” as a LGL scalar for whether the values were found, and “values” containing the result of the database call, a data.frame object containing matching rows or NULL if exists == FALSE.
check_fragments | R Documentation |
Determine number of matching fragments between unknown mass spectrum and specific peaks
Description
Determine number of matching fragments between unknown mass spectrum and specific peaks
Usage
check_fragments(con, ums, peakid, masserror = 5, minerror = 0.001)
Arguments
con
|
SQLite database connection |
ums
|
uncertainty mass spectrum of unknown compound |
peakid
|
integer vector of primary keys for peaks table |
masserror
|
numeric relative mass error (ppm) |
minerror
|
numeric minimum mass error (Da) |
Value
table of fragments and TRUE/FALSE for if the fragment is within the unknown mass spectrum
check_isotopedist | R Documentation |
Compare Isotopic Pattern to simulated pattern
Description
calculates the isotopic distribution of the stated elemental formula and compares against the empirical ms
Usage
check_isotopedist( ms, elementalformula, exactmasschart, error, minerror = 0.002, remove.elements = c(), max.dist = 3, min.int = 0.001, charge = "neutral", m = 1, n = 0.5 )
Arguments
ms
|
data.frame mass spectrum containing pair-wise m/z and intensity values of empirical isotopic pattern |
elementalformula
|
character string of elemental formula to simulate isotopic pattern |
exactmasschart
|
exact mass chart |
error
|
numeric relative mass error (in ppm) of mass spectrometer |
minerror
|
numeric minimum mass error (in Da) of mass spectrometer |
remove.elements
|
character vector of elements to remove from elemental formula |
max.dist
|
numeric maximum mass distance (in Da) from exact mass to include in simulated isotopic pattern |
min.int
|
numeric minimum relative intensity (maximum = 1, minimum = 0) to include in simulated isotopic pattern |
charge
|
character string for the charge state of the simulated isotopic pattern, options are ‘neutral’, ‘positive’, and ‘negative’ |
m
|
numeric dot product mass weighting |
n
|
numeric dot product intensity weighting |
Value
numeric vector of match scores between the empirical and calculated isotopic distribution.
check_mzML_convert | R Documentation |
Check mzML file for specific MSConvert parameters
Description
Check mzML file for specific MSConvert parameters
Usage
check_mzML_convert(mzml)
Arguments
mzml
|
list of msdata from ‘mzMLtoR’ function |
Value
data.frame object of conversion veracity checks
clause_where | R Documentation |
Build a WHERE clause for SQL statements
Description
Properly escaping SQL to prevent injection attacks can be difficult with more complicated queries. This clause constructor is intended to be specific to the WHERE clause of SELECT to UPDATE statements. The majority of construction is achieved with the ‘match_criteria’ parameter, which should always be a list with names for the columns to appear in the WHERE clause. A variety of convenience is built in, from simple comparisons to more complicated ones including negation and similarity (see the description for argument ‘match_criteria’).
Usage
clause_where(ANSI(), "example", list(foo = "bar", cat = "dog")) clause_where(ANSI(), "example", list(foo = list(values = "bar", like = TRUE))) clause_where(ANSI(), "example", list(foo = list(values = "bar", exclude = TRUE)))
Arguments
table_names
|
CHR vector of tables to search |
match_criteria
|
LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’ |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
and_or
|
LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”) |
Value
CHR scalar of the constructed where clause for an SQL statement
close_up_shop | R Documentation |
Conveniently close all database connections
Description
This closes both the plumber service and all database connections from the current running environment. If outstanding promises exist to database tables or views were created as class ‘tbl_’ (e.g. with ‘tbl(con, “table”)’), set ‘back_up_connected_tbls’ to TRUE to collect data from those and preserve in-place in the current global environment.
Usage
manage_connection() close_up_shop(TRUE)
Arguments
back_up_connected_tbls
|
LGL scalar of whether to clone currently promised tibble connections to database objects as data frames (default: FALSE). |
Value
None, modifies the current global environment in place
compare_ms | R Documentation |
Calculate dot product match score
Description
Calculates a the match score (based on dot product) of the two uncertainty mass spectra. Note: this is a static match score and does not include associated uncertainties.
Usage
compare_ms( ms1, ms2, error = c(5, 5), minerror = c(0.002, 0.002), m = 1, n = 0.5 )
Arguments
ms1, ms2
|
the uncertainty mass spectra from function ‘get_ums’ |
error
|
a vector of the respective mass error (in ppm) for each mass spectrum or a single vector representing the mass error for all m/z values |
minerror
|
a two component vector of the respective minimum mass error (in Da) for each mass spectrum or a single value representing the minimum mass error of all m/z values |
m, n
|
weighting values for mass (m) and intensity (n) |
complete_form_entry | R Documentation |
Ensure complete form entry
Description
This input validation check ensures the current session’s input object includes non-NA, non-NULL, and non-blank values similarly to [shiny::req] and [shiny::validate] but can be called with a predefined list of input names to check. Typically this is used for validate form entry completion. Call this function prior to reading form entries to ensure that all values requested by name in in ‘values’ are present. If they are not, a [nist_shinyalert] modal is displayed prompting the user to complete the form.
Usage
req(complete_form_entry(input, c("need1", "need2")))
Arguments
input
|
The session input object |
values
|
CHR vector of input object names to require |
show_alert
|
LGL scalar indicating whether or not to show an alert, set FALSE to return the status of the check |
Value
Whether or not all required values are present.
create_fallback_build | R Documentation |
Create an SQL file for use without the SQLite CLI
Description
For cases where the SQLite Command Line Interface is not available, dot commands used to simplify the database build pipeline are not usable. Call this function to create a self-contained SQL build file that can be used in [build_db] to build the database. The self-contained file will include all “CREATE” and “INSERT” statements necessary by parsing lines including “.read” and “.import” commands and directly reading referenced files.
Usage
create_fallback_build(build_file = file.path("config", "build.sql"))
Arguments
build_file
|
CHR scalar name SQL build file to use. The default, NULL, will use the environment variable “DB_BUILD_FILE” if it is available. |
populate
|
LGL scalar of whether to populate data (default: TRUE) |
populate_with
|
CHR scalar name SQL population file to use. The default, NULL, will use the environment variable “DB_DATA” if it is available. |
driver
|
CHR scalar of the database driver class to use to correctly interpolate SQL commands (default: “SQLite”) |
comments
|
CHR scalar regex identifying SQLite comments |
out_file
|
CHR scalar of the output file name and destination. The default, NULL, will write to a file named similarly to ‘build_file’ suffixed with “_full”. |
Value
None: a file will be written at ‘out_file’ with the output.
create_peak_list | R Documentation |
Spectral Uncertainty Functions ———————————————————- Create peak list from SQL ms_data table
Description
The function extracts the relevant information and sorts it into nested lists for use in the uncertainty functions
Usage
create_peak_list(ms_data)
Arguments
ms_data
|
extraction of the ms_data from the SQL table for a specified peak |
Value
nested list of all data
create_peak_table_ms1 | R Documentation |
Create peak table for MS1 data
Description
Takes a nested peak list and creates a peak table for easier determination of uncertainty of the measurement for MS1 data.
Usage
create_peak_table_ms1(peak, mass, masserror = 5, minerror = 0.002, int0 = NA)
Arguments
mass
|
the exact mass of the compound of interest |
masserror
|
the mass accuracy (in ppm) of the instrument data |
minerror
|
the minimum mass error (in Da) of the instrument data |
int0
|
the default setting for intensity values for missing m/z values |
peaklist
|
result of the ‘create_peak_list’ function |
Value
nested list of dataframes containing all MS2 data for the peak
create_peak_table_ms2 | R Documentation |
Create peak table for MS2 data
Description
Takes a nested peak list and creates a peak table for easier determination of uncertainty of the measurement for MS2 data.
Usage
create_peak_table_ms2(peak, mass, masserror = 5, minerror = 0.002, int0 = NA)
Arguments
mass
|
the exact mass of the compound of interest |
masserror
|
the mass accuracy (in ppm) of the instrument data |
minerror
|
the minimum mass error (in Da) of the instrument data |
int0
|
the default setting for intensity values for missing m/z values |
peaklist
|
result of the ‘create_peak_list’ function |
Value
nested list of dataframes containing all MS2 data for the peak
create_py_env | R Documentation |
Create a python environment for RDKit
Description
This project offers a full integration of RDKit via [reticulate]. This function does the heavy lifting for setting up that environment, either from an environment specifications file or from the conda forge channel.
Usage
create_py_env("nist_hrms_db", c("reticulate", "rdkit"))
Arguments
env_name
|
CHR scalar of a python environment |
Details
Preferred set up is to set variables in the ‘env_py.R’ file, which will be used over the internal defaults chosen here. The exception is if ‘INSTALL_FROM == “local”’ and no value is provided for ‘INSTALL_FROM_FILE’ which has no internal default.
Germane variables are ‘PYENV_NAME’ (default “reticulated_rdkit”), ‘CONDA_PATH’ (default “auto”), ‘CONDA_MODULES’ (default “rdkit”, “r-reticulate” will be added), ‘INSTALL_FROM’ (default “conda”), ‘INSTALL_FROM_FILE’ (default “rdkit/environment.yml”), ‘MIN_PY_VER’ (default 3.9).
Value
None
create_search_df | R Documentation |
Create data.frame containing parameters for extraction and searching
Description
Use this to create an intermediate data frame object used as part of the search routine.
Usage
create_search_df( filename, precursormz, rt, rt_start, rt_end, masserror, minerror, ms2exp, isowidth )
Arguments
filename
|
CHR scalar path to the mzml file |
precursormz
|
NUM scalar for the mass-to-charge ratio to examine |
rt
|
NUM scalar for the retention time centroid to examine |
rt_start
|
NUM scalar for the retention time start point of the feature |
rt_end
|
NUM scalar for the retention time end point of the feature |
masserror
|
NUM scalar of the instrument mass error value in parts per million |
minerror
|
NUM scalar of the minimum mass error value to use in absolute terms |
ms2exp
|
NUM scalar type of the fragmentation experiment (e.g. MS1 or MS2) |
isowidth
|
NUM scalar mass isolation width to use |
Value
data.frame object collating provided values
create_search_ms | R Documentation |
Generate uncertainty mass spectrum for MS1 and MS2 data
Description
Generate uncertainty mass spectrum for MS1 and MS2 data
Usage
create_search_ms( searchobj, correl = NULL, ph = NULL, freq = NULL, normfn = "sum", cormethod = "pearson" )
Arguments
searchobj
|
list object generated from ‘get_search-object’ |
correl
|
correlation limit for ions to MS1 |
ph
|
peak height to select scans for generating mass spectrum |
freq
|
observational frequency minimum for ions to use for generating mass spectrum |
normfn
|
normalization function, options are “sum” or “mean” |
cormethod
|
correlation function, default is “pearson” |
Value
list object containing the ms1 uncertainty mass spectrum ‘ums1’, ms2 uncertainty mass spectrum ‘ums2’ and respective uncertainty mass spectrum parameters ‘ms1params’ and ‘ms2params’
data_dictionary | R Documentation |
Create a data dictionary
Description
Get a list of tables and their defined columns with properties, including comments, suitable as a data dictionary from a connection object amenable to [odbc::dbListTables]. This function relies on [pragma_table_info].
Usage
data_dictionary(db_conn = con)
Arguments
db_conn
|
connection object (default:con) |
Value
LIST of length equal to the number of tables in ‘con’ with attributes identifying which tables, if any, failed to render into the dictionary.
dataframe_match | R Documentation |
Match multiple values in a database table
Description
Complex queries are sometimes necessary to match against multiple varied conditions across multiple items in a list or data frame. Call this function to apply vectorization to all items in ‘match_criteria’ and create a fully qualified SQL expression using [clause_where] and execute that query against the database connection in ‘db_conn’. Speed is not optimized during the call to clause where as each clause is built independently and joined together with “OR” statements.
Usage
dataframe_match( match_criteria, table_names, and_or = "AND", db_conn = con, log_ns = "db" )
Arguments
match_criteria
|
LIST of matching criteria with names matching columns against which to apply. In the simplest case, a direct value is given to the name (e.g. ‘list(last_name = “Smith”)’) for single matches. All match criteria must be their own list item. Values can also be provided as a nested list for more complicated WHERE clauses with names ‘values’, ‘exclude’, and ‘like’ that will be recognized. ‘values’ should be the actual search criteria, and if a vector of length greater than one is specified, the WHERE clause becomes an IN clause. ‘exclude’ (LGL scalar) determines whether to apply the NOT operator. ‘like’ (LGL scalar) determines whether this is an equality, list, or similarity. To reverse the example above by issuing a NOT statement, use ‘list(last_name = list(values = “Smith”, exclude = TRUE))’, or to look for all records LIKE (or NOT LIKE) “Smith”, set this as ‘list(last_name = list(values = “Smith”, exclude = FALSE, like = TRUE))’ |
table_names
|
CHR vector of tables to search |
and_or
|
LGL scalar of whether to use “AND” or “OR” for multiple criteria, which will be used to combine them all. More complicated WHERE clauses (including a mixture of AND and OR usage) should be built directly. (default: “OR”) |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Details
This is intended for use with a data frame object
Value
data.frame of the matching database rows
dotprod | R Documentation |
Calculate dot product
Description
Internal function: calculates the dot product between paired m/z and intensity values
Usage
dotprod(m1, i1, m2, i2, m = 1, n = 0.5)
Arguments
m1, m2
|
paired vectors containing measured m/z values |
i1, i2
|
paired vectors containing measured intensity values |
m, n
|
weighting values for mass (m) and intensity (n) |
dt_color_by | R Documentation |
Apply colors to DT objects by value in a column
Description
Adds a class to each node meeting the criteria defined elsewhere as project object ‘table_bg_classes’ as a list of colors with names matches values.
Usage
dt_color_by(names(DT_table_data), "color_by")
Arguments
table_names
|
CHR vector of the names going into a table |
look_for
|
CHR vector of the column name to color by |
Value
JS function to apply to a DT object by row
dt_formatted | R Documentation |
Easily format multiple DT objects in a shiny project in the same manner
Description
This serves solely to reduce the amount of options fed into ‘DT::datatable’ by providing common defaults and transparent options. Parameters largely do exactly what they say and will create a list ‘column_defs’ suitable for use as ‘datatable(… options = list(columnDefs = column_defs)’. Leave NULL to ignore any aspect.
Usage
dt_formatted( dataframe, show_rownames = FALSE, hide_cols = NULL, center_cols = NULL, narrow_cols = NULL, narrow_col_width = "5%", medium_cols = NULL, medium_col_width = "10%", large_cols = NULL, large_col_width = "15%", truncate_cols = NULL, truncate_width = 20, date_cols = NULL, date_col_width = "10%", selection_mode = "single", callback = NULL, color_by_column = NULL, names_to = "title", filter_at = "top", chr_to_factor = TRUE, page_length = 10, page_length_menu = c(10, 25, 50), ... )
Arguments
dataframe
|
data.frame to be converted to a DT::datatable object |
hide_cols
|
CHR vector of column names to hide |
center_cols
|
CHR vector of column names to center |
narrow_cols
|
CHR vector of column names to make ‘narrow_col_width’ wide |
narrow_col_width
|
CHR scalar defining column width (default: “5%”) |
medium_cols
|
CHR vector of column names to make ‘medium_col_width’ wide |
medium_col_width
|
CHR scalar defining column width (default: “10%”) |
large_cols
|
CHR vector of column names to make ‘large_col_width’ wide |
large_col_width
|
CHR scalar defining column width (default: “15%”) |
truncate_cols
|
CHR vector of column names to truncate |
truncate_width
|
INT scalar of the position at which to truncate |
date_cols
|
CHR vector of column names identifying dates |
date_col_width
|
CHR scalar defining column width (default: “10%”) |
selection_mode
|
CHR scalar of the DT selection mode (default: “single”) |
callback
|
JS custom callback to apply to the datatable widget |
color_by_column
|
CHR scalar of the column name by which to color rows |
names_to
|
CHR scalar of the name formatting modification to apply, as one of the options available in the ‘stringr’ package (default: “title” to apply ‘stringr::str_to_title’) |
filter_at
|
CHR scalar of the position for the column filter as understood by ‘DT::datatable(…, filter = filter_at)’. (default: “top”) |
chr_to_factor
|
BOOL scalar for whether or not to automatically convert character columns to factor columns (default: TRUE) |
…
|
other named arguments to be passed to ‘DT::datatable’ |
Value
DT::datatable object formatted as requested
Note
Truncation applies a JS function to retain the underlying information as a hover tooltip and truncates using ellipses.
Column name formatting relies on being able to parse ‘names_to’ as a valid function of the form ’sprintf(“str_to_ recognized options include”lower”, “upper”, “title”, and “sentence”.
To apply a custom format, define these parameters as a list (e.g. “dt_format_options”) and pass it, along with your dataframe, as do.call(“dt_formatted”, c(dataframe = df, dt_format_options))
er_map | R Documentation |
Create a simple entity relationship map
Description
This will poll the database connection and create an entity relationship map as a list directly from defined SQL statements used to build the table or view. For each table object it returns a list of length three containing the entity names that the table (1) ‘references’ (i.e. has a foreign key to), (2) is ‘referenced_by’ (i.e. is a foreign key for), and (3) views where it is ‘used_in_view’. These are names. This is intended for use as a mapping shortcut when ER Diagrams are unavailable, or for quick reference within a project, similarly to a dictionary relationship reference.
Usage
er_map(db_conn = con)
Arguments
db_conn
|
connection object, specifically of class “SQLiteConnection” but not strictly enforced |
Details
SQL is generated from [pragma_table_def()] with argument ‘get_sql’ = TRUE and ignores entities whose names start with “sqlite”.
Value
nested LIST object describing the database entity connections
export_msp | R Documentation |
Export to MSP
Description
The function exports an uncertainty mass spectrum into a NIST MS Search .msp file
Usage
export_msp( ms, file, precursor = "", name = "Exported Mass Spectrum", headerdata = c(), append = FALSE )
Arguments
ms
|
uncertainty mass spectrum from ‘get_ums’ function |
file
|
export .msp file to save the msp files |
precursor
|
If available, the numeric precursor m/z for the designated mass spectrum |
name
|
Text name to assign to the mass spectrum (not used in spectral searching) |
headerdata
|
character string containing named values for additional data to put in the header |
append
|
boolean (TRUE/FALSE) to append to .msp file (TRUE) or overwrite (FALSE) |
extend_suspect_list | R Documentation |
Extend the compounds and aliases tables
Description
Suspect lists are occasionally updated. To keep the current database up to date, run this function by pointing it to the updated or current suspect list. That suspect list should be one of (1) a file in either comma-separated-value (CSV) or a Microsoft Excel format (XLS or XLSX), (2) a data frame containing the new compounds in the standard format of the suspect list, or (3) a URL pointing to the suspect list.
Usage
extend_suspect_list(suspect_list, db_conn = con, retain_current = TRUE)
Arguments
suspect_list
|
CHR scalar pointing either to a file (CSV, XLS, or XLSX) or URL pointing to an XLSX file. |
db_conn
|
connection object (default: con) |
retain_current
|
LGL scalar of whether to retain the current list by attempting to match new entries to older ones, or to append all entries (default: TRUE) |
Details
If ‘suspect_list’ does not contain one of the expected file extensions, it will be assumed to be a URL pointing to a Microsoft Excel file with the suspect list in the first spreadsheet. The file for that URL will be downloaded temporarily, read in as a data frame, and then removed.
Required columns for the compounds table are first pulled and all other columns are treated as aliases. If ‘retain_current’ is TRUE, entries in the “name” column will be matched against current aliases and the compound id will be persisted for that compound.
Value
None
extract.elements | R Documentation |
Elemental Formula Functions Extract elements from formula
Description
Converts elemental formula into list of ‘elements’ and ‘counts’ corresponding to the composition
Usage
extract.elements(composition.str, remove.elements = c())
Arguments
composition.str
|
character string elemental formula |
remove.elements
|
character vector containing elements to remove from |
Value
list with ‘elements’ and ‘counts’
Examples
extract.elements("C2H5O") extract.elements("C2H5ONa", remove.elements = c("Na", "Cl"))
flush_dir | R Documentation |
Flush a directory with archive
Description
Clear a directory and archive those files if desired in any directory matching any pattern.
Clear a directory and archive those files if desired in any directory matching any pattern.
Usage
flush_dir("logs", ".txt") flush_dir(directory = "logs")
Arguments
archive
|
LGL scalar on whether to archive current logs |
directory
|
CHR scalar path to the directory to flush |
Value
None, executes directory actions
None, removes files from a directory
fn_guide | R Documentation |
View an index of help documentation in your browser
Description
View an index of help documentation in your browser
Usage
fn_guide()
Value
None
fn_help | R Documentation |
Get function documentation for this project
Description
This function is analogous to “?”, “??”, and “help”. For now, this effort is distributed as a project instead of a package. This imposes certain limitations, particularly regarding function documentation. Use this function to see the documentation for functions in this project just as you would any installed package. The other limitation is that these help files will not populate directly as a pop up when using RStudio tab completion.
Usage
fn_help(fn_name)
Arguments
fn_name
|
Object or CHR string name of a function in this project. |
Value
None, opens help file.
Note
This function will be deprecated if the project is moved to a package.
Examples
fn_help(fn_help)
format_html_id | R Documentation |
Format a file name as an HTML element ID
Description
This is often useful to provide feedback to the user about the files they’ve provided to a shiny application in a more informative manner, as IDs produced here are suitable to build dynamic UI around. This can serve as the base ID for tooltips, additional information, icons, etc. and produce everything necessary in one place for any number of files.
Usage
format_html_id(filename)
Arguments
filename
|
CHR vector of file names |
Value
CHR vector of the same size as filename
Examples
format_html_id(list.files())
format_list_of_names | R Documentation |
Grammatically collapse a list of values
Description
Given a vector of arbitrary length that coerces properly to a human-readable
character string, return it formatted as one of: “one”, “one and two”, or
“one, two, …, and three” using glue::glue
. This is functionally the same
as a static version of [glue::glue_collapse] with parameters sep = “,”,
width = Inf, and last = “, and”.
Usage
format_list_of_names(namelist, add_quotes = FALSE)
Arguments
namelist
|
vector of values to format |
add_quotes
|
LGL scalar of whether to enclose individual values in quotation marks |
Value
CHR vector of length one
Examples
format_list_of_names("test") format_list_of_names(c("apples", "bananas")) format_list_of_names(c(1:3)) format_list_of_names(seq.Date(Sys.Date(), Sys.Date() + 3, by = 1))
formulalize | R Documentation |
Generate standard chemical formula notation
Description
Generate standard chemical formula notation
Usage
formulalize(formula)
Arguments
formula
|
CHR string of an elemental formula |
Value
string with a standard ordered formula
Examples
formula <- "C10H15S1O3" formulalize(formula)
full_import | R Documentation |
Import one or more files from the NIST Method Reporting Tool for NTA
Description
This function serves as a single entry point for data imports. It is predicated upon the NIST import routine defined here and relies on several assumptions. It is intended ONLY as an interactive manner of importing n data files from the NIST Method Reporting Tool for NTA (MRT NTA).
Usage
full_import( import_object = NULL, file_name = NULL, db_conn = con, exclude_missing_required = FALSE, stop_if_missing_required = TRUE, include_if_missing_recommended = FALSE, stop_if_missing_recommended = TRUE, ignore_extra = TRUE, ignore_insert_conflicts = TRUE, requirements_obj = "import_requirements", method_in = "massspectrometry", ms_methods_table = "ms_methods", instrument_properties_table = "instrument_properties", sample_info_in = "sample", sample_table = "samples", contributor_in = "data_generator", contributors_table = "contributors", sample_aliases = NULL, generation_type = NULL, generation_type_norm_table = ref_table_from_map(sample_table, "generation_type"), mass_spec_in = "massspectrometry", chrom_spec_in = "chromatography", mobile_phases_in = "chromatography", qc_method_in = "qcmethod", qc_method_table = "qc_methods", qc_method_norm_table = ref_table_from_map(qc_method_table, "name"), qc_references_in = "source", qc_data_in = "qc", qc_data_table = "qc_data", carrier_mix_names = NULL, id_mix_by = "^mp*[0-9]+", mix_collection_table = "carrier_mix_collections", mobile_phase_props = list(in_item = "chromatography", db_table = "mobile_phases", props = c(flow = "flow", flow_units = "flowunits", duration = "duration", duration_units = "durationunits")), carrier_props = list(db_table = "carrier_mixes", norm_by = ref_table_from_map("carrier_mixes", "component"), alias_in = "carrier_aliases", props = c(id_by = "solvent", fraction_by = "fraction")), additive_props = list(db_table = "carrier_additives", norm_by = ref_table_from_map("carrier_additives", "component"), alias_in = "additive_aliases", props = c(id_by = "add$", amount_by = "_amount", units_by = "_units")), exclude_values = c("none", "", NA), peaks_in = "peak", peaks_table = "peaks", software_timestamp = NULL, software_settings_in = "msconvertsettings", ms_data_in = "msdata", ms_data_table = "ms_data", unpack_spectra = FALSE, unpack_format = c("separated", "zipped"), ms_spectra_table = "ms_spectra", linkage_table = "conversion_software_peaks_linkage", settings_table = "conversion_software_settings", as_date_format = "%Y-%m-%d %H:%M:%S", format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"), min_datetime = "2000-01-01 00:00:00", fragments_in = "annotation", fragments_table = "annotated_fragments", fragments_sources_table = "fragment_sources", fragments_norm_table = "norm_fragments", citation_info_in = "fragment_citation", inspection_info_in = "fragment_inspections", inspection_table = "fragment_inspections", generate_missing_aliases = TRUE, fragment_aliases_in = "fragment_aliases", fragment_aliases_table = "fragment_aliases", fragment_alias_type_norm_table = ref_table_from_map(fragment_aliases_table, "alias_type"), inchi_prefix = "InChI=1S/", rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"), rdkit_ns = "rdk", rdkit_make_if_not = TRUE, rdkit_aliases = c("inchi", "inchikey"), mol_to_prefix = "MolTo", mol_from_prefix = "MolFrom", type = "smiles", compounds_in = "compounddata", compounds_table = "compounds", compound_category = NULL, compound_category_table = "compound_categories", compound_aliases_in = "compound_aliases", compound_aliases_table = "compound_aliases", compound_alias_type_norm_table = ref_table_from_map(compound_aliases_table, "alias_type"), fuzzy = FALSE, case_sensitive = TRUE, ensure_unique = TRUE, require_all = FALSE, import_map = IMPORT_MAP, log_ns = "db" )
Arguments
import_object
|
nested LIST object of JSON data to import; this import routine was built around output from the NTA MRT (default: NULL) - note you may supply either import object or file_name |
file_name
|
external file in JSON format of data to import; this import routine was built around output from the NTA MRT (default: NULL) - note you may supply either import object or file_name |
db_conn
|
connection object (default: con) |
exclude_missing_required
|
LGL scalar of whether or not to skip imports missing required information (default: FALSE); if set to TRUE, this will override the setting for ‘stop_if_missing_required’ and the import will continue with logging messages for which files were incomplete |
stop_if_missing_required
|
LGL scalar of whether or not to to stop the import routine if a file is missing required information (default: TRUE) |
include_if_missing_recommended
|
LGL scalar of whether or not to include imports missing recommended information (default: FALSE) |
stop_if_missing_recommended
|
LGL scalar of whether or not to to stop the import routine if a file is missing recommended information (default: TRUE) |
ignore_extra
|
LGL scalar of whether to ignore extraneous import elements or stop the import process (default: TRUE) |
ignore_insert_conflicts
|
LGL scalar of whether to ignore insert conflicts during the qc methods and qc data import steps (default: TRUE) |
requirements_obj
|
CHR scalar of the name of an R object holding import requirements; this is a convenience shorthand to prevent multiple imports from parameter ‘file_name’ (default: “import_requirements”) |
method_in
|
CHR scalar name of the ‘obj’ list containing method information |
ms_methods_table
|
CHR scalar name of the database table containing method information |
instrument_properties_table
|
CHR scalar name of the database table holding instrument property information for a given method (default: “instrument_properties”) |
sample_info_in
|
CHR scalar name of the element within ‘import_object’ containing samples information |
sample_table
|
CHR scalar name of the database table holding sample information (default: “samples”) |
contributor_in
|
CHR scalar name of the element within ‘import_object[[sample_info_in]]’ containing contributor information (default: “data_generator”) |
contributors_table
|
CHR scalar name of the database table holding contributor information (default: “contributors”) |
sample_aliases
|
named CHR vector of aliases with names matching the alias, and values of the alias reference e.g. c(“ACU1234” = “NIST Biorepository GUAID”) which can be virutally any reference text; it is recommended that the reference be to a resolver service if connecting with external data sources (default: NULL) |
generation_type
|
CHR scalar of the type of data generated for this sample (e.g. “empirical” or “in silico”). The default (NULL) will assign based on ‘generation_type_default’; any other value will override the default value and be checked against values in ‘geneation_type_norm_table’ |
generation_type_norm_table
|
CHR scalar name of the database table normalizing sample generation type (default: “empirical”) |
mass_spec_in
|
CHR scalar name of the element in ‘obj’ holding mass spectrometry information (default: “massspectrometry”) |
chrom_spec_in
|
CHR scalar name of the element in ‘obj’ holding chromatographic information (default: “chromatography”) |
mobile_phases_in
|
CHR scalar name of the database table holding mobile phase and chromatographic information (default: “chromatography”) |
qc_method_in
|
CHR scalar name of the import object element containing QC method information (default: “qcmethod”) |
qc_method_table
|
CHR scalar of the database table name holding QC method check information (default: “qc_methods”) |
qc_method_norm_table
|
CHR scalar name of the database table normalizing QC methods type (default: “norm_qc_methods_name”) |
qc_references_in
|
CHR scalar of the name in ‘obj[[qc_method_in]]’ that contains the reference or citation for the QC protocol (default: “source”) |
carrier_mix_names
|
CHR vector (optional) of carrier mix collection names to assign, the length of which should equal 1 or the length of discrete carrier mixtures; the default, NULL, will automatically assign names as a function of the method and sample id. |
id_mix_by
|
regex CHR to identify mobile phase mixtures (default: “^mp*[0-9]+” matches the generated mixture names) |
mix_collection_table
|
CHR scalar name of the mix collections table (default: “carrier_mix_collections”) |
mobile_phase_props
|
LIST object describing how to import the mobile phase table containing: in_item: CHR scalar name of the ‘obj’ name containing chromatographic information (default: “chromatography”); db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); props: named CHR vector of name mappings with names equal to database columns in ‘mobile_phase_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’ |
carrier_props
|
LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_carriers”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “carrier_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘carrier_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate a carrier (e.g. “solvent”) |
additive_props
|
LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_additives”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “additive_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘additive_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’ ‘obj[[mobile_phase_props\(in_item]][[mobile_phase_props\)db_table]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate an additive (e.g. “add$”) |
exclude_values
|
CHR vector indicating which values to ignore in ‘obj’ (default: c(“none”, ““, NA)) |
peaks_in
|
CHR scalar name of the element within ‘import_object’ containing peak information |
peaks_table
|
CHR scalar name of the database table holding sample information (default: “samples”) |
ms_data_in
|
CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”) |
ms_data_table
|
CHR scalar name of the table holding packed spectra in the database (default: “ms_data”) |
unpack_spectra
|
LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE) |
unpack_format
|
CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped” |
ms_spectra_table
|
CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”) |
fragments_in
|
CHR scalar name of the ‘obj’ component holding annotated fragment information (default: “annotation”) |
fragments_table
|
CHR scalar name of the database table holding annotated fragment information (default: “annotated_fragments”) |
fragments_sources_table
|
CHR scalar name of the database table holding fragment source (e.g. generation) information (default: “fragment_sources”) |
fragments_norm_table
|
CHR scalar name of the database table holding normalized fragment identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragments_table’) |
citation_info_in
|
CHR scalar name of the ‘obj’ component holding fragment citation information (default: “fragment_citation”) |
inspection_info_in
|
CHR scalar name of the ‘obj’ component holding fragment inspection information (default: “fragment_inspections”) |
inspection_table
|
CHR scalar name of the database table holding fragment inspection information (default: “fragment_inspections”) |
generate_missing_aliases
|
LGL scalar determining whether or not to generate machine readable expressions (e.g. InChI) for fragment aliases from RDKit (requires RDKit activation; default: FALSE); see formals list for [add_rdkit_aliases] |
fragment_aliases_in
|
CHR scalar name of the ‘obj’ component holding fragment aliases (default: “fragment_aliases”) |
fragment_aliases_table
|
CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”) |
fragment_alias_type_norm_table
|
CHR scalar name of the alias reference
normalization table, by default the return of
|
rdkit_ref
|
CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project) |
mol_to_prefix
|
CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”) |
mol_from_prefix
|
CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”) |
type
|
The type of chemical structure notation (default: SMILES) |
compounds_in
|
CHR scalar name in ‘obj’ holding compound data (default: “compounddata”) |
compounds_table
|
CHR scalar name the database table holding compound data (default: “compounds”) |
compound_category
|
CHR or INT scalar of the compound category (either a direct ID or a matching category label in ‘compound_category_table’) (default: NULL) |
compound_category_table
|
CHR scalar name the database table holding normalized compound categories (default: “compound_categories”) |
compound_aliases_in
|
CHR scalar name of where compound aliases are located within the import (default: “compound_aliases”), passed to [resolve_compounds] as “norm_alias_table” |
compound_aliases_table
|
CHR scalar name of the alias reference table to use when assigning compound aliases (default: “compound_aliases”) passed to [resolve_compounds] as “compounds_table” |
compound_alias_type_norm_table
|
CHR scalar name of the alias reference
normalization table, by default the return of
|
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE). |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
ensure_unique
|
LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE) |
require_all
|
LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Details
Import files should be in JSON format as created by the MRT NTA. Examples are provided in the “example” directory of the project.
Defaults for this release are set throughout as of the latest database schema, but left here as arguments in case those should change, or slight changes are made to column and table names.
Value
Console logging if enabled and interactive prompts when user intervention is required. There is no formal return as it executes database actions.
Note
Many calls within this function are executed as do.call with a filtered argument list based on the names of formals for the called function. Several arguments to those functions are also left as the defaults set there; names must match exactly to be passed in this manner. See the list of inherited parameters.
gather_qc | R Documentation |
Quality Control Check of Import Data
Description
Performs the quality control check on the imported data from the peak gather function.
Usage
gather_qc( gather_peak, exactmasses, exactmasschart, ms1range = c(0.5, 3), ms1isomatchlimit = 0.5, minerror = 0.002, max_correl = 0.8, correl_bin = 0.1, max_ph = 10, ph_bin = 1, max_freq = 10, freq_bin = 1, min_n_peaks = 3, cormethod = "pearson" )
Arguments
gather_peak
|
peak object generated from ‘peak_gather_json’ function |
exactmasses
|
exactmasses list |
ms1range
|
2-component vector containing stating the range to evaluate the isotopic pattern of the precursor ion, from mass - ms1range[1] to mass + ms1range[2] |
ms1isomatchlimit
|
the reverse dot product minimum score for the isotopic pattern match |
minerror
|
the minimum mass error (in Da) allowable for the instrument |
max_correl
|
[TODO PLACEHOLDER] |
correl_bin
|
[TODO PLACEHOLDER] |
max_ph
|
[TODO PLACEHOLDER] |
ph_bin
|
[TODO PLACEHOLDER] |
max_freq
|
[TODO PLACEHOLDER] |
freq_bin
|
[TODO PLACEHOLDER] |
min_n_peaks
|
[TODO PLACEHOLDER] |
cormethod
|
[TODO PLACEHOLDER] |
Value
nested list of quality control check results
get_annotated_fragments | R Documentation |
Get all annotated fragments have matching masses
Description
Get all annotated fragments have matching masses
Usage
get_annotated_fragments(con, fragmentions, masserror, minerror)
Arguments
con
|
SQLite database connection |
fragmentions
|
numeric vector containing m/z values for fragments to search |
masserror
|
numeric relative mass error (ppm) |
minerror
|
numeric minimum mass error (Da) |
Value
data.frame of mass spectral data
get_component | R Documentation |
Resolve components from a list or named vector
Description
Call this to pull a component named obj_component
from a list or named
vector provided as obj
and optionally use [tack_on] to append to it. This
is intended to ease the process of pulling specific components from a list
for further treatment in the import process by isolating that component.
Usage
get_component(obj, obj_component, silence = TRUE, log_ns = "global", ...)
Arguments
obj
|
LIST or NAMED vector in which to find |
obj_component
|
CHR vector of named elements to find in |
silence
|
LGL scalar indicating whether to silence recursive messages,
which may be the same for each element of |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
…
|
Additional arguments passed to/from the ellipsis parameter of calling functions. If named, names are preserved. |
Details
This is similar in scope to [purrr::pluck] in many regards, but always returns items with names, and will search an entire list structure, including data frames, to return all values associated with that name in individual elements.
Value
LIST object containing the elements of obj
Note
This is a recursive function.
If ellipsis arguments are provided, they will be appended to each identified component via [tack_on]. Use with caution, but this can be useful for appending common data to an entire list (e.g. a datetime stamp for logging processing time or a processor name, human or software).
Examples
get_component(list(a = letters, b = 1:10), "a") get_component(list(ex = list(a = letters, b = 1:10), ex2 = list(c = 1:5, a = LETTERS)), "a") get_component(list(a = letters, b = 1:10), "a", c = 1:5)
get_compound_fragments | R Documentation |
Get all fragments associated with compounds
Description
Get all fragments associated with compounds
Usage
get_compound_fragments(con, fragmentions, masserror, minerror)
Arguments
con
|
SQLite database connection |
fragmentions
|
numeric vector containing m/z values for fragments to search |
masserror
|
numeric relative mass error (ppm) |
minerror
|
numeric minimum mass error (Da) |
Value
data.frame object describing known fragments in the database with known compound and peak references attached
get_compoundid | R Documentation |
Get compound ID and name for specific peaks
Description
Get compound ID and name for specific peaks
Usage
get_compoundid(con, peakid)
Arguments
con
|
SQLite database connection |
peakid
|
integer vector of primary keys for peaks table |
Value
table of compound IDs and names
get_fkpk_relationships | R Documentation |
Extract foreign key relationships from a schema
Description
This convenience function is part of the automatic generation of SQL commands building views and triggers from a defined schema. Its sole purpose is as a pre-pass extraction of foreign key relationships between tables from an object created by [db_map], which in turn relies on specific formatting in the schema SQL definitions.
Usage
get_fkpk_relationships(er_map(db_conn = con))
Arguments
db_map
|
LIST object containing descriptions of table mapping in an opinionated manner, generally generated by [er_map]. The expectation is a list of tables, with references in SQL form enumerated in a child element with a name matching ‘references_in’ |
references_in
|
CHR scalar naming the child element containing SQL references statements of the form “fk_column REFERENCES table(pk_column)” (default: “references” is provided by [er_map]) |
dictionary
|
LIST object containing the schema dictionary produced by [data_dictionary] fully describing table entities |
Value
LIST of data frames with one element for each table with a foreign key defined
Note
This only functions for list objects formatted correctly. That is, each entry in [db_map] must contain an element with a name matching that provided to ‘references_in’ which contains a character vector formatted as “table1 REFERENCES table2(pk_column)”.
get_massadj | R Documentation |
Calculate the mass adjustment for a specific adduct
Description
Calculate the mass adjustment for a specific adduct
Usage
get_massadj(adduct = "+H", exactmasses = NULL, db_conn = "con")
Arguments
adduct
|
character string containing the + or - and the elemental formula of the adduct, note “2H” should be represented as “H2” |
exactmasses
|
list of exact masses of elements, NULL pulls from the database |
db_conn
|
database connection object, either a CHR scalar name (default: “con”) or the connection object itself (preferred) |
Value
NUM scalar of the mass adjustment value
get_msconvert_data | R Documentation |
Extract msconvert metadata
Description
Extracts relevant Proteowizard MSConvert metadata from mzml file. Used for ‘peak_gather_json’ function
Usage
get_msconvert_data(mzml)
Arguments
mzml
|
list of msdata from ‘mzMLtoR’ function |
Value
list of msconvert parameters
get_msdata | R Documentation |
Get all mass spectral data within the database
Description
Get all mass spectral data within the database
Usage
get_msdata(con)
Arguments
con
|
SQLite database connection |
Value
data.frame of mass spectral data
get_msdata_compound | R Documentation |
Get all mass spectral data for a specific compound
Description
Get all mass spectral data for a specific compound
Usage
get_msdata_compound(con, 15)
Arguments
con
|
SQLite database connection |
compoundid
|
integer compound ID value |
Value
data.frame of mass spectral data
get_msdata_peakid | R Documentation |
Get all mass spectral data for a specific peak id
Description
Get all mass spectral data for a specific peak id
Usage
get_msdata_peakid(con, 15)
Arguments
con
|
SQLite database connection |
peakid
|
integer vector of peak ids |
Value
data.frame of mass spectral data
get_msdata_precursors | R Documentation |
Get all mass spectral data with a specific precursor ion
Description
Get all mass spectral data with a specific precursor ion
Usage
get_msdata_precursors(con, precursorion, masserror, minerror)
Arguments
con
|
SQLite database connection |
precursorion
|
numeric precursor ion m/z value |
masserror
|
numeric relative mass error (ppm) |
minerror
|
numeric minimum mass error (Da) |
Value
data.frame of mass spectral data
get_opt_params | R Documentation |
Get optimized uncertainty mass spectra parameters for a peak
Description
Get optimized uncertainty mass spectra parameters for a peak
Usage
get_opt_params(con, peak_ids)
Arguments
con
|
SQLite database connection |
peak_ids
|
integer vector of primary keys for peaks table |
Value
data.frame object of available optimized search parameters
get_peak_fragments | R Documentation |
Get annotated fragments for a specific peak
Description
Get annotated fragments for a specific peak
Usage
get_peak_fragments(con, peakid)
Arguments
con
|
SQLite database connection |
peakid
|
integer vector of primary keys for peaks table |
Value
data.frame of annotated fragments
get_peak_precursor | R Documentation |
Get precursor ion m/z for a specific peak
Description
Get precursor ion m/z for a specific peak
Usage
get_peak_precursor(con, peakid)
Arguments
con
|
SQLite database connection |
peakid
|
integer primary key for peaks table |
Value
numeric value of precursor ion m/z value
get_sample_class | R Documentation |
Get sample class information for specific peaks
Description
Get sample class information for specific peaks
Usage
get_sample_class(con, peakid)
Arguments
con
|
SQLite database connection |
peakid
|
integer vector of primary keys for peaks table |
Value
data.frame object of sample classes associated with a given peak
get_search_object | R Documentation |
Generate msdata object from input peak data
Description
Generate msdata object from input peak data
Usage
get_search_object(searchmzml, zoom = c(1, 4))
Arguments
searchmzml
|
mzml with searching dataframe from ‘getmzML’ function |
zoom
|
vector length of 2 containing +/- the area around the MS1 precursor ion to collect data. |
Value
LIST object of data.frames include MS1 and MS2 analytical data, and the search parameters used to generate them
get_suspectlist | R Documentation |
Get the current NIST PFAS suspect list.
Description
Downloads the current NIST suspect list of PFAS from the NIST Public Data Repository to the current project directory.
Usage
get_suspectlist( destfile = file.path("R", "compoundlist", "suspectlist.xlsx"), url_file = file.path("config", "suspectlist_url.txt"), default_url = SUS_LIST_URL, save_local = FALSE )
Arguments
destfile
|
CHR scalar file.path of location to save the downloaded file |
url_file
|
CHR scalar file.path of where the text file containing the download URL for the NIST PFAS Suspect List |
save_local
|
LGL scalar of whether to retain an R expression in the current environment after download |
Value
none
Examples
get_suspectlist()
get_ums | R Documentation |
Generate consensus mass spectrum
Description
The function calculates the uncertainty mass spectrum for a single peak table based on specific settings described in https://doi.org/10.1021/jasms.0c00423
Usage
get_ums( peaktable, correl = NULL, ph = NULL, freq = NULL, normfn = "sum", cormethod = "pearson" )
Arguments
peaktable
|
result of the ‘create_peak_table_ms1’ or ‘create_peak_table_ms1’ function |
correl
|
Minimum correlation coefficient between the target ions and the base ion intensity of the targeted m/z to be included in the mass spectrum |
ph
|
Minimum chromatographic peak height from which to extract MS2 data for the mass spectrum |
freq
|
minimum observational frequency of the target ions to be included in the mass spectrum |
normfn
|
the normalization function typically “mean” or “sum” for normalizing the intensity values |
cormethod
|
the correlation method used for calculating the correlation, see ‘cor’ function for methods |
Value
nested list of dataframes containing all MS1 and MS2 data for the peak
get_uniques | R Documentation |
Get unique components of a nested list
Description
There are times when the concept of “samples” and “grouped data” may become intertwined and difficult to parse. The import process is one of those times depending on how the import file is generated. This function takes a nested list and compares a specific aspect of it, grouping the output based on that aspect and returning its characteristics.
Usage
get_uniques(objects, aspect)
Arguments
objects
|
LIST object |
aspect
|
CHR scalar name of the aspect from which to generate unique combinations |
Details
For example, the standard NIST import includes the “sample” aspect, which may be identical for multiple data import files. This provides a unique listing of those sample characteristics to reduce data manipulation and storage, and minimize database “chatter” during read/write. It returns a set of unique characteristics in a list, with appended characteristics “import_object” with the index number and object name of entries matching those characteristics.
This is largely superceded by later developments to database operations that first check for a table primary key id given a comprehensive list of column values in those tables where only a single record should contain those values (e.g. a complete unique case, enforced or unenforced).
Value
Unnamed LIST of length equaling the number of unique combinations with their values and indices
Examples
tmp <- list(list(a = 1:10, b = 1:10), list(a = 1:5, b = 1:5), list(a = 1:10, b = 1:5)) get_uniques(tmp)
getcharge | R Documentation |
Get polarity of a ms scan within mzML object
Description
Get polarity of a ms scan within mzML object
Usage
getcharge(mzml, i)
Arguments
mzml
|
list mzML object generated from ‘mzMLtoR’ function |
i
|
integer scan number |
Value
integer representing scan polarity (either 1 (positive) or -1 (negative))
getmslevel | R Documentation |
Get MS Level of a ms scan within mzML object
Description
Get MS Level of a ms scan within mzML object
Usage
getmslevel(mzml, i)
Arguments
mzml
|
list mzML object generated from ‘mzMLtoR’ function |
i
|
integer scan number |
Value
integer representing the MS Level (1, 2, … n)
getmzML | R Documentation |
Brings raw data file into environment
Description
If filename is not extension .mzML, then converts the raw file
Usage
getmzML( search_df, CONVERT = FALSE, CHECKCONVERT = TRUE, is_waters = FALSE, lockmass = NULL, lockmasswidth = NULL, correct = FALSE )
Arguments
search_df
|
data.frame output of [create_search_df] or file name of a raw file to be converted |
CONVERT
|
LGL scalar of whether or not to convert the search_df filename (default FALSE) |
CHECKCONVERT
|
LGL scalar of whether or not to verify the conversion format (default TRUE) |
Value
LIST value of the trimmed mzML file matching search criteria
getprecursor | R Documentation |
Get precursor ion of a ms scan within mzML object
Description
Get precursor ion of a ms scan within mzML object
Usage
getprecursor(mzml, i)
Arguments
mzml
|
list mzML object generated from ‘mzMLtoR’ function |
i
|
integer scan number |
Value
numeric designating the precursor ion (or middle of the scan range for SWATCH or DIA), returns NULL if no precursor was selected
gettime | R Documentation |
Get time of a ms scan within mzML object
Description
Get time of a ms scan within mzML object
Usage
gettime(mzml, i)
Arguments
mzml
|
list mzML object generated from ‘mzMLtoR’ function |
i
|
integer scan number |
Value
numeric of the scan time
has_missing_elements | R Documentation |
Simple check for if an object is empty
Description
Checks for empty vectors, a blank character string, NULL, and NA values. If fed a list object, returns TRUE if any element is is the “empty” set. For data.frames checks that nrow is not 0. [rlang:::is_empty] only checks for length 0.
Usage
has_missing_elements(x, logging = TRUE)
Arguments
x
|
Object to be checked |
logging
|
LGL scalar of whether or not to make log messages (default: TRUE) |
Value
LGL scalar of whether x
is empty
Note
Reminder that vectors created with NULL values will be automatically reduced by R.
Examples
has_missing_elements("a") # FALSE has_missing_elements(c(NULL, 1:5)) # FALSE has_missing_elements(list(NULL, 1:5)) # TRUE has_missing_elements(data.frame(a = character(0))) # TRUE
is_elemental_match | R Documentation |
Checks if two elemental formulas match
Description
Checks if two elemental formulas match
Usage
is_elemental_match(testformula, trueformula)
Arguments
testformula
|
character string of elemental formula to test |
trueformula
|
character string of elemental formula to check against (truth) |
Value
logical
is_elemental_subset | R Documentation |
Check if elemental formula is a subset of another formula
Description
Check if elemental formula is a subset of another formula
Usage
is_elemental_subset(fragmentformula, parentformula)
Arguments
fragmentformula
|
character string of elemental formula subset to test |
parentformula
|
character string of elemental formula to check for subset |
Value
logical
Examples
is_elemental_subset("C2H2", "C2H5O") is_elemental_subset("C2H2", "C2H1O")
isotopic_distribution | R Documentation |
Isotopic distribution functions Generate isotopic distribution mass spectrum of elemental formula
Description
Isotopic distribution functions Generate isotopic distribution mass spectrum of elemental formula
Usage
isotopic_distribution( elementalformula, exactmasschart, remove.elements = c(), max.dist = 3, min.int = 0.001, charge = "neutral" )
Arguments
elementalformula
|
character string of elemental formula to simulate isotopic pattern |
exactmasschart
|
exact mass chart generated from function create_exactmasschart |
remove.elements
|
character vector of elements to remove from elemental formula |
max.dist
|
numeric maximum mass distance (in Da) from exact mass to include in simulated isotopic pattern |
min.int
|
numeric minimum relative intensity (maximum = 1, minimum = 0) to include in simulated isotopic pattern |
charge
|
character string for the charge state of the simulated isotopic pattern, options are ‘neutral’, ‘positive’, and ‘negative’ |
Value
data frame containing mz and int values of mass spectrum
lockmass_remove | R Documentation |
Remove lockmass scan from mzml object
Description
For Waters instruments only, identifies the scans that are due to a lock mass scan and removes them for easier processing.
Usage
lockmass_remove( mzml, lockmass = NULL, lockmasswidth = NULL, correct = FALSE, approach = "baseion" )
Arguments
mzml
|
mzML object generated from mzMLtoR() function |
lockmass
|
m/z value of the lockmass to remove |
lockmasswidth
|
m/z value for the half-window of the lockmass scan |
correct
|
logical if the subsequent spectra should be corrected |
Value
A copy of the object provided to ‘mzml’ with the lock mass removed.
log_as_dataframe | R Documentation |
Pull a log file into an R object
Description
Log messages generated by logger with anything other than the standard formatting options can have multiple formatting tags to display in the R console. These “junk up” any resulting object. If you want to read it directly in the console and preserve formatting, call [read_log] with the default ‘as_object’ argument (FALSE). For deeper inspection, a data frame works well, provided the formatting matches up. In ‘env_logger.R’ there is an option to set formatting layouts. In addition to setting formatting layouts, generate regex strings matching the desired format - ‘log_remove_color’ will remove the colors (the majority should be caught by the string provided as the default in this package) and ‘log_split_column’ will split the lines in your logging file into discrete categories named by ‘df_titles’.
Usage
log_as_dataframe("log.txt")
Arguments
file
|
CHR scalar file path to a log file (default NULL is translated as “log.txt”) |
last_n
|
INT scalar of the last ‘n’ log entries to read. |
condense
|
LGL scalar of whether to nest the resulting tibble by the nearest second. |
regex_remove
|
CHR scalar regular expression of characters to REMOVE from log messages via [stringr::str_remove_all] |
regex_split
|
CHR scalar regular expression of characters used to split the log entry into columns from log messages via [tidyr::separate] |
df_titles
|
CHR vector of headers for the resulting data frame, passed as the “into” argument of [tidyr::separate] |
Details
This will attempt to fail gracefully.
Value
tibble with one row per log entry (or groups)
Note
If “time” is included and ‘condense’ == TRUE, the log messages in the resulting tibble will nested to the nearest second.
If “status” is included it will be a factor with levels including the valid statuses from logger (see [logger::log_levels]).
Use care to develop ‘regex_split’ in order to split the log entries into the appropriate columns as defined by ‘df_titles’; extra values will be merged into the messages column.
log_fn | R Documentation |
Simple logging convenience
Description
Conveniently add a log message at the trace level. Typically this would be called twice bookending the body of a function along the lines of “Start fn()” and “End fn()” when calling a function. This can help provided traceability to deeply nested function calls within a log.
Usage
fn <- function() {log_fn("start"); 1+1; log_fn("end")} fn()
Arguments
status
|
CHR scalar to prefix the log message; will be coerced to sentence case. Typically “start” or “end” but anything is accepted (default “start”). |
log_ns
|
CHR scalar of the logger namespace to use (default NA_character_) |
level
|
CHR scalar of the logging level to be passed to [log_it] (default “trace”) |
Value
None, hands logging messages to [log_it]
log_it | R Documentation |
Conveniently log a message to the console
Description
Use this to log messages of arbitrary level and message. It works best with [logger] but will also print directly to the console to support setups where package [logger] may not be available or custom log levels are desired.
Usage
log_it( log_level, msg = NULL, log_ns = NULL, reset_logger_settings = FALSE, reload_all = FALSE, logger_settings = file.path("config", "env_logger.R"), add_unknown_ns = FALSE, clone_settings_from = NULL )
Arguments
log_level
|
CHR scalar of the level at which to log a given statement. If using the [logger] package, must match one of [logger:::log_levels] |
msg
|
CHR scalar of the message to accompany the log. |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: NULL prints to the global logging namespace) |
reset_logger_settings
|
LGL scalar indicating whether or not to refresh
the logger settings using the file identified in |
reload_all
|
LGL scalar indicating whether to, during
|
logger_settings
|
CHR file path to the file containing logger settings (default: file.path(“config”, “env_logger.R”)) |
add_unknown_ns
|
LGL scalar indicating whether or not to add a new
namespace if |
clone_settings_from
|
CHR scalar indicating |
Details
When using [logger], create settings for each namespace in file
config/env_logger.R
as a list (see examples there) and make sure it is
sourced. If using with [logger] and “file” or “both” is selected for the
namespace LOGGING[[log_ns]]\(to</code> parameter in <code>env_logger.R</code> logs will be written to disk at the file defined in <code>LOGGING[[log_ns]]\)file
as well as
the console.
Value
Adds to the logger file (if enabled) and/or prints to the console if enabled. See
Examples
log_it("test", "a test message") test_log <- function() { log_it("success", "a success message") log_it("warn", "a warning message") } test_log() # Try it with and without logger loaded.
make_acronym | R Documentation |
Simple acronym generator
Description
At times it is useful for display purposes to generate acronyms for longer
bits of text. This naively generates those by extracting the first letter as
upper case from each word in text
elements.
Usage
make_acronym(text)
Arguments
text
|
CHR vector of the text to acronym-ize |
Value
CHR vector of length equal to that of text
with the acronym
Examples
make_acronym("test me") make_acronym(paste("department of ", c("commerce", "energy", "defense")))
make_install_code | R Documentation |
Convenience function to set a new installation code
Description
Convenience function to set a new installation code
Usage
make_install_code(db_conn = con, new_name = NULL, log_ns = "db")
Arguments
db_conn
|
connection object (default “con”) |
new_name
|
CHR scalar of the human readable name of the installation (e.g. your project name) (default: NULL) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Value
None
make_requirements | R Documentation |
Make import requirements file
Description
Importing from the NIST contribution spreadsheet requires a certain format. In order to proceed smoothly, that format must be verified for gross integrity with regard to expectations about shape (i.e. class), names of elements, and whether they are required for import. This function creates a JSON expression of the expected import structure and saves it to the project directory.
Usage
make_requirements( example_import, file_name = "import_requirements.json", not_required = c("annotation", "chromatography", "opt_ums_params"), archive = TRUE, retain_in_R = TRUE, log_ns = "db" )
Arguments
example_import
|
CHR or LIST object containing an example of the expected import format; this should include only a SINGLE compound contribution file |
file_name
|
CHR scalar indicating a file name to save the resulting name or search on any existing file to archive if ‘archive’ = TRUE (default: “import_requirements.json”) |
not_required
|
CHR vector matching element names of ‘example_import’ which are not required; all others will be assumed to be required |
archive
|
LGL indicating whether or not to archive an existing file matching ‘file_name’ by suffixing the file name with current date. Only one archive per date is supported; if a file already exists, it will be deleted. (default: TRUE) |
retain_in_R
|
LGL indicating whether to retain a local copy of the requirements file generated (default: TRUE) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Details
Either an existing JSON expression or an R list object may be used for ‘example_import’. If it is a character scalar, it will be assumed to be a file name, which will be loaded based on file extension. That file must be a JSON parseable text file, though raw text is acceptable.
An example file is located in the project directory at “example/PFAC30PAR_PFCA1_mzML_cmpd2627.JSON”
As with any file manipulation, use care with ‘file_name’.
Value
writes a file to the project directory (based on the found location of ‘file_name’) with the JSON structure
manage_connection | R Documentation |
Check for, and optionally remove, a database connection object
Description
This function seeks to abstract connection management objects to a degree. It seeks to streamline the process of connecting and disconnecting existing connections as defined by function parameters. This release has not been tested extensively with drivers other than SQLite.
Usage
manage_connection("test.sqlite", conn_name = "test_con")
Arguments
db
|
CHR scalar name of the database to check, defaults to the name supplied in config/env.R (default: session variable DB_NAME) |
drv_pack
|
CHR scalar of the package used to connect to this database (default: session variable DB_DRIVER) |
conn_class
|
CHR vector of connection object classes to check against. Note this may depend heavily on connection packages and must be present in the class names of the driver used. (default session variable DB_CLASS) |
conn_name
|
CHR scalar of the R environment object name to use for this connection (default: “con”) |
is_local
|
LGL scalar indicating whether or not the referenced database is a local file, if not it will be treated as though it is either a DSN or a database name on your host server, connecting as otherwise defined |
rm_objects
|
LGL scalar indicating whether or not to remove objects identifiably connected to the database from the current environment. This is particularly useful if there are outstanding connections that need to be closed (default: TRUE) |
reconnect
|
LGL scalar indicating whether or not to connect if a connection does not exist; if both this and ‘disconnect’ are true, it will first be disconnected before reconnecting. (default: TRUE) |
disconnect
|
LGL scalar indicating whether or not to terminate and remove the connection from the current global environment (default: TRUE) |
log_ns
|
CHR scalar of the namespace (if any) to use for logging |
.environ
|
environment within which to place this connection object |
…
|
named list of any other connection parameters required for your database driver (e.g. postgres username/password) |
Value
None
Note
If you want to disconnect everything but retain tibble pointers to your data source as tibbles in this session, use [close_up_shop] instead.
For more complicated setups, it may be easier to use this function by storing parameters in a list and calling with [base::do.call()]
map_import | R Documentation |
Map an import file to the database schema
Description
This parses an import object and attempts to map it to database fields and tables as defined by an import map stored in an object of class data.frame, typically created during project compliance as “IMPORT_MAP”. This object is a list of all columns and their tables in the import file matched with the database table and column to which they should be imported.
Usage
map_import( import_obj, aspect, import_map, case_sensitive = TRUE, fuzzy = FALSE, ignore = TRUE, id_column = "_*id$", alias_column = "^alias$", resolve_normalization = TRUE, strip_na = FALSE, db_conn = con, log_ns = "db" )
Arguments
import_obj
|
LIST object of values to import |
aspect
|
CHR scalar of the import aspect (e.g. “sample”) to map |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE). |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
LIST of final mapped values
Note
The object used for ‘import_map’ must be of a data.frame object that at minimum includes names columns that includes import_category, import_parameter, alias_lookup, and sql_normalization
mode_checks | R Documentation |
Get list of available functions
Description
Helper function for verify_args()
that returns all the
currently available functions matching a given prefix. This searches the
entire library associated with the current R install.
Usage
mode_checks(prefix = "is", use_deprecated = FALSE)
Arguments
prefix
|
CHR scalar for the function prefix to search (default “is”) |
use_deprecated
|
BOOL scalar indicating whether or not to include functions marked as deprecated (PLACEHOLDER default FALSE) |
Details
Note: argument use_deprecated
is not currently used but serves as a
placeholder for future development to avoid or include deprecated functions
Value
CHR vector of functions matching prefix
Examples
mode_checks()
molecule_picture | R Documentation |
Picture a molecule from structural notation
Description
This is a thin wrapper to rdkit.Chem.MolFromX methods to generate molecular models from common structure notation such as InChI or SMILES. All picture files produced will be in portable network graphics (.png) format.
Usage
caffeine <- "C[n]1cnc2N(C)C(=O)N(C)C(=O)c12" molecule_picture(caffeine, show = TRUE)
Arguments
mol
|
CHR scalar expression of molecular structure |
mol_type
|
CHR scalar indicating the expression type of ‘mol’ (default: “smiles”) |
file_name
|
CHR scalar of an intended file destination (default: NULL will produce a random 10 character file name). Note that any file extensions provided here will be ignored. |
rdkit_name
|
CHR scalar indication the name of the R object bound to RDkit OR the name of the R object directly (i.e. without quotes) |
open_file
|
LGL scalar of whether to open the file after creation (default: FALSE) |
show
|
LGL scalar of whether to return the image itself as an object (default: FALSE) |
Value
None, or displays the resulting picture if ‘show == TRUE’
Note
Supported ‘mol’ expressions include FASTA, HELM, Inchi, Mol2Block, Mol2File, MolBlock, MolFile, PDBBlock, PDBFile, PNGFile, PNGString, RDKitSVG, Sequence, Smarts, Smiles, TPLBlock, and TPLFile
monoisotope.list | R Documentation |
Calculate the monoisotopic mass of a elemental formulas in
Description
Calculate the monoisotopic mass of a elemental formulas in
Usage
monoisotope.list( df, column, exactmasses, remove.elements = c(), adduct = "neutral" )
Arguments
df
|
data.frame with at least one column with elemental formulas |
column
|
integer or CHR scalar indicating the column containing the elemental formulas, if CHR then regex match is used |
exactmasses
|
list of exact masses |
remove.elements
|
elements to remove from the elemental formulas |
adduct
|
character string adduct/charge state to add to the elemental formula, options are ‘neutral’, ‘+H’, ‘-H’, ‘+Na’, ‘+K’, ‘+’, ‘-’, ‘-radical’, ‘+radical’ |
Value
data.frame with column of exact masses appended to it
ms_plot_peak | R Documentation |
Plot a peak from database mass spectral data
Description
Plots the intensity of ion traces over the scan period and annotates them with the mass to charge value. Several flexible plotting aspects are provided as data may become complicated.
Usage
ms_plot_peak( data, peak_type = c("area", "line", "segment"), peak_facet_by = "ms_n", peak_mz_resolution = 0, peak_drop_ratio = 0.01, peak_repel_labels = TRUE, peak_line_color = "black", peak_fill_color = "grey50", peak_fill_alpha = 0.2, peak_text_size = 3, peak_text_offset = 0.02, include_method = TRUE, db_conn = con )
Arguments
data
|
data.frame of spectral data in the form of the ‘ms_data’ table |
peak_type
|
CHR scalar of the plot type to draw, must be one of “line”, “segment”, or “area” (default: “line”) |
peak_facet_by
|
CHR scalar name of a column by which to facet the resulting plot (default: “ms_n”) |
peak_mz_resolution
|
INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int”), ion m/z value (as “base_ion”), and scan time (as “scantime”) - (default: 0 goes to unit resolution) |
peak_drop_ratio
|
NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5) |
peak_repel_labels
|
LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation |
peak_line_color
|
CHR scalar name of the color to use for the “color” aesthetic (only a single color is supported; default: “black”) |
peak_fill_color
|
CHR scalar name of the color to use for the “fill” aesthetic (only a single color is supported; default: “grey70”) |
peak_text_offset
|
NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.02 offsets labels in the positive direction by 2% of the maximum intensity) |
db_conn
|
database connection (default: con) which must be live to pull sample and compound identification information |
Details
The basic default plot will group all mass-to-charge ratio values by unit resolution (increase resolution with ‘peak_mz_resolution’) and plot them as an area trace over the scanning period. Traces are annotated with the grouping value. Values of ‘peak_mz_resolution’ greater than available data (e.g. 10 when data resolution is to the 5th decimal point) will default to maximum resolution.
Traces are filtered out completely if their maximum intensity is below the ratio set by ‘peak_drop_ratio’; only complete traces are filtered out this way, not individual data points within a retained trace. Set this as the fraction of the base peak (the peak of maximum intensity) to use to filter out low-intensity traces. The calculated intensity threshold will be printed to the caption.
Value
ggplot object
Note
Increasing ‘peak_mz_resolution’ will likely result in multiple separate traces.
Implicitly missing values are not interpolated, but lines are drawn through to the next point.
‘peak_type’ can will accept abbreviations of its accepted values (e.g. “l” for “line”)
ms_plot_peak_overview | R Documentation |
Create a patchwork plot of peak spectral properties
Description
Call this function to generate a combined plot from [ms_plot_peak], [ms_plot_spectra], and [ms_plot_spectral_intensity] using the [patchwork] package, which must be installed. All arguments will be passed directly to the underlying functions to provide flexibility in the final display. The default settings match those of the called plotting functions, and the output can be further manipulated with the patchwork package.
Usage
ms_plot_peak_overview( plot_peak_id, peak_type = c("area", "line", "segment"), peak_facet_by = "ms_n", peak_mz_resolution = 0, peak_drop_ratio = 0.01, peak_repel_labels = TRUE, peak_line_color = "black", peak_fill_color = "grey50", peak_fill_alpha = 0.2, peak_text_size = 3, peak_text_offset = 0.02, spectra_mz_resolution = 3, spectra_drop_ratio = 0.01, spectra_repel_labels = TRUE, spectra_repel_line_color = "grey50", spectra_nudge_y_factor = 0.03, spectra_log_y = FALSE, spectra_text_size = 3, spectra_max_overlaps = 50, intensity_plot_resolution = c("spectra", "peak"), intensity_mz_resolution = 3, intensity_drop_ratio = 0, patchwork_design = c(area(1, 4, 7, 7), area(1, 1, 4, 2), area(6, 1, 7, 2)), as_individual_plots = FALSE, include_method = TRUE, db_conn = con, log_ns = "global" )
Arguments
peak_type
|
CHR scalar of the plot type to draw, must be one of “line”, “segment”, or “area” (default: “line”) |
peak_facet_by
|
CHR scalar name of a column by which to facet the resulting plot (default: “ms_n”) |
peak_mz_resolution
|
INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int”), ion m/z value (as “base_ion”), and scan time (as “scantime”) - (default: 0 goes to unit resolution) |
peak_drop_ratio
|
NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5) |
peak_repel_labels
|
LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation |
peak_line_color
|
CHR scalar name of the color to use for the “color” aesthetic (only a single color is supported; default: “black”) |
peak_fill_color
|
CHR scalar name of the color to use for the “fill” aesthetic (only a single color is supported; default: “grey70”) |
peak_text_offset
|
NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.02 offsets labels in the positive direction by 2% of the maximum intensity) |
spectra_mz_resolution
|
INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as base_int), ion m/z value (as base_ion), and scan time (as scantime) - (default: 3) |
spectra_drop_ratio
|
NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5) |
spectra_repel_labels
|
LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation |
spectra_repel_line_color
|
CHR scalar name of the color to use for the “color” aesthetic of the lines connecting repelled labels to their data points; passed to [ggrepel::geom_text_repel] as segment.color (only a single color is supported; default: “grey50”) |
spectra_nudge_y_factor
|
NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.03 offsets labels in the positive direction by 3% of the maximum intensity) |
spectra_log_y
|
LGL scalar of whether or not to apply a log10 scaling factor to the y-axis (default: FALSE) |
spectra_text_size
|
NUM scalar of the text size to use for annotation labels (default: 3) |
spectra_max_overlaps
|
INT scalar of the maximum number of text overlaps to allow (default: 50) |
intensity_mz_resolution
|
INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int” or “intensity”), ion m/z value (as “base_ion” or “mz”), and scan time (as “scantime”) - (default: 5) |
intensity_drop_ratio
|
NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 0 returns all); if > 1 the inversion will be used (1e5 -> 1e-5) |
patchwork_design
|
the layout of the final plot see [patchwork::design] |
as_individual_plots
|
LGL scalar of whether to return the plots individually in a list (set TRUE) or as a patchwork plot (default: FALSE) |
db_conn
|
database connection (default: con) which must be live to pull sample and compound identification information |
Value
object of classes ‘gg’ and ‘ggplot’, as a patchwork unless ‘as_individual_plots’ is TRUE
Note
Requires a live connection to the database to pull all plots for a given peak_id.
Defaults are as for called functions
ms_plot_spectra | R Documentation |
Plot a fragment map from database mass spectral data
Description
Especially for non-targeted analysis workflows, it is often necessary to examine annotated fragment data for spectra across a given peak of interest. Annotated fragments lend increasing confidence in the identification of the compound giving rise to a mass spectral peak. If a fragment has been annotated, that identification is displayed along with the mass to charge value in blue. Annotations of the mass to charge ratio for unannotated fragments are displayed in red.
Usage
ms_plot_spectra( data, spectra_type = c("separated", "zipped"), spectra_mz_resolution = 3, spectra_drop_ratio = 0.01, spectra_repel_labels = TRUE, spectra_repel_line_color = "grey50", spectra_nudge_y_factor = 0.03, spectra_log_y = FALSE, spectra_is_file = FALSE, spectra_from_JSON = FALSE, spectra_animate = FALSE, spectra_text_size = 3, spectra_max_overlaps = 50, include_method = TRUE, db_conn = con )
Arguments
data
|
data.frame of spectral data in the form of the ‘ms_data’ table |
spectra_mz_resolution
|
INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as base_int), ion m/z value (as base_ion), and scan time (as scantime) - (default: 3) |
spectra_drop_ratio
|
NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 1e-2 means any trace with a maximum intensity less than 1% of the maximum intensity in the plot will be dropped); if > 1 the inversion will be used (1e5 -> 1e-5) |
spectra_repel_labels
|
LGL scalar on whether to use the [ggrepel] package to space out m/z labels in the plot (default: TRUE). If [ggrepel] is not installed, it will default to FALSE rather than requiring an installation |
spectra_repel_line_color
|
CHR scalar name of the color to use for the “color” aesthetic of the lines connecting repelled labels to their data points; passed to [ggrepel::geom_text_repel] as segment.color (only a single color is supported; default: “grey50”) |
spectra_nudge_y_factor
|
NUM scalar y-axis offset as a fraction of the maximum intensity for trace annotation (default: 0.03 offsets labels in the positive direction by 3% of the maximum intensity) |
spectra_log_y
|
LGL scalar of whether or not to apply a log10 scaling factor to the y-axis (default: FALSE) |
spectra_is_file
|
LGL scalar of whether data are coming from a file (default: FALSE) |
spectra_from_JSON
|
LGL scalar of whether data are in JSON format; other formats are not supported when ‘spectra_is_file = TRUE’ (default: FALSE) |
spectra_animate
|
LGL scalar of whether to produce an animation across the scantime for these data (default: FALSE) |
spectra_text_size
|
NUM scalar of the text size to use for annotation labels (default: 3) |
spectra_max_overlaps
|
INT scalar of the maximum number of text overlaps to allow (default: 50) |
db_conn
|
database connection (default: con) which must be live to pull sample and compound identification information |
Value
ggplot object
Note
If ‘spectra_animate’ is set to true, it requires the [gganimate] package to be installed (and may also require the [gifski] package) and WILL take a large amount of time to complete, but results in an animation that will iterate through the scan period and display mass spectral data as they appear across the peak. Your mileage likely will vary.
ms_plot_spectral_intensity | R Documentation |
Create a spectral intensity plot
Description
Often it is useful to get an overview of mass-to-charge intensity across the scanning time of a peak. Typically this is done with individual traces in the peak fashion, but large peaks can often mask smaller ones, or wash out lower intensity signals. Use this to plot m/z as dependent upon scan time with intensity shown by color and size. It is intended as a complement to [ms_plot_peak] and may be called at the same levels of granularity, generally greater so than [ms_plot_peak] which is more of an overview.
Usage
ms_plot_spectral_intensity( data, intensity_mz_resolution = 5, intensity_drop_ratio = 0, intensity_facet_by = NULL, intensity_plot_resolution = c("spectra", "peak"), include_method = TRUE, db_conn = con )
Arguments
data
|
tibble or pointer with data to plot, either at the peak level, in which case “base_ion” must be present, or at the spectral level, in which case “intensity” must be present |
intensity_mz_resolution
|
INT scalar mass to charge ratio tolerance to group peaks, with at minimum columns for intensity (as “base_int” or “intensity”), ion m/z value (as “base_ion” or “mz”), and scan time (as “scantime”) - (default: 5) |
intensity_drop_ratio
|
NUM scalar threshold of the maximum intensity below which traces will be dropped (default: 0 returns all); if > 1 the inversion will be used (1e5 -> 1e-5) |
intensity_facet_by
|
CHR scalar of a column name in ‘data’ by which to facet the resulting plot (default: NULL) |
db_conn
|
database connection (default: con) which must be live to pull sample and compound identification information |
Value
object of classes ‘gg’ and ‘ggplot’
ms_plot_titles | R Documentation |
Consistent title for ms_plot_x functions
Description
This helper function creates consistently formatted plot label elements in an opinionated manner. This is unlikely to be useful outside the direct context of [ms_plot_peak], [ms_plot_spectra], and [ms_plot_spectral_intensity].
Usage
ms_plot_titles( plot_data, mz_resolution, drop_ratio, include_method, db_conn = con )
Arguments
plot_data
|
data.frame object passed from the plotting function |
mz_resolution
|
NUM scalar passed from the plotting function |
drop_ratio
|
NUM scalar passed from the plotting function |
include_method
|
LGL scalar indicating whether or not to get the method narrative from the database |
db_conn
|
database connection (default: con) which must be live to pull sample and compound identification information |
Value
LIST of strings named for ggplot title elements “title”, “subtitle”, and “caption”
ms_spectra_separated | R Documentation |
Parse “Separated” MS Data
Description
The “separated” format includes spectra packed into two separate columns, one for mass and another for intensity. All values for a given scan time are packed into these columns, separated by space, with an unlimited number of discrete values, and must be a 1:1 ratio of values between the two columns.
Usage
ms_spectra_separated(df, ms_cols = c("mz", "intensity"))
Arguments
df
|
data.frame or json object containing spectra compressed in the “separated” format |
ms_cols
|
CHR vector of length 2 identifying the column names to use for mass and intensity in the source data; must be of length 2, with the first value identifying the mass-to-charge ratio column and the second identifying the intensity column |
Value
data.frame object of the unpacked spectra as a list column
Note
ms_cols is treated as regex expressions, but it is safest to provide matching column names
Examples
### JSON Example tmp <- jsonify::as.json('{ "measured_mz": "712.9501 713.1851", "measured_intensity": "15094.41015625 34809.9765625" }') ms_spectra_separated(tmp) ### Example data.frame tmp <- data.frame( measured_mz = "712.9501 713.1851", measured_intensity = "15094.41015625 34809.9765625" ) ms_spectra_separated(tmp)
ms_spectra_zipped | R Documentation |
Parse “Zipped” MS Data
Description
The “zipped” format includes spectra packed into one column containing alternating mass and intensity values for all observations. All values are packed into these columns for a given scan time, separated by spaces, with an unlimited number of discrete values, and must be in an alternating 1:1 pattern of values of the form “mass intensity mass intensity”.
Usage
ms_spectra_zipped(df, spectra_col = "data")
Arguments
df
|
data.frame object containing spectra compressed in the “zipped” format |
spectra_col
|
CHR vector of length 2 identifying the column names to use for mass and intensity in the source data; must be of length 2, with the first value identifying the mass column and the second identifying the intensity column |
Value
data.frame object containing unpacked spectra as a list column
Note
spectra-col is treated as a regex expression, but it is safest to provide a matching column name
Examples
### JSON Example tmp <- jsonlite::as.json('{"msdata": "712.9501 15094.41015625 713.1851 34809.9765625"}') ms_spectra_separated(tmp) ### Example data.frame tmp <- data.frame( msdata = "712.9501 15094.41015625 713.1851 34809.9765625" ) ms_spectra_zipped(tmp)
mzMLconvert | R Documentation |
Converts a raw file into an mzML
Description
Converts a raw file into an mzML
Usage
mzMLconvert(rawfile, msconvert = NULL, config = NULL, outdir = getwd())
Arguments
rawfile
|
file path of the MS raw file to be converted |
msconvert
|
file path of the msconvert.exe file, if NULL retrieves information from config directory |
config
|
configuration settings file for msconvert conversion to mzML, if NULL retrives information from config directory |
outdir
|
directory path for the converted mzML file. |
Value
CHR scalar path to the created file
mzMLtoR | R Documentation |
Opens file of type mzML into R environment
Description
Opens file of type mzML into R environment
Usage
mzMLtoR( mzmlfile = file.choose(), lockmass = NULL, lockmasswidth = NULL, correct = FALSE, approach = "hybrid" )
Arguments
mzmlfile
|
the file path of the mzML file which the data are to be read from. |
lockmass
|
NUM scalar m/z value of the lockmass to remove (Waters instruments only) (default: NULL) |
lockmasswidth
|
NUM scalar instrumental uncertainty associated with ‘lockmass’ (default: NULL) |
correct
|
logical if the subsequent spectra should be corrected for the lockmass (Waters instruments only) |
approach
|
character string defining the type of lockmass removal filter to use, default is ‘hybrid’ |
Value
list containing mzML data with unzipped masses and intensity information
nist_shinyalert | R Documentation |
Call [shinyalert::shinyalert] with specific styling
Description
This pass through function serves only to call [shinyalert::shinyalert] with parameters defined by this function, and can be used for additional styling that may be necessary. It is used solely for consistency sake.
Usage
nist_shinyalert("test", "info", shiny::h3("test"))
Arguments
title
|
The title of the modal. |
type
|
The type of the modal. There are 4 built-in types which will show
a corresponding icon: |
text
|
The modal’s text. Can either be simple text, or Shiny tags (including
Shiny inputs and outputs). If using Shiny tags, then you must also set |
className
|
A custom CSS class name for the modal’s container. |
html
|
If |
closeOnClickOutside
|
If |
immediate
|
If |
…
|
Additional named parameters to be passed to shinyalert. Unrecognized ones will be ignored. |
Value
None, shows a shinyalert modal
See Also
shinyalert::shinyalert
obj_name_check | R Documentation |
Sanity check for environment object names
Description
Provides a sanity check on whether or not a name reference exists and return
its name if so. If not, return the default name defined from default_name
.
This largely is used to prevent naming conflicts as part of managing the
plumber service but can be used for any item in the current namespace.
Usage
if (exists("log_it")) { obj_name_check("test", "test") test <- letters obj_name_check(test) }
Arguments
obj
|
R object or CHR scalar in question to be resolved in the namespace |
default_name
|
CHR scalar name to use for |
Value
CHR scalar of the resolved object name
open_env | R Documentation |
Convenience shortcut to open and edit session environment variables
Description
Calls [open_proj_file] for either the R, global, or logging environment settings containing the most common settings dictating project behavior.
Usage
open_env(name = c("R", "global", "logging", "rdkit", "shiny", "plumber"))
Arguments
name
|
CHR scalar, one of “R”, “global”, or “logging”. |
Value
None, opens a file for editing
open_proj_file | R Documentation |
Open and edit project files
Description
Project files are organized in several topical directories depending on their purpose as part of the package. For example, several project control variables are set to establish the session global environment in the “config” directory rather than the “R” directory.
Usage
open_proj_file(name, dir = NULL, create_new = FALSE)
Arguments
name
|
CHR scalar of the file name to open, accepts regex |
dir
|
CHR scalar of a directory name to search within |
create_new
|
LGL scalar of whether to create the file (similar functionality to [usethis]; default FALSE) |
Details
If a direct file match to name is not found, it will be searched for using a recursive [list.files] allowing for regex matches (e.g. “.R$”). Directories are similarly sought out within the project. Reasonable feedback is provided.
This convenience function uses [usethis::edit_file] to open (or create if ‘create_new’ is TRUE) any given file in the project.
Value
None, opens a file for editing
Note
If the directory and file cannot be found, and ‘create_new’ is true, the directory will be placed within the project directory.
optimal_ums | R Documentation |
Get the optimal uncertainty mass spectrum parameters for data
Description
Get the optimal uncertainty mass spectrum parameters for data
Usage
optimal_ums( peaktable, max_correl = 0.75, correl_bin = 0.05, max_ph = 10, ph_bin = 1, max_freq = 10, freq_bin = 1, min_n_peaks = 3, cormethod = "pearson" )
Arguments
peaktable
|
list generated from ‘create_peak_table_ms1’ or ‘create_peak_table_ms2’ |
max_correl
|
numeric maximum acceptable correlation |
correl_bin
|
numeric sequence bin width from max_correl..0 |
max_ph
|
numeric maximum acceptable peak height (%) |
ph_bin
|
numeric sequence bin width from max_ph..0 |
max_freq
|
numeric maximum acceptable observational frequency (%) |
freq_bin
|
numeric sequence bin width from max_freq..0 |
min_n_peaks
|
integer ideal minimum number of scans for mass spectrum |
cormethod
|
string indicating correlation function to use (see [cor()] for description) |
Value
data.frame object containing optimized search parameters
overlap | R Documentation |
Calculate overlap ranges
Description
Internal function: determines if two ranges (x1-e1 to x1+e1) and (x2-e2 to x2+e2) overlap (nonstatistical evaluation)
Usage
overlap(x1, e1, x2, e2)
Arguments
x1, x2
|
values containing mean values |
e1, e2
|
values containing respective error values |
pair_ums | R Documentation |
Pairwise data.frame of two uncertainty mass spectra
Description
The function stacks two uncertainty mass spectra together based on binned m/z values
Usage
pair_ums(ums1, ums2, error = 5, minerror = 0.002)
Arguments
ums1
|
uncertainty mass spectrum from ‘get_ums’ function |
ums2
|
uncertainty mass spectrum from ‘get_ums’ function |
minerror
|
the minimum mass error (in Da) of the instrument data |
masserror
|
the mass accuracy (in ppm) of the instrument data |
peak_gather_json | R Documentation |
Extract peak data and metadata
Description
gathers metadata from methodjson and extracts the MS1 and MS2 data from the mzml
Usage
peak_gather_json( methodjson, mzml, compoundtable, zoom = c(1, 5), minerror = 0.002 )
Arguments
methodjson
|
list of JSON generated from ‘parse_method_json’ function |
mzml
|
list of msdata from ‘mzMLtoR’ function |
compoundtable
|
data.frame containing compound identities [should be extractable from SQL later] |
zoom
|
numeric vector specifying the range around the precursor ion to include, from m/z - zoom[1] to m/z + zoom[2] |
minerror
|
numeric the minimum error (in Da) of the instrument |
Value
list of peak objects
plot_compare_ms | R Documentation |
Plot MS Comparison
Description
Plots a butterfly plot for the comparison of two uncertainty mass spectra
Usage
plot_compare_ms( ums1, ums2, main = "Comparison Mass Spectrum", size = 1, c1 = "black", c2 = "red", ylim.exp = 1 )
Arguments
ums1, ums2
|
uncertainty mass spectrum from ‘get_ums’ function |
main
|
Main Title of the Plot |
size
|
line width of the mass spectra lines |
c1
|
Color of the top (ums1) mass spectral lines |
c2
|
Color of the bottom (ums2) mass spectral lines |
ylim.exp
|
Expansion unit for the y-axis |
plot_ms | R Documentation |
Generate consensus mass spectrum
Description
Extract relevant information from a mass spectrum and plot it as an uncertainty mass spectrum.
Usage
plot_ms( ms, xlim = NULL, ylim = NULL, main = "Mass Spectrum", color = "black", size = 1, removal = 0 )
Arguments
peaklist
|
result of the ‘create_peak_list’ function |
Details
Extract relevant information from a mass spectrum and plot it as an uncertainty mass spectrum.
Value
ggplot object
pool.sd | R Documentation |
Pool standard deviations
Description
Internal function: calculates a pooled standard deviation
Usage
pool.sd(sd, n)
Arguments
sd
|
A vector containing numeric values of standard deviations |
n
|
A vector containing integers for the number of observations respective to the sd values |
pool.ums | R Documentation |
Pool uncertainty mass spectra
Description
Calculates a pooled uncertainty mass spectrum that is a result of data from multiple uncertainty mass spectra.
Usage
pool.ums(umslist, error = 5, minerror = 0.002)
Arguments
umslist
|
A list where each item is a uncertainty mass spectrum from function ‘get_ums’ |
minerror
|
the minimum mass error (in Da) of the instrument data |
masserror
|
the mass accuracy (in ppm) of the instrument data |
pragma_table_def | R Documentation |
Get table definition from SQLite
Description
Given a database connection (‘con’). Get more information about the properties of (a) database table(s) directly from ‘PRAGMA table_info()’ rather than e.g. [DBI::dbListFields()]. Set ‘get_sql’ to ‘TRUE’ to include the direct schema using sqlite_master; depending on formatting this may or may not be directly usable though some effort has been made to remove formatting characters (e.g. line feeds, tabs, etc) if stringr is available.
Usage
pragma_table_def(db_table, db_conn = con, get_sql = FALSE, pretty = TRUE)
Arguments
db_table
|
CHR vector name of the table(s) to inspect |
db_conn
|
connection object (default: con) |
get_sql
|
BOOL scalar of whether or not to return the schema sql (default FALSE) |
pretty
|
BOOL scalar for whether to return “pretty” SQL that includes human readability enhancements; if this is set to TRUE (the default), it is recommended that the output is fed through ‘cat’ and, in the case of multiple tables |
Details
Note that the package ‘stringr’ is required for formatting returns that include either ‘get_sql’ or ‘pretty’ as TRUE.
Value
data.frame object representing the SQL PRAGMA expression
pragma_table_info | R Documentation |
Explore properties of an SQLite table
Description
Add functionality to ‘pragma_table_def’ by filtering on column properties such as required and primary key fields. This provides some flexibility to searching table properties without sacrificing the full details of table schema. Parameter ‘get_sql’ is forced to FALSE; only information available via PRAGMA is searched by this function.
Usage
pragma_table_info("compounds")
Arguments
db_table
|
CHR vector name of the table(s) to inspect |
db_conn
|
connection object (default: con) |
condition
|
CHR vector matching specific checks, must be one of c(“required”, “has_default”, “is_PK”) for constraints where a field must not be null, has a default value defined, and is a primary key field, respectively. (default: NULL) |
name_like
|
CHR vector of character patterns to match against column names via grep. If length > 1, will be collapsed to a basic OR regex (e.g. c(“a”, “b”) becomes “a|b”). As regex, abbreviations and wildcards will typically work, but care should be used in that case. (default: NULL) |
data_type
|
CHR vector of character patterns to match against column data types via grep. If length > 1 will be collapsed to a basic “OR” regex (e.g. c(“int”, “real”) becomes “int|real”). As regex, abbreviations and wildcards will typically work, but care should be used in that case. (default: NULL) |
include_comments
|
LGL scalar of whether to include comments in the return data frame (default: FALSE) |
names_only
|
LGL scalar of whether to include names meeting defined criteria as a vector return value (default: FALSE) |
Details
This is intended to support validation during database communications with an SQLite connection, especially for application (e.g. ‘shiny’ development) by allowing for programmatic inspection of datbase columns by name and property.
Value
data.frame object describing the database entity
py_modules_available | R Documentation |
Are all conda modules available in the active environment
Description
Checks that all defined modules are available in the currently active python binding. Supports error logging
Usage
py_modules_available("rdkit")
Arguments
required_modules
|
CHR vector of required modules |
Value
LGL scalar of whether or not all modules are available. Check console for further details.
rdkit_active | R Documentation |
Sanity check on RDKit binding
Description
Given a name of an R object, performs a simple check on RDKit availability on that object, creating it if it does not exist. A basic structure conversion check is tried and a TRUE/FALSE result returned. Leave all arguments as their defaults of NULL to ensure they will honor the settings in ‘rdkit/env_py.R’.
Usage
rdkit_active( rdkit_ref = NULL, rdkit_name = NULL, log_ns = NULL, make_if_not = FALSE )
Arguments
rdkit_ref
|
CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project) |
rdkit_name
|
CHR scalar the name of a python environment able to run rdkit (default NULL goes to “rdkit” for convenience with other pipelines in this project) |
log_ns
|
|
make_if_not
|
LGL scalar of whether or not to create a new python environment using [activate_py_env] if the binding is not active |
Value
LGL scalar of whether or not the test of RDKit was successful
rdkit_mol_aliases | R Documentation |
Create aliases for a molecule from RDKit
Description
Call this function to generate any number of machine-readable aliases from an identifier set. Given the ‘identifiers’ and their ‘type’, RDKit will be polled for conversion functions to create a mol object. That mol object is then used to create machine-readable aliases in any number of supported formats. See the RDKit Documentation for options. The ‘type’ argument is used to match against a “MolFromX” funtion, while the ‘aliases’ argument is used to match against a “MolToX” function.
Usage
rdkit_mol_aliases( identifiers, type = "smiles", mol_from_prefix = "MolFrom", get_aliases = c("inchi", "inchikey"), mol_to_prefix = "MolTo", rdkit_ref = "rdk", log_ns = "rdk", make_if_not = TRUE )
Arguments
identifiers
|
CHR vector of machine-readable molecule identifiers in a format matching ‘type’ |
type
|
CHR scalar of the type of encoding to use for ‘identifiers’ (default: smiles) |
mol_from_prefix
|
CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”) |
get_aliases
|
CHR vector of aliases to produce (default: c(“inchi”, “inchikey”)) |
mol_to_prefix
|
CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”) |
rdkit_ref
|
CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project) |
log_ns
|
|
make_if_not
|
LGL scalar of whether or not to create a new python environment using [activate_py_env] if the binding is not active |
Details
At the time of authorship, RDK v2021.09.4 was in use, which contained the following options findable by this function: CMLBlock, CXSmarts, CXSmiles, FASTA, HELM, Inchi, InchiAndAuxInfo, InchiKey, JSON, MolBlock, PDBBlock, RandomSmilesVect, Sequence, Smarts, Smiles, TPLBlock, V3KMolBlock, XYZBlock.
Value
data.frame object containing the aliases and the original identifiers
Note
Both ‘type’ and ‘aliases’ are case insensitive.
If ‘aliases’ is set to NULL, all possible expressions (excluding those with “File” in the name) are returned from RDKit, which will likely produce NULL values and module ArgumentErrors.
read_log | R Documentation |
Read a log from a log file
Description
By default if ‘file’ does not exist (i.e. ‘file’ is not a fully defined path) this looks for log text files in the directory defined by ‘LOG_DIRECTORY’ in the session.
Usage
read_log("log.txt")
Arguments
file
|
CHR scalar file path to a log file (default NULL is translated to “log.txt”) |
last_n
|
INT scalar of the last ‘n’ log entries to read. |
as_object
|
LGL scalar of whether to return the log as an R object or just to print the log to the console. |
Value
CHR vector of the requested log file entries if ‘as_object’ is TRUE, or none with a console print if ‘as_object’ is FALSE
rebuild_help_htmls | R Documentation |
Rebuild the help files as HTML with an index
Description
Rebuild the help files as HTML with an index
Usage
rebuild_help_htmls(rebuild_book = TRUE, book = "dimspec_user_guide")
Arguments
rebuild_book
|
LGL scalar of whether or not to rebuild an associated bookdown document |
book
|
Path to folder containing the bookdown document to rebuild |
Value
URL to the requested book
rectify_null_from_env | R Documentation |
Rectify NULL values provided to functions
Description
To support redirection of sensible parameter reads from an environment,
either Global or System, functions in this package may include NULL as their
default value. This returns values in precedence of parameter
,
env_parameter
and default
.
Usage
rectify_null_from_env(test, test, "test")
Arguments
parameter
|
the object being evaluated |
env_parameter
|
the name or object of a value to use from the
environment if |
default
|
the fallback value to use if |
log_ns
|
the namespace to use with [log_it] if available |
Value
The requested value, either as-is, rectified from the environment, or the default
Note
log_ns
is only applicable if logging is set up in this project (see
project settings in env_glob.txt, env_R.R, and env_logger.R for details).
Both [base::.GlobalEnv] and [base::Sys.getenv] are checked, and can be provided as a character scalar or as an object reference
ref_table_from_map | R Documentation |
Get the name of a linked normalization table
Description
Extract the name of a normalization table from the database given a table and column reference.
Usage
ref_table_from_map("table1", "fk_column1", er_map(con), "references")
Arguments
table_name
|
CHR scalar name of the database table |
table_column
|
CHR scalar name of the foreign key table column |
this_map
|
LIST object containing the schema representation from ‘er_map’ (default: an object named “db_map” created as part of the package spin up) |
fk_refs_in
|
CHR scalar name of the item in ‘this_map’ containing the SQL “REFERENCES” statements extracted from the schema |
Value
CHR scalar name of the table to which a FK column is linked or an empty character string if no match is located (i.e. ‘table_column’ is not a defined foreign key).
Note
This requires an object of the same shape and properties as those resulting from [er_map] as ‘this_map’.
remove_db | R Documentation |
Remove an existing database
Description
This is limited to only the current working directory and includes its subdirectories. If you wish to retain a copy of the prior database, ensure argument ‘archive = TRUE’ (note the default is FALSE) to create a copy of the requested database prior to rebuild; this is created in the same directory as the found database and appends
Usage
remove_db("test.sqlite", archive = TRUE)
Arguments
db
|
CHR scalar name of the database to build (default: session value DB_NAME) |
archive
|
LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE) |
Value
None, check console for details
remove_icon_from | R Documentation |
Remove the last icon attached to an HTML element
Description
Remove the last icon attached to an HTML element
Usage
remove_icon_from(id)
Arguments
id
|
CHR scalar of the HTML ID from which to remove the last icon |
Value
CHR scalar suitable to execute with ‘shinyjs::runJS’
Examples
append_icon_to("example", "r-project", "fa-3x") remove_icon_from("example")
remove_sample | R Documentation |
Delete a sample
Description
Removes a sample from the database and associated records in ms_methods, conversion_software_settings, and conversion_software_linkage. Associated peak and mass spectrometric signals will also be removed.
Usage
remove_sample(sample_ids, db_conn = con, log_ns = "db")
Arguments
sample_ids
|
INT vector of IDs to remove from the samples table. |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
None, executes actions on the database
repair_xl_casrn_forced_to_date | R Documentation |
Repair CAS RNs forced to a date numeric by MSXL
Description
If a file is opened in Microsoft Excel(R), Chemical Abstract Service (CAS) Registry Numbers (RNs) can occasionally be read as a pseudodate (e.g. “1903-02-8”). Without tight controls over column formatting, this can result in CAS RNs that are not real entering a processing pipeline. This convenience function attempts to undo that automatic formatting by forcing vector members whose values when coerced to numeric are equal to those provided to a properly formatted date with an origin depending on operating system platform (as read by ‘.Platform$OS.type’); Windows operating systems use the Windows MSXL origin date of “1899-12-30” while others use “1904-01-01”. Text entries of “NA” are coerced to NA.
Usage
repair_xl_casrn_forced_to_date(casrn_vec, output_format = "%Y-%m-%d")
Arguments
casrn_vec
|
CHR or NUM vector of what should be valid CAS RNs |
output_format
|
CHR scalar of the output format, which |
Value
CHR vector of length equal to that of ‘casrn_vec’ where numeric entries have been coerced to the assumed date
Examples
repair_xl_casrn_forced_to_date(c("64324-08-3", "12332"))
repl_nan | R Documentation |
Replace NaN
Description
Replace all NaN values with a specified value
Usage
repl_nan(x, repl = NULL)
Arguments
x
|
vector of values |
repl
|
value to replace NaN contained in ‘x’ |
Value
vector with all NaN replaced with ‘repl’
report_qc | R Documentation |
Export QC result JSONfile into PDF
Description
Export QC result JSONfile into PDF
Usage
report_qc( jsonfile = file.choose(), outputfile = gsub(".json", ".pdf", jsonfile, ignore.case = TRUE) )
Arguments
jsonfile
|
jsonfile file path |
outputfile
|
output pdf file path |
Value
generates reporting PDF
reset_logger_settings | R Documentation |
Update logger settings
Description
This is a simple action wrapper to update any settings that may have been
changed with regard to logger. If, for instance, something is not logging the
way you expect it to, change the relevant setting and then run
update_logger_settings()
to reflect the current environment.
Usage
reset_logger_settings()
Arguments
reload
|
LGL scalar indicating (if TRUE) whether or not to refresh from
|
Value
None
resolve_compound_aliases | R Documentation |
Resolve compound aliases provided as part of the import routine
Description
Call this to add any aliases for a given ‘compound_id’ that may not be present in the database. Only those identifiable as part of the accepted types defined in ‘norm_alias_table’ will be mapped. If multiple items are provided in the import NAME, ADDITIONAL, or other items matching names in ‘norm_alias_table’.name column, indicate the split character in ‘split_multiples_by’ and any separator between names and values (e.g. CLASS:example) in ‘identify_property_by’.
Usage
resolve_compound_aliases( obj, compound_id, compounds_in = "compounddata", compound_alias_table = "compound_aliases", norm_alias_table = "norm_analyte_alias_references", norm_alias_name_column = "name", headers_to_examine = c("ADDITIONAL", "NAME"), split_multiples_by = ";", identify_property_by = ":", out_file = "unknown_compound_aliases.csv", db_conn = con, log_ns = "db", ... )
Arguments
obj
|
LIST object containing data formatted from the import generator |
compound_id
|
INT scalar of the compound_id to use for these aliases |
compounds_in
|
CHR scalar name in ‘obj’ holding compound data (default: “compounddata”) |
norm_alias_table
|
CHR scalar name of the table normalizing analyte alias references (default: “norm_analyte_alias_references”) |
norm_alias_name_column
|
CHR scalar name of the column in ‘norm_alias_table’ containing the human-readable expression of alias type classes (default: “name”) |
…
|
Named list of any additional aliases to tack on that are not found in the import object, with names matching those found in ‘norm_alias_table’.’norm_alias_name_column’ |
Value
None, though if unclassifiable aliases (those with alias types not present in the normalization table) are found, they will be written to a file (‘out_file’) in the project directory
Note
Existing aliases, and aliases for which there is no ‘compound_id’ will be ignored and not imported.
Compound IDs provided in ‘compound_id’ must be present in the compounds table and must be provided explicitly on a 1:1 basis for each element extracted from ‘obj’. If you provide an import object with 10 components for compound data, you must provide tying ‘compound_id’ identifiers for each. If all extracted components represent aliases for the same ‘compound_id’ then one may be provided.
Alias types (e.g. “InChI” are case insensitive)
resolve_compound_fragments | R Documentation |
Link together peaks, fragments, and compounds
Description
This function links together the peaks, annotated_fragments, and compounds table. This serves as the main connection table conceptually tying together peaks, the fragments annotated within those peaks, and the compound identification associated with the peaks. The database supports flexible assignment wherein compounds may be related to either peaks or annotated fragments, or both, and vice versa. At least two IDs are required for linkage; i.e. compounds may not have an acciated peak in the database, but are known to produce fragments at a particular m/z value. Ideally, all three are provided to provide traceback from compounds, a complete list of their annotated fragments, and association with a peak object with data containing unannotated fragments, which can be traced back to the sample from which it was drawn and the associated metrological method information.
Usage
resolve_compound_fragments( values = NULL, peak_id = NA, annotated_fragment_id = NA, compound_id = NA, linkage_table = "compound_fragments", peaks_table = "peaks", annotated_fragments_table = "annotated_fragments", compounds_table = "compounds", db_conn = con, log_ns = "db" )
Arguments
values
|
LIST item containing items for ‘peak_id’, ‘annotated_fragment_id’, and ‘compound_id’ (default: NULL); used preferentially if provided |
peak_id
|
INT vector (ideally of length 1) of the peak ID(s) to link; ignored if ‘values’ is provided (default: NA) |
annotated_fragment_id
|
INT vector of fragment ID(s) to link; ignored if ‘values’ is provided (default: NA) |
compound_id
|
INT vector of compound ID(s) to link; ignored if ‘values’ is provided (default: NA) |
linkage_table
|
CHR scalar name of the database table containing linkages between peaks, fragments, and compounds (default: “compound_fragments”) |
peaks_table
|
CHR scalar name of the database table containing peaks for look up (default: “peaks”) |
compounds_table
|
CHR scalar name of the table holding compound information |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
fragments_table
|
CHR scalar name of the table holding annotated fragment information |
Value
None, value checks entries and executes database actions
resolve_compounds | R Documentation |
Resolve the compounds node during bulk import
Description
Call this function as part of an import routine to resolve the compounds node.
Usage
resolve_compounds( obj, compounds_in = "compounddata", compounds_table = "compounds", compound_category = NULL, compound_category_table = "compound_categories", compound_alias_table = "compound_aliases", norm_alias_table = "norm_analyte_alias_references", norm_alias_name_column = "name", NIST_id_in = "id", require_all = FALSE, import_map = IMPORT_MAP, ensure_unique = TRUE, db_conn = con, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
compounds_in
|
CHR scalar name in ‘obj’ holding compound data (default: “compounddata”) |
compounds_table
|
CHR scalar name the database table holding compound data (default: “compounds”) |
compound_category
|
CHR or INT scalar of the compound category (either a direct ID or a matching category label in ‘compound_category_table’) (default: NULL) |
compound_category_table
|
CHR scalar name the database table holding normalized compound categories (default: “compound_categories”) |
norm_alias_table
|
CHR scalar name of the table normalizing analyte alias references (default: “norm_analyte_alias_references”) |
norm_alias_name_column
|
CHR scalar name of the column in ‘norm_alias_table’ containing the human-readable expression of alias type classes (default: “name”) |
require_all
|
LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
ensure_unique
|
LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE) |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
INT scalar if successful, result of the call to [add_or_get_id] otherwise
Note
This function is called as part of [full_import()]
resolve_description_NTAMRT | R Documentation |
Resolve the method description tables during import
Description
Two tables (and their associated normalization tables) exist in the database to store additional information about mass spectrometric and chromatographic methods. These tables are “ms_descriptions” and “chromatography_descriptions” and cannot be easily mapped directly. This function serves to coerce values supplied during import into that required by the database. Primarily, the issue rests in the need to support multiple descriptions of analytical instrumentation (e.g. multiple mass analyzer types, multiple vendors, multiple separation columns, etc.). Tables targeted by this function are “long” tables that may well have ‘n’ records for each mass spectrometric method.
Usage
resolve_description_NTAMRT( obj, method_id, type = c("massspec", "chromatography"), mass_spec_in = "massspectrometry", chrom_spec_in = "chromatography", db_conn = con, fuzzy = TRUE, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
method_id
|
INT scalar of the ms_method.id record to associate |
type
|
CHR scalar, one of “massspec” or “chromatography” depending on the type of description to add; much of the logic is shared, only details differ |
mass_spec_in
|
CHR scalar name of the element in ‘obj’ holding mass spectrometry information (default: “massspectrometry”) |
chrom_spec_in
|
CHR scalar name of the element in ‘obj’ holding chromatographic information (default: “chromatography”) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Value
None, executes actions on the database
Note
This function is called as part of [full_import()]
This function is brittle; built specifically for the NIST NTA MRT import format. If using a different import format, customize to your needs using this function as a guide.
resolve_fragments_NTAMRT | R Documentation |
Resolve the fragments node during database import
Description
Call this function as part of an import routine to resolve the fragments node including fragment inspections and aliases. If the python connection to RDKit is available and no aliases are provided, aliases as defined in ‘rdkit_aliases’ will be generated and stored if ‘generate_missing_aliases’ is set to TRUE. Components of the import file will be collated, have their values normalized, and any new fragment identifiers will be added to the database.
Usage
resolve_fragments_NTAMRT( obj, sample_id = NULL, generation_type = NULL, fragments_in = "annotation", fragments_table = "annotated_fragments", fragments_norm_table = ref_table_from_map(fragments_table, "fragment_id"), fragments_sources_table = "fragment_sources", citation_info_in = "fragment_citation", inspection_info_in = "fragment_inspections", inspection_table = "fragment_inspections", generate_missing_aliases = FALSE, fragment_aliases_in = "fragment_aliases", fragment_aliases_table = "fragment_aliases", alias_type_norm_table = ref_table_from_map(fragment_aliases_table, "alias_type"), inchi_prefix = "InChI=1S/", rdkit_name = ifelse(exists("PYENV_NAME"), PYENV_NAME, "rdkit"), rdkit_ref = ifelse(exists("PYENV_REF"), PYENV_REF, "rdk"), rdkit_ns = "rdk", rdkit_make_if_not = TRUE, rdkit_aliases = c("Inchi", "InchiKey"), mol_to_prefix = "MolTo", mol_from_prefix = "MolFrom", type = "smiles", import_map = IMPORT_MAP, case_sensitive = FALSE, fuzzy = FALSE, strip_na = TRUE, db_conn = con, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
sample_id
|
INT scalar matching a sample ID to which to tie these fragments (optional, default: NULL) |
generation_type
|
CHR scalar containing the generation type as defined in the “norm_generation_type” table (default: NULL will obtain the generation type attached to the ‘sample_id’ by database lookup) |
fragments_in
|
CHR scalar name of the ‘obj’ component holding annotated fragment information (default: “annotation”) |
fragments_table
|
CHR scalar name of the database table holding annotated fragment information (default: “annotated_fragments”) |
fragments_norm_table
|
CHR scalar name of the database table holding normalized fragment identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragments_table’) |
fragments_sources_table
|
CHR scalar name of the database table holding fragment source (e.g. generation) information (default: “fragment_sources”) |
citation_info_in
|
CHR scalar name of the ‘obj’ component holding fragment citation information (default: “fragment_citation”) |
inspection_info_in
|
CHR scalar name of the ‘obj’ component holding fragment inspection information (default: “fragment_inspections”) |
inspection_table
|
CHR scalar name of the database table holding fragment inspection information (default: “fragment_inspections”) |
generate_missing_aliases
|
LGL scalar determining whether or not to generate machine readable expressions (e.g. InChI) for fragment aliases from RDKit (requires RDKit activation; default: FALSE); see formals list for [add_rdkit_aliases] |
fragment_aliases_in
|
CHR scalar name of the ‘obj’ component holding fragment aliases (default: “fragment_aliases”) |
fragment_aliases_table
|
CHR scalar name of the database table holding fragment aliases (default: “fragment_aliases”) |
rdkit_ref
|
CHR scalar OR R object of an RDKit binding (default NULL goes to “rdk” for convenience with other pipelines in this project) |
mol_to_prefix
|
CHR scalar of the prefix to identify an RDKit function to create an alias from a mol object (default: “MolTo”) |
mol_from_prefix
|
CHR scalar of the prefix to identify an RDKit function to create a mol object from’identifiers’ (default: “MolFrom”) |
type
|
CHR scalar of the type of encoding to use for ‘identifiers’ (default: smiles) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE). |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
fragment_alias_type_norm_table
|
CHR scalar name of the database table holding normalized fragment alias type identities (default: obtains this from the result of a call to [er_map] with the table name from ‘fragment_aliases_table’) |
Details
Fragments missing structure annotation are supported (e.g. those with a formula but no SMILES notation provided).
For new fragments, the calculated molecular mass is generated by [calculate.monoisotope] from exact masses of each constituent atom. If RDKit is available and a SMILES notation is provided, the formal molecular net charge is also calculated via rdkit.Chem.GetFormalCharge.
Database tables affected by resolving the fragments node include: annotated_fragments, norm_fragments, fragment_inspections, fragment_aliases, and fragment_sources.
Value
INT vector of resolved annotated fragment IDs; executes database actions
Note
This function is called as part of [full_import()]
If components named in ‘citation_info_in’ and ‘inspection_info_in’ do not exist, that information will not be appended to the resulting database records.
Typical usage as part of the import workflow involves simply passing the import object and associated sample id: resolve_fragments_NTAMRT(obj = import_object, sample_id = 1), though wrapper functions like [full_import] also contain name-matched arguments to be passed in a [do.call] context.
resolve_method | R Documentation |
Add an ms_method record via import
Description
Part of the data import routine. Adds a record to the “ms_methods” table with the values provided in the JSON import template. Makes extensive uses of [resolve_normalization_value] to parse foreign key relationships.
Usage
resolve_method( obj, method_in = "massspectrometry", ms_methods_table = "ms_methods", db_conn = con, ensure_unique = TRUE, log_ns = "db", qc_method_in = "qcmethod", qc_search_text = "QC Method Used", qc_value_in = "value", require_all = TRUE, import_map = IMPORT_MAP, ... )
Arguments
obj
|
LIST object containing data formatted from the import generator |
method_in
|
CHR scalar name of the ‘obj’ list containing method information |
ms_methods_table
|
CHR scalar name of the database table containing method information |
db_conn
|
connection object (default: con) |
ensure_unique
|
LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
qc_method_in
|
CHR scalar name of the import object element containing QC method information (default: “qcmethod”) |
qc_search_text
|
CHR scalar name of an element in the import object in part ‘qc_method_in’ identifying whether or not a QC method was used (default: “QC Method Used”) |
qc_value_in
|
CHR scalar name of an element in the import object corresponding to ‘qc_method_in’ where the value of the metric named for ‘qc_search_text’ is located (default: “value”) |
require_all
|
LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
…
|
Other named elements to be appended to “ms_methods” as necessary for workflow resolution, can be used to pass defaults or additional values. |
Value
INT scalar if successful, result of the call to [add_or_get_id] otherwise
Note
This function is called as part of [full_import()]
resolve_mobile_phase_NTAMRT | R Documentation |
Resolve the mobile phase node
Description
The database node containing chromatographic method information is able to handle any number of descriptive aspects regarding chromatography. It houses normalized and aliased data in a manner that maximizes flexibility, allowing any number of carrier agents (e.g. gasses for GC, solvents for LC) to be described in increasing detail. To accommodate that, the structure itself may be unintuitive and may not map well as records may be heavily nested.
Usage
resolve_mobile_phase_NTAMRT( obj, method_id, sample_id, peak_id, carrier_mix_names = NULL, id_mix_by = "^mp*[0-9]+", ms_methods_table = "ms_methods", sample_table = "samples", peak_table = "peaks", db_conn = con, mix_collection_table = "carrier_mix_collections", mobile_phase_props = list(in_item = "chromatography", db_table = "mobile_phases", props = c(flow = "flow", flow_units = "flowunits", duration = "duration", duration_units = "durationunits")), carrier_props = list(db_table = "carrier_mixes", norm_by = "norm_carriers", alias_in = "carrier_aliases", props = c(id_by = "solvent", fraction_by = "fraction")), additive_props = list(db_table = "carrier_additives", norm_by = "norm_additives", alias_in = "additive_aliases", props = c(id_by = "add$", amount_by = "_amount", units_by = "_units")), exclude_values = c("none", "", NA), fuzzy = TRUE, clean_up = TRUE, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
method_id
|
INT scalar of the method id (e.g. from the import workflow) |
sample_id
|
INT scalar of the sample id (e.g. from the import workflow) |
peak_id
|
INT scalar of the peak id (e.g. from the import workflow) |
carrier_mix_names
|
CHR vector (optional) of carrier mix collection names to assign, the length of which should equal 1 or the length of discrete carrier mixtures; the default, NULL, will automatically assign names as a function of the method and sample id. |
id_mix_by
|
CHR scalar regex to identify the elements of ‘obj’ to use for the mobile phase node (default “^mp*[0-9]+“) grouping of carrier mix collections, this is the main piece of connectivity pulling together the descriptions and should only be changed to match different import naming schemes |
ms_methods_table
|
CHR scalar name of the methods table (default: “ms_methods”) |
sample_table
|
CHR scalar name of the samples table (default: “samples”) |
peak_table
|
CHR scalar name of the peaks table (default: “peaks”) |
db_conn
|
existing connection object (e.g. of class “SQLiteConnection”) |
mix_collection_table
|
CHR scalar name of the mix collections table (default: “carrier_mix_collections”) |
mobile_phase_props
|
LIST object describing how to import the mobile phase table containing: in_item: CHR scalar name of the ‘obj’ name containing chromatographic information (default: “chromatography”); db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); props: named CHR vector of name mappings with names equal to database columns in ‘mobile_phase_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’ |
carrier_props
|
LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_carriers”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “carrier_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘carrier_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate a carrier (e.g. “solvent”) |
additive_props
|
LIST object describing how to import the mobile phase table containing: db_table: CHR scalar name of the mobile phases table (default: “mobile_phases”); norm_table: CHR scalar name of the table used to normalize carriers (default: “norm_additives”); alias_table: CHR scalar name of the table containing carrier aliases to search (default: “additive_aliases”); props: named CHR vector of name mappings with names equal to database columns in ‘additive_props\(db_table' and values matching regex to match names in 'obj[[mobile_phase_props\)in_item]]’ ‘obj[[mobile_phase_props\(in_item]][[mobile_phase_props\)db_table]]’, and an extra element named ‘id_by’ containing regex used to match names in the import object indicate an additive (e.g. names terminating in “add”) |
exclude_values
|
CHR vector indicating which values to ignore in ‘obj’ (default: c(“none”, ““, NA)) |
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL SQL LIKE clause bookended with wildcards; overrides the ‘case_sensitive’ setting if TRUE (default: FALSE). |
clean_up
|
LGL scalar determining whether or not to clean up the ‘mix_collection_table’ by removing just-added records if there are errors adding to ‘carrier_props$db_table’ (default: TRUE) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Details
The mobile phase node contains one record in table “mobile_phases” for each method id, sample id, and carrier mix collection id with its associated flow rate, normalized flow units, duration, and normalized duration units. Each carrier mix collection has a name and child tables containing: records for each value normalized carrier component and its unit fraction (e.g. in carrier_mixes: Helium 1 would indicate pure Helium as a carrier gas in GC work; Water, 0.9; Methanol, 0.1 to indicate a solvent mixture of 10 in water), as well as value normalized carrier additives, their amount, and the units for that amount (mostly for LC work; e.g. in carrier_additives: ammonium acetate, 5, mMol to indicate an additive to a solvent of 5 mMol ammonium acetate); these are linked through the carrier mix collection id.
Call this function to import the results of the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT), or feed it as ‘obj’ a flat list containing chromatography information.
Value
None, executes actions on the database
Note
This is a brittle function, and should only be used as part of the NTA MRT import process, or as a template for how to import data.
Some arguments are complicated by design to keep conceptual information together. These should be fed a structured list matching expectations. This applies to ‘mobile_phase_props’, ‘carrier_props’, and ‘additive_props’. See defaults in documentation for examples.
Database insertions are done in real time, so failures may result in hanging or orphaned records. Turn on ‘clean_up’ to roll back by removing entries from ‘mix_collection_table’ and relying on delete cascades built into the database. Additional names are provided here to match the schema.
This function is called as part of [full_import()]
resolve_ms_data | R Documentation |
Resolve and store mass spectral data during import
Description
Use peak IDs generated by the import workflow to assign and store mass spectral data (if coming from the NIST NTA Method Reporting Tool, these will all be in the “separated” format). Optionally also calls [resolve_ms_spectra] if unpack_spectra = TRUE. Mass spectral data are stored in either one (“zipped”)
Usage
resolve_ms_data( obj, peak_id = NULL, peaks_table = "peaks", ms_data_in = "msdata", ms_data_table = "ms_data", unpack_spectra = FALSE, ms_spectra_table = "ms_spectra", unpack_format = c("separated", "zipped"), as_object = FALSE, import_map = IMPORT_MAP, db_conn = con, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
peak_id
|
INT scalar of the peak ID in question, which must be present |
peaks_table
|
CHR scalar name of the peaks table in the database (default: “peaks”) |
ms_data_in
|
CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”) |
ms_data_table
|
CHR scalar name of the table holding packed spectra in the database (default: “ms_data”) |
unpack_spectra
|
LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE) |
ms_spectra_table
|
CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”) |
unpack_format
|
CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped” |
as_object
|
LGL scalar indicating whether or not to return the result to the session as an object (TRUE) or to add it to the database (default: FALSE) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
If ‘as_object’ == TRUE, a data.frame object containing either packed (if ‘unpack_spectra’ == FALSE) or unpacked (if ‘unpack_spectra’ == TRUE) spectra, otherwise adds spectra to the database
Note
This function is called as part of [full_import()] during the call to [resolve_peaks]
resolve_ms_spectra | R Documentation |
Unpack mass spectral data in compressed format
Description
For some spectra, searching in a long form is much more performant. Use this function to unpack data already present in the ‘ms_data’ table into the ‘ms_spectra’ table. Data should be packed in one of two ways, either two columns for mass-to-charge ratio and intensity (“separated” - see [ms_spectra_separated]) or in a single column with interleaved data (“zipped” - see [ms_spectra_zipped]).
Usage
resolve_ms_spectra( peak_id, spectra_data = NULL, peaks_table = "peaks", ms_data_table = "ms_data", ms_spectra_table = "ms_spectra", unpack_format = c("separated", "zipped"), as_object = FALSE, db_conn = con, log_ns = "db" )
Arguments
peak_id
|
INT scalar of the peak ID in question, which must be present |
spectra_data
|
data.frame object containing spectral data |
peaks_table
|
CHR scalar name of the peaks table in the database (default: “peaks”) |
ms_data_table
|
CHR scalar name of the table holding packed spectra in the database (default: “ms_data”) |
ms_spectra_table
|
CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”) |
unpack_format
|
CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped” |
as_object
|
LGL scalar of whether to return the unpacked spectra to the session (default: TRUE) or to insert into the database (FALSE) |
db_conn
|
database connection object (default: con) |
log_ns
|
CHR scalar name of the logging namespace to use |
Value
If ‘as_object’ == TRUE, a data.frame of unpacked spectra, otherwise no return and a database insertion will be performed
Note
This function may be slow, especially with peaks containing a large number of scans or a large amount of data
References
ms_spectra_separated
ms_spectra_zipped
resolve_multiple_values | R Documentation |
Utility function to resolve multiple choices interactively
Description
This function is generally not called directly, but rather as a workflow component from within [resolve_normalization_value] during interactive sessions to get feedback from users during the normalization value resolution process.
Usage
resolve_multiple_values(values, search_value, as_regex = FALSE, db_table = "")
Arguments
values
|
CHR vector of possible values |
search_value
|
CHR scalar of the value to search |
as_regex
|
LGL scalar of whether to treat ‘search_value’ as a regular expression string (TRUE) or to use it directly (FALSE, default) |
db_table
|
CHR scalar name of the database table to search, used for printing log messages only (default: ““) |
Value
CHR scalar result of the user’s choice
resolve_normalization_value | R Documentation |
Resolve a normalization value against the database
Description
Normalized SQL databases often need to resolve primary keys. This function checks for a given value in a given table and either returns the matching index value or, if a value is not found and ‘interactive()’ is TRUE, it will add that value to the table and return the new index value. It will look for the first matching value in all columns of the requested table to support loose finding of identifiers and is meant to operate only on normalization tables (i.e. look up tables).
Usage
resolve_normalization_value( this_value, db_table, id_column = "id", case_sensitive = FALSE, fuzzy = FALSE, db_conn = con, log_ns = "db", ... )
Arguments
this_value
|
CHR (or coercible to) scalar value to look up |
db_table
|
CHR scalar of the database table to search |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE). |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
…
|
other values to add to the normalization table, where names must match the table schema |
Details
The search itself is done using [check_for_value].
Value
The database primary key (typically INT) of the normalized value
Note
This is mostly a DRY convenience function to avoid having to write the loookup and add logic each time.
Interactive sessions are required to add new values
resolve_peak_ums_params | R Documentation |
Resolve and import optimal uncertain mass spectrum parameters
Description
This imports the defined object component containing parameters for the optimized uncertainty mass spectrum used to compare with new data. This function may be called at any time to add data for a given peak, but there is no row unique restriction on the underlying table and is best used in a “one pass” method during the import routine. These parameters are calculated as part of NIST QA procedures and are added to the output of the NTA MRT after those JSONs have been created.
Usage
resolve_peak_ums_params( obj, peak_id, ums_params_in = "opt_ums_params", ums_params_table = "opt_ums_params", db_conn = con, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
peak_id
|
INT scalar of the peak ID in question, which must be present (e.g. from the import workflow) |
ums_params_in
|
CHR scalar name of the item in ‘obj’ containing optimized uncertainty mass spectrum parameters |
ums_params_table
|
CHR scalar name of the database table holding optimized uncertainty mass spectrum parameters |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Value
Nothing if successful, a data frame object of the extracted parameters otherwise.
Note
This function is called as part of [resolve_peaks()]
resolve_peaks | R Documentation |
Resolve the peaks node during import
Description
Call this function to resolve and insert information for the “peaks” node in the database including software conversion settings (via [resolve_software_settings_NTAMRT]) and mass spectra data (via [resolve_ms_data] and, optionally, [resolve_ms_spectra]). This function relies on the import object being formatted appropriately.
Usage
resolve_peaks( obj, sample_id, peaks_table = "peaks", software_timestamp = NULL, software_settings_in = "msconvertsettings", ms_data_in = "msdata", ms_data_table = "ms_data", unpack_spectra = FALSE, unpack_format = c("separated", "zipped"), ms_spectra_table = "ms_spectra", linkage_table = "conversion_software_peaks_linkage", settings_table = "conversion_software_settings", as_date_format = "%Y-%m-%d %H:%M:%S", format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"), min_datetime = "2000-01-01 00:00:00", import_map = IMPORT_MAP, ums_params_in = "opt_ums_params", ums_params_table = "opt_ums_params", db_conn = con, log_ns = "db" )
Arguments
obj
|
CHR vector describing settings or a named LIST with names matching column names in table conversion_software_settings. |
sample_id
|
INT scalar of the sample id (e.g. from the import workflow) |
peaks_table
|
CHR scalar of the database table name holding QC method check information (default: “peaks”) |
ms_data_in
|
CHR scalar of the named component of ‘obj’ holding mass spectral data (default: “msdata”) |
ms_data_table
|
CHR scalar name of the table holding packed spectra in the database (default: “ms_data”) |
unpack_spectra
|
LGL scalar indicating whether or not to unpack spectral data to a long format (i.e. all masses and intensities will become a single record) in the table defined by ‘ms_spectra_table’ (default: FALSE) |
unpack_format
|
CHR scalar of the type of data packing for the spectra, one of “separated” (default) or “zipped” |
ms_spectra_table
|
CHR scalar name of the table holding long form spectra in the database (default: “ms_spectra”) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
ums_params_in
|
CHR scalar name of the item in ‘obj’ containing optimized uncertainty mass spectrum parameters |
ums_params_table
|
CHR scalar name of the database table holding optimized uncertainty mass spectrum parameters |
db_conn
|
Connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
INT scalar of the newly inserted or identified peak ID(s)
Note
This function is called as part of [full_import()]
This function relies on an import map
resolve_qc_data_NTAMRT | R Documentation |
Resolve and import quality control data for import
Description
This imports the defined object component containing QC data (i.e. a nested list of multiple quality control checks) from the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT).
Usage
resolve_qc_data_NTAMRT( obj, peak_id, qc_data_in = "qc", qc_data_table = "qc_data", peaks_table = "peaks", ignore = FALSE, db_conn = con, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
peak_id
|
INT vector of the peak ids (e.g. from the import workflow) |
qc_data_in
|
CHR scalar name of the component in ‘obj’ containing QC data (default: “qc”) |
qc_data_table
|
CHR scalar name of the database table holding QC data (default: “qc_data”) |
peaks_table
|
CHR scalar name of the database table holding peaks data (default: “peaks”) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Value
None, executes actions on the database
Note
This function is called as part of [full_import()]
resolve_qc_methods_NTAMRT | R Documentation |
Resolve and import quality control method information
Description
This imports the defined object component containing QC method information (i.e. a data frame of multiple quality control checks) from the NIST Non-Targeted Analysis Method Reporting Tool (NTA MRT).
Usage
resolve_qc_methods_NTAMRT( obj, peak_id, qc_method_in = "qcmethod", qc_method_table = "qc_methods", qc_method_norm_table = "norm_qc_methods_name", qc_method_norm_reference = "norm_qc_methods_reference", qc_references_in = "source", peaks_table = "peaks", ignore = FALSE, db_conn = con, log_ns = "db" )
Arguments
obj
|
LIST object containing data formatted from the import generator |
peak_id
|
INT vector of the peak ids (e.g. from the import workflow) |
qc_method_in
|
CHR scalar of the name in ‘obj’ that contains QC method check information (default: “qcmethod”) |
qc_method_table
|
CHR scalar of the database table name holding QC method check information (default: “qc_methods”) |
qc_method_norm_table
|
CHR scalar name of the database table normalizing QC methods type (default: “norm_qc_methods_name”) |
qc_method_norm_reference
|
CHR scalar name of the database table normalizing QC methods reference type (default: “norm_qc_methods_reference”) |
qc_references_in
|
CHR scalar of the name in ‘obj[[qc_method_in]]’ that contains the reference or citation for the QC protocol (default: “source”) |
peaks_table
|
CHR scalar name of the database table holding sample information (default: “samples”) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Value
None, executes actions on the database
Note
This function is called as part of [full_import()]
resolve_sample | R Documentation |
Add a sample via import
Description
Part of the data import routine. Adds a record to the “samples” table with the values provided in the JSON import template. Uses [verify_sample_class] and [verify_contributor] to parse foreign key relationships, [resolve_method] to add a record to ms_methods to get the proper id, and [resolve_software_settings_NTAMRT] to insert records into and get the proper conversion software linkage id from tables “conversion_software_settings” and “conversion_software_linkage” if appropriate.
Usage
resolve_sample( obj, db_conn = con, method_id = NULL, sample_in = "sample", sample_table = "samples", generation_type = NULL, generation_type_default = "empirical", generation_type_norm_table = "norm_generation_type", import_map = IMPORT_MAP, ensure_unique = TRUE, require_all = TRUE, fuzzy = FALSE, case_sensitive = TRUE, log_ns = "db", ... )
Arguments
obj
|
LIST object containing data formatted from the import generator |
db_conn
|
connection object (default: con) |
method_id
|
INT scalar of the associated ms_methods record id |
sample_in
|
CHR scalar of the import object name storing sample data (default: “sample”) |
sample_table
|
CHR scalar name of the database table holding sample information (default: “samples”) |
generation_type
|
CHR scalar of the type of data generated for this sample (e.g. “empirical” or “in silico”). The default (NULL) will assign based on ‘generation_type_default’; any other value will override the default value and be checked against values in ‘geneation_type_norm_table’ |
generation_type_default
|
CHR scalar naming the default data generation type (default: “empirical”) |
generation_type_norm_table
|
CHR scalar name of the database table normalizing sample generation type (default: “empirical”) |
import_map
|
data.frame object of the import map (e.g. from a CSV) |
ensure_unique
|
LGL scalar of whether or not to first check that the values provided form a new unique record (default: TRUE) |
require_all
|
LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table) |
fuzzy
|
LGL scalar of whether to do a “fuzzy” match in the sense that values provided are wrapped in an SQL “LIKE ’ the ‘case_sensitive’ setting if TRUE (default: FALSE). |
case_sensitive
|
LGL scalar of whether to match on a case sensitive basis (the default TRUE searches for values as-provided) or whether to coerce value matches by upper, lower, sentence, and title case matches |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
…
|
Other named elements to be appended to samples as necessary for workflow resolution, can be used to pass defaults or additional values. |
Value
INT scalar if successful, result of the call to [add_or_get_id] otherwise
Note
This function is called as part of [full_import()]
resolve_sample_aliases | R Documentation |
Resolve and import sample aliases
Description
Call this function to attach sample aliases to a sample record in the database. This can be done either through the import object with a name reference or directly by assigning additional values.
Usage
resolve_sample_aliases( sample_id, obj = NULL, aliases_in = NULL, values = NULL, db_table = "sample_aliases", db_conn = con, log_ns = "db" )
Arguments
sample_id
|
INT scalar of the sample id (e.g. from the import workflow) |
obj
|
(optional) LIST object containing data formatted from the import generator (default: NULL) |
aliases_in
|
(optional) CHR scalar of the name in ‘obj’ containing the sample aliases in list format (default: NULL) |
values
|
(optional) LIST containing the sample aliases with names as the alias name and values containing the reference (e.g. URI, link to a containing repository, or reference to the owner or project from which a sample is drawn) to that alias |
db_table
|
CHR scalar name of the database table containing sample aliases (default: “sample_aliases”) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Value
None, executes actions on the database
Note
This function is called as part of [full_import()]
One of ‘values’ or both of ‘obj’ and ‘aliases_in’ must be provided to add new sample aliases.
resolve_software_settings_NTAMRT | R Documentation |
Import software settings
Description
Part of the standard import pipeline, adding rows to the ‘conversion_software_settings’ table with a given sample id. Some argument names are shared with other import functions, specifically ‘obj’ but are formed differently to resolve the node complexity correctly.
Usage
resolve_software_settings_NTAMRT( obj, software_timestamp = NULL, db_conn = con, software_settings_in = "msconvertsettings", settings_table = "conversion_software_settings", linkage_table = "conversion_software_peaks_linkage", as_date_format = "%Y-%m-%d %H:%M:%S", format_checks = c("ymd_HMS", "ydm_HMS", "mdy_HMS", "dmy_HMS"), min_datetime = "2000-01-01 00:00:00", log_ns = "db" )
Arguments
obj
|
CHR vector describing settings or a named LIST with names matching column names in table conversion_software_settings. |
software_timestamp
|
CHR scalar of the sample timestamp (e.g. sample$starttime) to use for linking software conversion settings with peak data, with a call back to the originating sample. If NULL (the default), the current system timestamp in UTC will be used from [lubridate::now()]. |
db_conn
|
connection object (default: con) |
software_settings_in
|
CHR scalar name of the component in ‘obj’ containing software settings (default: “msconvertsettings”) |
settings_table
|
CHR scalar name of the database table containing the software settings used for an imported data file (default: “conversion_software_settings”) |
linkage_table
|
CHR scalar name of the database table containing the linkage between peaks and their software settings (default: “conversion_software_peaks_linkage”) |
as_date_format
|
CHR scalar the format to use when storing timestamps that matches database column expectations (default: “%Y-%m-%d %H:%M:%S”) |
format_checks
|
CHR vector of the [lubridate::parse_date_time()] format checks to execute in order of priority; these must match a lubridate function of the same name (default: c(“ymd_HMS”, “ydm_HMS”, “mdy_HMS”, “dmy_HMS”)) |
min_datetime
|
CHR scalar of the minimum reasonable timestamp used as a sanity check (default: “2000-01-01 00:00:00”) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
NULL on errors, INT scalar of the inserted software linkage id if successful
Note
This function is called as part of [full_import()]
resolve_table_name | R Documentation |
Check presence of a database table
Description
This convenience function checks for the existence of one or more ‘db_table’ objects in a database.
Usage
resolve_table_name(db_table = "compounds", db_conn = "test_con")
Arguments
db_table
|
CHR vector of table names to check |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the namespace (if any) to use for logging (default: “db”) |
Value
CHR vector of existing tables
save_data_dictionary | R Documentation |
Save the current data dictionary to disk
Description
Executes [data_dictionary()] and saves the output to a local file. If output_format
is one of “data.frame” or “list”, the resulting file will be saved as an RDS.
Parameter output_file
will be used during the save process; relative paths
will be identified by the current working directory.
Usage
save_data_dictionary(db_conn = con)
Arguments
db_conn
|
connection object (default: con) |
output_format
|
CHR scalar, one of (capitalization insensitive) “json”, “csv”, “data.frame”, or “list” (default “json”) |
output_file
|
CHR scalar indicating where to save the resulting file; an appropriate file name will be constructed if left NULL (default: NULL) |
overwrite_existing
|
LGL scalar indicating whether to overwrite an existing file whose name matches that determined from ‘output_file’ (default: TRUE); file names will be appended with “(x)” sequentially if this is FALSE and a file with matching name exists. |
Value
None, saves a file to the current working directory
search_all | R Documentation |
Search all mass spectra within database against unknown mass spectrum
Description
Search all mass spectra within database against unknown mass spectrum
Usage
search_all( con, searchms, normfn = "sum", cormethod = "pearson", optimized_params = TRUE )
Arguments
con
|
SQLite database connection |
searchms
|
object generated from ‘create_search_ms’ function |
normfn
|
the normalization function typically “mean” or “sum” for normalizing the intensity values |
cormethod
|
the correlation method used for calculating the correlation, see ‘cor’ function for methods |
optimized_params
|
LGL scalar indicating whether or not to use the optimal search parameters stored in the database table ‘opt_ums_params’ |
Value
LIST of search results
search_precursor | R Documentation |
Search the database for all compounds with matching precursor ion m/z values
Description
Search the database for all compounds with matching precursor ion m/z values
Usage
search_precursor( con, searchms, normfn = "sum", cormethod = "pearson", optimized_params = TRUE )
Arguments
con
|
SQLite database connection |
searchms
|
object generated from ‘create_search_ms’ function |
normfn
|
the normalization function typically “mean” or “sum” for normalizing the intensity values |
cormethod
|
the correlation method used for calculating the correlation, see ‘cor’ function for methods |
optimized_params
|
LGL scalar indicating whether or not to use the optimal search parameters stored in the database table ‘opt_ums_params’ |
Value
table of match statistics for the compound of interest
setup_rdkit | R Documentation |
Conveniently set up an RDKit python environment for use with R
Description
Conveniently set up an RDKit python environment for use with R
Usage
setup_rdkit(env_name = "nist_hrms_db", required_libraries = c("reticulate", "rdkit"), env_ref = "rdk")
Arguments
env_name
|
CHR scalar of the name of a python environment |
env_ref
|
CHR scalar of the name of an R expression bound to a python library OR an R object reference by name to an existing object that should be bound to RDKit (e.g. from [reticulate::import]) |
ns
|
CHR scalar |
Value
None, though calls to utility functions will give their own returns
sigtest | R Documentation |
Significance testing function
Description
Internal function: enables significance testing between two values
Usage
sigtest(x1, x2, s1, s2, n1, n2, sig = 0.95)
Arguments
x1, x2
|
mean values to be compared |
s1, s2
|
standard deviation of their respective values |
n1, n2
|
number of observations of the respective values |
sig
|
significance level to test (0.95 = 95%) |
smilestoformula | R Documentation |
Convert SMILES string to Formula and other information
Description
The function converts SMILES strings into a data frame containing the molecular formula (FORMULA), fixed mass of the formula (FIXED MASS), and the net charge (NETCHARGE).
Usage
smilestoformula(SMILES)
Arguments
SMILES
|
vector of SMILES strings |
Value
data frame
Examples
smilestoformula(c("CCCC", "C(F)(F)F")) smilestoformula("CCCC")
sql_to_msp | R Documentation |
Export SQL Database to a MSP NIST MS Format
Description
Export SQL Database to a MSP NIST MS Format
Usage
sql_to_msp( con, optimized_params = TRUE, outputfile = paste0("DimSpecExport", Sys.Date(), ".msp"), cormethod = "pearson", normfn = "sum" )
Arguments
con
|
SQLite database connection |
optimized_params
|
Boolean TRUE indicates that the optimized parameters for uncertainty mass spectra will be used. |
outputfile
|
Text string file name and/or location to save MSP file format |
cormethod
|
Text string type of correlation function to use (DEFAULT = ‘pearson’) |
normfn
|
Text string type of normalization function to use (DEFAULT = ‘sum’) |
Value
None, saves a *.msp file to the local file system.
sqlite_auto_trigger | R Documentation |
Create a basic SQL trigger for handling foreign key relationships
Description
This creates a simple trigger designed to streamline foreign key compliance for SQLite databases. Resulting triggers will check during table insert or update actions that have one or more foreign key relationships defined as ‘target_table.fk_col = norm_table.pk_col’. It is primarily for use in controlled vocabulary lists where a single id is tied to a single value in the parent table, but more complicated relationships can be handled.
Usage
sqlite_auto_trigger(target_table = "test", fk_col = c("col1", "col2", "col3"), norm_table = c("norm_col1", "norm_col2", "norm_col3"), pk_col = "id", val_col = "value", action_occurs = "after", trigger_action = "insert", table_action = "update")
Arguments
target_table
|
CHR scalar name of a table with a foreign key constraint. |
fk_col
|
CHR vector name(s) of the column(s) in ‘target_table’ with foreign key relationship(s) defined. |
norm_table
|
CHR vector name(s) of the table(s) containing the primary key relationship(s). |
pk_col
|
CHR vector name(s) of the column(s) in ‘norm_table’ containing the primary key(s) side of the relationship(s). |
val_col
|
CHR vector name(s) of the column(s) in ‘norm_table’ containing values related to the primary key(s) of the relationship(s). |
action_occurs
|
CHR scalar on when to run the trigger, must be one of ‘c(“before”, “after”, “instead”)’ (“instead” should only be used if ‘target_table’ is a view - this restriction is not enforced). |
trigger_action
|
CHR scalar on what type of trigger this is (e.g. ‘when’ = “after” and ‘trigger_action’ = “insert” -> “AFTER INSERT INTO”) and must be one of ‘c(“insert”, “update”, “delete”)’. |
for_each
|
CHR scalar for SQLite this must be only ‘row’ - translated into a “FOR EACH ROW” clause. Set to any given noun for other SQL engines supporting other trigger transaction types (e.g. “FOR EACH STATEMENT” triggers) |
table_action
|
CHR scalar on what type of action to run when the trigger fires, must be one of ‘c(“insert”, “update”, “delete”)’. |
filter_col
|
CHR scalar of a filter column to override the final WHERE clause in the trigger. This should almost always be left as the default ““. |
filter_val
|
CHR scalar of a filter value to override the final WHERE clause in the trigger. This should almost always be left as the default ““. |
or_ignore
|
LGL scalar on whether to ignore insertions to normalization tables if an error occurs (default: TRUE, which can under certain conditions raise exceptions during execution of the trigger if more than a single value column exists in the parent table) |
addl_actions
|
CHR vector of additional target actions to add to ‘table_action’ statements, appended to the end of the resulting “insert” or “update” actions to ‘target_table’. If multiple tables are in use, use positional matching in the vector (e.g. with three normalization tables, and additional actions to only the second, use c(““,”additional actions”, ““)) |
Details
These are intended as native database backup support for when connections do not change the default SQLite setting of PRAGMA foreign_keys = off. Theoretically any trigger could be created, but should only be used with care outside the intended purpose.
Triggers created by this function will check all new INSERT and UPDATE statements by checking provided values against their parent table keys. If an index match is found no action will be taken on the parent table. If no match is found, it is assumed this is a new normalized value and it will be added to the normalization table and the resulting new key will be replaced in the target table column.
Value
CHR scalar of class glue containing the SQL necessary to create a trigger. This is raw text; it is not escaped and should be further manipulated (e.g. via dbplyr::sql()) as your needs and database communication pipelines dictate.
Note
While this will work on any number of combinations, all triggers should be heavily inspected prior to use. The default case for this trigger is to set it for a single FK/PK relationship with a single normalization value. It will run on any number of normalized columns however trigger behavior may be unexpected for more complex relationships.
If ‘or_ignore’ is set to TRUE, errors in adding to the parent table will be ignored silently, possibly causing NULL values to be inserted into the target table foreign key column. For this reason it is recommended that the ‘or_ignore’ parameter only be set to true to expand parent table entries, but it will only supply a single value for the new normalization table. If additional columns in the parent table must be populated (e.g. the parent table has two required columns “value” and “acronym”), it is recommended to take care of those prior to any action that would activate these triggers.
Parameters are not checked against a schema (e.g. tables and columns exist, or that a relationships exists between tables). This function processes only text provided to it.
Define individual relationships between ‘fk_col’, ‘norm_table’, ‘pk_col’, and ‘val_col’ as necessary. Lengths for these parameters should match in a 1:1:1:1 manner to fully describe the relationships. If the schema of all tables listed in ‘norm_table’ are close matches, e.g. all have two columns “id” and “value” then ‘pk_col’ and ‘val_col’ will be reused when only a single value is provided for them. That is, provided three ‘norm_table’(s) and one ‘pk_col’ and one ‘val_col’, the arguments for ‘pk_col’ and ‘val_col’ will apply to each ‘norm_table’.
The usage example is built on a hypothetical SQLite schema containing four tables, one of which (“test” - with columns “id”, “col1”, “col2”, and “col3”) defines foreign key relationships to the other three (“norm_col1”, “norm_col2”, and “norm_col3”).
See Also
build_triggers
sqlite_auto_view | R Documentation |
Create a basic SQL view of a normalized table
Description
Many database viewers will allow links for normalization tables to get the human-readable value of a normalized column. Instead it is often preferable to build in views automatically that “denormalize” such tables for display or use in an application. This function seeks to script the process of creating those views. It examines the table definition from [pragma_table_info] and will extract the primary/foreign key relationships to build a “denormalized” view of the table using [get_fkpk_relationships] which requires a database map created from [er_map] and data dictionary created from [data_dictionary].
Usage
sqlite_auto_view(table_pragma = pragma_table_info("contributors"), target_table = "contributors", relationships = get_fkpk_relationships(db_map = er_map(con), dictionary = data_dictionary(con)), drop_if_exists = FALSE)
Arguments
table_pragma
|
data.frame object from [pragma_table_info] for a given table name in the database |
target_table
|
CHR scalar name of the database table to build for, which should be present in the relationship definition |
relationships
|
data.frame object describing the foreign key relationships for ‘target_table’, which should generally be the result of a call to [get_fkpk_relationships] |
drop_if_exists
|
LGL scalar indicating whether to include a “DROP VIEW” prefix for the generated view statement; as this has an impact on schema, no default is set |
Details
TODO for v2: abstract the relationships call by looking for objects in the current session.
Value
CHR scalar of class glue containing the SQL necessary to create a “denormalized” view. This is raw text; it is not escaped and should be further manipulated (e.g. via dbplyr::sql()) as your needs and database communication pipelines dictate.
Note
No schema checking is performed by this function, but rather relies on definitions from other functions.
This example will run slowly if the database map [er_map] and dictionary [data_dictionary] haven’t yet been called. If they exist in your session, use those as arguments to get_fkpk_relationships.
See Also
build_views
pragma_table_info
get_fkpk_relationships
er_map
data_dictionary
sqlite_parse_build | R Documentation |
Parse SQL build statements
Description
Reading SQL files directly into R can be problematic. This function is primarily called in [create_fallback_build]. To support multiline, human-readable SQL statements, ‘sql_statements’ must be of length 1.
Usage
example_file <- "./config/sql_nodes/reference.sql" if (file.exists(example_file)) { build_commands <- readr::read_file(example_file) sqlite_parse_build(build_commands) }
Arguments
sql_statements
|
CHR scalar of SQL build statements from an SQL file. |
magicsplit
|
CHR scalar regex indicating some “magic” split point SQL comment to simplify the identification of discrete commands; will be used to split results (optional but highly recommended) |
header
|
CHR scalar regex indicating the format of header comments SQL comment to remove (optional) |
section
|
CHR scalar regex indicating the format of section comments SQL comment to remove (optional) |
Details
All arguments ‘magicsplit’, ‘header’, and ‘section’ provide flexibility in the comment structure of the SQL file and accept regex for character matching purposes.
Value
LIST of parsed complete build commands as CHR vectors containing each line.
sqlite_parse_import | R Documentation |
Parse SQL import statements
Description
In the absence of the sqlite command line interface (CLI), the [build_db] process needs a full set of SQL statements to execute directly rather than CLI dot commands. This utility function parses formatted SQL statements containing CLI “.import” commands to create SQL INSERT statements. This function is primarily called in [create_fallback_build].
Usage
if (file.exists("./config/data/elements.csv")) { sqlite_parse_import(".import --csv --skip 1 ./config/data/elements.csv elements") }
Arguments
build_statements
|
CHR vector of SQL build statements from an SQL file. |
Value
LIST of parsed .import statements as full “INSERT” statements.
start_api | R Documentation |
Start the plumber interface from a clean environment
Description
This convenience function launches the plumber instance if it was not set to launch during the session setup. It is a thin wrapper with a more intuitive name than [api_reload] and the default background setting turned off to test the server in the current session.
Usage
start_api()
Arguments
plumber_file
|
CHR scalar name of the plumber definition file, which
should be in |
plumber_host
|
CHR scalar of the host server address (default: NULL) |
plumber_port
|
INT scalar of the listening port on the host server (default: NULL) |
background
|
LGL scalar of whether to launch the API in a background process (default: FALSE) |
src_dir
|
CHR scalar file path to settings and functions enabling the plumber API (default: here::here(“inst”, “plumber”)) |
log_ns
|
CHR scalar name of the logging namespace to use for this function (default: “api”) |
Value
None, launches the plumber instance
Note
This function is intended to pull from the environment variables identifying the plumber file, host, and port.
start_app | R Documentation |
WIP Launch a shiny application
Description
Call this function to launch an app either directly or in a background
process. The name must be present in the app directory or as a named
element of SHINY_APPS
in the current environment.
Usage
start_app("table_explorer")
Arguments
app_name
|
CHR scalar name of the shiny app to run, this should be the
name of a directory containing a shiny app that is located within the
directory defined by |
app_dir
|
file path to a directory containing shiny apps (default: here::here(“inst”, “apps”)) |
background
|
LGL scalar of whether to launch the application in a background process (default: FALSE) |
…
|
Other named parameters to be passed to [shiny::runApp] |
Value
None, launches a browser with the requested shiny application
Note
Background launching of shiny apps is not yet supported.
start_rdkit | R Documentation |
Start the RDKit integration
Description
If the session was started without RDKit integration, e.g. INFORMATICS or USE_RDKIT were FALSE in [config/env_R.R], start up RDKit in this session.
Usage
start_rdkit(src_dir = here::here("inst", "rdkit"), log_ns = "rdkit")
Arguments
src_dir
|
CHR scalar file path to settings and functions enabling rdkit (default: here::here(“inst”, “rdkit”)) |
log_ns
|
CHR scalar name of the logging namespace to use for this function (default: “rdkit”) |
Value
LGL scalar indicating whether starting RDKit integration was successful
Note
RDKit and rcdk are incompatible. If the session was started with INFORMATICS = TRUE and USE_RDKIT = FALSE, ChemmineR was likely loaded. If this is the case, the session will need to be restarted due to java conflicts between the two.
summarize_check_fragments | R Documentation |
Summarize results of check_fragments function
Description
Summarize results of check_fragments function
Usage
summarize_check_fragments(fragments_checked)
Arguments
fragments_checked
|
output of ‘check_fragments’ function |
Value
table summary of check_fragments function
support_info | R Documentation |
R session information for support needs
Description
Several items of interest for this particular project including: - DB_DATE, DB_VERSION, BUILD_FILE, LAST_DB_SCHEMA, LAST_MODIFIED, DEPENDS_ON, and EXCLUSIONS as defined in the project’s ../config/env_R.R file.
Usage
support_info()
Arguments
app_info
|
BOOL scalar on whether to return this application’s properties |
Value
LIST of values
suspectlist_at_NIST | R Documentation |
Open the NIST PDR entry for the current NIST PFAS suspect list
Description
This simply points your browser to the NIST public data repository for the current NIST suspect list, where you can find additional information. Click the download button in the left column of any file to download it. s Requires the file “suspectlist_url.txt” to be present in the ‘config’ subdirectory of the current working directory.
Usage
suspectlist_at_NIST(url_file = file.path("config", "suspectlist_url.txt"))
Value
none
Examples
suspectlist_at_NIST()
table_msdata | R Documentation |
Tabulate MS Data
Description
Pulls specified MS Data from mzML and converts it into table format for further processing Internal function for ‘peak_gather_json’ function
Usage
table_msdata(mzml, scans, mz = NA, zoom = NA, masserror = NA, minerror = NA)
Arguments
mzml
|
list of msdata from ‘mzMLtoR’ function |
scans
|
integer vector containing scan numbers to extract MS data |
mz
|
numeric targeted m/z |
zoom
|
numeric vector specifying the range around m/z, from m/z - zoom[1] to m/z + zoom[2] |
masserror
|
numeric relative mass error (in ppm) of the instrument |
minerror
|
numeric minimum mass error (in Da) of the instrument |
Value
data.frame containing MS data
tack_on | R Documentation |
Append additional named elements to a list
Description
This does nothing more than [base::append] ellipsis arguments to be added
directly to the end of an existing list object. This primarily supports
additional property assignment during the import process for future
development and refinement. Call this as part of any function with additional
arguments. This may result in failures or ignoring unrecognized named
parameters. If no additional arguments are passed obj
is returned as
provided.
Usage
tack_on(obj, ..., log_ns = "db")
Arguments
obj
|
LIST of any length to be appended to |
…
|
Additional arguments passed to/from the ellipsis parameter of calling functions. If named, names are preserved. |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
LIST object of length equal to obj
plus additional named arguments
Note
If duplicate names exists in obj
and those provided as ellipsis
arguments, those provided as part of the ellipsis will replace those in
obj
.
Examples
tack_on(list(a = 1:3), b = letters, c = rnorm(10)) tack_on(list(a = 1:3))
tidy_comments | R Documentation |
Tidy up table and field comments
Description
Creates more human-readable outputs after extracting the raw SQL used to build entities and parsing out the comments as identified with the /* … */ multi-line comment flag pair. Single line comments are not extracted. The first comment is assumed to be the table comment. See examples in the ‘config/sql_nodes’ directory.
Usage
tidy_comments(pragma_table_def("compounds", get_sql = TRUE))
Arguments
obj
|
result of calling [pragma_table_def] with ‘get_sql’ = TRUE |
Value
LIST of length equal to ‘obj’ containing extracted comments
tidy_ms_spectra | R Documentation |
Tidy Spectra
Description
A convenience function to take outputs from [ms_spectra_separated] and [ms_spectra_zipped] and return them as a tidy data frame by unpacking the list column “spectra”.
Usage
tidy_ms_spectra(df = packed_data)
Arguments
df
|
data.frame object containing nested spectra in a column |
Value
data.frame object containing tidy spectra
tidy_spectra | R Documentation |
Decompress Spectra
Description
This convenience wrapper will automatically decompress ms spectra in the “separate” and “zipped” formats and return them as tidy data frames suitable for further manipulation or visualization.
Usage
tidy_spectra( target, is_file = FALSE, is_format = c("separated", "zipped"), spectra_set = "msdata", ms_col_sep = c("measured_mz", "measured_intensity"), ms_col_zip = "data", is_JSON = FALSE )
Arguments
target
|
CHR scalar file path to use OR an R object containing compressed spectral data in the “separate” or “zipped” format |
is_file
|
BOOL scalar of whether or not ‘target’ is a file. Set to FALSE to use an existing R object, which should contain an object with a named element matching parameter ‘spectra_set’ (default TRUE) |
is_format
|
CHR scalar of the compression format, which must be one of the supported compression forms (“separated” or “zipped”); ignored if the compression format can be inferred from the text in ‘target’ (default “separate”) |
spectra_set
|
CHR scalar of the object name holding a spectra data frame to decompress (default “msdata”) |
ms_col_sep
|
CHR vector of the column names holding spectral masses and intensities in the “separate” format (default c(“masses”, “intensities”)) |
ms_col_zip
|
CHR scalar of the name of the column holding spectral masses and intensities in the “unzip” format (default “msdata”) |
is_JSON
|
BOOL scalar of whether or not ‘target’ is a JSON expression needing conversion (default TRUE) |
Value
data.frame object containing unpacked spectra
Examples
tidy_spectra('{"msdata": "712.9501 15094.41015625 713.1851 34809.9765625"}', is_format = "zipped") tidy_spectra('{"measured_mz":"712.9501 713.1851","measured_intensity":"15094.41015625 34809.9765625"}')
unzip | R Documentation |
Unzip binary data into vector
Description
Unzip binary data into vector
Usage
unzip(x, type = "gzip")
Arguments
x
|
String of binary data to convert |
type
|
type of compression (see ‘base::memDecompress’). Default is ‘gzip’ |
Value
vector containing data from converted binary data
update_all | R Documentation |
Convenience function to rebuild all database related files
Description
This is a development and deployment function that should be used with caution. It is intended solely to assist with the development process of rebuilding a database schema from source files and producing the supporting data. It will create both the JSON expressin of the data dictionary and the fallback SQL file.
Usage
update_all()
Arguments
rebuild
|
LGL scalar indicating whether to first rebuild from environment settings (default: FALSE for safety) |
api_running
|
LGL scalar of whether or not the API service is currently running (default: TRUE) |
api_monitor
|
process object pointing to the API service (default: NULL) |
db
|
CHR scalar of the database name (default: session value DB_NAME) |
build_from
|
CHR scalar of a SQL build script to use (default: environment value DB_BUILD_FILE) |
populate
|
LGL scalar of whether to populate with data from the file in ‘populate_with’ (default: TRUE) |
populate_with
|
CHR scalar for the populate script (e.g. “populate_demo.sql”) to during after the build is complete; (default: session value DB_DATA); ignored if ‘populate = FALSE’ |
archive
|
LGL scalar of whether to create an archive of the current database (if it exists) matching the name supplied in argument ‘db’ (default: FALSE), passed to [‘remove_db()’] |
sqlite_cli
|
CHR scalar to use to look for installed sqlite3 CLI tools in the current system environment (default: session value SQLITE_CLI) |
connect
|
LGL scalar of whether or not to connect to the rebuilt database in the global environment as object ’con“ (default: FALSE) |
log_ns
|
CHR scalar of the logging namespace to use during execution (default: “db”) |
Details
!! To preserve data, do not call this with both ‘rebuild’ = TRUE and ‘archive’ = FALSE !!
Value
Files for the new database, fallback build, and data dictionary will be created in the project directory and objects will be created in the global environment for the database map (LIST “db_map”) and current dictionary (LIST “db_dict”)
Note
This does not recast the views and triggers files created through [sqlite_autoview] and [sqlite_autotrigger] as the output of those may often need additional customization. Existing auto-views and -triggers will be created as defined. To exclude those, first modify the build file referenced by [build_db].
This requires references to be in place to the individual functions in the current environment.
update_data_sources | R Documentation |
Dump current database contents
Description
Perform one or both of two main tasks for backing up the NTA database.
Usage
update_data_sources( project, data_dir = file.path("config", "data"), create_backups = TRUE, dump_tables = TRUE, dump_sql = TRUE, db_conn = con, sqlite_cli = ifelse(exists("SQLITE_CLI"), SQLITE_CLI, NULL), db_name = ifelse(exists("DB_NAME"), DB_NAME, NULL) )
Arguments
project
|
CHR scalar of the directory containing project specific data (required, no default) |
data_dir
|
CHR scalar of the directory containing project independent data sources used for population (default: ‘file.path(“config”, “data”)’) |
create_backups
|
LGL scalar indicating whether to create backups prior to writing updated data files (default: TRUE) |
dump_tables
|
LGL scalar indicating whether to dump contents of database tables as comma-separated-value files (default: TRUE) |
dump_sql
|
LGL scalar indicating whether to create an SQL dump file containing both schema and data as a backup (default: TRUE) |
db_conn
|
connection object (default: con) |
SQLITE_CLI
|
CHR scalar system reference to your installation of the sqlite command line interface |
Details
The main task is to update CSV files in the config/data directory with the current contents of the database. This is done on a table by table basis and results in flat files whose structures no longer interrelate except numerically. Primarily this would be used to migrate database contents to other systems or for further manipulation. Please specify a ‘project’ that project-specific information can be maintained.
Backups created with this function are placed in a “backups” subdirectory of the directory defined by parameter ‘data_dir’. If ‘dump_sql = TRUE’ SQL dump files will be written to “backups/sqlite” with file names equal to the current database name prefixed by date.
Value
None, copies database information to the local file system
update_env_from_file | R Documentation |
Update a conda environment from a requirements file
Description
The ‘requirements_file’ can be any formatted file that contains a definition for python libraries to add to an environment (e.g. requirements.txt, environment.yml, etc) that is understood by conda. Relative file paths are fine, but the file will not be discovered (e.g. by ‘list.files’) so specificity is always better.
Usage
update_env_from_file("nist_hrms_db")
Arguments
env_name
|
CHR scalar of a python environment |
requirements_file
|
CHR scalar file path to a suitable requirements.txt or environment.yml file |
conda_alias
|
CHR scalar of the command line interface alias for your conda tools (default: NULL is translated first to the environment variable CONDA_CLI and then to “conda”) |
Details
This is a helper function, largely to support versions of reticulate prior to the introduction of the environment argument in version 1.24+.
Value
None, directly updates the referenced python environment
Note
This requires conda CLI tools to be installed.
A default installation alias of “conda” is assumed.
Set global variable ‘CONDA_CLI’ to your conda alias for better support.
update_logger_settings | R Documentation |
Update logger settings
Description
This applies the internal routing and formatting for logger functions to the current value of the LOGGING object. If LOGGING is changed (i.e. a logging namespace is added or changed) this function should be run to update routing and formatting to be in line with the current settings.
Usage
update_logger_settings(log_all_warnings = FALSE, log_all_errors = FALSE)
Arguments
log_all_warnings
|
LGL scalar indicating whether or not to log all warnings (default: TRUE) |
log_all_errors
|
LGL scalar indicating whether or not to log all errors (default: TRUE) |
Value
None
Note
The calling stack for auto logging of warnings and errors does not work with background processes. These settings call [logger::log_warnings()] and [logger::log_errors()].
This function is used only for its side effects.
user_guide | R Documentation |
Launch the User Guide for DIMSpec
Description
Use this function to launch the bookdown version of the User Guide for the NIST Database Infrastructure for Mass Spectrometry (DIMSpec) Toolkit
Usage
user_guide()
Arguments
path
|
CHR scalar representing a valid file path to the local user guide |
url_gh
|
CHR scalar pointing to the web resource, in this case the URL to the User Guide hosted on GitHub pages |
view_on_github
|
LGL scalar of whether to use the hosted version of the User Guide on GitHub (default: TRUE is recommended) which will always display the most up to date version |
Value
None, opens a browser to the index page of the User Guide
Note
This works ONLY when DIMSpec is used as a project with the defined directory structure
valid_file_format | R Documentation |
Ensure files uploaded to a shiny app are of the required file type
Description
This input validation check uses [tools::file_ext] to ensure that files uploaded to [shiny::fileInput] are among the acceptable file formats. Users may sometimes wish to load a file outside the “accepts” format list by manually changing it during the upload process. If they are not, a [nist_shinyalert] modal is displayed prompting the user to upload a file in one of the requested formats.
Usage
req(valid_file_format(input$file_upload, c(".csv", ".xls")))
Arguments
filename
|
CHR scalar name of the file uploaded to the shiny server |
accepts
|
CHR vector of acceptable file formats |
show_alert
|
LGL scalar indicating whether or not to show an alert, set FALSE to return the status of the check |
Value
Whether or not all required values are present.
validate_casrns | R Documentation |
Validate a CAS RN
Description
Chemical Abstract Service (CAS) Registry Numbers (RNs) follow a standard creation format. From [https://www.cas.org/support/documentation/chemical-substances/faqs], a CAS RN is a “numeric identifier that can contain up to 10 digits, divided by hyphens into three parts. The right-most digit is a check digit used to verify the validity and uniqueness of the entire number. For example, 58-08-2 is the CAS Registry Number for caffeine.”
Usage
validate_casrns(casrn_vec, strip_bad_cas = TRUE)
Arguments
casrn_vec
|
CHR vector of what CAS RNs to validate |
strip_bad_cas
|
LGL scalar of whether to strip out invalid CAS RNs (default: TRUE) |
Details
Provided CAS RNs in ‘casrn_vec’ are validated for format and their checksum digit. Those failing will be printed to the console by default, and users have the option of stripping unverified entries from the return vector.
This only validates that a CAS RN is properly constructed; it does not indicate that the registry number exists in the CAS Registry.
See [repair_xl_casrn_forced_to_date] as one possible pre-processing step.
Value
CHR vector of length equal to that of ‘casrn_vec’
Examples
validate_casrns(c("64324-08-9", "64324-08-5", "12332")) validate_casrns(c("64324-08-9", "64324-08-5", "12332"), strip_bad_cas = FALSE)
validate_column_names | R Documentation |
Ensure database column presence
Description
When working with SQL databases, this convenience function validates any number of column names by comparing against the list of column names in any number of tables. Typically it is called transparently inline to cause execution failure when column names are not present in referenced tables during build of SQL queries.
Usage
validate_column_names(con, "peaks", "id")
Arguments
db_conn
|
connection object (e.g. of class “SQLiteConnection”) |
table_names
|
CHR vector of tables to search |
column_names
|
CHR vector of column names to validate |
Value
None
validate_tables | R Documentation |
Ensure database table presence
Description
When working with SQL databases, this convenience function validates any number of table names by comparing against the list of those present. Typically it is called transparently inline to cause execution failure when tables are not present during build of SQL queries.
Usage
validate_tables(con, "peaks")
Arguments
db_conn
|
connection object (e.g. of class “SQLiteConnection”) |
table_names
|
CHR vector name of tables to ensure are present |
Value
Failure if the table doesn’t exist, none if it does.
verify_args | R Documentation |
Verify arguments for a function
Description
This helper function checks arguments against a list of expectations. This was in part inspired by the excellent testthat package and shares concepts with the Checkmate package. However, this function performs many of the common checks without additional package dependencies, and can be inserted into other functions for a project easily with:
arg_check <- verify_args(args = as.list(environment()), conditions = list(param1 = c("mode", "logical"), param2 = c("length", 1))
and check the return with
if (!arg_check$valid) cat(paste0(arg_check$messages, "\n"))
where argument conditions
describes the tests. This comes at the price
of readability as the list items in conditions
do not have to be
named, but can be to improve clarity. See more details below for argument
conditions
to view which expectations are currently supported.
As this is a nested list condition check, it can also originate from any
source coercible to a list (e.g. JSON, XML, etc.) and this feature, along
with the return of human-meaningful evaluation strings, is particularly
useful for development of shiny applications. Values from other sources MUST
be coercible to a full list (e.g. if being parsed from JSON, use
jsonlite::fromJSON(simplifyMatrix = FALSE)
)
Usage
verify_args(args = list(character_length_2 = c("a", "b")), conditions = list(character_length_2 = list(c("mode", "character"), c("length", 3)) ) verify_args(args = list(boolean = c(TRUE, FALSE, TRUE)), conditions = list(list(c("mode", "logical"), c("length", 1))) ) verify_args(args = list(foo = c(letters[1:3]), bar = 1:10), conditions = list(foo = list(c("mode", "numeric"), c("n>", 5)), bar = list(c("mode", "logical"), c("length", 5), c(">", 10), c("between", list(100, 200)), c("choices", list("a", "b")))) )
Arguments
args
|
LIST of named arguments and their values, typically passed
directly from a function definition in the form |
conditions
|
Nested LIST of conditions and values to check, with one
list item for each element in
Multiple expectation conditions can be set for
each element of
Currently supported expectations are:
|
from_fn
|
CHR scalar of the function from which this is called, used if
logger is enabled and ignored if not; by default it will pull the calling
function’s name from the call stack, but can be overwritten by a manual
entry here for better tracing. (default |
silent
|
LGL scalar of whether to silence warnings for individual
failiures, leaving them only as part of the output. (default: |
Value
LIST of the resulting values and checks, primarily useful for its
\(valid</code> (<code>TRUE</code> if all checks pass or <code>FALSE</code> if any fail) and <code>\)message
values.
Note
If logger is enabled, also provides some additional meaningful feedback.
At least one condition check is required for every element passed to args
.
verify_import_columns | R Documentation |
Verify column names for import
Description
This function validates that all required columns are present prior to importing into a database column by examining provided values against the database schema. This is more of a sanity check on other functions than anything, but also strips extraneous columns to meet the needs of an INSERT action. The input to ‘values’ should be either a LIST or named CHR vector of values for insertion or a CHR vector of the column names.
Usage
verify_import_columns( values, db_table, names_only = FALSE, require_all = TRUE, db_conn = con, log_ns = "db" )
Arguments
values
|
LIST or CHR vector of values to add. If ‘names_only’ is TRUE, values are directly interpreted as column names. Otherwise, all values provided must be named. |
db_table
|
CHR scalar of the table name |
names_only
|
LGL scalar of whether to treat entries of ‘values’ as the column names rather than the column values (default: FALSE) |
require_all
|
LGL scalar of whether to require all columns (except the assumed primary key column of “id”) or only those defined as “NOT NULL” (default: TRUE requires the presence of all columns in the table) |
db_conn
|
connection object (default: con) |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Value
An object of the same type as ‘values’ with extraneous values (i.e. those not matching a database column header) stripped away.
Note
If columns are defined as required in the schema and are not present, this will fail with an informative message about which columns were missing.
If columns are provided that do not match the schema, they will be stripped away in the return value.
verify_import_requirements | R Documentation |
Verify an import file’s properties
Description
Checks an import file’s characteristics against expectations. This is mostly a sanity check against changing conditions from project to project. Import requirements should be defined at the environment level and enumerated as a JSON object, which can be created by calling [make_requirements] on an example import for simplicity. An example is provided in the ‘examples’ directory as “NIST_import_requirements.json”. If multiple requirements are in use (e.g. pulling from multiple locations), this can be run multiple times with different values of ‘requirement_obj’ or ‘file_name’.
Usage
verify_import_requirements( obj, ignore_extra = TRUE, requirements_obj = "import_requirements", file_name = "import_requirements", log_issues_as = "warn", log_ns = "db" )
Arguments
obj
|
LIST of the object to import matching structure expectations, typically from a JSON file fed through [full_import] |
ignore_extra
|
LGL scalar of whether to ignore extraneous import elements or stop the import process (default: TRUE) |
requirements_obj
|
CHR scalar of the name of an R object holding import requirements; this is a convenience shorthand to prevent multiple imports from parameter ‘file_name’ (default: “import_requirements”) |
file_name
|
CHR scalar of the name of a file holding import requirements; if this has already been added to the calling environment, ‘requirements_obj’ will be used preferentially as the name of that object |
log_issues_as
|
CHR scalar of the log level to use (default: “warn”), which must be a valid log level as in [logger::FATAL]; will be ignored if the [logger] package isn’t available |
log_ns
|
CHR scalar of the logging namespace to use (default: “db”) |
Details
The return from this is a tibble with 9 columns. The first is the name of the import object member, typically the file name. If a single, unnested import object is provided this will be “import object”. The other columns include the following verification checks:
- has_all_required: Are all required names present in the sample? (TRUE/FALSE)
- missing_requirements: Character vectors naming any of the missing requirements
- has_full_detail: Is all expected detail present? (TRUE/FALSE)
- missing_detail: Character vectors naming any missing value sets
- has_extra: Are there unexpected values provided? (TRUE/FALSE)
- extra_cols: Character vectors naming any has_extra columns; these will be dropped from the import but are provided for information sake
- has_name_mismatches: Are there name differences between the import requirement elements and the import object? (TRUE/FALSE)
- mismatched_names: Named lists enumerating which named elements (if any) from the import object did not match name expectations in the requirements
All of this is defined by the ‘requirements_obj’ list. Do not provide that list directly, instead pass this function the name of the requirements object for interoperability. If a ‘requirements_obj’ cannot be identified via [base::exists] then the ‘file_name’ will take precedence and be imported. Initial use and set up may be easier in interactive sessions.
Value
A tibble object with 9 columns containing the results of the checks.
Note
If ‘file_name’ is provided, it need not be fully defined. The value provided will be used to search the project directory.
with_help | R Documentation |
Convenience application of add_help
using pipes directly in UI.R
Description
This may not work for certain widgets with heavily nested HTML. Note that classes may be CSS dependent.
Usage
actionButton("example", "With Help") with_help("Now with a question mark icon hosting a tooltip") actionButton("example", "With Help") with_help("Large and green", size = "xl", class = "success")
Arguments
widget
|
shiny.tag widget |
tooltip
|
CHR scalar of the tooltip text |
…
|
Other named arguments to be passed to ‘add_help’ |
Value
The widget
provided with a hover tooltip icon appended to it.
Note
Most standard Shiny widgets are supported, but maybe not all.