masskit.apps.process.libraries package¶
Subpackages¶
Submodules¶
masskit.apps.process.libraries.batch_converter module¶
- masskit.apps.process.libraries.batch_converter.batch_converter_app(config: omegaconf.DictConfig) None ¶
- masskit.apps.process.libraries.batch_converter.write_batch(writers, table)¶
masskit.apps.process.libraries.converter module¶
- masskit.apps.process.libraries.converter.converter_app(config: omegaconf.DictConfig) None ¶
- masskit.apps.process.libraries.converter.disable_console_logging()¶
masskit.apps.process.libraries.fasta2peptides module¶
- class masskit.apps.process.libraries.fasta2peptides.PepTuple(nterm, pep, cterm)¶
Bases:
tuple
- cterm¶
Alias for field number 2
- nterm¶
Alias for field number 0
- pep¶
Alias for field number 1
- masskit.apps.process.libraries.fasta2peptides.count_rhk(peptide)¶
Return a count of the basic residues in a peptide
- Parameters:
peptide –
- Returns:
count of basic residues
- masskit.apps.process.libraries.fasta2peptides.extract_peptides(cfg)¶
Breakdown all peptides from all of the proteins in a fasta file according to the specified digestion strategy.
- Parameters:
cfg – configuration parameters
- Returns:
non-redundant list of peptides.
- masskit.apps.process.libraries.fasta2peptides.fasta(filename)¶
Iterate over a fasta file, yields tuples of (header, sequence)
- Parameters:
filename – name of fasta file
- Returns:
- masskit.apps.process.libraries.fasta2peptides.fasta2peptides_app(cfg: omegaconf.DictConfig) None ¶
- masskit.apps.process.libraries.fasta2peptides.fasta_parse_id(header)¶
Extract an accession from a UniProt fasta file header string The UniProtKB uses the following format: >db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion
- Parameters:
header – UniProt formated fasta header string
- Returns:
accession string
- masskit.apps.process.libraries.fasta2peptides.nonspecific(residues, min, max, missed)¶
Yield eptides are peptides which were cleaved at every location.
- Parameters:
residues – protein string
min – minimum length for returned peptides
max – maximum length for returned peptides
missed – maximum number of possible missed cleavages
- Returns:
yield peptides in order
- class masskit.apps.process.libraries.fasta2peptides.pepgen(peptides, pep2proteins, cfg)¶
Bases:
object
- add_row(row)¶
add row to row cache
- Parameters:
row – the new row
- enumerate()¶
Perform the work to generate all of the decorated peptides
- Returns:
The completed pyarrow table
- finalize_table()¶
retrieve the completed table
- Returns:
a pyarrow table
- permute_mods(pep, mods, max_mods=4)¶
Yield all of the possible permutations of the set of modifications applied to the given peptide
- Parameters:
pep – a peptide string
mods – list of mods
max_mods – maximum number of modifications to apply at one time
- Returns:
yield a peptide with modifications applied
- masskit.apps.process.libraries.fasta2peptides.semitryptic(residues, min, max, missed)¶
Yield semi-Tryptic Peptides are peptides which were cleaved at the C-Terminal side of arginine (R) and lysine (K) by trypsin at one end but not the other.
- Parameters:
residues – protein string
min – minimum length for returned peptides
max – maximum length for returned peptides
missed – maximum number of possible missed cleavages
- Returns:
yield peptides in order
- masskit.apps.process.libraries.fasta2peptides.trypsin(residues)¶
Follow the rules for a tryptic digestion enzyme to yield peptides from a given protein string. The cleavage rule for trypsin is: after R or K, but not before P
- Parameters:
residues – protein string
- Returns:
yield peptides in order
- masskit.apps.process.libraries.fasta2peptides.tryptic(residues, min, max, missed)¶
Control a tryptic digest by limiting the length and allowing for missed cleavages
- Parameters:
residues – protein string
min – minimum length for returned peptides
max – maximum length for returned peptides
missed – maximum number of possible missed cleavages
- Returns:
yield peptides in order
masskit.apps.process.libraries.optimize_parquet module¶
masskit.apps.process.libraries.parquet_info module¶
masskit.apps.process.libraries.pepsearch2parquet module¶
- masskit.apps.process.libraries.pepsearch2parquet.read_pepsearch(file_in, df_in=None, filename_fields_in=None, title_fields_in=None)¶
read in a file from mspepsearch and turn it into a standard arrow table :param file_in: stream or filename of tsv file to read in :param df_in: the dataframe to concatenate to, otherwise a dataframe is created :param filename_fields_in: regex for pulling fields from the filename :param title_fields_in: regex for pulling fields from the Unknown name :return: the dataframe
masskit.apps.process.libraries.pubchem_links module¶
- class masskit.apps.process.libraries.pubchem_links.Analyze(cfg)¶
Bases:
object
- do_joins()¶
- load_cas()¶
- load_nist_data()¶
- patent_counts()¶
- pmid_counts()¶
- save_data()¶
- wikipedia_counts()¶
- class masskit.apps.process.libraries.pubchem_links.Download(urls: Iterable[str], dest_dir: str)¶
Bases:
object
- download_url(task_id: rich.progress.TaskID, url: str, path)¶
- class masskit.apps.process.libraries.pubchem_links.PubChemCAS(cfg)¶
Bases:
object
- get_pubchem_cas()¶
- parse_pubchem_json()¶
- pubchem_cas(session, page)¶
- use_pubchem_cache(filename)¶
- class masskit.apps.process.libraries.pubchem_links.PubChemFTP(cfg)¶
Bases:
object
- cache_pubchem_files(cfg: omegaconf.DictConfig)¶
- get_convert_options(col_types)¶
- get_csv_type(t)¶
- get_new_columns(old_columns, cfg: omegaconf.DictConfig)¶
- process(table: Table, cfg: omegaconf.DictConfig, column_names)¶
- class masskit.apps.process.libraries.pubchem_links.PubChemWiki(cfg)¶
Bases:
object
- fetch_counts()¶
- get_pubchem_wiki()¶
- parse_counts(wikijson)¶
- parse_pubchem_json()¶
- pubchem_wiki(session, page)¶
- use_pubchem_cache(filename)¶
- masskit.apps.process.libraries.pubchem_links.main(cfg: omegaconf.DictConfig) int ¶
masskit.apps.process.libraries.rewrite_sdf module¶
- masskit.apps.process.libraries.rewrite_sdf.rewrite_sdf_app(config)¶
masskit.apps.process.libraries.transform_table module¶
- masskit.apps.process.libraries.transform_table.cast_columns(table: Table, cfg: omegaconf.DictConfig)¶
- masskit.apps.process.libraries.transform_table.compress_start_stop(table: Table)¶
- masskit.apps.process.libraries.transform_table.concat_to_output(files: list, sort: omegaconf.DictConfig, output: omegaconf.DictConfig)¶
- masskit.apps.process.libraries.transform_table.get_sort_index(table: Table, cfg: omegaconf.DictConfig)¶
- masskit.apps.process.libraries.transform_table.misc_operations(table: Table, cfg: omegaconf.DictConfig)¶
- masskit.apps.process.libraries.transform_table.process_and_cache(infile: Path, outfile: Path, sort: omegaconf.DictConfig, batch_size=500, casts=None, operations=None) Path ¶
- masskit.apps.process.libraries.transform_table.str2pyarrow_type(s: str) DataType ¶
- masskit.apps.process.libraries.transform_table.transform_table_app(cfg: omegaconf.DictConfig) None ¶
masskit.apps.process.libraries.update_sets module¶
- masskit.apps.process.libraries.update_sets.update_sets_app(config: omegaconf.DictConfig) None ¶