masskit.apps.process.libraries package¶

Subpackages¶

masskit.apps.process.libraries.conf package
- Subpackages
  - masskit.apps.process.libraries.conf.hydra package
    - Subpackages
    - Module contents
- Module contents

Submodules¶

masskit.apps.process.libraries.batch_converter module¶

masskit.apps.process.libraries.batch_converter.batch_converter_app(config: omegaconf.DictConfig) → None¶

masskit.apps.process.libraries.batch_converter.write_batch(writers, table)¶

masskit.apps.process.libraries.converter module¶

masskit.apps.process.libraries.converter.converter_app(config: omegaconf.DictConfig) → None¶

masskit.apps.process.libraries.converter.disable_console_logging()¶

masskit.apps.process.libraries.fasta2peptides module¶

class masskit.apps.process.libraries.fasta2peptides.PepTuple(nterm, pep, cterm)¶

Bases: tuple

cterm¶: Alias for field number 2

nterm¶: Alias for field number 0

pep¶: Alias for field number 1

masskit.apps.process.libraries.fasta2peptides.count_rhk(peptide)¶

Return a count of the basic residues in a peptide

Parameters:: peptide –
Returns:: count of basic residues

masskit.apps.process.libraries.fasta2peptides.extract_peptides(cfg)¶

Breakdown all peptides from all of the proteins in a fasta file according to the specified digestion strategy.

Parameters:: cfg – configuration parameters
Returns:: non-redundant list of peptides.

masskit.apps.process.libraries.fasta2peptides.fasta(filename)¶

Iterate over a fasta file, yields tuples of (header, sequence)

Parameters:: filename – name of fasta file
Returns:

masskit.apps.process.libraries.fasta2peptides.fasta2peptides_app(cfg: omegaconf.DictConfig) → None¶

masskit.apps.process.libraries.fasta2peptides.fasta_parse_id(header)¶

Extract an accession from a UniProt fasta file header string The UniProtKB uses the following format: >db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

Parameters:: header – UniProt formated fasta header string
Returns:: accession string

masskit.apps.process.libraries.fasta2peptides.nonspecific(residues, min, max, missed)¶

Yield eptides are peptides which were cleaved at every location.

Parameters:

residues – protein string
min – minimum length for returned peptides
max – maximum length for returned peptides
missed – maximum number of possible missed cleavages

Returns:

yield peptides in order

class masskit.apps.process.libraries.fasta2peptides.pepgen(peptides, pep2proteins, cfg)¶

Bases: object

add_row(row)¶

add row to row cache

Parameters:: row – the new row

enumerate()¶

Perform the work to generate all of the decorated peptides

Returns:: The completed pyarrow table

finalize_table()¶

retrieve the completed table

Returns:: a pyarrow table

permute_mods(pep, mods, max_mods=4)¶

Yield all of the possible permutations of the set of modifications applied to the given peptide

Parameters:

pep – a peptide string
mods – list of mods
max_mods – maximum number of modifications to apply at one time

Returns:

yield a peptide with modifications applied

masskit.apps.process.libraries.fasta2peptides.semitryptic(residues, min, max, missed)¶

Yield semi-Tryptic Peptides are peptides which were cleaved at the C-Terminal side of arginine (R) and lysine (K) by trypsin at one end but not the other.

Parameters:

residues – protein string
min – minimum length for returned peptides
max – maximum length for returned peptides
missed – maximum number of possible missed cleavages

Returns:

yield peptides in order

masskit.apps.process.libraries.fasta2peptides.trypsin(residues)¶

Follow the rules for a tryptic digestion enzyme to yield peptides from a given protein string. The cleavage rule for trypsin is: after R or K, but not before P

Parameters:: residues – protein string
Returns:: yield peptides in order

masskit.apps.process.libraries.fasta2peptides.tryptic(residues, min, max, missed)¶

Control a tryptic digest by limiting the length and allowing for missed cleavages

Parameters:

residues – protein string
min – minimum length for returned peptides
max – maximum length for returned peptides
missed – maximum number of possible missed cleavages

Returns:

yield peptides in order

masskit.apps.process.libraries.optimize_parquet module¶

masskit.apps.process.libraries.parquet_info module¶

masskit.apps.process.libraries.pepsearch2parquet module¶

masskit.apps.process.libraries.pepsearch2parquet.read_pepsearch(file_in, df_in=None, filename_fields_in=None, title_fields_in=None)¶: read in a file from mspepsearch and turn it into a standard arrow table :param file_in: stream or filename of tsv file to read in :param df_in: the dataframe to concatenate to, otherwise a dataframe is created :param filename_fields_in: regex for pulling fields from the filename :param title_fields_in: regex for pulling fields from the Unknown name :return: the dataframe

masskit.apps.process.libraries.pubchem_links module¶

class masskit.apps.process.libraries.pubchem_links.Analyze(cfg)¶

Bases: object

do_joins()¶

load_cas()¶

load_nist_data()¶

patent_counts()¶

pmid_counts()¶

save_data()¶

wikipedia_counts()¶

class masskit.apps.process.libraries.pubchem_links.Download(urls: Iterable[str], dest_dir: str)¶

Bases: object

download_url(task_id: rich.progress.TaskID, url: str, path)¶

class masskit.apps.process.libraries.pubchem_links.PubChemCAS(cfg)¶

Bases: object

get_pubchem_cas()¶

parse_pubchem_json()¶

pubchem_cas(session, page)¶

use_pubchem_cache(filename)¶

class masskit.apps.process.libraries.pubchem_links.PubChemFTP(cfg)¶

Bases: object

cache_pubchem_files(cfg: omegaconf.DictConfig)¶

get_convert_options(col_types)¶

get_csv_type(t)¶

get_new_columns(old_columns, cfg: omegaconf.DictConfig)¶

process(table: Table, cfg: omegaconf.DictConfig, column_names)¶

class masskit.apps.process.libraries.pubchem_links.PubChemWiki(cfg)¶

Bases: object

fetch_counts()¶

get_pubchem_wiki()¶

parse_counts(wikijson)¶

parse_pubchem_json()¶

pubchem_wiki(session, page)¶

use_pubchem_cache(filename)¶

masskit.apps.process.libraries.pubchem_links.main(cfg: omegaconf.DictConfig) → int¶

masskit.apps.process.libraries.rewrite_sdf module¶

masskit.apps.process.libraries.rewrite_sdf.rewrite_sdf_app(config)¶

masskit.apps.process.libraries.transform_table module¶

masskit.apps.process.libraries.transform_table.cast_columns(table: Table, cfg: omegaconf.DictConfig)¶

masskit.apps.process.libraries.transform_table.compress_start_stop(table: Table)¶

masskit.apps.process.libraries.transform_table.concat_to_output(files: list, sort: omegaconf.DictConfig, output: omegaconf.DictConfig)¶

masskit.apps.process.libraries.transform_table.get_sort_index(table: Table, cfg: omegaconf.DictConfig)¶

masskit.apps.process.libraries.transform_table.misc_operations(table: Table, cfg: omegaconf.DictConfig)¶

masskit.apps.process.libraries.transform_table.process_and_cache(infile: Path, outfile: Path, sort: omegaconf.DictConfig, batch_size=500, casts=None, operations=None) → Path¶

masskit.apps.process.libraries.transform_table.str2pyarrow_type(s: str) → DataType¶

masskit.apps.process.libraries.transform_table.transform_table_app(cfg: omegaconf.DictConfig) → None¶

masskit.apps.process.libraries.update_sets module¶

masskit.apps.process.libraries.update_sets.update_sets_app(config: omegaconf.DictConfig) → None¶

masskit.apps.process.libraries package¶

Subpackages¶

Submodules¶

masskit.apps.process.libraries.batch_converter module¶

masskit.apps.process.libraries.converter module¶

masskit.apps.process.libraries.fasta2peptides module¶

masskit.apps.process.libraries.optimize_parquet module¶

masskit.apps.process.libraries.parquet_info module¶

masskit.apps.process.libraries.pepsearch2parquet module¶

masskit.apps.process.libraries.pubchem_links module¶

masskit.apps.process.libraries.rewrite_sdf module¶

masskit.apps.process.libraries.transform_table module¶

masskit.apps.process.libraries.update_sets module¶

Module contents¶

Table of Contents

This Page