masskit.apps.process.libraries package

Subpackages

Submodules

masskit.apps.process.libraries.batch_converter module

masskit.apps.process.libraries.batch_converter.batch_converter_app(config: omegaconf.DictConfig) None
masskit.apps.process.libraries.batch_converter.write_batch(writers, table)

masskit.apps.process.libraries.converter module

masskit.apps.process.libraries.converter.converter_app(config: omegaconf.DictConfig) None
masskit.apps.process.libraries.converter.disable_console_logging()

masskit.apps.process.libraries.fasta2peptides module

class masskit.apps.process.libraries.fasta2peptides.PepTuple(nterm, pep, cterm)

Bases: tuple

cterm

Alias for field number 2

nterm

Alias for field number 0

pep

Alias for field number 1

masskit.apps.process.libraries.fasta2peptides.count_rhk(peptide)

Return a count of the basic residues in a peptide

Parameters:

peptide

Returns:

count of basic residues

masskit.apps.process.libraries.fasta2peptides.extract_peptides(cfg)

Breakdown all peptides from all of the proteins in a fasta file according to the specified digestion strategy.

Parameters:

cfg – configuration parameters

Returns:

non-redundant list of peptides.

masskit.apps.process.libraries.fasta2peptides.fasta(filename)

Iterate over a fasta file, yields tuples of (header, sequence)

Parameters:

filename – name of fasta file

Returns:

masskit.apps.process.libraries.fasta2peptides.fasta2peptides_app(cfg: omegaconf.DictConfig) None
masskit.apps.process.libraries.fasta2peptides.fasta_parse_id(header)

Extract an accession from a UniProt fasta file header string The UniProtKB uses the following format: >db|UniqueIdentifier|EntryName ProteinName OS=OrganismName OX=OrganismIdentifier [GN=GeneName ]PE=ProteinExistence SV=SequenceVersion

Parameters:

header – UniProt formated fasta header string

Returns:

accession string

masskit.apps.process.libraries.fasta2peptides.nonspecific(residues, min, max, missed)

Yield eptides are peptides which were cleaved at every location.

Parameters:
  • residues – protein string

  • min – minimum length for returned peptides

  • max – maximum length for returned peptides

  • missed – maximum number of possible missed cleavages

Returns:

yield peptides in order

class masskit.apps.process.libraries.fasta2peptides.pepgen(peptides, pep2proteins, cfg)

Bases: object

add_row(row)

add row to row cache

Parameters:

row – the new row

enumerate()

Perform the work to generate all of the decorated peptides

Returns:

The completed pyarrow table

finalize_table()

retrieve the completed table

Returns:

a pyarrow table

permute_mods(pep, mods, max_mods=4)

Yield all of the possible permutations of the set of modifications applied to the given peptide

Parameters:
  • pep – a peptide string

  • mods – list of mods

  • max_mods – maximum number of modifications to apply at one time

Returns:

yield a peptide with modifications applied

masskit.apps.process.libraries.fasta2peptides.semitryptic(residues, min, max, missed)

Yield semi-Tryptic Peptides are peptides which were cleaved at the C-Terminal side of arginine (R) and lysine (K) by trypsin at one end but not the other.

Parameters:
  • residues – protein string

  • min – minimum length for returned peptides

  • max – maximum length for returned peptides

  • missed – maximum number of possible missed cleavages

Returns:

yield peptides in order

masskit.apps.process.libraries.fasta2peptides.trypsin(residues)

Follow the rules for a tryptic digestion enzyme to yield peptides from a given protein string. The cleavage rule for trypsin is: after R or K, but not before P

Parameters:

residues – protein string

Returns:

yield peptides in order

masskit.apps.process.libraries.fasta2peptides.tryptic(residues, min, max, missed)

Control a tryptic digest by limiting the length and allowing for missed cleavages

Parameters:
  • residues – protein string

  • min – minimum length for returned peptides

  • max – maximum length for returned peptides

  • missed – maximum number of possible missed cleavages

Returns:

yield peptides in order

masskit.apps.process.libraries.optimize_parquet module

masskit.apps.process.libraries.parquet_info module

masskit.apps.process.libraries.pepsearch2parquet module

masskit.apps.process.libraries.pepsearch2parquet.read_pepsearch(file_in, df_in=None, filename_fields_in=None, title_fields_in=None)

read in a file from mspepsearch and turn it into a standard arrow table :param file_in: stream or filename of tsv file to read in :param df_in: the dataframe to concatenate to, otherwise a dataframe is created :param filename_fields_in: regex for pulling fields from the filename :param title_fields_in: regex for pulling fields from the Unknown name :return: the dataframe

masskit.apps.process.libraries.rewrite_sdf module

masskit.apps.process.libraries.rewrite_sdf.rewrite_sdf_app(config)

masskit.apps.process.libraries.transform_table module

masskit.apps.process.libraries.transform_table.cast_columns(table: Table, cfg: omegaconf.DictConfig)
masskit.apps.process.libraries.transform_table.compress_start_stop(table: Table)
masskit.apps.process.libraries.transform_table.concat_to_output(files: list, sort: omegaconf.DictConfig, output: omegaconf.DictConfig)
masskit.apps.process.libraries.transform_table.get_sort_index(table: Table, cfg: omegaconf.DictConfig)
masskit.apps.process.libraries.transform_table.misc_operations(table: Table, cfg: omegaconf.DictConfig)
masskit.apps.process.libraries.transform_table.process_and_cache(infile: Path, outfile: Path, sort: omegaconf.DictConfig, batch_size=500, casts=None, operations=None) Path
masskit.apps.process.libraries.transform_table.str2pyarrow_type(s: str) DataType
masskit.apps.process.libraries.transform_table.transform_table_app(cfg: omegaconf.DictConfig) None

masskit.apps.process.libraries.update_sets module

masskit.apps.process.libraries.update_sets.update_sets_app(config: omegaconf.DictConfig) None

Module contents