masskit.peptide package

Submodules

masskit.peptide.encoding module

masskit.peptide.encoding.calc_ion_series(ion_type, num_isotopes, cumulative_masses, arrays, peptide, mod_names, mod_positions, neutral_loss, charge_in, analysis, positions, start_offset=0, max_internal_size=7)
masskit.peptide.encoding.calc_ions_mz(peptide, ion_types_in, mod_names=None, mod_positions=None, analysis_annotations=False, precursor_charge=2, num_isotopes=2, max_internal_size=7)

calculate the mz values of an ion type default values are taken from the HCD values in https://pubs.acs.org/doi/full/10.1021/pr3007045

Parameters:
  • peptide – the peptide sequence

  • ion_types_in – tuple or array of tuple of ion type and charge

  • mod_names – any modifications

  • mod_positions – the positions of the modifications

  • analysis_annotations – add additional annotations useful for analyzing spectra

  • precursor_charge – used to filter out ion types with charge greater than the precursor

  • num_isotopes – number of carbon 13 isotopes to calculate

Returns:

a numpy arrays of the mz values for the ion series, ion intensities, annotations as an arrow list, precursor mass, fields used for analysing ion peaks

masskit.peptide.encoding.calc_named_ions(arrays, analysis=None, named_ion=None, precursor_mass=None, precursor_charge=None, charge_in=None, neutral_loss=None, num_isotopes=2)
masskit.peptide.encoding.calc_precursor_mass(peptide, mod_names=None, mod_positions=None)

calculate mass of modified peptide

Parameters:
  • peptide – the peptide

  • mod_names – the modification ids

  • mod_positions – the positions of the modifications

Returns:

the mass

masskit.peptide.encoding.calc_precursor_mz(peptide, charge, mod_names=None, mod_positions=None)

calculate m/z of modified peptide

Parameters:
  • peptide – the peptide

  • charge – the charge of the peptide

  • mod_names – the modification ids

  • mod_positions – the positions of the modifications

Returns:

the mass

masskit.peptide.encoding.expand_mod_string(mod_string)

decode modification string into site and position

Parameters:

mod_string – the standard modification string, e.g. “A” or “A0” or “$”

Returns:

tuple of site, position

masskit.peptide.encoding.mod_mass_pos(mod_positions, mod_names, i)

at a given pos in the sequence, find any matching modification positions in mod_positions and sum up the masses of the modifications

Parameters:
  • mod_positions – mod positions

  • mod_names – mod names

  • i – position in peptide

Returns:

masses of matching modifications

masskit.peptide.encoding.parse_ion_type_tuple(tuple_in, precursor_charge)

split ion_type tuple into ion type and neutral loss, if specified

Parameters:

tuple_in – ion type tuple

Raises:

ValueError – more than one neutral loss

Returns:

ion type, neutral loss

masskit.peptide.encoding.parse_modification_encoding(modification_encoding)

Takes a string containing a set of modification strings and creates a list of tuples. The tuples contain the modification name, the site, and the position of the modification. The string has the following format:

Site encoding of a modification: A-Y amino acid

which can be appended with a modification position encoding: 0 peptide N-terminus . peptide C-terminus ^ protein N-terminus $ protein C-terminus

So that ‘K.’ means lysine at the C-terminus of the peptide. The position encoding can be used separately, e.g. ‘^’ means apply to any protein N-terminus, regardless of amino acid

A list of modifications is separated by hashes: Phospho{S}#Methyl{0/I}#Carbamidomethyl#Deamidated{F^/Q/N}

An optional list of sites is specified within the {} for each modification. If there are no ‘{}’ then a default set of sites is used. Multiple sites are separated by a ‘/’.

“0” by itself implies “00” “.” by itself implies “..” “^” by itself implies “0^” “$” by itself implies “.$”

Parameters:

modification_encoding – a string containing the above format

Returns:

list of tuples, each tuple has modification name, site, and position

masskit.peptide.encoding.protonate_mass(mass, z)

Given a neutral mass and charge of an ion, calculate the m/z of the ion

Parameters:
  • mass – mass

  • z – charge

Returns:

m/z

masskit.peptide.spectrum_generator module

masskit.peptide.spectrum_generator.add_theoretical_spectra(df, theoretical_spectrum_column=None, ion_types=None, num_isotopes=2)

add theoretical spectra to a column

Parameters:
  • df – dataframe containing spectra

  • theoretical_spectrum_column – name of the column to hold theoretical spectra

  • ion_types – ion types to generate. None is default for TheoreticalPeptideSpectrum

  • num_isotopes – number of c-13 isotopes to calculate

masskit.peptide.spectrum_generator.create_peptide_name(peptide, precursor_charge, mod_names=None, mod_positions=None, ev=None)

_ create the name of a peptide spectrum

Parameters:
  • peptide – the peptide string

  • precursor_charge – the precursor charge

  • mod_names – list of modification names (integer)

  • mod_positions – position of modifications, 0 based

  • ev – collision energy in ev

masskit.peptide.spectrum_generator.generate_mods(peptide, mod_list, n_peptide=False, c_peptide=False, mod_probability=None)

Given a peptide and a list of modifications expressed as tuples, place the allowable modifications on the peptide.

Parameters:
  • mod_list – the list of allowed modifications, expressed as a string (see encoding.py)

  • peptide – the peptide

  • n_peptide – is the peptide at the N terminus of the protein?

  • c_peptide – is the peptide at the C terminus of the protein?

  • mod_probability – the probability of a modification at a particular site. None=1.0

Returns:

list of modification name, list of modification positions

masskit.peptide.spectrum_generator.generate_peptide_library(num=100, min_length=5, max_length=30, min_charge=1, max_charge=8, min_ev=10, max_ev=60, mod_list=None, set='train', mod_probability=0.1)

Generate a theoretical peptide library

Parameters:
  • set – which set to create, e.g. train, valid, test

  • num – the number of peptides

  • min_length – minimum length of the peptides

  • max_length – maximum length of the peptides

  • min_charge – the minimum charge of the peptides

  • max_charge – the maximum charge of the peptides

  • min_ev – the minimum eV (also used for nce)

  • max_ev – the maximum eV (also used for nce)

  • mod_list – the list of allowed modifications, expressed as a string (see encoding.py)

  • mod_probability – the probability of a modification at a particular site

Returns:

the dataframe

Module contents