masskit.peptide package¶
Submodules¶
masskit.peptide.encoding module¶
- masskit.peptide.encoding.calc_ion_series(ion_type, num_isotopes, cumulative_masses, arrays, peptide, mod_names, mod_positions, neutral_loss, charge_in, analysis, positions, start_offset=0, max_internal_size=7)¶
- masskit.peptide.encoding.calc_ions_mz(peptide, ion_types_in, mod_names=None, mod_positions=None, analysis_annotations=False, precursor_charge=2, num_isotopes=2, max_internal_size=7)¶
calculate the mz values of an ion type default values are taken from the HCD values in https://pubs.acs.org/doi/full/10.1021/pr3007045
- Parameters:
peptide – the peptide sequence
ion_types_in – tuple or array of tuple of ion type and charge
mod_names – any modifications
mod_positions – the positions of the modifications
analysis_annotations – add additional annotations useful for analyzing spectra
precursor_charge – used to filter out ion types with charge greater than the precursor
num_isotopes – number of carbon 13 isotopes to calculate
- Returns:
a numpy arrays of the mz values for the ion series, ion intensities, annotations as an arrow list, precursor mass, fields used for analysing ion peaks
- masskit.peptide.encoding.calc_named_ions(arrays, analysis=None, named_ion=None, precursor_mass=None, precursor_charge=None, charge_in=None, neutral_loss=None, num_isotopes=2)¶
- masskit.peptide.encoding.calc_precursor_mass(peptide, mod_names=None, mod_positions=None)¶
calculate mass of modified peptide
- Parameters:
peptide – the peptide
mod_names – the modification ids
mod_positions – the positions of the modifications
- Returns:
the mass
- masskit.peptide.encoding.calc_precursor_mz(peptide, charge, mod_names=None, mod_positions=None)¶
calculate m/z of modified peptide
- Parameters:
peptide – the peptide
charge – the charge of the peptide
mod_names – the modification ids
mod_positions – the positions of the modifications
- Returns:
the mass
- masskit.peptide.encoding.expand_mod_string(mod_string)¶
decode modification string into site and position
- Parameters:
mod_string – the standard modification string, e.g. “A” or “A0” or “$”
- Returns:
tuple of site, position
- masskit.peptide.encoding.mod_mass_pos(mod_positions, mod_names, i)¶
at a given pos in the sequence, find any matching modification positions in mod_positions and sum up the masses of the modifications
- Parameters:
mod_positions – mod positions
mod_names – mod names
i – position in peptide
- Returns:
masses of matching modifications
- masskit.peptide.encoding.parse_ion_type_tuple(tuple_in, precursor_charge)¶
split ion_type tuple into ion type and neutral loss, if specified
- Parameters:
tuple_in – ion type tuple
- Raises:
ValueError – more than one neutral loss
- Returns:
ion type, neutral loss
- masskit.peptide.encoding.parse_modification_encoding(modification_encoding)¶
Takes a string containing a set of modification strings and creates a list of tuples. The tuples contain the modification name, the site, and the position of the modification. The string has the following format:
Site encoding of a modification: A-Y amino acid
which can be appended with a modification position encoding: 0 peptide N-terminus . peptide C-terminus ^ protein N-terminus $ protein C-terminus
So that ‘K.’ means lysine at the C-terminus of the peptide. The position encoding can be used separately, e.g. ‘^’ means apply to any protein N-terminus, regardless of amino acid
A list of modifications is separated by hashes: Phospho{S}#Methyl{0/I}#Carbamidomethyl#Deamidated{F^/Q/N}
An optional list of sites is specified within the {} for each modification. If there are no ‘{}’ then a default set of sites is used. Multiple sites are separated by a ‘/’.
“0” by itself implies “00” “.” by itself implies “..” “^” by itself implies “0^” “$” by itself implies “.$”
- Parameters:
modification_encoding – a string containing the above format
- Returns:
list of tuples, each tuple has modification name, site, and position
- masskit.peptide.encoding.protonate_mass(mass, z)¶
Given a neutral mass and charge of an ion, calculate the m/z of the ion
- Parameters:
mass – mass
z – charge
- Returns:
m/z
masskit.peptide.spectrum_generator module¶
- masskit.peptide.spectrum_generator.add_theoretical_spectra(df, theoretical_spectrum_column=None, ion_types=None, num_isotopes=2)¶
add theoretical spectra to a column
- Parameters:
df – dataframe containing spectra
theoretical_spectrum_column – name of the column to hold theoretical spectra
ion_types – ion types to generate. None is default for TheoreticalPeptideSpectrum
num_isotopes – number of c-13 isotopes to calculate
- masskit.peptide.spectrum_generator.create_peptide_name(peptide, precursor_charge, mod_names=None, mod_positions=None, ev=None)¶
_ create the name of a peptide spectrum
- Parameters:
peptide – the peptide string
precursor_charge – the precursor charge
mod_names – list of modification names (integer)
mod_positions – position of modifications, 0 based
ev – collision energy in ev
- masskit.peptide.spectrum_generator.generate_mods(peptide, mod_list, n_peptide=False, c_peptide=False, mod_probability=None)¶
Given a peptide and a list of modifications expressed as tuples, place the allowable modifications on the peptide.
- Parameters:
mod_list – the list of allowed modifications, expressed as a string (see encoding.py)
peptide – the peptide
n_peptide – is the peptide at the N terminus of the protein?
c_peptide – is the peptide at the C terminus of the protein?
mod_probability – the probability of a modification at a particular site. None=1.0
- Returns:
list of modification name, list of modification positions
- masskit.peptide.spectrum_generator.generate_peptide_library(num=100, min_length=5, max_length=30, min_charge=1, max_charge=8, min_ev=10, max_ev=60, mod_list=None, set='train', mod_probability=0.1)¶
Generate a theoretical peptide library
- Parameters:
set – which set to create, e.g. train, valid, test
num – the number of peptides
min_length – minimum length of the peptides
max_length – maximum length of the peptides
min_charge – the minimum charge of the peptides
max_charge – the maximum charge of the peptides
min_ev – the minimum eV (also used for nce)
max_ev – the maximum eV (also used for nce)
mod_list – the list of allowed modifications, expressed as a string (see encoding.py)
mod_probability – the probability of a modification at a particular site
- Returns:
the dataframe