masskit.spectra package¶

Submodules¶

masskit.spectra.ions module¶

class masskit.spectra.ions.HiResIons(*args, **kwargs)¶

Bases: Ions

for containing high mass resolution ions

static cast_intensity(intensity)¶: cast a single mz value

static cast_mz(mz)¶: cast a single mz value

change_mass_info(mass_info, inplace=False, take_max=True)¶

given a new mass info, recalculate tolerance bins

Parameters:

mass_info – the MassInfo structure to change to
inplace – if true, change in place, otherwise return copy
take_max – for each bin take the maximum intensity ion, otherwise sum all ions mapping to the bin

create_tolerance()¶: create start and stop arrays

intersect(comparison_ions, tiebreaker=None)¶

find the intersections between two high resolution ion series. calls standalone function to allow use of numba

Parameters:

comparison_ions – the ion series to compare to
tiebreaker – how to deal with one to multiple matches to peaks in spectra1. mz is closest mz value, intensity is closest intensity, None is report multiple matches

Returns:

matched peak indexes in self, matched peak indexes in comparison_ions

property starts¶

Returns:: the start positions for each peak bin

property stops¶

Returns:: the stop positions for each peak bin

property tolerance¶

Returns:: the mass tolerance for each peak bin

class masskit.spectra.ions.Ions(mz=None, intensity=None, stddev=None, annotations=None, mass_info: MassInfo = None, jitter=0, copy_arrays=True, tolerance=None)¶

Bases: ABC

base class for a series of ions

property annotations¶: per peak annotations

static cast_intensity(intensity)¶: cast a single mz value

static cast_mz(mz)¶: cast a single mz value

clear_and_intersect(ion2, index1, index2, tiebreaker=None)¶

_ if indices are not provided, clear the ions of both self and ion2 of zero intensity peaks then intersect

Parameters:

ion2 – comparison ions
index1 – intersection indices for self (can be None)
index2 – intersection indices for ion2 (can be None)
tiebreaker – how to deal with one to multiple matches to peaks in spectra1. mz is closest mz value, intensity is closest intensity, None is report multiple matches

Returns:

_description_

clear_computed_properties()¶

clear properties that are lazy computed

Returns:: returns self

copy(min_mz=-1, max_mz=0, min_intensity=-1, max_intensity=0)¶

create filtered version of self. This is essentially a copy constructor

Parameters:

min_mz – minimum mz value
max_mz – maximum mz value. 0 = ignore
min_intensity – minimum intensity value
max_intensity – maximum intensity value. 0 = ignore

Returns:

copy

copy_annot(ion2, index1, index2)¶

copy annotations from ion2 to this set of ions using the matched ion indices

Parameters:

ion2 – the ions to compare against
index1 – matched ions in this set of ions
index2 – matched ions in ions2

cosine_score(ion2, index1=None, index2=None, mz_power=0.0, intensity_power=0.5, scale=999, skip_denom=False, tiebreaker=None)¶

calculate the cosine score between this set of ions and ions2

Parameters:

ion2 – the ions to compare against
index1 – matched ions in this set of ions
index2 – matched ions in ions2
mz_power – what power to raise the mz value for each peak
intensity_power – what power to raise the intensity for each peak
scale – what value to scale the score by
skip_denom – skip computing the denominator
tiebreaker – how to deal with one to multiple matches to peaks in self. mz is closest mz value, intensity is closest intensity, None is no tiebreaking

Returns:

cosine score

evenly_space(tolerance=None, take_max=True, max_mz=None, include_zeros=False, take_sqrt=False)¶

convert ions to product ions with evenly spaced m/z bins. The m/z bins are centered on multiples of tolerance * 2. Multiple ions that map to the same bin are either summed or the max taken of the ion intensities.

Parameters:

tolerance – the mass tolerance of the evenly spaced m/z bins (bin width is twice this value) in daltons
take_max – for each bin take the maximum intensity ion, otherwise sum all ions mapping to the bin
max_mz – maximum mz value, 2000 by default
include_zeros – fill out array including bins with zero intensity
take_sqrt – take the sqrt of the intensities

filter(min_mz=-1, max_mz=0, min_intensity=-1, max_intensity=0, inplace=False)¶

filter ions by mz and/or intensity.

Parameters:

min_mz – minimum mz value, exclusive
max_mz – maximum mz value, inclusive. 0 = ignore
min_intensity – minimum intensity value, exclusive
max_intensity – maximum intensity value, inclusive. 0 = ignore
inplace – do operation on current ions, otherwise create copy

Returns:

filtered copy if not inplace, otherwise current ions

half_tolerance(mz)¶

calculate 1/2 of the tolerance interval

Parameters:: mz – mz value
Returns:: 1/2 tolerance interval

property intensity¶

Returns:: the intensity of the precursor. could be a numpy array

abstract intersect(comparison_ions, tiebreaker=None)¶

ions2array(array, channel, bin_size=1.0, precursor=0, intensity_norm=1.0, insert_mz=False, mz_norm=2000.0, rand_intensity=0.0, down_shift=0.0, channel_first=True, take_max=True, stddev_channel=None, take_sqrt=False)¶

fill out an array of fixed size with the ions. note that this func assumes spectra sorted by mz

Parameters:

array – the array to fill out
channel – which channel to fill out in the array
bin_size – the size of each bin in the array
precursor – if nonzero, use this value to invert the spectra by subtracting mz from this value
intensity_norm – value to norm the intensity
insert_mz – instead of putting the normalized intensity in the array, put in the normalized mz
mz_norm – the value to use to norm the mz values inserted
rand_intensity – if not 0, multiply each intensity by random value 1 +/- rand_intensity
down_shift – shift mz down by this value in Da
channel_first – channel before spectrum in input array (pytorch style). tensorflow is channel last.
take_max – take the maximum intensity in a bin rather the sum of peaks in a bin
stddev_channel – which channel contains the std dev. None means no std dev
take_sqrt – take the square root of the intensity

property join¶: returns a dictionary with information on how to join this ion set to another ion set, such as for annotation

mask(indices, inplace=False)¶

mask out ions that are pointed to by the indices

Parameters:

indices – indices of ions to screen out or numpy boolean mask
inplace – do operation on current ions

Returns:

masked copy if not inplace, otherwise current ions

static mask_ions(mask, return_ions)¶

mask out a set of ions

Parameters:

mask – boolean mask
return_ions – ions to be masked

Returns:

masked ions

property mass_type¶

merge(merge_ions, inplace=False)¶

merge another set of ions into this one.

Parameters:

merge_ions – the ions to add in
inplace – do operation on current ions

Returns:

merged copy if not inplace, otherwise current ions

property mz¶

Returns:: the mz of the precursor. could be a numpy array. optionally adds a jitter to the m/z values

property neutral_loss¶

property neutral_loss_charge¶

norm(max_intensity_in=999, keep_type=True, inplace=False, ord=None)¶

norm the intensities

Parameters:

max_intensity_in – the intensity of the most intense peak
keep_type – keep the type of the intensity array
inplace – do operation on current ions, otherwise create copy
ord – if set, normalize using norm order as in np.linalg.norm. 2 = l2

Returns:

normed copy if not inplace, otherwise current ions

num_ions()¶: number of ions in spectrum :return: number of ions in spectrum

parent_filter(h2o=True, inplace=False, precursor_mz=0.0, charge=None)¶

filter parent ions, including water losses.

Parameters:

h2o – filter out water losses
inplace – do operation on current ions, otherwise create copy
precursor_mz – precursor m/z
charge – charge of precursor

Returns:

filtered copy if not inplace, otherwise current ions

property rank¶

return ranks of peaks by intensity 1=most intense, rank is integer over the size of the intensity matrix

Returns:: the rank of the ions by intensity. could be a numpy array.

rank_ions()¶: rank the ions. intensity rank, 1=most intense, rank is integer over the size of the intensity matrix

shift_mz(shift, inplace=False)¶

shift the mz values of all ions by the value of shift. Negative ions are masked out

Parameters:

shift – value to shift mz
inplace – do operation on current ions

Returns:

masked copy if not inplace, otherwise current ions

property stddev¶

Returns:: the std dev of the intensity per peak

property tolerance¶

property tolerance_type¶

total_intensity()¶: total intensity of ions :return: total intensity

windowed_filter(mz_window=7, num_ions=5, inplace=False)¶

filter ions by examining peaks in order of intensity and filtering peaks within a window

Parameters:

mz_window – half size of mz_window for filtering. 0 = no filtering
num_ions – number of ions allowed in full mz_window
inplace – do operation on current ions, otherwise create copy

Returns:

filtered copy if not inplace, otherwise current ions

class masskit.spectra.ions.IonsIterator(ions)¶

Bases: object

iterator over ion mz and intensity

class masskit.spectra.ions.MassInfo(tolerance: float = None, tolerance_type: str = None, mass_type: str = None, neutral_loss: str = None, neutral_loss_charge: int = None, evenly_spaced=False, arrow_struct_accessor=None, arrow_struct_scalar=None)¶

Bases: object

information about mass measurements of an ion peak

masskit.spectra.ions.cosine_score_calc(spectrum1_mz, spectrum1_intensity, spectrum2_mz, spectrum2_intensity, index1, index2, mz_power=0.0, intensity_power=0.5, scale=999, skip_denom=False)¶

the Stein and Scott 94 cosine score. By convention, sqrt of score is taken and multiplied by 999. separated out from class and placed here so that can be jit compiled by numba.

Parameters:

spectrum1_mz – query spectrum mz
spectrum1_intensity – query spectrum intensity
spectrum2_mz – the comparison spectrum2 mz
spectrum2_intensity – the comparison spectrum2 intensity
index1 – matched ions in spectrum1. may include duplicate matches
index2 – matched ions in spectrum2. may include duplicate matches
mz_power – what power to raise the mz value for each peak
intensity_power – what power to raise the intensity for each peak
scale – what value to scale the score by
skip_denom – skip computing the denominator

Returns:

the cosine score

masskit.spectra.ions.dedup_matches(products1, products2, index1, index2, tiebreaker='mz', skip_nomatch=True)¶

given a series of indices to matched peaks in two product ion sets, get rid of duplicate matches to peaks in the first product ion set, using a tiebreaker.

Parameters:

products1 – first set of product ions
products2 – second set of product ions
index1 – indices into first set of product ions
index2 – indices into the second set of product ions
tiebreaker – tiebreak by ‘intensity’ or ‘mz’ of duplicate matches. ‘delete’ means don’t match either. defaults to ‘mz’
skip_nomatch – in the return values, skip missing matches to the first set of produc tions, defaults to True

Returns:

matches of the first product ion set, matches of the second product ion set

masskit.spectra.ions.intersect_hires(ions1_starts, ions1_stops, ions2_starts, ions2_stops)¶

find the intersections between two high resolution ion series

Parameters:

ions1_starts – start positions of the first ion series to compare
ions1_stops – stop positions of the first ion series to compare
ions2_starts – start positions of the second ion series to compare
ions2_stops – stop positions of the second ion series to compare

Returns:

matched peak indexes in ions1, matched peak indexes in ion2

masskit.spectra.ions.my_intersect1d(ar1, ar2)¶

simplified version of numpy intersect1d. Pull outside of class so it can be jit compiled by numba (numba has only experimental class support). Note: this function does not work if there are ions in each spectra with identical mz!

Parameters:

ar1 – mz values for one spectra
ar2 – mz values for another spectra

Returns:

index of matches into array 1, index of matches into array 2

masskit.spectra.ions.nce2ev(nce, precursor_mz, charge)¶

convert nce to ev. Equation for QE and taken from http://proteomicsnews.blogspot.com/2014/06/normalized-collision-energy-calculation.html

Parameters:

nce – normalized collision energy
precursor_mz – precursor m/z
charge – charge

Returns:

ev

masskit.spectra.ipython module¶

masskit.spectra.ipython.is_notebook()¶

check to see if the code is running in a jupyter notebook

Returns:: true if it is

masskit.spectra.join module¶

class masskit.spectra.join.Join(*args, **kwargs)¶

Bases: ABC

abstract do_join(tiebreaker='mz')¶

do the join

Parameters:: tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
Returns:: self

static join_2_spectra(spectra1, spectra2, tiebreaker='mz')¶

left join of two spectra. iterate through all peaks for spectra1 and return joined spectra2 peaks. results also include unmatched spectra1 (left) peaks

Parameters:

spectra1 – first spectra. all peaks included in result
spectra2 – second spectra
tiebreaker – how to deal with one to multiple matches to peaks in spectra1. mz is closest mz value, intensity is closest intensity, None is report multiple matches

Returns:

list of peak ids from spectrum1, list of peak ids from spectrum2

static join_3_spectra(experimental_spectrum, predicted_spectrum, theoretical_spectrum, tiebreaker='mz')¶

Join the peaks in a single experimental spectrum to a predicted spectrum and a theoretical spectrum. The join lists returned include all experimental and predicted peaks, but only the theoretical peaks that match the experimental spectra (and not necessarily the predicted spectrum). Note that it is possible to get a join where the theoretical peak matches the experimental peak but not the predicted peak.

Parameters:

experimental_spectrum – the experimental spectrum
predicted_spectrum – the predicted spectrum
theoretical_spectrum – the annotated theoretical spectrum
tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches

Returns:

3 lists with the peak ids. first is experimental peaks matching the theoretical peaks. Second are the predicted peaks that match the experimental peaks. Third are the theoretical peaks that match the experimental peaks. A value of None indicates no join.

static list2float32(list_in)¶

static list2float64(list_in)¶

static list2int16(list_in)¶

static list2uint16(list_in)¶

static list2uint32(list_in)¶

static list2uint64(list_in)¶

to_pandas()¶: output the join results as a pandas dataframe

class masskit.spectra.join.PairwiseJoin(exp_lib_map, theo_lib_map, *args, **kwargs)¶

Bases: Join

Join 2 sets of spectra

do_join(tiebreaker='mz')¶

do the join

Parameters:: tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
Returns:: self

class masskit.spectra.join.ThreewayJoin(exp_lib_map, pred_lib_map, theo_lib_map, *args, **kwargs)¶

Bases: Join

Join 3 sets of spectra. The join lists returned include all experimental and predicted peaks, but only the theoretical peaks that match the experimental spectra (and not the predicted spectrum). Note that it is possible to get a join where the theoretical peak matches the experimental peak but not the predicted peak.

do_join(tiebreaker='mz')¶

do the join

Parameters:: tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
Returns:: self

masskit.spectra.spectrum_plotting module¶

class masskit.spectra.spectrum_plotting.AnimateSpectrumPlot¶

Bases: object

class used to create animated gifs of spectrum plots

add_figure(fig, close_fig=True)¶

use the provided figure to draw an image for one frame of an animation

Parameters:

fig – matplotlib figure
close_fig – if True, close the figure when done

create_animated_gif(file, fps=5, pause=7)¶

create the animated gif from the stored figures

Parameters:

file – the file to write the animated gif to
fps – how many frames per second the animation runs
pause – how many time to duplicate the last frame

masskit.spectra.spectrum_plotting.draw_spectrum(spectrum, fig_format, output, figsize=(4, 2))¶

spectrum thumbnail plotting code called by the spectrum object writes to a stream

Parameters:

spectrum – spectrum to plot
fig_format – format of the figure e.g. ‘png’
output – output stream
figsize – the size of the figure in inches

masskit.spectra.spectrum_plotting.error_bar_plot(mz_in, intensity_in, stddev_in, color, linewidth=1)¶

plot spectra as colored error bars

Parameters:

mz_in – mz values in daltons
intensity_in – intensity value
stddev_in – standard deviations from intensity_in
color – color of spectrum
alpha – alpha blending

Returns:

line collection

masskit.spectra.spectrum_plotting.error_bar_plot_lines(mz_in, intensity_in, stddev_in, color, vertical_cutoff=0.01, linewidth=1)¶

plot spectra as colored error bars using lines

Parameters:

mz_in – mz values in daltons
intensity_in – intensity value
stddev_in – standard deviations from intensity_in
color – color of spectrum
vertical_cutoff – if the intensity/max_intensity is below this value, don’t plot the vertical line

Returns:

line collection

masskit.spectra.spectrum_plotting.line_plot(mz_in, intensity_in, color, linewidth=1)¶

create a LineCollection for plotting a spectrum

Parameters:

mz_in – mz values
intensity_in – intensity value
color – color of spectrum

Returns:

line collection

masskit.spectra.spectrum_plotting.multiple_spectrum_plot(intensities, mz=None, mirror_intensities=None, dpi=100, min_mz=0, max_mz=2000, title='', subtitles=None, normalize=None, color=(0, 0, 1, 1), mirror_color=(1, 0, 0, 1))¶

create a spectrum plot. If subject spectrum is specified, will draw a mirror plot

Parameters:

intensities – spectrum to be plotted. array-like
mz – mz values for plot. array-like parallel to intensities
mirror_intensities – intensities for mirror spectrum. array-like parallel to intensities.
dpi – dpi of the plot
min_mz – minimum mz of the plot
max_mz – maximum mz of the plot
title – title of the plot
subtitles – an array of strings, one title for each spectrum plot
normalize – norm the spectra intensities to this value
color – color of spectrum specified as RBGA tuple
mirror_color – color of mirrored spectrum specified as RGBA tuple

Returns:

matplotlib figure

masskit.spectra.spectrum_plotting.normalize_intensity(intensity, normalize=999.0)¶

norm the spectrum to the max peak

Parameters:

intensity –
normalize – value to norm the spectrum to

Returns:

masskit.spectra.spectrum_plotting.spectrum_plot(axis, mz, intensity, stddev=None, mirror_mz=None, mirror_intensity=None, mirror_stddev=None, mirror=True, title=None, xlabel='m/z', ylabel='Intensity', title_size=None, label_size=None, max_mz=None, min_mz=0, color=(0, 0, 1, 1), mirror_color=(1, 0, 0, 1), stddev_color=(0.3, 0.3, 0.3, 0.5), left_label_color=(1, 0, 0, 1), normalize=1000, vertical_cutoff=0.0, vertical_multiplier=1.1, right_label=None, left_label=None, right_label_size=None, left_label_size=None, no_xticks=False, no_yticks=False, linewidth=1)¶

make a spectrum plot using matplotlib. if mirror_intensity is specified, will do a mirror plot

Parameters:

axis – matplotlib axis
mz – mz values as array-like
intensity – intensity as array-like, parallel to mz array
stddev – standard deviation of the intensities
title – title of plot
xlabel – xlabel of plot
ylabel – ylabel of plot
title_size – size of title font
label_size – size of x and y axis label fonts
mirror_mz – mz values of mirror spectrum, corresponding to mirror_intensity. If none, uses mz
mirror_intensity – intensity of mirror spectrum as array-like, parallel to mz array. If none, don’t plot
mirror_stddev – standard deviation of the intensities
mirror – if true, mirror the plot if there are two spectra. Otherwise plot the two spectra together
max_mz – maximum mz to plot
min_mz – minimum mz to plot
normalize – if specified, norm the spectra to this value.
color – color of spectrum specified as RBGA tuple
mirror_color – color of mirrored spectrum specified as RGBA tuple
stddev_color – color of error bars
left_label_color – color of the left top label
vertical_cutoff – if the intensity/max_intensity is below this value, don’t plot the vertical line
vertical_multiplier – multiply times y max values to create white space
right_label – label for the top right of the fiture
left_label – label for the top left of the figure
right_label_size – size of label for the top right of the fiture
left_label_size – size of label for the top left of the figure
no_xticks – turn off x ticks and labels
no_yticks – turn off y ticks and lables
linewidth – width of plotted lines

Returns:

peak_collection, mirror_peak_collection sets of peaks for picking

masskit.spectra.spectrum_plotting.unsparsify_spectrum(spectrum, max_mz)¶

fill out array using spectrum values, placing zeros in missing mz values

Parameters:

spectrum – the spectrum to operate on
max_mz – maximum mz value

Returns:

the unsparsified array

masskit.spectra.theoretical_spectrum module¶

class masskit.spectra.theoretical_spectrum.TheoreticalPeptideSpectrum(peptide, ion_types=None, mod_names=None, mod_positions=None, analysis_annotations=False, num_isotopes=2, *args, **kwargs)¶

Bases: TheoreticalSpectrum

theoretical peptide spectrum

class masskit.spectra.theoretical_spectrum.TheoreticalSpectrum(*args, **kwargs)¶

Bases: Spectrum

base class to contain a theoretical spectrum

masskit.spectra.theoretical_spectrum.annotate_peptide_spectrum(spectrum, peptide=None, precursor_charge=None, ion_types=None, mod_names=None, mod_positions=None)¶: annotate a spectrum with theoretically calculated ions

masskit.spectra package¶

Submodules¶

masskit.spectra.ions module¶

masskit.spectra.ipython module¶

masskit.spectra.join module¶

masskit.spectra.spectrum_plotting module¶

masskit.spectra.theoretical_spectrum module¶

Module contents¶

Table of Contents

This Page