masskit.spectra package¶
Submodules¶
masskit.spectra.ions module¶
- class masskit.spectra.ions.HiResIons(*args, **kwargs)¶
Bases:
Ions
for containing high mass resolution ions
- static cast_intensity(intensity)¶
cast a single mz value
- static cast_mz(mz)¶
cast a single mz value
- change_mass_info(mass_info, inplace=False, take_max=True)¶
given a new mass info, recalculate tolerance bins
- Parameters:
mass_info – the MassInfo structure to change to
inplace – if true, change in place, otherwise return copy
take_max – for each bin take the maximum intensity ion, otherwise sum all ions mapping to the bin
- create_tolerance()¶
create start and stop arrays
- intersect(comparison_ions, tiebreaker=None)¶
find the intersections between two high resolution ion series. calls standalone function to allow use of numba
- Parameters:
comparison_ions – the ion series to compare to
tiebreaker – how to deal with one to multiple matches to peaks in spectra1. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
matched peak indexes in self, matched peak indexes in comparison_ions
- property starts¶
- Returns:
the start positions for each peak bin
- property stops¶
- Returns:
the stop positions for each peak bin
- property tolerance¶
- Returns:
the mass tolerance for each peak bin
- class masskit.spectra.ions.Ions(mz=None, intensity=None, stddev=None, annotations=None, mass_info: MassInfo = None, jitter=0, copy_arrays=True, tolerance=None)¶
Bases:
ABC
base class for a series of ions
- property annotations¶
per peak annotations
- static cast_intensity(intensity)¶
cast a single mz value
- static cast_mz(mz)¶
cast a single mz value
- clear_and_intersect(ion2, index1, index2, tiebreaker=None)¶
_ if indices are not provided, clear the ions of both self and ion2 of zero intensity peaks then intersect
- Parameters:
ion2 – comparison ions
index1 – intersection indices for self (can be None)
index2 – intersection indices for ion2 (can be None)
tiebreaker – how to deal with one to multiple matches to peaks in spectra1. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
_description_
- clear_computed_properties()¶
clear properties that are lazy computed
- Returns:
returns self
- copy(min_mz=-1, max_mz=0, min_intensity=-1, max_intensity=0)¶
create filtered version of self. This is essentially a copy constructor
- Parameters:
min_mz – minimum mz value
max_mz – maximum mz value. 0 = ignore
min_intensity – minimum intensity value
max_intensity – maximum intensity value. 0 = ignore
- Returns:
copy
- copy_annot(ion2, index1, index2)¶
copy annotations from ion2 to this set of ions using the matched ion indices
- Parameters:
ion2 – the ions to compare against
index1 – matched ions in this set of ions
index2 – matched ions in ions2
- cosine_score(ion2, index1=None, index2=None, mz_power=0.0, intensity_power=0.5, scale=999, skip_denom=False, tiebreaker=None)¶
calculate the cosine score between this set of ions and ions2
- Parameters:
ion2 – the ions to compare against
index1 – matched ions in this set of ions
index2 – matched ions in ions2
mz_power – what power to raise the mz value for each peak
intensity_power – what power to raise the intensity for each peak
scale – what value to scale the score by
skip_denom – skip computing the denominator
tiebreaker – how to deal with one to multiple matches to peaks in self. mz is closest mz value, intensity is closest intensity, None is no tiebreaking
- Returns:
cosine score
- evenly_space(tolerance=None, take_max=True, max_mz=None, include_zeros=False, take_sqrt=False)¶
convert ions to product ions with evenly spaced m/z bins. The m/z bins are centered on multiples of tolerance * 2. Multiple ions that map to the same bin are either summed or the max taken of the ion intensities.
- Parameters:
tolerance – the mass tolerance of the evenly spaced m/z bins (bin width is twice this value) in daltons
take_max – for each bin take the maximum intensity ion, otherwise sum all ions mapping to the bin
max_mz – maximum mz value, 2000 by default
include_zeros – fill out array including bins with zero intensity
take_sqrt – take the sqrt of the intensities
- filter(min_mz=-1, max_mz=0, min_intensity=-1, max_intensity=0, inplace=False)¶
filter ions by mz and/or intensity.
- Parameters:
min_mz – minimum mz value, exclusive
max_mz – maximum mz value, inclusive. 0 = ignore
min_intensity – minimum intensity value, exclusive
max_intensity – maximum intensity value, inclusive. 0 = ignore
inplace – do operation on current ions, otherwise create copy
- Returns:
filtered copy if not inplace, otherwise current ions
- half_tolerance(mz)¶
calculate 1/2 of the tolerance interval
- Parameters:
mz – mz value
- Returns:
1/2 tolerance interval
- property intensity¶
- Returns:
the intensity of the precursor. could be a numpy array
- abstract intersect(comparison_ions, tiebreaker=None)¶
- ions2array(array, channel, bin_size=1.0, precursor=0, intensity_norm=1.0, insert_mz=False, mz_norm=2000.0, rand_intensity=0.0, down_shift=0.0, channel_first=True, take_max=True, stddev_channel=None, take_sqrt=False)¶
fill out an array of fixed size with the ions. note that this func assumes spectra sorted by mz
- Parameters:
array – the array to fill out
channel – which channel to fill out in the array
bin_size – the size of each bin in the array
precursor – if nonzero, use this value to invert the spectra by subtracting mz from this value
intensity_norm – value to norm the intensity
insert_mz – instead of putting the normalized intensity in the array, put in the normalized mz
mz_norm – the value to use to norm the mz values inserted
rand_intensity – if not 0, multiply each intensity by random value 1 +/- rand_intensity
down_shift – shift mz down by this value in Da
channel_first – channel before spectrum in input array (pytorch style). tensorflow is channel last.
take_max – take the maximum intensity in a bin rather the sum of peaks in a bin
stddev_channel – which channel contains the std dev. None means no std dev
take_sqrt – take the square root of the intensity
- property join¶
returns a dictionary with information on how to join this ion set to another ion set, such as for annotation
- mask(indices, inplace=False)¶
mask out ions that are pointed to by the indices
- Parameters:
indices – indices of ions to screen out or numpy boolean mask
inplace – do operation on current ions
- Returns:
masked copy if not inplace, otherwise current ions
- static mask_ions(mask, return_ions)¶
mask out a set of ions
- Parameters:
mask – boolean mask
return_ions – ions to be masked
- Returns:
masked ions
- property mass_type¶
- merge(merge_ions, inplace=False)¶
merge another set of ions into this one.
- Parameters:
merge_ions – the ions to add in
inplace – do operation on current ions
- Returns:
merged copy if not inplace, otherwise current ions
- property mz¶
- Returns:
the mz of the precursor. could be a numpy array. optionally adds a jitter to the m/z values
- property neutral_loss¶
- property neutral_loss_charge¶
- norm(max_intensity_in=999, keep_type=True, inplace=False, ord=None)¶
norm the intensities
- Parameters:
max_intensity_in – the intensity of the most intense peak
keep_type – keep the type of the intensity array
inplace – do operation on current ions, otherwise create copy
ord – if set, normalize using norm order as in np.linalg.norm. 2 = l2
- Returns:
normed copy if not inplace, otherwise current ions
- num_ions()¶
number of ions in spectrum :return: number of ions in spectrum
- parent_filter(h2o=True, inplace=False, precursor_mz=0.0, charge=None)¶
filter parent ions, including water losses.
- Parameters:
h2o – filter out water losses
inplace – do operation on current ions, otherwise create copy
precursor_mz – precursor m/z
charge – charge of precursor
- Returns:
filtered copy if not inplace, otherwise current ions
- property rank¶
return ranks of peaks by intensity 1=most intense, rank is integer over the size of the intensity matrix
- Returns:
the rank of the ions by intensity. could be a numpy array.
- rank_ions()¶
rank the ions. intensity rank, 1=most intense, rank is integer over the size of the intensity matrix
- shift_mz(shift, inplace=False)¶
shift the mz values of all ions by the value of shift. Negative ions are masked out
- Parameters:
shift – value to shift mz
inplace – do operation on current ions
- Returns:
masked copy if not inplace, otherwise current ions
- property stddev¶
- Returns:
the std dev of the intensity per peak
- property tolerance¶
- property tolerance_type¶
- total_intensity()¶
total intensity of ions :return: total intensity
- windowed_filter(mz_window=7, num_ions=5, inplace=False)¶
filter ions by examining peaks in order of intensity and filtering peaks within a window
- Parameters:
mz_window – half size of mz_window for filtering. 0 = no filtering
num_ions – number of ions allowed in full mz_window
inplace – do operation on current ions, otherwise create copy
- Returns:
filtered copy if not inplace, otherwise current ions
- class masskit.spectra.ions.IonsIterator(ions)¶
Bases:
object
iterator over ion mz and intensity
- class masskit.spectra.ions.MassInfo(tolerance: float = None, tolerance_type: str = None, mass_type: str = None, neutral_loss: str = None, neutral_loss_charge: int = None, evenly_spaced=False, arrow_struct_accessor=None, arrow_struct_scalar=None)¶
Bases:
object
information about mass measurements of an ion peak
- masskit.spectra.ions.cosine_score_calc(spectrum1_mz, spectrum1_intensity, spectrum2_mz, spectrum2_intensity, index1, index2, mz_power=0.0, intensity_power=0.5, scale=999, skip_denom=False)¶
the Stein and Scott 94 cosine score. By convention, sqrt of score is taken and multiplied by 999. separated out from class and placed here so that can be jit compiled by numba.
- Parameters:
spectrum1_mz – query spectrum mz
spectrum1_intensity – query spectrum intensity
spectrum2_mz – the comparison spectrum2 mz
spectrum2_intensity – the comparison spectrum2 intensity
index1 – matched ions in spectrum1. may include duplicate matches
index2 – matched ions in spectrum2. may include duplicate matches
mz_power – what power to raise the mz value for each peak
intensity_power – what power to raise the intensity for each peak
scale – what value to scale the score by
skip_denom – skip computing the denominator
- Returns:
the cosine score
- masskit.spectra.ions.dedup_matches(products1, products2, index1, index2, tiebreaker='mz', skip_nomatch=True)¶
given a series of indices to matched peaks in two product ion sets, get rid of duplicate matches to peaks in the first product ion set, using a tiebreaker.
- Parameters:
products1 – first set of product ions
products2 – second set of product ions
index1 – indices into first set of product ions
index2 – indices into the second set of product ions
tiebreaker – tiebreak by ‘intensity’ or ‘mz’ of duplicate matches. ‘delete’ means don’t match either. defaults to ‘mz’
skip_nomatch – in the return values, skip missing matches to the first set of produc tions, defaults to True
- Returns:
matches of the first product ion set, matches of the second product ion set
- masskit.spectra.ions.intersect_hires(ions1_starts, ions1_stops, ions2_starts, ions2_stops)¶
find the intersections between two high resolution ion series
- Parameters:
ions1_starts – start positions of the first ion series to compare
ions1_stops – stop positions of the first ion series to compare
ions2_starts – start positions of the second ion series to compare
ions2_stops – stop positions of the second ion series to compare
- Returns:
matched peak indexes in ions1, matched peak indexes in ion2
- masskit.spectra.ions.my_intersect1d(ar1, ar2)¶
simplified version of numpy intersect1d. Pull outside of class so it can be jit compiled by numba (numba has only experimental class support). Note: this function does not work if there are ions in each spectra with identical mz!
- Parameters:
ar1 – mz values for one spectra
ar2 – mz values for another spectra
- Returns:
index of matches into array 1, index of matches into array 2
- masskit.spectra.ions.nce2ev(nce, precursor_mz, charge)¶
convert nce to ev. Equation for QE and taken from http://proteomicsnews.blogspot.com/2014/06/normalized-collision-energy-calculation.html
- Parameters:
nce – normalized collision energy
precursor_mz – precursor m/z
charge – charge
- Returns:
ev
masskit.spectra.ipython module¶
- masskit.spectra.ipython.is_notebook()¶
check to see if the code is running in a jupyter notebook
- Returns:
true if it is
masskit.spectra.join module¶
- class masskit.spectra.join.Join(*args, **kwargs)¶
Bases:
ABC
- abstract do_join(tiebreaker='mz')¶
do the join
- Parameters:
tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
self
- static join_2_spectra(spectra1, spectra2, tiebreaker='mz')¶
left join of two spectra. iterate through all peaks for spectra1 and return joined spectra2 peaks. results also include unmatched spectra1 (left) peaks
- Parameters:
spectra1 – first spectra. all peaks included in result
spectra2 – second spectra
tiebreaker – how to deal with one to multiple matches to peaks in spectra1. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
list of peak ids from spectrum1, list of peak ids from spectrum2
- static join_3_spectra(experimental_spectrum, predicted_spectrum, theoretical_spectrum, tiebreaker='mz')¶
Join the peaks in a single experimental spectrum to a predicted spectrum and a theoretical spectrum. The join lists returned include all experimental and predicted peaks, but only the theoretical peaks that match the experimental spectra (and not necessarily the predicted spectrum). Note that it is possible to get a join where the theoretical peak matches the experimental peak but not the predicted peak.
- Parameters:
experimental_spectrum – the experimental spectrum
predicted_spectrum – the predicted spectrum
theoretical_spectrum – the annotated theoretical spectrum
tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
3 lists with the peak ids. first is experimental peaks matching the theoretical peaks. Second are the predicted peaks that match the experimental peaks. Third are the theoretical peaks that match the experimental peaks. A value of None indicates no join.
- static list2float32(list_in)¶
- static list2float64(list_in)¶
- static list2int16(list_in)¶
- static list2uint16(list_in)¶
- static list2uint32(list_in)¶
- static list2uint64(list_in)¶
- to_pandas()¶
output the join results as a pandas dataframe
- class masskit.spectra.join.PairwiseJoin(exp_lib_map, theo_lib_map, *args, **kwargs)¶
Bases:
Join
Join 2 sets of spectra
- do_join(tiebreaker='mz')¶
do the join
- Parameters:
tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
self
- class masskit.spectra.join.ThreewayJoin(exp_lib_map, pred_lib_map, theo_lib_map, *args, **kwargs)¶
Bases:
Join
Join 3 sets of spectra. The join lists returned include all experimental and predicted peaks, but only the theoretical peaks that match the experimental spectra (and not the predicted spectrum). Note that it is possible to get a join where the theoretical peak matches the experimental peak but not the predicted peak.
- do_join(tiebreaker='mz')¶
do the join
- Parameters:
tiebreaker – how to deal with one to multiple matches. mz is closest mz value, intensity is closest intensity, None is report multiple matches
- Returns:
self
masskit.spectra.spectrum_plotting module¶
- class masskit.spectra.spectrum_plotting.AnimateSpectrumPlot¶
Bases:
object
class used to create animated gifs of spectrum plots
- add_figure(fig, close_fig=True)¶
use the provided figure to draw an image for one frame of an animation
- Parameters:
fig – matplotlib figure
close_fig – if True, close the figure when done
- create_animated_gif(file, fps=5, pause=7)¶
create the animated gif from the stored figures
- Parameters:
file – the file to write the animated gif to
fps – how many frames per second the animation runs
pause – how many time to duplicate the last frame
- masskit.spectra.spectrum_plotting.draw_spectrum(spectrum, fig_format, output, figsize=(4, 2))¶
spectrum thumbnail plotting code called by the spectrum object writes to a stream
- Parameters:
spectrum – spectrum to plot
fig_format – format of the figure e.g. ‘png’
output – output stream
figsize – the size of the figure in inches
- masskit.spectra.spectrum_plotting.error_bar_plot(mz_in, intensity_in, stddev_in, color, linewidth=1)¶
plot spectra as colored error bars
- Parameters:
mz_in – mz values in daltons
intensity_in – intensity value
stddev_in – standard deviations from intensity_in
color – color of spectrum
alpha – alpha blending
- Returns:
line collection
- masskit.spectra.spectrum_plotting.error_bar_plot_lines(mz_in, intensity_in, stddev_in, color, vertical_cutoff=0.01, linewidth=1)¶
plot spectra as colored error bars using lines
- Parameters:
mz_in – mz values in daltons
intensity_in – intensity value
stddev_in – standard deviations from intensity_in
color – color of spectrum
vertical_cutoff – if the intensity/max_intensity is below this value, don’t plot the vertical line
- Returns:
line collection
- masskit.spectra.spectrum_plotting.line_plot(mz_in, intensity_in, color, linewidth=1)¶
create a LineCollection for plotting a spectrum
- Parameters:
mz_in – mz values
intensity_in – intensity value
color – color of spectrum
- Returns:
line collection
- masskit.spectra.spectrum_plotting.multiple_spectrum_plot(intensities, mz=None, mirror_intensities=None, dpi=100, min_mz=0, max_mz=2000, title='', subtitles=None, normalize=None, color=(0, 0, 1, 1), mirror_color=(1, 0, 0, 1))¶
create a spectrum plot. If subject spectrum is specified, will draw a mirror plot
- Parameters:
intensities – spectrum to be plotted. array-like
mz – mz values for plot. array-like parallel to intensities
mirror_intensities – intensities for mirror spectrum. array-like parallel to intensities.
dpi – dpi of the plot
min_mz – minimum mz of the plot
max_mz – maximum mz of the plot
title – title of the plot
subtitles – an array of strings, one title for each spectrum plot
normalize – norm the spectra intensities to this value
color – color of spectrum specified as RBGA tuple
mirror_color – color of mirrored spectrum specified as RGBA tuple
- Returns:
matplotlib figure
- masskit.spectra.spectrum_plotting.normalize_intensity(intensity, normalize=999.0)¶
norm the spectrum to the max peak
- Parameters:
intensity –
normalize – value to norm the spectrum to
- Returns:
- masskit.spectra.spectrum_plotting.spectrum_plot(axis, mz, intensity, stddev=None, mirror_mz=None, mirror_intensity=None, mirror_stddev=None, mirror=True, title=None, xlabel='m/z', ylabel='Intensity', title_size=None, label_size=None, max_mz=None, min_mz=0, color=(0, 0, 1, 1), mirror_color=(1, 0, 0, 1), stddev_color=(0.3, 0.3, 0.3, 0.5), left_label_color=(1, 0, 0, 1), normalize=1000, vertical_cutoff=0.0, vertical_multiplier=1.1, right_label=None, left_label=None, right_label_size=None, left_label_size=None, no_xticks=False, no_yticks=False, linewidth=1)¶
make a spectrum plot using matplotlib. if mirror_intensity is specified, will do a mirror plot
- Parameters:
axis – matplotlib axis
mz – mz values as array-like
intensity – intensity as array-like, parallel to mz array
stddev – standard deviation of the intensities
title – title of plot
xlabel – xlabel of plot
ylabel – ylabel of plot
title_size – size of title font
label_size – size of x and y axis label fonts
mirror_mz – mz values of mirror spectrum, corresponding to mirror_intensity. If none, uses mz
mirror_intensity – intensity of mirror spectrum as array-like, parallel to mz array. If none, don’t plot
mirror_stddev – standard deviation of the intensities
mirror – if true, mirror the plot if there are two spectra. Otherwise plot the two spectra together
max_mz – maximum mz to plot
min_mz – minimum mz to plot
normalize – if specified, norm the spectra to this value.
color – color of spectrum specified as RBGA tuple
mirror_color – color of mirrored spectrum specified as RGBA tuple
stddev_color – color of error bars
left_label_color – color of the left top label
vertical_cutoff – if the intensity/max_intensity is below this value, don’t plot the vertical line
vertical_multiplier – multiply times y max values to create white space
right_label – label for the top right of the fiture
left_label – label for the top left of the figure
right_label_size – size of label for the top right of the fiture
left_label_size – size of label for the top left of the figure
no_xticks – turn off x ticks and labels
no_yticks – turn off y ticks and lables
linewidth – width of plotted lines
- Returns:
peak_collection, mirror_peak_collection sets of peaks for picking
- masskit.spectra.spectrum_plotting.unsparsify_spectrum(spectrum, max_mz)¶
fill out array using spectrum values, placing zeros in missing mz values
- Parameters:
spectrum – the spectrum to operate on
max_mz – maximum mz value
- Returns:
the unsparsified array
masskit.spectra.theoretical_spectrum module¶
- class masskit.spectra.theoretical_spectrum.TheoreticalPeptideSpectrum(peptide, ion_types=None, mod_names=None, mod_positions=None, analysis_annotations=False, num_isotopes=2, *args, **kwargs)¶
Bases:
TheoreticalSpectrum
theoretical peptide spectrum
- class masskit.spectra.theoretical_spectrum.TheoreticalSpectrum(*args, **kwargs)¶
Bases:
Spectrum
base class to contain a theoretical spectrum
- masskit.spectra.theoretical_spectrum.annotate_peptide_spectrum(spectrum, peptide=None, precursor_charge=None, ion_types=None, mod_names=None, mod_positions=None)¶
annotate a spectrum with theoretically calculated ions