masskit.data_specs package

Submodules

masskit.data_specs.arrow_types module

class masskit.data_specs.arrow_types.JSONArrowArray

Bases: MasskitArrowArray

Extension array for JSONArrowType

class masskit.data_specs.arrow_types.JSONArrowScalarType

Bases: ExtensionScalar

arrow scalar extension class for jsonpickled objects

as_py(self)

Return this scalar as a Python object.

class masskit.data_specs.arrow_types.JSONArrowType

Bases: PyExtensionType

arrow type extension class for jsonpickled objects

to_pandas_dtype()

returns pandas extension dtype

class masskit.data_specs.arrow_types.MasskitArrowArray

Bases: ExtensionArray

Extension array for arrow arrays.

to_numpy()

Convert to numpy array of Spectrum objects

to_pylist()

Convert to list of Spectrum objects

class masskit.data_specs.arrow_types.MasskitPandasArray(values)

Bases: ExtensionArray

Base class for pandas extension arrays that contain a numpy array of objects

copy()

return a copy of the array

Returns:

copy of the array

dtype = None
isna()

returns array indicating if any of the data is None

nbytes() int

The number of bytes needed to store this object in memory.

take(indexes, fill_value=None, allow_fill=False)

take values from array

Parameters:
  • indexes – the indexes of values to take

  • fill_value – if allow_fill is True, use this value when index is negative

  • allow_fill – use fill_value if index is negative, otherwise negative index indexes from back of array

Returns:

the taken array

to_mgf(fp)

write to an mgf file

Parameters:

fp – stream or filename

to_msp(fp, annotate_peptide=False, ion_types=None)

write to an msp file

Parameters:
  • fp – stream or filename

  • annotate_peptide – annotate the spectrum as a peptide

  • ion_types – ion types to annotate

to_mzxml(fp)

write to an mgf file

Parameters:

fp – stream or filename

class masskit.data_specs.arrow_types.MasskitPandasDtype

Bases: ExtensionDtype

classmethod construct_array_type()

Return the array type associated with this dtype. Returns ——- type

na_value = None
name = None
type = None
class masskit.data_specs.arrow_types.MolArrowArray

Bases: MasskitArrowArray

Extension array for MolArrowType

class masskit.data_specs.arrow_types.MolArrowScalarType

Bases: ExtensionScalar

arrow scalar extension class for spectra

as_py(self)

Return this scalar as a Python object.

class masskit.data_specs.arrow_types.MolArrowType

Bases: PyExtensionType

arrow type extension class for Mols

to_pandas_dtype()

returns pandas extension dtype

class masskit.data_specs.arrow_types.MolPandasArray(values)

Bases: MasskitPandasArray

dtype = <masskit.data_specs.arrow_types.MolPandasDtype object>
class masskit.data_specs.arrow_types.MolPandasDtype

Bases: MasskitPandasDtype

classmethod construct_array_type()

Return the array type associated with this dtype. Returns ——- type

na_value = nan
name = 'Mol'
class masskit.data_specs.arrow_types.PathArrowType

Bases: JSONArrowType

to_pandas_dtype()

returns pandas extension dtype

class masskit.data_specs.arrow_types.PathPandasArray(values)

Bases: MasskitPandasArray

dtype = <masskit.data_specs.arrow_types.PathPandasDtype object>
class masskit.data_specs.arrow_types.PathPandasDtype

Bases: MasskitPandasDtype

classmethod construct_array_type()

Return the array type associated with this dtype. Returns ——- type

na_value = nan
name = 'shortest_path'
type

alias of object

class masskit.data_specs.arrow_types.SpectrumArrowArray

Bases: MasskitArrowArray

arrow array that holds spectra

class masskit.data_specs.arrow_types.SpectrumArrowScalarType

Bases: ExtensionScalar

arrow scalar extension class for spectra

as_py(self)

Return this scalar as a Python object.

class masskit.data_specs.arrow_types.SpectrumArrowType(storage_type=None)

Bases: PyExtensionType

arrow type extension class for spectra parameterized by storage_type, which can be molecules_struct or peptide_struct

to_pandas_dtype()

returns pandas extension dtype

class masskit.data_specs.arrow_types.SpectrumPandasArray(values)

Bases: MasskitPandasArray

dtype = <masskit.data_specs.arrow_types.SpectrumPandasDtype object>
class masskit.data_specs.arrow_types.SpectrumPandasDtype

Bases: MasskitPandasDtype

classmethod construct_array_type()

Return the array type associated with this dtype. Returns ——- type

na_value = nan
name = 'Spectrum'
type

alias of Spectrum

masskit.data_specs.file_schemas module

masskit.data_specs.schemas module

masskit.data_specs.schemas.compose_fields(*field_lists)

Compose field lists, retaining order but removing redundancies

Parameters:

field_lists – variable number of field lists

Returns:

combined noin-redundant field list

masskit.data_specs.schemas.create_array(array, schema, type_name)

create an arrow array from a python or numpy array-like

Parameters:
  • schema – schema to use

  • type_name – name of the type

  • array – the input array

Returns:

arrow array

masskit.data_specs.schemas.create_getter(name)

create a generic property getter on the props dictionary of an object

Parameters:

name – name of the property

Returns:

getter function

masskit.data_specs.schemas.create_scalar(value, schema, type_name)

create a scalar from a python or numpy value

Parameters:
  • schema – schema to use

  • type_name – name of the type

  • value – the input value

Returns:

arrow scalar

masskit.data_specs.schemas.create_setter(name)

create a generic property setter on the props dictionary of an object

Parameters:

name – name of the property

Returns:

setter function

masskit.data_specs.schemas.get_field_int_metadata(field, name: str) int

returns int encoded in metadata of field

Parameters:
  • field – arrow field, e.g. table.field(‘fp’)

  • name – name of metadata

Returns:

value of field

masskit.data_specs.schemas.massinfo2struct(mass_info)

put mass_info data into a pyarrow StructArray

Parameters:

mass_info – mass_info

Returns:

StructArray

masskit.data_specs.schemas.name_to_type(schema, type_name)

convert field name to arrow type of field

Parameters:
  • schema – schema to search

  • type_name – name of type

Returns:

type

masskit.data_specs.schemas.populate_properties(class_in, fields=[pyarrow.Field<id: uint64>, pyarrow.Field<instrument: string>, pyarrow.Field<instrument_type: string>, pyarrow.Field<instrument_model: string>, pyarrow.Field<ion_mode: string>, pyarrow.Field<ionization: string>, pyarrow.Field<name: string>, pyarrow.Field<casno: string>, pyarrow.Field<synonyms: string>, pyarrow.Field<scan: string>, pyarrow.Field<collision_energy: float>, pyarrow.Field<retention_time: double>, pyarrow.Field<collision_gas: string>, pyarrow.Field<insource_voltage: int64>, pyarrow.Field<sample_inlet: string>, pyarrow.Field<ev: float>, pyarrow.Field<nce: float>, pyarrow.Field<charge: int16>, pyarrow.Field<precursor_mz: double>, pyarrow.Field<exact_mass: double>, pyarrow.Field<exact_mw: double>, pyarrow.Field<set: dictionary<values=string, indices=int32, ordered=0>>, pyarrow.Field<composition: dictionary<values=string, indices=int32, ordered=0>>, pyarrow.Field<spectrum_fp: large_list<item: uint8>>, pyarrow.Field<spectrum_fp_count: int32>, pyarrow.Field<peptide: string>, pyarrow.Field<peptide_len: int32>, pyarrow.Field<peptide_type: dictionary<values=string, indices=int32, ordered=0>>, pyarrow.Field<mod_names: large_list<item: int16>>, pyarrow.Field<mod_positions: large_list<item: int32>>, pyarrow.Field<protein_id: large_list<item: string>>, pyarrow.Field<column: string>, pyarrow.Field<experimental_ri: float>, pyarrow.Field<experimental_ri_data: int32>, pyarrow.Field<experimental_ri_error: float>, pyarrow.Field<stdnp: float>, pyarrow.Field<stdnp_data: int32>, pyarrow.Field<stdnp_error: float>, pyarrow.Field<stdpolar: float>, pyarrow.Field<stdpolar_data: int32>, pyarrow.Field<stdpolar_error: float>, pyarrow.Field<vial_id: int64>, pyarrow.Field<aromatic_rings: int32>, pyarrow.Field<ecfp4: large_list<item: uint8>>, pyarrow.Field<ecfp4_count: int32>, pyarrow.Field<estimated_ri: float>, pyarrow.Field<estimated_ri_error: float>, pyarrow.Field<estimated_ri_stdnp: float>, pyarrow.Field<estimated_ri_stdnp_error: float>, pyarrow.Field<estimated_ri_stdpolar: float>, pyarrow.Field<estimated_ri_stdpolar_error: float>, pyarrow.Field<formula: string>, pyarrow.Field<has_2d: bool>, pyarrow.Field<has_conformer: bool>, pyarrow.Field<has_tms: int32>, pyarrow.Field<hba: int32>, pyarrow.Field<hbd: int32>, pyarrow.Field<inchi_key: string>, pyarrow.Field<inchi_key_orig: string>, pyarrow.Field<isomeric_smiles: string>, pyarrow.Field<num_atoms: int32>, pyarrow.Field<num_undef_double: int32>, pyarrow.Field<num_undef_stereo: int32>, pyarrow.Field<rotatable_bonds: int32>, pyarrow.Field<smiles: string>, pyarrow.Field<tpsa: float>, pyarrow.Field<logp: float>, pyarrow.Field<fragments: int32>])

given a class (or any object), create a set of properties from a list of fields

Parameters:
  • class_in – the class/object to be modified

  • fields – a list of pyarrow fields whose names will be used to create properties

masskit.data_specs.schemas.set_field_int_metadata(schema, field: str, name: str, value: int)

returns arrow schema with metadata with name set to int value in the metadata of field

Parameters:
  • schema – input schema

  • field – name of field

  • name – name of metadata

  • value – value of field (will be stored as bytes)

Returns:

new schema updated from input schema

masskit.data_specs.schemas.subtract_fields(field_list, fields2bsubtracted)

delete fields from a field list

Parameters:
  • field_list – field list to be edited

  • fields2bsubtracted – list of fields to be deleted

Returns:

edited field list

masskit.data_specs.spectral_library module

class masskit.data_specs.spectral_library.LibraryAccessor(pandas_obj)

Bases: object

the base pandas accessor class. To use the accessor, import LibraryAccessor into your code.

Notes: = to add info to the dataframe itself, use _obj.tandem_peptide_library.__dict__[‘info’] = info - in the future, by caching the record id, the serialization functions can be modified to read in chunks

copy()
display()
masskit.data_specs.spectral_library.display_masskit_df(df)

Module contents