masskit.data_specs package¶
Submodules¶
masskit.data_specs.arrow_types module¶
- class masskit.data_specs.arrow_types.JSONArrowArray¶
Bases:
MasskitArrowArray
Extension array for JSONArrowType
- class masskit.data_specs.arrow_types.JSONArrowScalarType¶
Bases:
ExtensionScalar
arrow scalar extension class for jsonpickled objects
- as_py(self)¶
Return this scalar as a Python object.
- class masskit.data_specs.arrow_types.JSONArrowType¶
Bases:
PyExtensionType
arrow type extension class for jsonpickled objects
- to_pandas_dtype()¶
returns pandas extension dtype
- class masskit.data_specs.arrow_types.MasskitArrowArray¶
Bases:
ExtensionArray
Extension array for arrow arrays.
- to_numpy()¶
Convert to numpy array of Spectrum objects
- to_pylist()¶
Convert to list of Spectrum objects
- class masskit.data_specs.arrow_types.MasskitPandasArray(values)¶
Bases:
ExtensionArray
Base class for pandas extension arrays that contain a numpy array of objects
- copy()¶
return a copy of the array
- Returns:
copy of the array
- dtype = None¶
- isna()¶
returns array indicating if any of the data is None
- nbytes() int ¶
The number of bytes needed to store this object in memory.
- take(indexes, fill_value=None, allow_fill=False)¶
take values from array
- Parameters:
indexes – the indexes of values to take
fill_value – if allow_fill is True, use this value when index is negative
allow_fill – use fill_value if index is negative, otherwise negative index indexes from back of array
- Returns:
the taken array
- to_mgf(fp)¶
write to an mgf file
- Parameters:
fp – stream or filename
- to_msp(fp, annotate_peptide=False, ion_types=None)¶
write to an msp file
- Parameters:
fp – stream or filename
annotate_peptide – annotate the spectrum as a peptide
ion_types – ion types to annotate
- to_mzxml(fp)¶
write to an mgf file
- Parameters:
fp – stream or filename
- class masskit.data_specs.arrow_types.MasskitPandasDtype¶
Bases:
ExtensionDtype
- classmethod construct_array_type()¶
Return the array type associated with this dtype. Returns ——- type
- na_value = None¶
- name = None¶
- type = None¶
- class masskit.data_specs.arrow_types.MolArrowArray¶
Bases:
MasskitArrowArray
Extension array for MolArrowType
- class masskit.data_specs.arrow_types.MolArrowScalarType¶
Bases:
ExtensionScalar
arrow scalar extension class for spectra
- as_py(self)¶
Return this scalar as a Python object.
- class masskit.data_specs.arrow_types.MolArrowType¶
Bases:
PyExtensionType
arrow type extension class for Mols
- to_pandas_dtype()¶
returns pandas extension dtype
- class masskit.data_specs.arrow_types.MolPandasArray(values)¶
Bases:
MasskitPandasArray
- dtype = <masskit.data_specs.arrow_types.MolPandasDtype object>¶
- class masskit.data_specs.arrow_types.MolPandasDtype¶
Bases:
MasskitPandasDtype
- classmethod construct_array_type()¶
Return the array type associated with this dtype. Returns ——- type
- na_value = nan¶
- name = 'Mol'¶
- class masskit.data_specs.arrow_types.PathArrowType¶
Bases:
JSONArrowType
- to_pandas_dtype()¶
returns pandas extension dtype
- class masskit.data_specs.arrow_types.PathPandasArray(values)¶
Bases:
MasskitPandasArray
- dtype = <masskit.data_specs.arrow_types.PathPandasDtype object>¶
- class masskit.data_specs.arrow_types.PathPandasDtype¶
Bases:
MasskitPandasDtype
- classmethod construct_array_type()¶
Return the array type associated with this dtype. Returns ——- type
- na_value = nan¶
- name = 'shortest_path'¶
- type¶
alias of
object
- class masskit.data_specs.arrow_types.SpectrumArrowArray¶
Bases:
MasskitArrowArray
arrow array that holds spectra
- class masskit.data_specs.arrow_types.SpectrumArrowScalarType¶
Bases:
ExtensionScalar
arrow scalar extension class for spectra
- as_py(self)¶
Return this scalar as a Python object.
- class masskit.data_specs.arrow_types.SpectrumArrowType(storage_type=None)¶
Bases:
PyExtensionType
arrow type extension class for spectra parameterized by storage_type, which can be molecules_struct or peptide_struct
- to_pandas_dtype()¶
returns pandas extension dtype
- class masskit.data_specs.arrow_types.SpectrumPandasArray(values)¶
Bases:
MasskitPandasArray
- dtype = <masskit.data_specs.arrow_types.SpectrumPandasDtype object>¶
masskit.data_specs.file_schemas module¶
masskit.data_specs.schemas module¶
- masskit.data_specs.schemas.compose_fields(*field_lists)¶
Compose field lists, retaining order but removing redundancies
- Parameters:
field_lists – variable number of field lists
- Returns:
combined noin-redundant field list
- masskit.data_specs.schemas.create_array(array, schema, type_name)¶
create an arrow array from a python or numpy array-like
- Parameters:
schema – schema to use
type_name – name of the type
array – the input array
- Returns:
arrow array
- masskit.data_specs.schemas.create_getter(name)¶
create a generic property getter on the props dictionary of an object
- Parameters:
name – name of the property
- Returns:
getter function
- masskit.data_specs.schemas.create_scalar(value, schema, type_name)¶
create a scalar from a python or numpy value
- Parameters:
schema – schema to use
type_name – name of the type
value – the input value
- Returns:
arrow scalar
- masskit.data_specs.schemas.create_setter(name)¶
create a generic property setter on the props dictionary of an object
- Parameters:
name – name of the property
- Returns:
setter function
- masskit.data_specs.schemas.get_field_int_metadata(field, name: str) int ¶
returns int encoded in metadata of field
- Parameters:
field – arrow field, e.g. table.field(‘fp’)
name – name of metadata
- Returns:
value of field
- masskit.data_specs.schemas.massinfo2struct(mass_info)¶
put mass_info data into a pyarrow StructArray
- Parameters:
mass_info – mass_info
- Returns:
StructArray
- masskit.data_specs.schemas.name_to_type(schema, type_name)¶
convert field name to arrow type of field
- Parameters:
schema – schema to search
type_name – name of type
- Returns:
type
- masskit.data_specs.schemas.populate_properties(class_in, fields=[pyarrow.Field<id: uint64>, pyarrow.Field<instrument: string>, pyarrow.Field<instrument_type: string>, pyarrow.Field<instrument_model: string>, pyarrow.Field<ion_mode: string>, pyarrow.Field<ionization: string>, pyarrow.Field<name: string>, pyarrow.Field<casno: string>, pyarrow.Field<synonyms: string>, pyarrow.Field<scan: string>, pyarrow.Field<collision_energy: float>, pyarrow.Field<retention_time: double>, pyarrow.Field<collision_gas: string>, pyarrow.Field<insource_voltage: int64>, pyarrow.Field<sample_inlet: string>, pyarrow.Field<ev: float>, pyarrow.Field<nce: float>, pyarrow.Field<charge: int16>, pyarrow.Field<precursor_mz: double>, pyarrow.Field<exact_mass: double>, pyarrow.Field<exact_mw: double>, pyarrow.Field<set: dictionary<values=string, indices=int32, ordered=0>>, pyarrow.Field<composition: dictionary<values=string, indices=int32, ordered=0>>, pyarrow.Field<spectrum_fp: large_list<item: uint8>>, pyarrow.Field<spectrum_fp_count: int32>, pyarrow.Field<peptide: string>, pyarrow.Field<peptide_len: int32>, pyarrow.Field<peptide_type: dictionary<values=string, indices=int32, ordered=0>>, pyarrow.Field<mod_names: large_list<item: int16>>, pyarrow.Field<mod_positions: large_list<item: int32>>, pyarrow.Field<protein_id: large_list<item: string>>, pyarrow.Field<column: string>, pyarrow.Field<experimental_ri: float>, pyarrow.Field<experimental_ri_data: int32>, pyarrow.Field<experimental_ri_error: float>, pyarrow.Field<stdnp: float>, pyarrow.Field<stdnp_data: int32>, pyarrow.Field<stdnp_error: float>, pyarrow.Field<stdpolar: float>, pyarrow.Field<stdpolar_data: int32>, pyarrow.Field<stdpolar_error: float>, pyarrow.Field<vial_id: int64>, pyarrow.Field<aromatic_rings: int32>, pyarrow.Field<ecfp4: large_list<item: uint8>>, pyarrow.Field<ecfp4_count: int32>, pyarrow.Field<estimated_ri: float>, pyarrow.Field<estimated_ri_error: float>, pyarrow.Field<estimated_ri_stdnp: float>, pyarrow.Field<estimated_ri_stdnp_error: float>, pyarrow.Field<estimated_ri_stdpolar: float>, pyarrow.Field<estimated_ri_stdpolar_error: float>, pyarrow.Field<formula: string>, pyarrow.Field<has_2d: bool>, pyarrow.Field<has_conformer: bool>, pyarrow.Field<has_tms: int32>, pyarrow.Field<hba: int32>, pyarrow.Field<hbd: int32>, pyarrow.Field<inchi_key: string>, pyarrow.Field<inchi_key_orig: string>, pyarrow.Field<isomeric_smiles: string>, pyarrow.Field<num_atoms: int32>, pyarrow.Field<num_undef_double: int32>, pyarrow.Field<num_undef_stereo: int32>, pyarrow.Field<rotatable_bonds: int32>, pyarrow.Field<smiles: string>, pyarrow.Field<tpsa: float>, pyarrow.Field<logp: float>, pyarrow.Field<fragments: int32>])¶
given a class (or any object), create a set of properties from a list of fields
- Parameters:
class_in – the class/object to be modified
fields – a list of pyarrow fields whose names will be used to create properties
- masskit.data_specs.schemas.set_field_int_metadata(schema, field: str, name: str, value: int)¶
returns arrow schema with metadata with name set to int value in the metadata of field
- Parameters:
schema – input schema
field – name of field
name – name of metadata
value – value of field (will be stored as bytes)
- Returns:
new schema updated from input schema
- masskit.data_specs.schemas.subtract_fields(field_list, fields2bsubtracted)¶
delete fields from a field list
- Parameters:
field_list – field list to be edited
fields2bsubtracted – list of fields to be deleted
- Returns:
edited field list
masskit.data_specs.spectral_library module¶
- class masskit.data_specs.spectral_library.LibraryAccessor(pandas_obj)¶
Bases:
object
the base pandas accessor class. To use the accessor, import LibraryAccessor into your code.
Notes: = to add info to the dataframe itself, use _obj.tandem_peptide_library.__dict__[‘info’] = info - in the future, by caching the record id, the serialization functions can be modified to read in chunks
- copy()¶
- display()¶
- masskit.data_specs.spectral_library.display_masskit_df(df)¶