masskit_ai package¶
Subpackages¶
- masskit_ai.apps package
- masskit_ai.conf package
- masskit_ai.mol package
- Subpackages
- Submodules
- masskit_ai.mol.mol_datasets module
- masskit_ai.mol.mol_embed module
- masskit_ai.mol.mol_prediction module
- Module contents
- masskit_ai.spectrum package
- Subpackages
- Submodules
- masskit_ai.spectrum.spectrum_base_objects module
- masskit_ai.spectrum.spectrum_datasets module
- masskit_ai.spectrum.spectrum_embed module
Embed1D
Embed1D.charge_channels()
Embed1D.charge_embed()
Embed1D.charge_singleton_channels()
Embed1D.charge_singleton_embed()
Embed1D.ev_channels()
Embed1D.ev_embed()
Embed1D.ev_singleton_channels()
Embed1D.ev_singleton_embed()
Embed1D.nce_channels()
Embed1D.nce_embed()
Embed1D.nce_singleton_channels()
Embed1D.nce_singleton_embed()
- masskit_ai.spectrum.spectrum_lightning module
BaseSpectrumLightningModule
BaseSpectrumLightningModule.calc_loss()
BaseSpectrumLightningModule.configure_optimizers()
BaseSpectrumLightningModule.forward()
BaseSpectrumLightningModule.on_test_epoch_end()
BaseSpectrumLightningModule.on_train_epoch_end()
BaseSpectrumLightningModule.on_validation_epoch_end()
BaseSpectrumLightningModule.test_step()
BaseSpectrumLightningModule.training_step()
BaseSpectrumLightningModule.training_step_end()
BaseSpectrumLightningModule.validation_step()
BaseSpectrumLightningModule.validation_test_epoch_end()
BaseSpectrumLightningModule.validation_test_step()
SpectrumLightningModule
- masskit_ai.spectrum.spectrum_losses module
- masskit_ai.spectrum.spectrum_prediction module
PeptideSpectrumPredictor
PeptideSpectrumPredictor.add_item()
PeptideSpectrumPredictor.create_dataloaders()
PeptideSpectrumPredictor.create_items()
PeptideSpectrumPredictor.create_mz_tolerance()
PeptideSpectrumPredictor.finalize_items()
PeptideSpectrumPredictor.make_spectrum()
PeptideSpectrumPredictor.single_prediction()
PeptideSpectrumPredictor.write_items()
SinglePeptideSpectrumPredictor
finalize_spectrum()
- Module contents
- masskit_ai.test_fixtures package
Submodules¶
masskit_ai.base_datasets module¶
- class masskit_ai.base_datasets.BaseDataset(*args: Any, **kwargs: Any)¶
Bases:
Dataset
,ABC
abstract base class for NIST Dataset Notes: only one of these is created per GPU per epoch (or per entire run?)
- property data¶
- abstract get_data_row(index)¶
given the index, return corresponding data for the index
- get_x(data_row)¶
given the data row, return the input to the network
- abstract get_y(data_row)¶
given the data row, return the target of the network
- init_copy(worker_id=0, num_workers=1)¶
initialize a copy of a Dataset. Pytorch lightning constructs Datasets in the main thread, then forks, so to initialize a Dataset, this funcion is called in the worker_init_fn
- Parameters:
worker_id – worker id of the thread, defaults to 0
num_workers – number of workers total, defaults to 1
- to_pandas()¶
return data as pandas dataframe
- Raises:
NotImplementedError – not implemented
- class masskit_ai.base_datasets.DataframeDataset(*args: Any, **kwargs: Any)¶
Bases:
BaseDataset
dataset for a dataframe
- get_data_row(index)¶
given the index, return corresponding data for the index
- get_y(data_row)¶
given the data row, return the target of the network
- to_pandas()¶
return data as pandas dataframe
- Raises:
NotImplementedError – not implemented
masskit_ai.base_losses module¶
- class masskit_ai.base_losses.BaseLoss(*args: Any, **kwargs: Any)¶
Bases:
Module
,ABC
abstract base class for losses loss is implemented as a pytorch module
- abstract forward(output, batch, params=None) torch.Tensor ¶
calculate the loss
- Parameters:
output – output dictionary from the model
batch – batch data from the dataloader
params – optional dictionary of parameters, such as epoch type
- Returns:
loss tensor
- class masskit_ai.base_losses.L1Loss(*args: Any, **kwargs: Any)¶
Bases:
BaseLoss
l1 loss
- forward(output, batch, params=None) torch.Tensor ¶
calculate the loss
- Parameters:
output – output dictionary from the model
batch – batch data from the dataloader
params – optional dictionary of parameters, such as epoch type
- Returns:
loss tensor
- class masskit_ai.base_losses.MSEKLLoss(*args: Any, **kwargs: Any)¶
Bases:
BaseLoss
mean square error plus kl divergence
- forward(output, batch, params=None) torch.Tensor ¶
calculate the loss
- Parameters:
output – output dictionary from the model
batch – batch data from the dataloader
params – optional dictionary of parameters, such as epoch type
- Returns:
loss tensor
- class masskit_ai.base_losses.MSELoss(*args: Any, **kwargs: Any)¶
Bases:
BaseLoss
mean square error
- forward(output, batch, params=None) torch.Tensor ¶
calculate the loss
- Parameters:
output – output dictionary from the model
batch – batch data from the dataloader
params – optional dictionary of parameters, such as epoch type
- Returns:
loss tensor
- masskit_ai.base_losses.get_namedtuple_dict(container, key)¶
masskit_ai.base_objects module¶
masskit_ai.callbacks module¶
masskit_ai.embed module¶
- class masskit_ai.embed.BasicEmbed(config)¶
Bases:
ABC
base embedding class
- property channels¶
return the number of channels in the encoding
- Returns:
the number of channels
- embed(row)¶
call the requested embedding functions as listed in config.ml.embedding.embeddings
- Parameters:
row – the data row
- Returns:
the concatenated one hot tensor of the embeddings
- static list2one_hot(list_in, num_classes)¶
convert a list of integers into a one hot tensor
- Parameters:
list_in – the list of integers
num_classes – the number of classes
- Returns:
the one hot tensor
- class masskit_ai.embed.Embed(config)¶
Bases:
BasicEmbed
generic embedding
each embedding has a member function ending in _embed to create the embedding and another member function ending in _channels that gives the number of channels in the embedding both functions take a dict called “row”
- property channels¶
return the number of channels in the encoding
- Returns:
the number of channels
- embed(row)¶
call the requested embedding functions as listed in config.ml.embedding.embeddings
- Parameters:
row – the data row
- Returns:
the concatenated one hot tensor of the embeddings
masskit_ai.lightning module¶
- class masskit_ai.lightning.BaseDataModule(*args: Any, **kwargs: Any)¶
Bases:
LightningDataModule
base class for data loading
- create_loader(set_to_load=None)¶
- setup(stage=None)¶
called on every GPU
- Parameters:
stage – is set to “fit” or “test”
- Returns:
self
- test_dataloader()¶
- train_dataloader()¶
- val_dataloader()¶
- class masskit_ai.lightning.MasskitDataModule(*args: Any, **kwargs: Any)¶
Bases:
BaseDataModule
standard data module that creates pytorch DataLoader(s). The DataLoader(s) in turn contain pytorch Dataset(s)
- create_loader(set_to_load=None)¶
helper function to load data
- Parameters:
set_to_load – name of the set to load
- Returns:
loader or list of loaders
- get_subsets(set_to_load)¶
create datasets
- Parameters:
set_to_load – train, valid or test dataset
- Returns:
a list of datasets
- class masskit_ai.lightning.SpectrumDataModule(*args: Any, **kwargs: Any)¶
Bases:
MasskitDataModule
- class masskit_ai.lightning.XORDataModule(*args: Any, **kwargs: Any)¶
Bases:
BaseDataModule
data loader for XOR toy network
- create_loader(set_to_load=None)¶
- masskit_ai.lightning.get_pytorch_ranks()¶
get ranks for this process when training in parallel
- Returns:
is this training parallel?, world_rank, world_size, num_gpus, num_nodes, node_rank, local_rank, worker_id
- masskit_ai.lightning.log_worker_start(worker_id)¶
function for initializing the Dataset Notes: since we are handling the sharding ourselves, it’s necessary to disable adding of Distributed Sampler in Trainer by using replace_sampler_ddp=False
- Parameters:
worker_id – worker rank
- masskit_ai.lightning.seed_worker()¶
set the random seed for this worker. In future versions of pytorch lightning, can be replaced with pl_worker_init_function
- masskit_ai.lightning.setup_datamodule(config)¶
set up a datamodule from config using specified collate_fn factory
- Parameters:
config – configuration
masskit_ai.loggers module¶
- class masskit_ai.loggers.MSMLFlowLogger(*args: Any, **kwargs: Any)¶
Bases:
MLFlowLogger
- close(artifacts=None, *args, **kwargs)¶
close the log, saving tensorboard artifacts if available to mlflow
- Parameters:
artifacts – directory with artifact directries to log
- log_figure(figure_tag, fig, global_step=None)¶
log a matplotlib figure as an artifact
- Parameters:
figure_tag – name of the image
fig – the matplotlib figure
global_step – the epoch
- log_image_file(filename, fig=None, global_step=None)¶
log an image file, including animated gifs, or, if the logger does not support saving files, log an image of the matplotlib figure
- Parameters:
fig – matplotlib figure
filename – what to name the image
global_step – epoch
- log_params_from_omegaconf_dict(params)¶
recursively examine omegaconf object
- Parameters:
params – omegaconf object
- log_params_to_mlflow()¶
log typical experiment parameters to mlflow
- mlf_log_string(string_in, filename)¶
log omegaconf to mlflow
- Parameters:
string_in – string to write
filename – name of the artifact file
- mlf_set_tag(tag, config_value, process_value=None)¶
set a particular tag with the config value. if null config value, then use process value
- Parameters:
tag – the tag
config_value – the configuration value
process_value – the process value
- Returns:
- mlf_setup_tags()¶
create tags for logging as MLflow doesn’t set these standard mlflow tags
- class masskit_ai.loggers.MSTensorBoardLogger(*args: Any, **kwargs: Any)¶
Bases:
TensorBoardLogger
- close(*args, **kwargs)¶
close the logger, adding graph to the log. currently disabled as there are multiple problems with using torch.jit.trace.
- Returns:
- log_figure(figure_tag, fig, global_step=None)¶
log a matplotlib figure as an artifact
- Parameters:
figure_tag – name of the image
fig – the matplotlib figure
global_step – the epoch
- log_image_file(filename, fig=None, global_step=None)¶
log a list of images as an animated gif, or, if the logger does not support animated gifs, log an image of the matplotlib figure
- Parameters:
fig – matplotlib figure
filename – what to name the image
global_step – epoch
- masskit_ai.loggers.filter_pytorch_lightning_warnings()¶
masskit_ai.metrics module¶
- class masskit_ai.metrics.BaseLossMetric(*args: Any, **kwargs: Any)¶
Bases:
BaseMetric
base class for metrics
- compute()¶
- update(output, batch)¶
update the metric for each step. This is automatically called by forward() and all the arguments to forward are given to update.
- Parameters:
output – standard ModelOutput from model
batch – standard ModelInput batch information
- class masskit_ai.metrics.BaseMetric(*args: Any, **kwargs: Any)¶
Bases:
Metric
base class for metrics
- compute()¶
- static extract_spectra(output, batch)¶
Given the input and output to a model, extract the spectra
- Parameters:
output – model output
batch – model input
- Returns:
predicted spectra and true spectra as Tensors
- update(output, batch)¶
update during batch
- Parameters:
output – standard ModelOutput from model
batch – standard ModelInput batch information
- Returns:
Note: in the current version of torchmetrics, update() is called twice on each step, once to aggregate the current step to the accumulators and second to call compute() on the values for the current step (the value of the accumulators is stashed and restored during this last process). This is only done when compute_on_step is true. In future versions of torchmetrics, this behavior will become optional: https://github.com/PyTorchLightning/metrics/issues/344. 2021-09-07
- class masskit_ai.metrics.KLMetric(*args: Any, **kwargs: Any)¶
Bases:
BaseMetric
- compute()¶
- update(output, batch)¶
update during batch
- Parameters:
output – standard ModelOutput from model
batch – standard ModelInput batch information
- Returns:
Note: in the current version of torchmetrics, update() is called twice on each step, once to aggregate the current step to the accumulators and second to call compute() on the values for the current step (the value of the accumulators is stashed and restored during this last process). This is only done when compute_on_step is true. In future versions of torchmetrics, this behavior will become optional: https://github.com/PyTorchLightning/metrics/issues/344. 2021-09-07
- class masskit_ai.metrics.L1Metric(*args: Any, **kwargs: Any)¶
Bases:
BaseLossMetric
standard l1
- class masskit_ai.metrics.MSEMetric(*args: Any, **kwargs: Any)¶
Bases:
BaseLossMetric
standard mean squared error
- class masskit_ai.metrics.SpectrumCosineMetric(*args: Any, **kwargs: Any)¶
Bases:
BaseLossMetric
- class masskit_ai.metrics.SpectrumMSEMetric(*args: Any, **kwargs: Any)¶
Bases:
BaseLossMetric
- class masskit_ai.metrics.SpectrumNormalNLLMetric(*args: Any, **kwargs: Any)¶
Bases:
BaseLossMetric
masskit_ai.prediction module¶
- class masskit_ai.prediction.Predictor(config=None, *args, **kwargs)¶
Bases:
ABC
- abstract add_item(item_idx, item)¶
- apply_dropout(model)¶
for use by torch apply to turn on dropout in a model in eval mode.
- Parameters:
model – the model
- abstract create_dataloaders(model)¶
- abstract create_items(dataloader_idx, start)¶
create items for holding predictions
- Parameters:
dataloader_idx – the index of the dataloader in self.dataloaders
start – the start row of the batch
- abstract finalize_items(dataloader_idx, start)¶
- load_model(model_name)¶
- prep_model_for_prediction(model)¶
prepare the model for inference
- Parameters:
model – the model
dropout – should dropout be turned on?
- abstract single_prediction(model, item_idx, dataloader_idx)¶
- abstract write_items(dataloader_idx, start)¶
masskit_ai.samplers module¶
- class masskit_ai.samplers.BaseSampler(dataset)¶
Bases:
ABC
base class for sampler. used by each worker thread to select which records should be included in the epoch data for that worker.
- abstract probability()¶
method to compute the probability of sampling a particular record
- Returns:
numpy array with the probability of sampling, from [0,1]
- class masskit_ai.samplers.DatasetFromSampler(*args: Any, **kwargs: Any)¶
Bases:
Dataset
Dataset to create indexes from Sampler
- class masskit_ai.samplers.DistributedSamplerWrapper(*args: Any, **kwargs: Any)¶
Bases:
DistributedSampler
Wrapper over Sampler for distributed training. Allows you to use any sampler in distributed mode.
It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSamplerWrapper instance as a DataLoader sampler, and load a subset of subsampled data of the original dataset that is exclusive to it.
Note
Sampler is assumed to be of constant size.
Module contents¶
- class masskit_ai.DeviceMode(*args: Any, **kwargs: Any)¶
Bases:
TorchFunctionMode
- masskit_ai.set_torch_config(torch_device=None) None ¶