masskit_ai package

Subpackages

Submodules

masskit_ai.base_datasets module

class masskit_ai.base_datasets.BaseDataset(*args: Any, **kwargs: Any)

Bases: Dataset, ABC

abstract base class for NIST Dataset Notes: only one of these is created per GPU per epoch (or per entire run?)

property data
abstract get_data_row(index)

given the index, return corresponding data for the index

get_x(data_row)

given the data row, return the input to the network

abstract get_y(data_row)

given the data row, return the target of the network

init_copy(worker_id=0, num_workers=1)

initialize a copy of a Dataset. Pytorch lightning constructs Datasets in the main thread, then forks, so to initialize a Dataset, this funcion is called in the worker_init_fn

Parameters:
  • worker_id – worker id of the thread, defaults to 0

  • num_workers – number of workers total, defaults to 1

to_pandas()

return data as pandas dataframe

Raises:

NotImplementedError – not implemented

class masskit_ai.base_datasets.DataframeDataset(*args: Any, **kwargs: Any)

Bases: BaseDataset

dataset for a dataframe

get_data_row(index)

given the index, return corresponding data for the index

get_y(data_row)

given the data row, return the target of the network

to_pandas()

return data as pandas dataframe

Raises:

NotImplementedError – not implemented

masskit_ai.base_losses module

class masskit_ai.base_losses.BaseLoss(*args: Any, **kwargs: Any)

Bases: Module, ABC

abstract base class for losses loss is implemented as a pytorch module

abstract forward(output, batch, params=None) torch.Tensor

calculate the loss

Parameters:
  • output – output dictionary from the model

  • batch – batch data from the dataloader

  • params – optional dictionary of parameters, such as epoch type

Returns:

loss tensor

class masskit_ai.base_losses.L1Loss(*args: Any, **kwargs: Any)

Bases: BaseLoss

l1 loss

forward(output, batch, params=None) torch.Tensor

calculate the loss

Parameters:
  • output – output dictionary from the model

  • batch – batch data from the dataloader

  • params – optional dictionary of parameters, such as epoch type

Returns:

loss tensor

class masskit_ai.base_losses.MSEKLLoss(*args: Any, **kwargs: Any)

Bases: BaseLoss

mean square error plus kl divergence

forward(output, batch, params=None) torch.Tensor

calculate the loss

Parameters:
  • output – output dictionary from the model

  • batch – batch data from the dataloader

  • params – optional dictionary of parameters, such as epoch type

Returns:

loss tensor

class masskit_ai.base_losses.MSELoss(*args: Any, **kwargs: Any)

Bases: BaseLoss

mean square error

forward(output, batch, params=None) torch.Tensor

calculate the loss

Parameters:
  • output – output dictionary from the model

  • batch – batch data from the dataloader

  • params – optional dictionary of parameters, such as epoch type

Returns:

loss tensor

masskit_ai.base_losses.get_namedtuple_dict(container, key)

masskit_ai.base_objects module

class masskit_ai.base_objects.ModelInput(x, y, index)

Bases: tuple

index

Alias for field number 2

x

Alias for field number 0

y

Alias for field number 1

class masskit_ai.base_objects.ModelOutput(y_prime, score, var)

Bases: tuple

score

Alias for field number 1

var

Alias for field number 2

y_prime

Alias for field number 0

masskit_ai.callbacks module

class masskit_ai.callbacks.ConcatenateIdLogs(*args: Any, **kwargs: Any)

Bases: Callback

used to concatenate log files created from each training worker thread

on_train_epoch_end(trainer, pl_module, outputs)
class masskit_ai.callbacks.ModelCheckpointOnStart(*args: Any, **kwargs: Any)

Bases: ModelCheckpoint

used to save parameters before training begins

on_train_start(trainer, pl_module)

masskit_ai.embed module

class masskit_ai.embed.BasicEmbed(config)

Bases: ABC

base embedding class

property channels

return the number of channels in the encoding

Returns:

the number of channels

embed(row)

call the requested embedding functions as listed in config.ml.embedding.embeddings

Parameters:

row – the data row

Returns:

the concatenated one hot tensor of the embeddings

static list2one_hot(list_in, num_classes)

convert a list of integers into a one hot tensor

Parameters:
  • list_in – the list of integers

  • num_classes – the number of classes

Returns:

the one hot tensor

class masskit_ai.embed.Embed(config)

Bases: BasicEmbed

generic embedding

each embedding has a member function ending in _embed to create the embedding and another member function ending in _channels that gives the number of channels in the embedding both functions take a dict called “row”

property channels

return the number of channels in the encoding

Returns:

the number of channels

embed(row)

call the requested embedding functions as listed in config.ml.embedding.embeddings

Parameters:

row – the data row

Returns:

the concatenated one hot tensor of the embeddings

class masskit_ai.embed.EmbedXor(config)

Bases: Embed

embedding for toy xor network

static xor_channels()

the number of nce channels

Returns:

the number of nce_float channels

xor_embed(row)

embed the nce as a single float value from 0 to 1

Parameters:

row – data record

Returns:

FloatTensor

masskit_ai.lightning module

class masskit_ai.lightning.BaseDataModule(*args: Any, **kwargs: Any)

Bases: LightningDataModule

base class for data loading

create_loader(set_to_load=None)
setup(stage=None)

called on every GPU

Parameters:

stage – is set to “fit” or “test”

Returns:

self

test_dataloader()
train_dataloader()
val_dataloader()
class masskit_ai.lightning.MasskitDataModule(*args: Any, **kwargs: Any)

Bases: BaseDataModule

standard data module that creates pytorch DataLoader(s). The DataLoader(s) in turn contain pytorch Dataset(s)

create_loader(set_to_load=None)

helper function to load data

Parameters:

set_to_load – name of the set to load

Returns:

loader or list of loaders

get_subsets(set_to_load)

create datasets

Parameters:

set_to_load – train, valid or test dataset

Returns:

a list of datasets

class masskit_ai.lightning.SpectrumDataModule(*args: Any, **kwargs: Any)

Bases: MasskitDataModule

class masskit_ai.lightning.XORDataModule(*args: Any, **kwargs: Any)

Bases: BaseDataModule

data loader for XOR toy network

create_loader(set_to_load=None)
masskit_ai.lightning.get_pytorch_ranks()

get ranks for this process when training in parallel

Returns:

is this training parallel?, world_rank, world_size, num_gpus, num_nodes, node_rank, local_rank, worker_id

masskit_ai.lightning.log_worker_start(worker_id)

function for initializing the Dataset Notes: since we are handling the sharding ourselves, it’s necessary to disable adding of Distributed Sampler in Trainer by using replace_sampler_ddp=False

Parameters:

worker_id – worker rank

masskit_ai.lightning.seed_worker()

set the random seed for this worker. In future versions of pytorch lightning, can be replaced with pl_worker_init_function

masskit_ai.lightning.setup_datamodule(config)

set up a datamodule from config using specified collate_fn factory

Parameters:

config – configuration

masskit_ai.loggers module

class masskit_ai.loggers.MSMLFlowLogger(*args: Any, **kwargs: Any)

Bases: MLFlowLogger

close(artifacts=None, *args, **kwargs)

close the log, saving tensorboard artifacts if available to mlflow

Parameters:

artifacts – directory with artifact directries to log

log_figure(figure_tag, fig, global_step=None)

log a matplotlib figure as an artifact

Parameters:
  • figure_tag – name of the image

  • fig – the matplotlib figure

  • global_step – the epoch

log_image_file(filename, fig=None, global_step=None)

log an image file, including animated gifs, or, if the logger does not support saving files, log an image of the matplotlib figure

Parameters:
  • fig – matplotlib figure

  • filename – what to name the image

  • global_step – epoch

log_params_from_omegaconf_dict(params)

recursively examine omegaconf object

Parameters:

params – omegaconf object

log_params_to_mlflow()

log typical experiment parameters to mlflow

mlf_log_string(string_in, filename)

log omegaconf to mlflow

Parameters:
  • string_in – string to write

  • filename – name of the artifact file

mlf_set_tag(tag, config_value, process_value=None)

set a particular tag with the config value. if null config value, then use process value

Parameters:
  • tag – the tag

  • config_value – the configuration value

  • process_value – the process value

Returns:

mlf_setup_tags()

create tags for logging as MLflow doesn’t set these standard mlflow tags

class masskit_ai.loggers.MSTensorBoardLogger(*args: Any, **kwargs: Any)

Bases: TensorBoardLogger

close(*args, **kwargs)

close the logger, adding graph to the log. currently disabled as there are multiple problems with using torch.jit.trace.

Returns:

log_figure(figure_tag, fig, global_step=None)

log a matplotlib figure as an artifact

Parameters:
  • figure_tag – name of the image

  • fig – the matplotlib figure

  • global_step – the epoch

log_image_file(filename, fig=None, global_step=None)

log a list of images as an animated gif, or, if the logger does not support animated gifs, log an image of the matplotlib figure

Parameters:
  • fig – matplotlib figure

  • filename – what to name the image

  • global_step – epoch

masskit_ai.loggers.filter_pytorch_lightning_warnings()

masskit_ai.metrics module

class masskit_ai.metrics.BaseLossMetric(*args: Any, **kwargs: Any)

Bases: BaseMetric

base class for metrics

compute()
update(output, batch)

update the metric for each step. This is automatically called by forward() and all the arguments to forward are given to update.

Parameters:
  • output – standard ModelOutput from model

  • batch – standard ModelInput batch information

class masskit_ai.metrics.BaseMetric(*args: Any, **kwargs: Any)

Bases: Metric

base class for metrics

compute()
static extract_spectra(output, batch)

Given the input and output to a model, extract the spectra

Parameters:
  • output – model output

  • batch – model input

Returns:

predicted spectra and true spectra as Tensors

update(output, batch)

update during batch

Parameters:
  • output – standard ModelOutput from model

  • batch – standard ModelInput batch information

Returns:

Note: in the current version of torchmetrics, update() is called twice on each step, once to aggregate the current step to the accumulators and second to call compute() on the values for the current step (the value of the accumulators is stashed and restored during this last process). This is only done when compute_on_step is true. In future versions of torchmetrics, this behavior will become optional: https://github.com/PyTorchLightning/metrics/issues/344. 2021-09-07

class masskit_ai.metrics.KLMetric(*args: Any, **kwargs: Any)

Bases: BaseMetric

compute()
update(output, batch)

update during batch

Parameters:
  • output – standard ModelOutput from model

  • batch – standard ModelInput batch information

Returns:

Note: in the current version of torchmetrics, update() is called twice on each step, once to aggregate the current step to the accumulators and second to call compute() on the values for the current step (the value of the accumulators is stashed and restored during this last process). This is only done when compute_on_step is true. In future versions of torchmetrics, this behavior will become optional: https://github.com/PyTorchLightning/metrics/issues/344. 2021-09-07

class masskit_ai.metrics.L1Metric(*args: Any, **kwargs: Any)

Bases: BaseLossMetric

standard l1

class masskit_ai.metrics.MSEMetric(*args: Any, **kwargs: Any)

Bases: BaseLossMetric

standard mean squared error

class masskit_ai.metrics.SpectrumCosineMetric(*args: Any, **kwargs: Any)

Bases: BaseLossMetric

class masskit_ai.metrics.SpectrumMSEMetric(*args: Any, **kwargs: Any)

Bases: BaseLossMetric

class masskit_ai.metrics.SpectrumNormalNLLMetric(*args: Any, **kwargs: Any)

Bases: BaseLossMetric

masskit_ai.prediction module

class masskit_ai.prediction.Predictor(config=None, *args, **kwargs)

Bases: ABC

abstract add_item(item_idx, item)
apply_dropout(model)

for use by torch apply to turn on dropout in a model in eval mode.

Parameters:

model – the model

abstract create_dataloaders(model)
abstract create_items(dataloader_idx, start)

create items for holding predictions

Parameters:
  • dataloader_idx – the index of the dataloader in self.dataloaders

  • start – the start row of the batch

abstract finalize_items(dataloader_idx, start)
load_model(model_name)
prep_model_for_prediction(model)

prepare the model for inference

Parameters:
  • model – the model

  • dropout – should dropout be turned on?

abstract single_prediction(model, item_idx, dataloader_idx)
abstract write_items(dataloader_idx, start)

masskit_ai.samplers module

class masskit_ai.samplers.BaseSampler(dataset)

Bases: ABC

base class for sampler. used by each worker thread to select which records should be included in the epoch data for that worker.

abstract probability()

method to compute the probability of sampling a particular record

Returns:

numpy array with the probability of sampling, from [0,1]

class masskit_ai.samplers.DatasetFromSampler(*args: Any, **kwargs: Any)

Bases: Dataset

Dataset to create indexes from Sampler

class masskit_ai.samplers.DistributedSamplerWrapper(*args: Any, **kwargs: Any)

Bases: DistributedSampler

Wrapper over Sampler for distributed training. Allows you to use any sampler in distributed mode.

It is especially useful in conjunction with torch.nn.parallel.DistributedDataParallel. In such case, each process can pass a DistributedSamplerWrapper instance as a DataLoader sampler, and load a subset of subsampled data of the original dataset that is exclusive to it.

Note

Sampler is assumed to be of constant size.

Module contents

class masskit_ai.DeviceMode(*args: Any, **kwargs: Any)

Bases: TorchFunctionMode

masskit_ai.set_torch_config(torch_device=None) None