artifacts#

Note

See the Glossary for the meaning of the acronyms used in this guide.

A task plugins collection for working with artifacts generated by Dioptra.

mlflow.py#

A task plugin module for MLFlow artifacts management.

This module contains a set of task plugins for managing artifacts generated during an entry point run.

download_all_artifacts_in_run(run_id: str, artifact_path: str, destination_path: Optional[str] = None) str[source]#

Downloads an artifact file or directory from a previous MLFlow run.

Parameters
  • run_id – The unique identifier of a previous MLFlow run.

  • artifact_path – The relative source path to the desired artifact.

  • destination_path – The relative destination path where the artifacts will be downloaded. If None, the artifacts will be downloaded to a new uniquely-named directory on the local filesystem. The default is None.

Returns

A string pointing to the directory containing the downloaded artifacts.

See also

  • mlflow.tracking.MlflowClient.download_artifacts()

upload_data_frame_artifact(data_frame: pandas.DataFrame, file_name: str, file_format: str, file_format_kwargs: Optional[Dict[str, Any]] = None, working_dir: Optional[Union[str, pathlib.Path]] = None) None[source]#

Uploads a DataFrame as an artifact of the active MLFlow run.

The file_format argument selects the DataFrame serializer, which are all handled using the object’s DataFrame.to_{format} methods. The string passed to file_format must match one of the following,

  • csv[.bz2|.gz|.xz] - A comma-separated values plain text file with optional compression.

  • feather - A binary feather file.

  • json - A plain text JSON file.

  • pickle - A binary pickle file.

Parameters
  • data_frame – A DataFrame to be uploaded.

  • file_name – The filename to use for the serialized DataFrame.

  • file_format – The DataFrame file serialization format.

  • file_format_kwargs – A dictionary of additional keyword arguments to pass to the serializer. If None, then no additional keyword arguments are passed. The default is None.

  • working_dir – The location where the file should be saved. If None, then the current working directory is used. The default is None.

Notes

The pyarrow package must be installed in order to serialize to the feather format.

upload_directory_as_tarball_artifact(source_dir: Union[str, pathlib.Path], tarball_filename: str, tarball_write_mode: str = 'w:gz', working_dir: Optional[Union[str, pathlib.Path]] = None) None[source]#

Archives a directory and uploads it as an artifact of the active MLFlow run.

Parameters
  • source_dir – The directory which should be uploaded.

  • tarball_filename – The filename to use for the archived directory tarball.

  • tarball_write_mode – The write mode for the tarball, see tarfile.open() for the full list of compression options. The default is “w:gz” (gzip compression).

  • working_dir – The location where the file should be saved. If None, then the current working directory is used. The default is None.

See also

upload_file_as_artifact(artifact_path: Union[str, pathlib.Path]) None[source]#

Uploads a file as an artifact of the active MLFlow run.

Parameters

artifact_path – The location of the file to be uploaded.

utils.py#

A task plugin module containing generic utilities for managing artifacts.

is_within_directory(directory: Union[str, pathlib.Path], target: Union[str, pathlib.Path]) bool[source]#
safe_extract(tar: tarfile.TarFile, path: Union[str, pathlib.Path] = '.') None[source]#
extract_tarfile(filepath: Union[str, pathlib.Path], tarball_read_mode: str = 'r:gz', output_dir: Optional[Any] = None) None[source]#

Extracts a tarball archive into the current working directory.

Parameters
  • filepath – The location of the tarball archive file provided as a string or a Path object.

  • tarball_read_mode – The read mode for the tarball, see tarfile.open() for the full list of compression options. The default is “r:gz” (gzip compression).

See also

make_directories(dirs: List[Union[str, pathlib.Path]]) None[source]#

Creates directories if they do not exist.

Parameters

dirs – A list of directories provided as strings or Path objects.

extract_tarfile_in_unique_subdir(filepath: Union[str, pathlib.Path], tarball_read_mode: str = 'r:gz') pathlib.Path[source]#

Extracts a tarball archive into a unique subdirectory of the current working directory.

Parameters
  • filepath – The location of the tarball archive file provided as a string or a Path object.

  • tarball_read_mode – The read mode for the tarball, see tarfile.open() for the full list of compression options. The default is “r:gz” (gzip compression).

See also

exceptions.py#

A task plugin module of exceptions for the artifacts plugins collection.

exception UnsupportedDataFrameFileFormatError[source]#

Bases: dioptra.sdk.exceptions.base.BaseTaskPluginError

The requested data frame file format is not supported.