artifacts#
Note
See the Glossary for the meaning of the acronyms used in this guide.
A task plugins collection for working with artifacts generated by Dioptra.
mlflow.py#
A task plugin module for MLFlow artifacts management.
This module contains a set of task plugins for managing artifacts generated during an entry point run.
- download_all_artifacts_in_run(run_id: str, artifact_path: str, destination_path: Optional[str] = None) str [source]#
Downloads an artifact file or directory from a previous MLFlow run.
- Parameters
run_id – The unique identifier of a previous MLFlow run.
artifact_path – The relative source path to the desired artifact.
destination_path – The relative destination path where the artifacts will be downloaded. If None, the artifacts will be downloaded to a new uniquely-named directory on the local filesystem. The default is None.
- Returns
A string pointing to the directory containing the downloaded artifacts.
See also
mlflow.tracking.MlflowClient.download_artifacts()
- upload_data_frame_artifact(data_frame: pandas.DataFrame, file_name: str, file_format: str, file_format_kwargs: Optional[Dict[str, Any]] = None, working_dir: Optional[Union[str, pathlib.Path]] = None) None [source]#
Uploads a
DataFrame
as an artifact of the active MLFlow run.The file_format argument selects the
DataFrame
serializer, which are all handled using the object’s DataFrame.to_{format} methods. The string passed to file_format must match one of the following,csv[.bz2|.gz|.xz] - A comma-separated values plain text file with optional compression.
feather - A binary feather file.
json - A plain text JSON file.
pickle - A binary pickle file.
- Parameters
data_frame – A
DataFrame
to be uploaded.file_name – The filename to use for the serialized
DataFrame
.file_format – The
DataFrame
file serialization format.file_format_kwargs – A dictionary of additional keyword arguments to pass to the serializer. If None, then no additional keyword arguments are passed. The default is None.
working_dir – The location where the file should be saved. If None, then the current working directory is used. The default is None.
Notes
The
pyarrow
package must be installed in order to serialize to the feather format.
- upload_directory_as_tarball_artifact(source_dir: Union[str, pathlib.Path], tarball_filename: str, tarball_write_mode: str = 'w:gz', working_dir: Optional[Union[str, pathlib.Path]] = None) None [source]#
Archives a directory and uploads it as an artifact of the active MLFlow run.
- Parameters
source_dir – The directory which should be uploaded.
tarball_filename – The filename to use for the archived directory tarball.
tarball_write_mode – The write mode for the tarball, see
tarfile.open()
for the full list of compression options. The default is “w:gz” (gzip compression).working_dir – The location where the file should be saved. If None, then the current working directory is used. The default is None.
See also
utils.py#
A task plugin module containing generic utilities for managing artifacts.
- is_within_directory(directory: Union[str, pathlib.Path], target: Union[str, pathlib.Path]) bool [source]#
- safe_extract(tar: tarfile.TarFile, path: Union[str, pathlib.Path] = '.') None [source]#
- extract_tarfile(filepath: Union[str, pathlib.Path], tarball_read_mode: str = 'r:gz', output_dir: Any = None) None [source]#
Extracts a tarball archive into the current working directory.
- Parameters
filepath – The location of the tarball archive file provided as a string or a
Path
object.tarball_read_mode – The read mode for the tarball, see
tarfile.open()
for the full list of compression options. The default is “r:gz” (gzip compression).
See also
- make_directories(dirs: List[Union[str, pathlib.Path]]) None [source]#
Creates directories if they do not exist.
- Parameters
dirs – A list of directories provided as strings or
Path
objects.
- extract_tarfile_in_unique_subdir(filepath: Union[str, pathlib.Path], tarball_read_mode: str = 'r:gz') pathlib.Path [source]#
Extracts a tarball archive into a unique subdirectory of the current working directory.
- Parameters
filepath – The location of the tarball archive file provided as a string or a
Path
object.tarball_read_mode – The read mode for the tarball, see
tarfile.open()
for the full list of compression options. The default is “r:gz” (gzip compression).
See also
exceptions.py#
A task plugin module of exceptions for the artifacts plugins collection.