rmellipse.workflows.archive_interface ===================================== .. py:module:: rmellipse.workflows.archive_interface Exceptions ---------- .. autoapisummary:: rmellipse.workflows.archive_interface.ReleaseRecordNotFoundError Classes ------- .. autoapisummary:: rmellipse.workflows.archive_interface.ArchiveInterface rmellipse.workflows.archive_interface.CDCSArchive rmellipse.workflows.archive_interface.FileSystemArchive Functions --------- .. autoapisummary:: rmellipse.workflows.archive_interface.is_url rmellipse.workflows.archive_interface.chunked_sha rmellipse.workflows.archive_interface.get_credentials rmellipse.workflows.archive_interface.get_interface Module Contents --------------- .. py:exception:: ReleaseRecordNotFoundError Bases: :py:obj:`Exception` Raise when a record is asked for but doesn't exist. Initialize self. See help(type(self)) for accurate signature. .. py:function:: is_url(input_string) .. py:function:: chunked_sha(f: io.BytesIO, chunk: int = 2**25) Generate a SHA1 for a file in chunks. :param f: :type f: io.BytesIO :param chunk: How many bytes to chunk at a time, by default 2^20 (~1MB) :type chunk: int, optional :returns: sha1 hash as a string. :rtype: str .. py:function:: get_credentials(host: str, user: str = None, password: str = None) Get credentials for a given host and user. Prompts user for missing username or password. Stores credentials in the system when provided by prompt or as function arguments. :param host: System host name, typically a URL. :type host: str :param user: Username, by default None :type user: str, optional :param password: Password, by default None :type password: str, optional :returns: * *user* -- str * *password* -- str .. py:function:: get_interface(host: str | pathlib.Path, user: str = None, password: str = None, resolve_relative_paths_to: str | pathlib.Path = Path.cwd()) -> ArchiveInterface Login to an archive, returns an ArchiveInterface. :param host: Path to the archive, url or file path. :type host: str | Path :param user: User name to the archive (if required) :type user: str, optional :param password: Passwords to the archive (if required). :type password: str, optional :returns: Object that implements the ArchiveInterface. :rtype: ArchiveInterface .. py:class:: ArchiveInterface(host: str | pathlib.Path, user: str = None, password: str = None) Bases: :py:obj:`abc.ABC` Interface for interacting through archives utilized by command line utility functions. Any archive should be written into this interface. :param ABC: _description_ .. py:attribute:: host .. py:attribute:: user :value: None .. py:attribute:: get_release_records .. py:method:: download_release(record: dict, workspace: str, max_threads: int, release_cache_folder: str | pathlib.Path, download_chunk_bytes: int, progress_bar: bool = True, file_status: bool = True) Download a release from the archive to cache_folder. Generates worker threads to download blobs using the download_blob function. Provideds progress bars if requested. :param record: _description_ :type record: dict :param max_threads: _description_ :type max_threads: int :param progress_bar: _description_, by default True :type progress_bar: bool, optional .. py:method:: download_blob(blob_pid: str, target_file: pathlib.Path, update_dict: dict, expected_size: int, release_record=dict, workspace=str, chunk_size=2**25) :abstractmethod: Stream a blob stored in the archive to your local computer. :param blob_pid: PID of the blob. :type blob_pid: str :param target_file: Target file in local path. :type target_file: Path :param update_dict: Empty dictionary, updated with the :type update_dict: dict :param expected_size: _description_ :type expected_size: int :param chunk_size: _description_, by default 2**25 :type chunk_size: _type_, optional :returns: _description_ :rtype: _type_ .. py:method:: process_and_upload_blob(posix_rel_path, working_dir, release_title_versionless, release_title, workspace_title, chunk_size, verbose=False) -> Tuple[str, int] :abstractmethod: Upload a blob to the archive. Returns the blob_PID and the number of bytes. :param posix_rel_path: _description_ :type posix_rel_path: _type_ :param working_dir: _description_ :type working_dir: _type_ :param release_title_versionless: _description_ :type release_title_versionless: _type_ :param release_title: _description_ :type release_title: _type_ :param workspace_title: _description_ :type workspace_title: _type_ :param chunk_size: _description_ :type chunk_size: _type_ :param verbose: _description_, by default False :type verbose: bool, optional :returns: * **blob_pid** (*str*) -- The PID of the blob (i.e. file) that was uploaded * **nbytes** (*int*) -- The size of the uploaded in bytes. .. py:method:: upload_blobs_and_update_mapping(release_title: str, release_title_versionless: str, workspace: str, project_mapping: dict, project_directory: str | pathlib.Path, max_threads: int, chunk_size: int, no_blobs: bool = False) Upload blobs to the archive and update the project mapping. The project mapping metadata for each blob is updated with information that is only determined at upload time (i.e. the PID, bytes). Should be called before upload release record. :param project_mapping: Project mapping dictionary. :type project_mapping: dict :param project_directory: Root directory of the project. :type project_directory: str | Path :param max_threads: Max threads for upload processes. Each blob gets its own process. :type max_threads: int :param chunk_size: Max size of chunks for uploading :type chunk_size: int :param no_blobs: Dont upload blobs, by default False, for debugging purposes only. :type no_blobs: bool, optional .. py:method:: upload_release_record(release_record: dict) -> str :abstractmethod: Upload a release record. The record should be assigned a PID during the upload process. :param title: Title of the release with version code. :type title: str :param release_record: Releae record dictionary. :type release_record: dict :returns: PID of uploaded object. :rtype: str .. py:class:: CDCSArchive(host: str, user: str, password: str = None) Bases: :py:obj:`ArchiveInterface` Archive Interface for a CDCS instance. .. py:attribute:: curator .. py:attribute:: supports_repeat_releases :value: True .. py:method:: get_release_records(title_versionless: str, version_expressions: list[str], workspace='Global Public Workspace') -> dict Get release records matching a version code. :param title_versionless: Title of the release with out the version. :type title_versionless: str :param version_expressions: Version code string (i.e [">=0.1.0","<0.2.0] or ["==0.1.2"]) :type version_expressions: str :param workspace: Workspace to look through, by default "Global Public Workspace" :type workspace: str, optional :returns: Dictionary of release records, keys are the {title}-v{version} format sorted from oldest to most recent version following semver. :rtype: dict .. py:method:: download_blob(blob_pid: str, target_file: str | pathlib.Path, update_dict: dict, expected_size: int, release_record=dict, workspace=str, chunk_size: int = 2**25) Download a blob from a PID to it's a local file. :param blob_pid: PID of blob, should be url. :type blob_pid: str :param target_file: Target path to download to. :type target_file: str | Path :param update_dict: Dictionary with {'size':0,'finished':false}, used to monitor the download process when spun up into threads. :type update_dict: dict :param expected_size: Expected size of the blob in bytes. :type expected_size: int :param chunk_size: Chunk size for downloading, by default 2**25 :type chunk_size: int, optional .. py:method:: process_and_upload_blob(posix_rel_path: pathlib.Path, working_dir: pathlib.Path, release_title: str, release_title_versionless: str, workspace_title: str, chunk_size: int, verbose: bool = False) -> tuple[str] Upload a file to a CDCS workspace. :param posix_rel_path: _description_ :type posix_rel_path: Path :param working_dir: Working directory of release, from with all paths are relative. :type working_dir: Path :param release_title: Name of the release with version code. Isn't required for the CDCS archive, but included for interface compatability. :type release_title: str :param release_title_versionless: Name of the release without the version code. :type release_title_versionless: str :param chunk_size: Chunking size. Not used :type chunk_size: int :param workspace_title: Name of the workspace to upload to. :type workspace_title: str :returns: * *blob_pid* -- blob_id * *nbytes* -- size of file in bytes .. py:method:: upload_release_record(release_record: dict, workspace: str) -> str Upload a release record. The record should be assigned a PID during the upload process. :param title: Title of the release with version code. :type title: str :param release_record: Releae record dictionary. :type release_record: dict :returns: PID of uploaded object. :rtype: str .. py:class:: FileSystemArchive(host: str, user: str) Bases: :py:obj:`ArchiveInterface` Archive Interface for a file system archive. File system archive, which is an archive stored just in a directory with a standard layout. Access permissions are based on file system persmissions in the system. .. py:attribute:: archive_path .. py:attribute:: host :value: '' .. py:method:: get_release_records(title_versionless: str, version_expressions: list[str], workspace='Global Public Workspace') -> dict Get release records matching a version code. :param title_versionless: Title of the release with out the version. :type title_versionless: str :param version_expressions: Version code string (i.e [">=0.1.0","<0.2.0] or ["==0.1.2"]) :type version_expressions: str :param workspace: Workspace to look through, by default "Global Public Workspace" :type workspace: str, optional :returns: Dictionary of release records, keys are the {title}-v{version} format sorted from oldest to most recent version following semver. :rtype: dict .. py:method:: download_blob(blob_pid: str, target_file: str | pathlib.Path, update_dict: dict, expected_size: int, release_record: dict, workspace: str, chunk_size: int = 2**25) Download a blob from a PID to it's a local file. :param blob_pid: PID of blob, should be url. :type blob_pid: str :param target_file: Target path to download to. :type target_file: str | Path :param update_dict: Dictionary with {'size':0,'finished':false}, used to monitor the download process when spun up into threads. :type update_dict: dict :param expected_size: Expected size of the blob in bytes. :type expected_size: int :param release_record: Full release record. :type release_record: dict :param workspace: Workspace to look through :type workspace: str, :param chunk_size: Chunk size for downloading, by default 2**25 :type chunk_size: int, optional .. py:method:: process_and_upload_blob(posix_rel_path: pathlib.Path, working_dir: pathlib.Path, release_title_versionless: str, release_title: str, workspace_title: str, chunk_size: int, verbose: bool = False) -> tuple[str] Upload a file to a CDCS workspace. :param posix_rel_path: _description_ :type posix_rel_path: Path :param working_dir: Working directory of the project :type working_dir: Path, :param release_title_versionless: Name of release without version code :type release_title_versionless: str :param release_title: Name of release with version :type release_title: Str :param verbose: Print information :type verbose: bool, :param workspace_title: Name of the workspace. :type workspace_title: str, optional :param chunk_size: Size of upload in chunks. :type chunk_size: int :returns: * *blob_pid* -- blob_id * *nbytes* -- size of file in bytes .. py:method:: upload_release_record(release_record: dict, workspace_title: str) -> str Upload a release record. The record should be assigned a PID during the upload process. :param title: Title of the release with version code. :type title: str :param release_record: Releae record dictionary. :type release_record: dict :returns: PID of uploaded object. :rtype: str