rmellipse.workflows.archive_interface ===================================== .. py:module:: rmellipse.workflows.archive_interface Exceptions ---------- .. autoapisummary:: rmellipse.workflows.archive_interface.ReleaseRecordNotFoundError Classes ------- .. autoapisummary:: rmellipse.workflows.archive_interface.ArchiveInterface rmellipse.workflows.archive_interface.CDCSArchive rmellipse.workflows.archive_interface.FileSystemArchive Functions --------- .. autoapisummary:: rmellipse.workflows.archive_interface.is_url rmellipse.workflows.archive_interface.chunked_sha rmellipse.workflows.archive_interface.get_credentials rmellipse.workflows.archive_interface.get_interface Module Contents --------------- .. py:exception:: ReleaseRecordNotFoundError Bases: :py:obj:`Exception` Raise when a record is asked for but doesn't exist. Initialize self. See help(type(self)) for accurate signature. .. !! processed by numpydoc !! .. py:function:: is_url(input_string) .. py:function:: chunked_sha(f: io.BytesIO, chunk: int = 2**25) Generate a SHA1 for a file in chunks. :Parameters: **f** : io.BytesIO .. **chunk** : int, optional How many bytes to chunk at a time, by default 2^20 (~1MB) :Returns: str sha1 hash as a string. .. !! processed by numpydoc !! .. py:function:: get_credentials(host: str, user: str = None, password: str = None) Get credentials for a given host and user. Prompts user for missing username or password. Stores credentials in the system when provided by prompt or as function arguments. :Parameters: **host** : str System host name, typically a URL. **user** : str, optional Username, by default None **password** : str, optional Password, by default None :Returns: user str password str .. !! processed by numpydoc !! .. py:function:: get_interface(host: str | pathlib.Path, user: str = None, password: str = None, resolve_relative_paths_to: str | pathlib.Path = Path.cwd()) -> ArchiveInterface Login to an archive, returns an ArchiveInterface. :Parameters: **host** : str | Path Path to the archive, url or file path. **user** : str, optional User name to the archive (if required) **password** : str, optional Passwords to the archive (if required). :Returns: ArchiveInterface Object that implements the ArchiveInterface. .. !! processed by numpydoc !! .. py:class:: ArchiveInterface(host: str | pathlib.Path, user: str = None, password: str = None) Bases: :py:obj:`abc.ABC` Interface for interacting through archives utilized by command line utility functions. Any archive should be written into this interface. :Parameters: **ABC** _description_ .. !! processed by numpydoc !! .. py:attribute:: host .. py:attribute:: user :value: None .. py:attribute:: get_release_records .. py:method:: download_release(record: dict, workspace: str, max_threads: int, release_cache_folder: str | pathlib.Path, download_chunk_bytes: int, progress_bar: bool = True, file_status: bool = True) Download a release from the archive to cache_folder. Generates worker threads to download blobs using the download_blob function. Provideds progress bars if requested. :Parameters: **record** : dict _description_ **max_threads** : int _description_ **progress_bar** : bool, optional _description_, by default True .. !! processed by numpydoc !! .. py:method:: download_blob(blob_pid: str, target_file: pathlib.Path, update_dict: dict, expected_size: int, release_record=dict, workspace=str, chunk_size=2**25) :abstractmethod: Stream a blob stored in the archive to your local computer. :Parameters: **blob_pid** : str PID of the blob. **target_file** : Path Target file in local path. **update_dict** : dict Empty dictionary, updated with the **expected_size** : int _description_ **chunk_size** : _type_, optional _description_, by default 2**25 :Returns: _type_ _description_ .. !! processed by numpydoc !! .. py:method:: process_and_upload_blob(posix_rel_path, working_dir, release_title_versionless, release_title, workspace_title, chunk_size, verbose=False) -> Tuple[str, int] :abstractmethod: Upload a blob to the archive. Returns the blob_PID and the number of bytes. :Parameters: **posix_rel_path** : _type_ _description_ **working_dir** : _type_ _description_ **release_title_versionless** : _type_ _description_ **release_title** : _type_ _description_ **workspace_title** : _type_ _description_ **chunk_size** : _type_ _description_ **verbose** : bool, optional _description_, by default False :Returns: blob_pid: str The PID of the blob (i.e. file) that was uploaded nbytes: int The size of the uploaded in bytes. .. !! processed by numpydoc !! .. py:method:: upload_blobs_and_update_mapping(release_title: str, release_title_versionless: str, workspace: str, project_mapping: dict, project_directory: str | pathlib.Path, max_threads: int, chunk_size: int, no_blobs: bool = False) Upload blobs to the archive and update the project mapping. The project mapping metadata for each blob is updated with information that is only determined at upload time (i.e. the PID, bytes). Should be called before upload release record. :Parameters: **project_mapping** : dict Project mapping dictionary. **project_directory** : str | Path Root directory of the project. **max_threads** : int Max threads for upload processes. Each blob gets its own process. **chunk_size** : int Max size of chunks for uploading **no_blobs** : bool, optional Dont upload blobs, by default False, for debugging purposes only. .. !! processed by numpydoc !! .. py:method:: upload_release_record(release_record: dict) -> str :abstractmethod: Upload a release record. The record should be assigned a PID during the upload process. :Parameters: **title** : str Title of the release with version code. **release_record** : dict Releae record dictionary. :Returns: str PID of uploaded object. .. !! processed by numpydoc !! .. py:class:: CDCSArchive(host: str, user: str, password: str = None) Bases: :py:obj:`ArchiveInterface` Archive Interface for a CDCS instance. .. !! processed by numpydoc !! .. py:attribute:: curator .. py:attribute:: supports_repeat_releases :value: True .. py:method:: get_release_records(title_versionless: str, version_expressions: list[str], workspace='Global Public Workspace') -> dict Get release records matching a version code. :Parameters: **title_versionless** : str Title of the release with out the version. **version_expressions** : str Version code string (i.e [">=0.1.0","<0.2.0] or ["==0.1.2"]) **workspace** : str, optional Workspace to look through, by default "Global Public Workspace" :Returns: dict Dictionary of release records, keys are the {title}-v{version} format sorted from oldest to most recent version following semver. .. !! processed by numpydoc !! .. py:method:: download_blob(blob_pid: str, target_file: str | pathlib.Path, update_dict: dict, expected_size: int, release_record=dict, workspace=str, chunk_size: int = 2**25) Download a blob from a PID to it's a local file. :Parameters: **blob_pid** : str PID of blob, should be url. **target_file** : str | Path Target path to download to. **update_dict** : dict Dictionary with {'size':0,'finished':false}, used to monitor the download process when spun up into threads. **expected_size** : int Expected size of the blob in bytes. **chunk_size** : int, optional Chunk size for downloading, by default 2**25 .. !! processed by numpydoc !! .. py:method:: process_and_upload_blob(posix_rel_path: pathlib.Path, working_dir: pathlib.Path, release_title: str, release_title_versionless: str, workspace_title: str, chunk_size: int, verbose: bool = False) -> tuple[str] Upload a file to a CDCS workspace. :Parameters: **posix_rel_path** : Path _description_ **working_dir** : Path Working directory of release, from with all paths are relative. **release_title: str** Name of the release with version code. Isn't required for the CDCS archive, but included for interface compatability. **release_title_versionless: str** Name of the release without the version code. **chunk_size** : int Chunking size. Not used **workspace_title: str** Name of the workspace to upload to. :Returns: blob_pid: blob_id nbytes: size of file in bytes .. !! processed by numpydoc !! .. py:method:: upload_release_record(release_record: dict, workspace: str) -> str Upload a release record. The record should be assigned a PID during the upload process. :Parameters: **title** : str Title of the release with version code. **release_record** : dict Releae record dictionary. :Returns: str PID of uploaded object. .. !! processed by numpydoc !! .. py:class:: FileSystemArchive(host: str, user: str) Bases: :py:obj:`ArchiveInterface` Archive Interface for a file system archive. File system archive, which is an archive stored just in a directory with a standard layout. Access permissions are based on file system persmissions in the system. .. !! processed by numpydoc !! .. py:attribute:: archive_path .. py:attribute:: host :value: '' .. py:method:: get_release_records(title_versionless: str, version_expressions: list[str], workspace='Global Public Workspace') -> dict Get release records matching a version code. :Parameters: **title_versionless** : str Title of the release with out the version. **version_expressions** : str Version code string (i.e [">=0.1.0","<0.2.0] or ["==0.1.2"]) **workspace** : str, optional Workspace to look through, by default "Global Public Workspace" :Returns: dict Dictionary of release records, keys are the {title}-v{version} format sorted from oldest to most recent version following semver. .. !! processed by numpydoc !! .. py:method:: download_blob(blob_pid: str, target_file: str | pathlib.Path, update_dict: dict, expected_size: int, release_record: dict, workspace: str, chunk_size: int = 2**25) Download a blob from a PID to it's a local file. :Parameters: **blob_pid** : str PID of blob, should be url. **target_file** : str | Path Target path to download to. **update_dict** : dict Dictionary with {'size':0,'finished':false}, used to monitor the download process when spun up into threads. **expected_size** : int Expected size of the blob in bytes. **release_record: dict** Full release record. **workspace: str,** Workspace to look through **chunk_size** : int, optional Chunk size for downloading, by default 2**25 .. !! processed by numpydoc !! .. py:method:: process_and_upload_blob(posix_rel_path: pathlib.Path, working_dir: pathlib.Path, release_title_versionless: str, release_title: str, workspace_title: str, chunk_size: int, verbose: bool = False) -> tuple[str] Upload a file to a CDCS workspace. :Parameters: **posix_rel_path** : Path _description_ **working_dir** : Path, Working directory of the project **release_title_versionless: str** Name of release without version code **release_title: Str** Name of release with version **verbose** : bool, Print information **workspace_title: str, optional** Name of the workspace. **chunk_size: int** Size of upload in chunks. :Returns: blob_pid: blob_id nbytes: size of file in bytes .. !! processed by numpydoc !! .. py:method:: upload_release_record(release_record: dict, workspace_title: str) -> str Upload a release record. The record should be assigned a PID during the upload process. :Parameters: **title** : str Title of the release with version code. **release_record** : dict Releae record dictionary. :Returns: str PID of uploaded object. .. !! processed by numpydoc !!