{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n# Defining Array Schema\n\nArray schema are json documents that act as metadata describing the\nstructure of array-like data. The submodule provides some python bindings\nfor generating the schema in python.\n\nSee\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Make a Registry\n\nFirst make a registry to store a collection of schema you want to\nuse. We will also import some example functions including in the\nrmellipse package, as well as the json package to format the\njson documents.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "import rmellipse.arrschema as arrschema\nimport rmellipse.arrschema.examples as examples\nimport xarray as xr\nimport numpy as np\nimport json\n\nREGISTRY = arrschema.ArrayClassRegistry()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will define a basic schema of an arry with arbitrary shape\nmade of floats called \"float_zeros\". When the schema is made with\nthe arrschema function a uid is automatically added. Then we\nadd it to the registry.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# a schema is just a dictionary that specifies\n# what the expected structure of a dataset is\nfloat_zeros_schema = arrschema.ArraySchema(\n name='float_zeros', shape=(...,), dims=(...,), dtype=float\n)\n\n\n# you can use that dictionary to generate a\n# python class object within the context of a registry\nclass FloatZeros(arrschema.AnnotatedArray):\n registry = REGISTRY\n schema = float_zeros_schema\n\n\nprint(FloatZeros)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Validation\n\nRequires that the array conforms to the\nAnnotatedArray subclass specification.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# cast an array into the type to check it conforms\n# to the specification,\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\nmy_data = FloatZeros(my_data)\nmy_data.validate()\n\n# this will fail because the dtype isn't correct\nmy_data_fails = xr.DataArray(np.zeros((4, 4), dtype=complex))\ntry:\n my_data_fails = FloatZeros(my_data_fails)\n my_data_fails.validate()\nexcept arrschema.ValidationError as e:\n print('caught error: \\n', e)\n\n# succesfully validated datasets store the associated schema in the metadata\nprint(my_data.attrs['ARRSCHEMA'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loaders and Savers\n\nArray schema provide a system for organizing and envoking different\nencoding and decoding functions for various schema. Often times\nwe work with a single in memory representation of a particular object,\nbut may need to be able to read/write to and from multiple different\nmethods of storing that data on disc. We do this by associating\na loader/saver with a function using a module spec, and related\nfile extensions.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# supply the module pathspec to the function\nREGISTRY.add_loader(\n 'rmellipse.arrschema.examples:load_group_saveable',\n ['.h5', '.hdf5'],\n loader_type='group_saveable',\n schema=FloatZeros.schema,\n)\n\nREGISTRY.add_saver(\n 'rmellipse.arrschema.examples:save_group_saveable',\n ['.h5', '.hdf5'],\n saver_type='group_saveable',\n schema=FloatZeros.schema,\n)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# Encoding and decoding functions are expected have function signatures\n# that look like ``fun(path, data, *args, **kwargs)``\n\nmy_data.save('example.h5', 'my_data_name')\nmy_data_read = FloatZeros.load('example.h5', group='my_data_name')\nprint(my_data_read)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conversion\n\nConverters can be assigned as well. Converters are functions that take in\na single data set with an associated schema, and returns a new data set\nwith an associated schema.\n\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": false }, "outputs": [], "source": [ "# define a new format we care about\n\nint_zeros_schema = arrschema.ArraySchema(\n name='int_zeros', shape=(...,), dims=(...,), dtype=int\n)\n\n\nclass IntZeros(arrschema.AnnotatedArray):\n registry = REGISTRY\n schema = int_zeros_schema\n\n\nREGISTRY.add_converter(\n 'rmellipse.arrschema.examples:convert_float_to_int',\n input_schema=FloatZeros.schema,\n output_schema=IntZeros.schema,\n)\n\nconverted = my_data.convert_to(IntZeros)\n\nprint(converted)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 0 }