{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\n# Defining Array Schema\n\nArray schema are json documents that act as metadata describing the\nstructure of array-like data. The submodule provides some python bindings\nfor generating the schema in python.\n\nSee\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Make a Registry\n\nFirst make a registry to store a collection of schema you want to\nuse. We will also import some example functions including in the\nrmellipse package, as well as the json package to format the\njson documents.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import rmellipse.arrschema as arrschema\nimport rmellipse.arrschema.examples as examples\nimport xarray as xr\nimport numpy as np\nimport json\n\nregistry = arrschema.ArrSchemaRegistry()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will define a basic schema of an arry with arbitrary shape\nmade of floats called \"float_zeros\". When the schema is made with\nthe arrschema function a uid is automatically added. Then we\nadd it to the registry.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "float_zeros = arrschema.arrschema(\n\tname='float_zeros', shape=(...,), dims=(...,), dtype=float\n)\n\nprint(json.dumps(float_zeros, indent=True))\n\nregistry.add_schema(float_zeros)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Validation\n\nCurrently validation is only implemented for xarray.DataArray objects\nand RMEmeas objects.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# the schema can be provided directly\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\narrschema.validate(my_data, schema=float_zeros)\n\n# reffered to by a registry and uid\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\narrschema.validate(my_data, schema=float_zeros)\n\n# reffered to by a registry and name\n# Names are not unique, this may fail if multiple\n# schemas share a name and is not recommended.\nmy_data = xr.DataArray(np.zeros((4, 4), dtype=float))\narrschema.validate(my_data, schema=float_zeros)\n\n# this will fail because the dtype isn't correct\nmy_data_fails = xr.DataArray(np.zeros((4, 4), dtype=complex))\ntry:\n\tarrschema.validate(my_data_fails, schema=float_zeros)\nexcept arrschema.ValidationError as e:\n\tprint(e)\n\n# validated datasets store the associated schema in the metadata\nprint(my_data.attrs)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Loaders and Savers\n\nArray schema provide a system for organizing and envoking different\nencoding and decoding functions for various schema. Often times\nwe work with a single in memory representation of a particular object,\nbut may need to be able to read/write to and from multiple different\nmethods of storing that data on disc. We do this by associating\na loader/saver with a function using a module spec, and related\nfile extensions.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "registry.add_loader(\n\t'rmellipse.arrschema.examples:load_group_saveable',\n\t['.h5', '.hdf5'],\n\tloader_type='group_saveable',\n\tschema=float_zeros,\n)\n\nregistry.add_saver(\n\t'rmellipse.arrschema.examples:save_group_saveable',\n\t['.h5', '.hdf5'],\n\tsaver_type='group_saveable',\n\tschema=float_zeros,\n)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Encoding and decoding functions are expected have function signatures\n# that look like ``fun(path, data, *args, **kwargs)``\n\narrschema.save('example.h5', my_data, 'my-name', registry=registry)\nmy_data_read = arrschema.load(\n\t'example.h5', group='my-name', registry=registry, schema=float_zeros\n)\nprint(my_data_read)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Otherwise, you will have to explicitly declare what schema your\ndata corresponds to.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "arrschema.save('example.h5', my_data, 'my-name', registry=registry, schema=float_zeros)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Conversion\n\nConverters can be assigned as well. Converters are functions that take in\na single data set with an associated schema, and returns a new data set\nwith an associated schema.\n\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# define a new format we care about\n\nint_zeros = arrschema.arrschema(name='int_zeros', shape=(...,), dims=(...,), dtype=int)\n\nregistry.add_schema(int_zeros)\n\nregistry.add_converter(\n\t'rmellipse.arrschema.examples:convert_float_to_int',\n\tinput_schema=float_zeros,\n\toutput_schema=int_zeros,\n)\n\nconverted = arrschema.convert(my_data, registry=registry, output_schema=int_zeros)\narrschema.validate(converted, schema=int_zeros)\nprint(converted)"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.13.3"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}