Adding Inputs and Outputs#

Overview#

In the Hello World Tutorial, you created a plugin with one task and ran it through an entrypoint and experiment. Now, you will extend that idea to include task inputs, task outputs, and entrypoint parameters.

This will let you parameterize input parameters for a plugin task when running a job. After running multiple jobs, you will compare outputs and observe how different sample sizes impact the observed sample mean’s relationship to its underlying distribution.

Prerequisites#

Before starting, ensure you have set up Dioptra and have created a User and Queue.

Workflow#

Step 1: Create a New Type#

The Plugin Task you will define will output a Numpy array. Before registering this output in your plugin task, you need to define the Numpy array type in Dioptra.

  1. Navigate to the Plugin Parameters tab.

  2. Click Create.

  3. Enter the name: NumpyArray. Add an optional short description.

  4. Click Submit.

Screenshot of creating a new type called NumpyArray.

Creating a new Parameter Type in the GUI.#

Learn More

Step 2: Create the Plugin#

You will now create a new plugin with one task. This task accepts four parameters: * random_seed * sample_size * mean * var

The function samples a normal distribution, logs the mean, and then returns the array.

  1. Go to the Plugins tab and click Create Plugin.

  2. Name it sample_normal and add a short description.

  3. In the plugin list, click the row corresponding to the Sample Normal Plugin you just created to go to the Plugin Files table.

  4. Add a new Python file named sample_normal.py.

  5. Paste the code below into the editor.

sample_normal.py

import numpy as np
import structlog
from dioptra import pyplugs

LOGGER = structlog.get_logger()

# Helper function - not registered as a Dioptra plugin task
def sqrt(num:float)->float:
  return np.sqrt(num)

@pyplugs.register
def sample_normal_distribution_print_mean(
              random_seed: int = 0,
              mean: float = 0,
              var: float = 1,
              sample_size: int = 100) -> np.ndarray :

  rng = np.random.default_rng(seed=random_seed)
  std_dev = sqrt(var)
  draws = rng.normal(loc=mean, scale=std_dev, size=sample_size)
  draws_mean = np.mean(draws)
  diff = np.abs(mean-draws_mean)
  pct = 100*diff/mean

  LOGGER.info(
      "Plugin 2 - "
      f"The mean value of the draws was {draws_mean:.4f}, "
      f"which was {diff:.4f} different from the passed-in mean ({pct:.2f}%). "
      "[Passed-in Parameters]"
      f"Seed: {random_seed}; "
      f"Mean: {mean}; "
      f"Variance: {var}; "
      f"Sample Size: {sample_size};"
      )

  return draws

Step 3: Register the Task#

Unlike in the Hello World simple logging function task with no inputs/outputs, this Task requires us to register the inputs and outputs along with their Types. Using Dioptra’s autodetect functionality will help here.

  1. Click Import Function Tasks (top right of the editor) to auto-detect functions from sample_normal.py.

Screenshot of the "Import Function Tasks" button.

Using “Import Tasks” to automatically detect and register plugin tasks.#

Note

Input and output types are auto-detected from Python type hints and the return annotation (->).

  1. You may see an error under Plugin Tasks: Resolve missing type for the np_ndarray output. This is because the custom type is called NumpyArray, not np_ndarray, which is the default name inferred from the return type.

Screenshot of a missing type error in Plugin Task registration.

The output type was detected as np_ndarray, but the type you created is called NumpyArray.#

Fix the mismatched param type:

  • Click the output badge.

  • Set Name to output and Type to NumpyArray.

Once you’ve corrected the errors, save the plugin file by clicking submit.

Learn More

  • Plugins - Syntax reference for creating plugins

Step 4: Create Entrypoint Parameters#

You will create an entrypoint that accepts a parameter, allowing you to change the sample size passed to this Function Task dynamically at Job runtime.

  1. Navigate to Entrypoints and click Create Entrypoint.

  2. Name it sample_normal_ep and add a short description.

  3. Attach the tensorflow-cpu Queue to the Entrypoint.

  4. In the Entrypoint Parameters window, click Add Parameter:

    • Name: sample_size

    • Type: int

    • Default value: 100

Screenshot of adding the sample_size parameter to Entrypoint 2.

Creating an entrypoint parameter allows the parameter to be changed during a job run.#

Step 5: Define Task Graph#

Now add the task to the graph and bind the parameters.

  1. In the Task Plugins window, select sample_normal.

  2. Click Add to Task Graph. This auto-populates the YAML with default structure.

Screenshot of adding sample_normal to Entrypoint 2.

Using “Add To Task Graph” to automatically populate the YAML editor.#

  1. Edit the YAML to bind the parameters. Map sample_size to the entrypoint parameter ($sample_size) and hardcode the others to something reasonable (e.g. random_seed=0, mean=10, var=10)

Screenshot of editing parameters in Entrypoint 2 task graph.

Binding the task parameters in the YAML editor.#

  1. Ensure the task graph is valid by clicking Validate Inputs. Assuming all Types are set appropriately for inputs / outputs, this should pass.

  2. Click Submit Entrypoint to save.

Learn More

See Task Graph Syntax for detailed reference documentation on Task Graph YAML syntax

Step 6: Create an Experiment and Run Jobs#

You will create an Experiment and then run multiple Jobs within it using different parameters.

  1. Navigate to the Experiments tab. Create a new Experiment called Sample Normal.

  2. In the Entrypoints list, add the sample_normal_ep Entrypoint.

  3. Click Submit Experiment, then click the row corresponding to that Experiment.

  4. Click Create in the Jobs table.

  5. Select sample_normal_ep for the Entrypoint and tensorflow-cpu for the queue.

Submit a high sample size job:

  1. Set the sample_size parameter to 10000. Add a Job description.

  2. Click Submit Job.

Screenshot of running Entrypoint 2 with sample_size=10000.

Setting the sample size parameter for a job to 10,000.#

Submit a low sample size job:

  1. Create a second job using sample_normal_ep, but this time leave sample_size at the default 100.

Screenshot showing multiple jobs created with different sample sizes.

Jobs queue, start, and finish.#

Step 7: Inspect Results#

Once the jobs finish, inspect the logs for each.

Job with Sample Size 100:

Log Output (Small Sample)

Plugin 2 - The mean value of the draws was 10.2565, which was 0.2565 different from the passed-in mean (2.56%). [Passed-in Parameters]Seed: 0; Mean: 10; Variance: 10; Sample Size: 100

Job with Sample Size 10,000:

Log Output (Large Sample)

Plugin 2 - The mean value of the draws was 9.9971, which was 0.0029 different from the passed-in mean (0.03%). [Passed-in Parameters]Seed: 0; Mean: 10; Variance: 10; Sample Size: 100000

Notice that the sample mean was much closer to the distribution mean when the sample size was larger.

Note

This experiment is a simple illustration of the Law of Large Numbers: as the sample size increases, the sample mean tends to get closer to the population mean.

Conclusion#

You now know how to:

Next, you’ll chain multiple tasks together into a single workflow.