PFHub: The Phase Field Community Hub

How to add a new benchmark result manually. However, in most cases it is easier to use the automated upload form.

Overview

The following are instructions for adding benchmark results to the benchmark tables and charts. Please feel free to upload benchmark results via pull requests on GitHub. More benchmark results will greatly improve the utility of the website and encourage community collaboration.

Each benchmark result is stored in a YAML file called meta.yaml in a separate directory in _data/simulations. A YAML file is a minimal, human readable syntax for structured data. The meta.yaml file stores the meta data for only one benchmark result and a new directory is required for each new benchmark result.

How to Upload

To record a new benchmark result, use the following workflow.

Fork the website repository.
Edit the repository by adding a new directory to _data/simulations and create meta.yaml in the directory. The name of the directory becomes the name of the benchmark result on the website so try to use a descriptive name for the directory.
Fill out the meta.yaml using the schema outlined below. This is a text-based summary of the benchmark problem, your implementation and the hardware used to execute it, and links to data displayed on the website.
Submit a pull request for the new meta.yaml. At this stage the website test suite will check the meta.yaml against the schema. The website developer can then work with the benchmark uploader to refine the meta.yaml so that all the data associated with the benchmark result is available to be displayed on the website.

Minimal Example of a YAML Benchmark File

Each YAML description of a specific benchmark result contains the following three parts:

benchmark: specify the benchmark problem and version you have implemented,
metadata: summarize the runtime environment, software and hardware, used to produce this result and
data: capture salient outputs from the benchmark result, particularly the free energy evolution to be displayed on the website

The following is the minimal description of a benchmark result with relevant comments. The definitive archetype resides at _data/simulations/example/example.yaml. To understand the YAML syntax consult either the Ansible documentation for a simple overview or the YAML site for a more in depth description.

---
# miminal example with the required fields
benchmark:
  # Refer to the problem definition for appropriate value.
  id: 1a    # number+letter, from problem description
  version: 1    # number, from problem description

metadata:
  # Describe the runtime environment, hardware and software
  summary: concise description of this contribution    #
  author: name    # preferably yours
  email: "name@organization"    # in quotes
  timestamp: "Day, DD MM YYYY HH:MM:SS -ZONE"    #, e.g. 'date -R' on Linux or any valid timestamp
  hardware:    #
    # relevant details of your machine or cluster
    architecture: i686   # architecture of the environment
    cores: 6    # number actually used if less than total available
  software:    #
    # software framework your application is built upon, from the (website)[]
    name: name    # all lower-case, e.g. fipy or moose or prisms, etc.

data:
  # Values for use in tables, charts, galleries, etc.
  # Use Vega standard to help generate graphics directly; see
  # https://github.com/vega/vega/wiki/Data and
  # https://vega.github.io/vega-lite/docs/data.html.
  # Broadly, a list of key-value pairs defined minimally with
  # two keys, 'name' and 'values', to help the parser determine
  # where these data belong on the final site. If 'values' are
  # multiply defined, indent and specify keys 'time' for execution time
  # and 'value' for appropriate datum.
  - name: run_time
    # wall time, in seconds, when specified execution-times were reached
    values:
      - sim_time: 0.0
        time: 0.0
      - sim_time: 2.0
        time: 1.0
      - sim_time: 8.0
        time: 2.0
  - name: memory_usage
    values: 27232    # peak, in KB
  - name: free_energy
    url: https://somewhere/data.csv
    format:
      type: csv
      parse:
        free_energy: number
        time: number

If you would like to submit additional information, each of the blocks in the example admits a details: block. This is currently not parsed for the website, but may be of use to other users aor for future reference.

The Schema (Layout of the YAML File)

Many examples can be found in _data/simulations and these can be used as templates. The complete schema is outlined in _data/simulations/example/schema.yaml. A meta.yaml file contains three sections: benchmark_id, metadata and data.

Benchmark

The benchmark section includes a id and a version. This is in anticipation of version changes to the benchmark problems. The current choices are 1a, 1b, 1c, 1d, 2a, 2b, 2c and 2d for the id value and either 0 or 1 for the version value.

benchmark:
  id: 1a
  version: 1

Metadata

The metadata section describes the details about the code being used, but not the outcomes (outcomes go in the data section). See _data/simulations/example/meta.yaml for all possible fields in the metadata section.

Note that the metadata.software.details section takes any number of

- name: a name
  values: any valid JSON

pairs. This section is open for adding any specific details about the benchmark result that are important but not included in other fields within the metadata section. The metadata.software.details section uses the Vega data spec as described in the next section.

Data

The data section consists of any number of name and values pairs. For example, the data section can describe the run time, the memory usage and data about the free energy at different time steps. Note that this is data not known before starting the execution. The format for the data section is the Vega data spec. The basic data model used by Vega is tabular data, similar to a spreadsheet or database table. Individual data sets are assumed to contain a collection of records (or “rows”), which may contain any number of named data attributes (fields, or “columns”). The url field can either link to JSON or CSV data currently, but we can extend the possible formats as the need arises.

For the charts, there must be a free_energy section with free_energy and time fields. Other required fields will be added as more details are displayed on the website. For example, the following is in _data/simulations/moose_1d_sta/meta.yaml.

- name: free_energy
  url: https://gist.githubusercontent.com/wd15/41e21ea1090057c42a59380d90367763/raw/a211864b3269e86eb63db6f3dd9167ed18b92d08/hackathon_p1_sphere_STA.csv
  format:
    type: csv
    parse:
      TotalEnergy: number
      time: number
  transform:
    - type: formula
      field: free_energy
      expr: datum.TotalEnergy
    - type: filter
      test: "datum.time > 1.0"
    - type: filter
      test: "datum.time < 1e6"

It describes the free_energy by generating free_energy and time fields from a CSV file. It describes how to parse the CSV file and filters time values that are either too large or too small.

Please read the Vega data spec for more details.

Automated Testing of Benchmark Uploads

The uploaded meta.yaml file is automatically tested to check that it is in compliance with the schema when the pull-request is submitted. The results of this check will appear on the pull-request page. Repairs to the meta.yaml may be necessary to pass the tests. If the tests all pass, the web site dev will need to check that the formatting and links work for displaying the charts and tables, which is not entirely automated by the test suite. Further repairs may be necessary at this stage.

Manual Result Upload