Skip to content

nestor.datasets

Helper function to load excavator toy dataset.

Hodkiewicz, M., and Ho, M. (2016) "Cleaning historical maintenance work order data for reliability analysis" in Journal of Quality in Maintenance Engineering, Vol 22 (2), pp. 146-163.

BscStartDate Asset OriginalShorttext PMType Cost
initialization of MWO which excavator this MWO concerns (A, B, C, D, E) natural language description of the MWO repair (PM01) or replacement (PM02) MWO expense (AUD)

Parameters:

Name Type Description Default
cleaned bool

whether to return the original dataset (False) or the dataset with keyword extraction rules applied (True), as described in Hodkiewicz and Ho (2016)

False

Returns:

Type Description
pandas.DataFrame

raw data for use in testing nestor and subsequent workflows

Source code in nestor/datasets/excavators.py
def load_excavators(cleaned=False):
    """
    Helper function to load excavator toy dataset.

    Hodkiewicz, M., and Ho, M. (2016)
    "Cleaning historical maintenance work order data for reliability analysis"
    in Journal of Quality in Maintenance Engineering, Vol 22 (2), pp. 146-163.

    BscStartDate| Asset | OriginalShorttext | PMType | Cost
    --- | --- | --- | --- | ---
    initialization of MWO | which excavator this MWO concerns (A, B, C, D, E)| natural language description of the MWO| repair (PM01) or replacement (PM02) | MWO expense (AUD)

    Args:
        cleaned (bool): whether to return the original dataset (False) or the dataset with
            keyword extraction rules applied (True), as described in Hodkiewicz and Ho (2016)

    Returns:
        pandas.DataFrame: raw data for use in testing nestor and subsequent workflows

    """

    csv_filename = _download_excavators(cleaned=cleaned)

    df = (
        pd.read_csv(
            csv_filename, parse_dates=["BscStartDate"], sep=",", escapechar="\\"
        )
        .astype(
            {
                "Asset": AssetType,
                "OriginalShorttext": pd.StringDtype(),
                "PMType": PMType,
                "Cost": float,
            }
        )
        .rename_axis("ID")
    )

    return df