`nestor.datasets`

Helper function to load excavator toy dataset.

Hodkiewicz, M., and Ho, M. (2016) "Cleaning historical maintenance work order data for reliability analysis" in Journal of Quality in Maintenance Engineering, Vol 22 (2), pp. 146-163.

BscStartDate	Asset	OriginalShorttext	PMType	Cost
initialization of MWO	which excavator this MWO concerns (A, B, C, D, E)	natural language description of the MWO	repair (PM01) or replacement (PM02)	MWO expense (AUD)

Parameters:

Name	Type	Description	Default
`cleaned`	`bool`	whether to return the original dataset (False) or the dataset with keyword extraction rules applied (True), as described in Hodkiewicz and Ho (2016)	`False`

Returns:

Type	Description
`pandas.DataFrame`	raw data for use in testing nestor and subsequent workflows

Source code in nestor/datasets/excavators.py

def load_excavators(cleaned=False):
    """
    Helper function to load excavator toy dataset.

    Hodkiewicz, M., and Ho, M. (2016)
    "Cleaning historical maintenance work order data for reliability analysis"
    in Journal of Quality in Maintenance Engineering, Vol 22 (2), pp. 146-163.

    BscStartDate| Asset | OriginalShorttext | PMType | Cost
    --- | --- | --- | --- | ---
    initialization of MWO | which excavator this MWO concerns (A, B, C, D, E)| natural language description of the MWO| repair (PM01) or replacement (PM02) | MWO expense (AUD)

    Args:
        cleaned (bool): whether to return the original dataset (False) or the dataset with
            keyword extraction rules applied (True), as described in Hodkiewicz and Ho (2016)

    Returns:
        pandas.DataFrame: raw data for use in testing nestor and subsequent workflows

    """

    csv_filename = _download_excavators(cleaned=cleaned)

    df = (
        pd.read_csv(
            csv_filename, parse_dates=["BscStartDate"], sep=",", escapechar="\\"
        )
        .astype(
            {
                "Asset": AssetType,
                "OriginalShorttext": pd.StringDtype(),
                "PMType": PMType,
                "Cost": float,
            }
        )
        .rename_axis("ID")
    )

    return df