bugfinder.features.extraction.word2vec.model

class bugfinder.features.extraction.word2vec.model.Word2VecModel(dataset, deprecation_warning=None)

Class responsible for the training of the Word2Vec model using the corpus generated by the tokenization of the dataset.

execute(name, **kwargs)

Run the processing. This function reads each tokenized file in the dataset, generates a corpus, trains the model and saves it.

Parameters: name (str) – This parameter will be the name of the model saved in disk.

get_token_list()

Reads each file, retrieves the tokens from it and concatenates them in a single list which will be the corpus.

Returns: list containing all the tokens in the dataset, concatenated to form the corpus
Return type: token_list