bugfinder.features.extraction.word2vec.model

class bugfinder.features.extraction.word2vec.model.Word2VecModel(dataset, deprecation_warning=None)

Bases: AbstractProcessing

Class responsible for the training of the Word2Vec model using the corpus generated by the tokenization of the dataset.

algorithm = 1
execute(name, **kwargs)

Run the processing. This function reads each tokenized file in the dataset, generates a corpus, trains the model and saves it.

Parameters

name (str) – This parameter will be the name of the model saved in disk.

get_token_list()

Reads each file, retrieves the tokens from it and concatenates them in a single list which will be the corpus.

Returns

list containing all the tokens in the dataset, concatenated to form the corpus

Return type

token_list

min_count = 1
seed = None
tokens = {}
window_dim = 5
word_dim = 50
workers = 4