bugfinder.features.extraction.word2vec.model
- class bugfinder.features.extraction.word2vec.model.Word2VecModel(dataset, deprecation_warning=None)
Bases:
AbstractProcessing
Class responsible for the training of the Word2Vec model using the corpus generated by the tokenization of the dataset.
- algorithm = 1
- execute(name, **kwargs)
Run the processing. This function reads each tokenized file in the dataset, generates a corpus, trains the model and saves it.
- Parameters
name (str) – This parameter will be the name of the model saved in disk.
- get_token_list()
Reads each file, retrieves the tokens from it and concatenates them in a single list which will be the corpus.
- Returns
list containing all the tokens in the dataset, concatenated to form the corpus
- Return type
token_list
- min_count = 1
- seed = None
- tokens = {}
- window_dim = 5
- word_dim = 50
- workers = 4