Skip to content

FEA:add split token and generate related resource#59

Open
txy77 wants to merge 39 commits into
RUCAIBox:mainfrom
txy77:main
Open

FEA:add split token and generate related resource#59
txy77 wants to merge 39 commits into
RUCAIBox:mainfrom
txy77:main

Conversation

@txy77

@txy77 txy77 commented Oct 6, 2022

Copy link
Copy Markdown
Collaborator
  1. Update split token, generate word2vec, copy_mask, token2id, load pretrained model
  2. Fix some bugs in redial, inspired and tgredial model

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fix the bugs
  2. Retypeset the code

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the variable:

  1. processing -> processed_
  2. split_token -> split_text

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the name of the variable:

  1. processing -> processed_
  2. split_token -> split_text

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Add the version number of python package gensim

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. change the name of variable:
    crslabtokenizer -> Tokenizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change the wat of load config

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix the problem of build copy_mask.npy

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: Removed unnecessary word2vec

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: Complete the integration of tokenizer classes

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: problem of data type

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix: add special_token_idx to tokenizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: conv special_token_idx

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: variable name : CRS_Tokenizer -> crs_tokenizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: variable name : wordembedding -> word_embedding

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX : delete redundant variable : crstokennizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change variable name: BaseCrsTokenize -> BaseTokenizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: change as_tensor function

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: seperate the word2vec & copy_mask from dictionary

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: delete npy_dict

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: delete npy_dict

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: bert_tokenize -> BertToeknizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: variable name
self.Tokenizer -> self.tokenizer

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIX: add copy_mask = None

@txy77 txy77 left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_word -> word_list
add return copy_mask

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant