SentencePiece tokenizer.
Supports both SentencePiece Unigram and SentencePiece BPE model files. The model type is detected from the trainer spec at construction time.
Get the beginning-of-sequence token ID.
Get the end-of-sequence token ID.
Get the unknown token ID.
Get vocabulary size.
Decode token IDs back to text.
Encode text into token IDs.
Static
SentencePiece tokenizer.
Supports both SentencePiece Unigram and SentencePiece BPE model files. The model type is detected from the trainer spec at construction time.