jax-js
    Preparing search index...

    SentencePiece tokenizer.

    Supports both SentencePiece Unigram and SentencePiece BPE model files. The model type is detected from the trainer spec at construction time.

    Constructors

    Accessors

    • get bosToken(): number

      Get the beginning-of-sequence token ID.

      Returns number

    • get eosToken(): number

      Get the end-of-sequence token ID.

      Returns number

    • get modelType(): "unigram" | "bpe"

      Returns "unigram" | "bpe"

    • get unkToken(): number

      Get the unknown token ID.

      Returns number

    • get vocabSize(): number

      Get vocabulary size.

      Returns number

    Methods

    • Decode token IDs back to text.

      Parameters

      • tokens: number[]

      Returns string

    • Encode text into token IDs.

      Parameters

      • text: string

      Returns number[]