Class TextData

java.lang.Object
ai.djl.basicdataset.utils.TextData

public class TextData extends Object
TextData is a utility for managing textual data within a Dataset.

See TextDataset for an example.

  • Constructor Details

  • Method Details

    • getDefaultConfiguration

      public static TextData.Configuration getDefaultConfiguration()
      Returns a good default TextData.Configuration to use for the constructor with defaults.
      Returns:
      a good default TextData.Configuration to use for the constructor with defaults
    • preprocess

      public void preprocess(ai.djl.ndarray.NDManager manager, List<String> newTextData) throws ai.djl.modality.nlp.embedding.EmbeddingException
      Preprocess the textData into NDArray by providing the data from the dataset.
      Parameters:
      manager - the
      newTextData - the data from the dataset
      Throws:
      ai.djl.modality.nlp.embedding.EmbeddingException - if there is an error while embedding input
    • setTextProcessors

      public void setTextProcessors(List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)
      Sets the text processors.
      Parameters:
      textProcessors - the new textProcessors
    • setTextEmbedding

      public void setTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)
      Sets the textEmbedding to embed the data with.
      Parameters:
      textEmbedding - the textEmbedding
    • getTextEmbedding

      public ai.djl.modality.nlp.embedding.TextEmbedding getTextEmbedding()
      Gets the TextEmbedding used to embed the data with.
      Returns:
      the TextEmbedding
    • setEmbeddingSize

      public void setEmbeddingSize(int embeddingSize)
      Sets the embedding size.
      Parameters:
      embeddingSize - the embedding size
    • getVocabulary

      public ai.djl.modality.nlp.Vocabulary getVocabulary()
      Gets the DefaultVocabulary built while preprocessing the text data.
      Returns:
      the DefaultVocabulary
    • getEmbedding

      public ai.djl.ndarray.NDArray getEmbedding(ai.djl.ndarray.NDManager manager, long index)
      Gets the text embedding for the given index of the text input.
      Parameters:
      manager - the manager for the embedding array
      index - the index of the text input
      Returns:
      the NDArray containing the text embedding
    • getRawText

      public String getRawText(long index)
      Gets the raw textual input.
      Parameters:
      index - the index of the text input
      Returns:
      the raw text
    • getProcessedText

      public List<String> getProcessedText(long index)
      Gets the textual input after preprocessing.
      Parameters:
      index - the index of the text input
      Returns:
      the list of processed tokens
    • getSize

      public int getSize()
      Returns the size of the data.
      Returns:
      the size of the data