Package ai.djl.basicdataset.utils
Class TextData
java.lang.Object
ai.djl.basicdataset.utils.TextData
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final classThe configuration for creating aTextDatavalue in aDataset. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic TextData.ConfigurationReturns a good defaultTextData.Configurationto use for the constructor with defaults.ai.djl.ndarray.NDArraygetEmbedding(ai.djl.ndarray.NDManager manager, long index) Gets the text embedding for the given index of the text input.getProcessedText(long index) Gets the textual input after preprocessing.getRawText(long index) Gets the raw textual input.intgetSize()Returns the size of the data.ai.djl.modality.nlp.embedding.TextEmbeddingGets theTextEmbeddingused to embed the data with.ai.djl.modality.nlp.VocabularyGets theDefaultVocabularybuilt while preprocessing the text data.voidpreprocess(ai.djl.ndarray.NDManager manager, List<String> newTextData) Preprocess the textData intoNDArrayby providing the data from the dataset.voidsetEmbeddingSize(int embeddingSize) Sets the embedding size.voidsetTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding) Sets the textEmbedding to embed the data with.voidsetTextProcessors(List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors) Sets the text processors.
-
Constructor Details
-
Method Details
-
getDefaultConfiguration
Returns a good defaultTextData.Configurationto use for the constructor with defaults.- Returns:
- a good default
TextData.Configurationto use for the constructor with defaults
-
preprocess
public void preprocess(ai.djl.ndarray.NDManager manager, List<String> newTextData) throws ai.djl.modality.nlp.embedding.EmbeddingException Preprocess the textData intoNDArrayby providing the data from the dataset.- Parameters:
manager- thenewTextData- the data from the dataset- Throws:
ai.djl.modality.nlp.embedding.EmbeddingException- if there is an error while embedding input
-
setTextProcessors
Sets the text processors.- Parameters:
textProcessors- the new textProcessors
-
setTextEmbedding
public void setTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding) Sets the textEmbedding to embed the data with.- Parameters:
textEmbedding- the textEmbedding
-
getTextEmbedding
public ai.djl.modality.nlp.embedding.TextEmbedding getTextEmbedding()Gets theTextEmbeddingused to embed the data with.- Returns:
- the
TextEmbedding
-
setEmbeddingSize
public void setEmbeddingSize(int embeddingSize) Sets the embedding size.- Parameters:
embeddingSize- the embedding size
-
getVocabulary
public ai.djl.modality.nlp.Vocabulary getVocabulary()Gets theDefaultVocabularybuilt while preprocessing the text data.- Returns:
- the
DefaultVocabulary
-
getEmbedding
public ai.djl.ndarray.NDArray getEmbedding(ai.djl.ndarray.NDManager manager, long index) Gets the text embedding for the given index of the text input.- Parameters:
manager- the manager for the embedding arrayindex- the index of the text input- Returns:
- the
NDArraycontaining the text embedding
-
getRawText
Gets the raw textual input.- Parameters:
index- the index of the text input- Returns:
- the raw text
-
getProcessedText
Gets the textual input after preprocessing.- Parameters:
index- the index of the text input- Returns:
- the list of processed tokens
-
getSize
public int getSize()Returns the size of the data.- Returns:
- the size of the data
-