Package ai.djl.basicdataset.nlp
Class PennTreebankText
java.lang.Object
ai.djl.training.dataset.RandomAccessDataset
ai.djl.basicdataset.nlp.TextDataset
ai.djl.basicdataset.nlp.PennTreebankText
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset
The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal
(WSJ) collection of 98,732 stories for syntactic annotation (see here for details).
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from class ai.djl.basicdataset.nlp.TextDataset
TextDataset.SampleNested classes/interfaces inherited from class ai.djl.training.dataset.RandomAccessDataset
ai.djl.training.dataset.RandomAccessDataset.BaseBuilder<T extends ai.djl.training.dataset.RandomAccessDataset.BaseBuilder<T>>Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
ai.djl.training.dataset.Dataset.Usage -
Field Summary
Fields inherited from class ai.djl.basicdataset.nlp.TextDataset
manager, mrl, prepared, samples, sourceTextData, targetTextData, usageFields inherited from class ai.djl.training.dataset.RandomAccessDataset
dataBatchifier, device, labelBatchifier, limit, pipeline, prefetchNumber, sampler, targetPipeline -
Method Summary
Modifier and TypeMethodDescriptionprotected longstatic PennTreebankText.Builderbuilder()Creates a builder to build aPennTreebankText.ai.djl.training.dataset.Recordget(ai.djl.ndarray.NDManager manager, long index) voidprepare(ai.djl.util.Progress progress) Prepares the dataset for use with tracked progress.Methods inherited from class ai.djl.basicdataset.nlp.TextDataset
getProcessedText, getRawText, getSamples, getTextEmbedding, getVocabulary, preprocessMethods inherited from class ai.djl.training.dataset.RandomAccessDataset
getData, getData, getData, getData, newSubDataset, newSubDataset, randomSplit, size, subDataset, subDataset, subDataset, subDataset, toArrayMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface ai.djl.training.dataset.Dataset
matchingTranslatorOptions, prepare
-
Method Details
-
builder
Creates a builder to build aPennTreebankText.- Returns:
- a new
PennTreebankText.Builderobject
-
get
public ai.djl.training.dataset.Record get(ai.djl.ndarray.NDManager manager, long index) throws IOException - Specified by:
getin classai.djl.training.dataset.RandomAccessDataset- Throws:
IOException
-
availableSize
protected long availableSize()- Specified by:
availableSizein classai.djl.training.dataset.RandomAccessDataset
-
prepare
public void prepare(ai.djl.util.Progress progress) throws IOException, ai.djl.modality.nlp.embedding.EmbeddingException Prepares the dataset for use with tracked progress.- Parameters:
progress- the progress tracker- Throws:
IOException- for various exceptions depending on the datasetai.djl.modality.nlp.embedding.EmbeddingException
-