Class PennTreebankText

java.lang.Object
ai.djl.training.dataset.RandomAccessDataset
ai.djl.basicdataset.nlp.TextDataset
ai.djl.basicdataset.nlp.PennTreebankText
All Implemented Interfaces:
ai.djl.training.dataset.Dataset

public class PennTreebankText extends TextDataset
The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    A builder to construct a PennTreebankText .

    Nested classes/interfaces inherited from class ai.djl.basicdataset.nlp.TextDataset

    TextDataset.Sample

    Nested classes/interfaces inherited from class ai.djl.training.dataset.RandomAccessDataset

    ai.djl.training.dataset.RandomAccessDataset.BaseBuilder<T extends ai.djl.training.dataset.RandomAccessDataset.BaseBuilder<T>>

    Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset

    ai.djl.training.dataset.Dataset.Usage
  • Field Summary

    Fields inherited from class ai.djl.basicdataset.nlp.TextDataset

    manager, mrl, prepared, samples, sourceTextData, targetTextData, usage

    Fields inherited from class ai.djl.training.dataset.RandomAccessDataset

    dataBatchifier, device, labelBatchifier, limit, pipeline, prefetchNumber, sampler, targetPipeline
  • Method Summary

    Modifier and Type
    Method
    Description
    protected long
    Creates a builder to build a PennTreebankText.
    ai.djl.training.dataset.Record
    get(ai.djl.ndarray.NDManager manager, long index)
    void
    prepare(ai.djl.util.Progress progress)
    Prepares the dataset for use with tracked progress.

    Methods inherited from class ai.djl.basicdataset.nlp.TextDataset

    getProcessedText, getRawText, getSamples, getTextEmbedding, getVocabulary, preprocess

    Methods inherited from class ai.djl.training.dataset.RandomAccessDataset

    getData, getData, getData, getData, newSubDataset, newSubDataset, randomSplit, size, subDataset, subDataset, subDataset, subDataset, toArray

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface ai.djl.training.dataset.Dataset

    matchingTranslatorOptions, prepare
  • Method Details

    • builder

      public static PennTreebankText.Builder builder()
      Creates a builder to build a PennTreebankText.
      Returns:
      a new PennTreebankText.Builder object
    • get

      public ai.djl.training.dataset.Record get(ai.djl.ndarray.NDManager manager, long index) throws IOException
      Specified by:
      get in class ai.djl.training.dataset.RandomAccessDataset
      Throws:
      IOException
    • availableSize

      protected long availableSize()
      Specified by:
      availableSize in class ai.djl.training.dataset.RandomAccessDataset
    • prepare

      public void prepare(ai.djl.util.Progress progress) throws IOException, ai.djl.modality.nlp.embedding.EmbeddingException
      Prepares the dataset for use with tracked progress.
      Parameters:
      progress - the progress tracker
      Throws:
      IOException - for various exceptions depending on the dataset
      ai.djl.modality.nlp.embedding.EmbeddingException