Class WikiText2

java.lang.Object
ai.djl.basicdataset.nlp.WikiText2
All Implemented Interfaces:
ai.djl.training.dataset.Dataset, ai.djl.training.dataset.RawDataset<Path>

public class WikiText2 extends Object implements ai.djl.training.dataset.RawDataset<Path>
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static final class 
    A builder to construct a WikiText2 .

    Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset

    ai.djl.training.dataset.Dataset.Usage
  • Method Summary

    Modifier and Type
    Method
    Description
    Creates a builder to build a WikiText2.
    Get data from the WikiText2 dataset.
    Iterable<ai.djl.training.dataset.Batch>
    getData(ai.djl.ndarray.NDManager manager)
    Fetches an iterator that can iterate through the Dataset.
    void
    prepare(ai.djl.util.Progress progress)
    Prepares the dataset for use with tracked progress.

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface ai.djl.training.dataset.Dataset

    getData, matchingTranslatorOptions, prepare
  • Method Details

    • builder

      public static WikiText2.Builder builder()
      Creates a builder to build a WikiText2.
      Returns:
      a new WikiText2.Builder object
    • prepare

      public void prepare(ai.djl.util.Progress progress) throws IOException
      Prepares the dataset for use with tracked progress.
      Specified by:
      prepare in interface ai.djl.training.dataset.Dataset
      Parameters:
      progress - the progress tracker
      Throws:
      IOException - for various exceptions depending on the dataset
    • getData

      public Iterable<ai.djl.training.dataset.Batch> getData(ai.djl.ndarray.NDManager manager) throws IOException, ai.djl.translate.TranslateException
      Fetches an iterator that can iterate through the Dataset. This method is not implemented for the WikiText2 dataset because the WikiText2 dataset is not suitable for iteration. If the method is called, it will directly return null.
      Specified by:
      getData in interface ai.djl.training.dataset.Dataset
      Parameters:
      manager - the dataset to iterate through
      Returns:
      an Iterable of Batch that contains batches of data from the dataset
      Throws:
      IOException
      ai.djl.translate.TranslateException
    • getData

      public Path getData() throws IOException
      Get data from the WikiText2 dataset. This method will directly return the whole dataset.
      Specified by:
      getData in interface ai.djl.training.dataset.RawDataset<Path>
      Returns:
      a Path object locating the WikiText2 dataset file
      Throws:
      IOException