Package ai.djl.basicdataset.nlp
Class WikiText2
java.lang.Object
ai.djl.basicdataset.nlp.WikiText2
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset,ai.djl.training.dataset.RawDataset<Path>
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from
the set of verified Good and Featured articles on Wikipedia.
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
ai.djl.training.dataset.Dataset.Usage -
Method Summary
Modifier and TypeMethodDescriptionstatic WikiText2.Builderbuilder()Creates a builder to build aWikiText2.getData()Get data from the WikiText2 dataset.Iterable<ai.djl.training.dataset.Batch>getData(ai.djl.ndarray.NDManager manager) Fetches an iterator that can iterate through theDataset.voidprepare(ai.djl.util.Progress progress) Prepares the dataset for use with tracked progress.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface ai.djl.training.dataset.Dataset
getData, matchingTranslatorOptions, prepare
-
Method Details
-
builder
Creates a builder to build aWikiText2.- Returns:
- a new
WikiText2.Builderobject
-
prepare
Prepares the dataset for use with tracked progress.- Specified by:
preparein interfaceai.djl.training.dataset.Dataset- Parameters:
progress- the progress tracker- Throws:
IOException- for various exceptions depending on the dataset
-
getData
public Iterable<ai.djl.training.dataset.Batch> getData(ai.djl.ndarray.NDManager manager) throws IOException, ai.djl.translate.TranslateException Fetches an iterator that can iterate through theDataset. This method is not implemented for the WikiText2 dataset because the WikiText2 dataset is not suitable for iteration. If the method is called, it will directly returnnull.- Specified by:
getDatain interfaceai.djl.training.dataset.Dataset- Parameters:
manager- the dataset to iterate through- Returns:
- an
IterableofBatchthat contains batches of data from the dataset - Throws:
IOExceptionai.djl.translate.TranslateException
-
getData
Get data from the WikiText2 dataset. This method will directly return the whole dataset.- Specified by:
getDatain interfaceai.djl.training.dataset.RawDataset<Path>- Returns:
- a
Pathobject locating the WikiText2 dataset file - Throws:
IOException
-