Package ai.djl.basicdataset.nlp
package ai.djl.basicdataset.nlp
Contains a library of built-in datasets for
Application.NLP.-
ClassesClassDescriptionThe
AmazonReviewdataset contains aApplication.NLP.SENTIMENT_ANALYSISset of reviews and their sentiment ratings.A builder to construct aAmazonReview.A text classification dataset contains questions from cooking.stackexchange.com and their associated tags on the site.A builder to construct aCookingStackExchange.GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.A builder to construct aGoEmotions.The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).A builder to construct aPennTreebankText.TheStanfordMovieReviewdataset contains aApplication.NLP.SENTIMENT_ANALYSISset of movie reviews and their sentiment ratings.A builder for aStanfordMovieReview.Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.A builder for aStanfordQuestionAnsweringDataset.TatoebaEnglishFrenchDatasetis a English-French machine translation dataset from The Tatoeba Project (http://www.manythings.org/anki/).A builder for aTatoebaEnglishFrenchDataset.TextDatasetis an abstract dataset that can be used for datasets for natural language processing where either the source or target are text-based data.TextDataset.Builder<T extends TextDataset.Builder<T>>Abstract Builder that helps build aTextDataset.A class storesTextDatasetsample information.A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13.A builder for aUniversalDependenciesEnglishEWT.The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.A builder to construct aWikiText2.