german_ner
get_sentences_and_labels
¶
Combines tokens into sentences and create vocab set for train data and labels.
For simplicity tokens with 'O' entity are omitted.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
path |
str
|
Path to the downloaded dataset file. |
required |
Returns:
Type | Description |
---|---|
Tuple[List[str], List[str], Set[str], Set[str]]
|
(sentences, labels, train_vocab, label_vocab) |
Source code in fastestimator\fastestimator\dataset\data\german_ner.py
load_data
¶
Load and return the GermEval dataset.
Dataset from GermEval 2014 contains 31,000 sentences corresponding to over 590,000 tokens from German wikipedia and News corpora. The sentence is encoded as one token per line with information provided in tab-seprated columns. Sourced from https://sites.google.com/site/germeval2014ner/data
Parameters:
Name | Type | Description | Default |
---|---|---|---|
root_dir |
Optional[str]
|
The path to store the downloaded data. When |
None
|
Returns:
Type | Description |
---|---|
Tuple[NumpyDataset, NumpyDataset, Set[str], Set[str]]
|
(train_data, eval_data, train_vocab, label_vocab) |