defload_data(root_dir:Optional[str]=None,seq_length:int=64)->Tuple[NumpyDataset,NumpyDataset,NumpyDataset,List[str]]:"""Load and return the Penn TreeBank dataset. Args: root_dir: The path to store the downloaded data. When `path` is not provided, the data will be saved into `fastestimator_data` under the user's home directory. seq_length: Length of data sequence. Returns: (train_data, eval_data, test_data, vocab) """home=str(Path.home())ifroot_dirisNone:root_dir=os.path.join(home,'fastestimator_data','PennTreeBank')else:root_dir=os.path.join(os.path.abspath(root_dir),'PennTreeBank')os.makedirs(root_dir,exist_ok=True)train_data_path=os.path.join(root_dir,'ptb.train.txt')eval_data_path=os.path.join(root_dir,'ptb.valid.txt')test_data_path=os.path.join(root_dir,'ptb.test.txt')files=[(train_data_path,'https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.train.txt'),(eval_data_path,'https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.valid.txt'),(test_data_path,'https://raw.githubusercontent.com/wojzaremba/lstm/master/data/ptb.test.txt')]texts=[]fordata_path,download_linkinfiles:ifnotos.path.exists(data_path):# Downloadprint("Downloading data: {}".format(data_path))wget.download(download_link,data_path,bar=bar_custom)text=[]withopen(data_path,'r')asf:forlineinf:text.extend(line.split()+['<eos>'])texts.append(text)# Build dictionary from training datavocab=sorted(set(texts[0]))word2idx={u:ifori,uinenumerate(vocab)}#convert word to index and split the sequences and discard the last incomplete sequencedata=[[word2idx[word]forwordintext[:-(len(text)%seq_length)]]fortextintexts]x_train,x_eval,x_test=[np.array(d).reshape(-1,seq_length)fordindata]train_data=NumpyDataset(data={"x":x_train})eval_data=NumpyDataset(data={"x":x_eval})test_data=NumpyDataset(data={"x":x_test})returntrain_data,eval_data,test_data,vocab