csv_dataset
CSVDataset
¶
Bases: InMemoryDataset
A dataset from a CSV file.
CSVDataset reads entries from a CSV file, where the first row is the header. The root directory of the csv file may be accessed using dataset.parent_path. This may be useful if the csv contains relative path information that you want to feed into, say, an ImageReader Op.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_path |
str
|
The (absolute) path to the CSV file. |
required |
delimiter |
str
|
What delimiter is used by the file. |
','
|
include_if |
Union[None, Dict[str, Union[Any, Iterable[Any]]], str, Callable[..., bool]]
|
An optional filter specifying which rows should be included. This can be a dictionary, for example {'mode': 'train', 'type': [0, 1, 2]} in which case only rows which have a 'mode' column value of 'train' AND a 'type' column value of either 0, 1, or 2 will be included in this dataset. Alternatively, this can be a query string: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#the-query-method, for example 'type >= 1'. Finally, it could be a function whose argument(s) correspond to column names and whose output is a boolean (include_if=lambda mode: mode in ['eval', 'test']). This last option is very flexible, but also slower to execute. |
None
|
fill_na |
Optional[Any]
|
A fill value if data is missing. By default, this will follow pandas convention and use different types of NaNs. |
'pandas_default'
|
kwargs |
Other arguments to be passed through to pandas csv reader function. See the pandas docs for details: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html. |
{}
|