ckanapi_harvesters.harvesters.file_formats package
Submodules
ckanapi_harvesters.harvesters.file_formats.csv_format module
The basic file format for DataStore: CSV
- class ckanapi_harvesters.harvesters.file_formats.csv_format.CsvFileFormat(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)
Bases:
FileFormatABC- append_allowed() bool
This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.
- append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file, appending it to the end of the file.
- append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.
- copy(dest=None)
- default_read_kwargs: dict = {'dtype': <class 'str'>, 'engine': 'python', 'keep_default_na': False, 'sep': None}
- default_write_kwargs: dict = {}
- read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) DataFrame | ListRecords
Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.
- read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) DataFrame | ListRecords
Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).
ckanapi_harvesters.harvesters.file_formats.file_format_abc module
File format base class
- class ckanapi_harvesters.harvesters.file_formats.file_format_abc.FileFormatABC(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)
Bases:
ABC- abstractmethod append_allowed() bool
This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.
- append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file, appending it to the end of the file.
- append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.
- abstractmethod copy(dest=None)
- default_read_kwargs: dict = {}
- default_write_kwargs: dict = {}
- extra_args: list | None
- options_string: str | None
- abstractmethod read_buffer_full(buffer: IOBase, fields: Dict[str, CkanField] | None) DataFrame | ListRecords
Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.
- abstractmethod read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) DataFrame | ListRecords
Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).
- read_kwargs: dict
- resource_attributes_from_file: CkanResourceInfo | None
- abstractmethod write_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file.
- abstractmethod write_in_memory(df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory.
- write_kwargs: dict
ckanapi_harvesters.harvesters.file_formats.file_format_init module
File format keyword selection
- ckanapi_harvesters.harvesters.file_formats.file_format_init.init_file_format_datastore(format: str, options_string: str = None, aux_read_fun_name: str = None, aux_write_fun_name: str = None) FileFormatABC
ckanapi_harvesters.harvesters.file_formats.json_format module
The basic file format for DataStore: JSON
- class ckanapi_harvesters.harvesters.file_formats.json_format.JsonFileFormat(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)
Bases:
FileFormatABCJSON file IO using pandas.read_json.
Reading by chunks is allowed in mode lines=True (reads file as one JSON object per line). In this case, use CLI arguments –allow-chunks –read-kwargs lines=True
Recommended: use with CLI argument –allow-chunks –read-kwargs orient=records,lines=True
NB: argument typ=”frame” cannot be overridden for read arguments.
- append_allowed() bool
This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.
- append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file, appending it to the end of the file.
- append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.
- copy(dest=None)
- default_read_kwargs: dict = {'lines': False, 'orient': 'records'}
- default_write_kwargs: dict = {'lines': True, 'orient': 'records'}
- read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) DataFrame | ListRecords
Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.
- read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) DataFrame | ListRecords
Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).
ckanapi_harvesters.harvesters.file_formats.shp_format module
Shapefile format support
- class ckanapi_harvesters.harvesters.file_formats.shp_format.DownloadedShapeFileConversion(*values)
Bases:
IntEnum- CsvWkb = 0
- ShapefileAsIs = 3
- ShapefileProjection = 2
- class ckanapi_harvesters.harvesters.file_formats.shp_format.ShapeFileFormat(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)
Bases:
FileFormatABC- append_allowed() bool
This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.
- append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file, appending it to the end of the file.
- append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.
- copy(dest=None)
- default_read_kwargs: dict = {'encoding': 'utf-8'}
- default_write_kwargs: dict = {'encoding': 'utf-8'}
- downloaded_df_to_gdf(df: DataFrame, *, fields: Dict[str, CkanField] | None, context: str = None) None
- read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) DataFrame | ListRecords
Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.
- read_file(file_path: str | StringIO, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) DataFrame | ListRecords
Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).
ckanapi_harvesters.harvesters.file_formats.user_format module
The basic file format for DataStore: CSV
- class ckanapi_harvesters.harvesters.file_formats.user_format.UserFileFormat(options_string: str, *, df_read_fun: Callable[[Any], ListRecords | DataFrame] = None, df_write_fun: Callable[[ListRecords | DataFrame, Any], Any] = None, read_kwargs: dict = None, write_kwargs: dict = None)
Bases:
FileFormatABC- append_allowed() bool
This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.
- append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file, appending it to the end of the file.
- append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.
- copy(dest=None)
- read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) DataFrame | ListRecords
Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.
- read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) DataFrame | ListRecords
Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).
ckanapi_harvesters.harvesters.file_formats.user_format_prototypes module
User custom IO function examples
- ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_by_chunks(file_path_or_stream: str | IOBase, *, fields: Dict[str, CkanField] | None, allow_chunks: bool = True, params: UserFileFormat = None, **kwargs) Generator
Read a file/IO stream and return a DataFrame generator. This function implements a context manager that ensures the file is closed properly when it is released. The DataFrame generator must be defined in a sub-function, such as in this example.
Implementation prototype
file_handle = open(file_path_or_stream, ‘r’) try:
yield read_function_example_by_chunks_generator(file_handle)
- finally:
file_handle.close()
- ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_by_chunks_generator(file_handle) Generator[DataFrame | List[dict], None, None]
This is the function which properly yields DataFrame chunks. It is called by read_function_example_by_chunks.
- ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_df(file_path_or_stream: str | IOBase, *, fields: Dict[str, CkanField] | None, allow_chunks: bool = True, params: UserFileFormat = None, **kwargs) DataFrame | List[dict]
Read a file/IO stream and return a unique DataFrame. This case is the simplest implementation.
- ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_df_with_metadata(file_path_or_stream: str | IOBase, *, fields: Dict[str, CkanField] | None, allow_chunks: bool = True, params: UserFileFormat = None, **kwargs) Tuple[DataFrame | List[dict], CkanResourceInfo]
Read a file/IO stream and return a unique DataFrame. This case returns as well metadata which can be read from the file.
- ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.write_function_example(df: DataFrame | List[dict], file_path_or_buffer: str | IOBase, *, fields: Dict[str, CkanField] | None, append: bool = False, params: UserFileFormat = None, **kwargs) None
This function writes a DataFrame to the given file path.
ckanapi_harvesters.harvesters.file_formats.xls_format module
The basic file format for DataStore: XLS
- class ckanapi_harvesters.harvesters.file_formats.xls_format.ExcelFileFormat(options_string: str, *, read_kwargs: dict = None, write_kwargs: dict = None)
Bases:
FileFormatABCExcel file IO using pandas.read_excel. Supports xls, xlsx, xlsm, xlsb, odf, ods and odt formats.
Recommended: use with CLI argument –read-kwargs sheet_name=your_sheet (default is 0 i.e. the first sheet)
- append_allowed() bool
This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.
- append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) None
Write a DataFrame or ListRecords to a file, appending it to the end of the file.
- append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) bytes
This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.
- copy(dest=None)
- read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) DataFrame | ListRecords
Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.
- read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) DataFrame | ListRecords
Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).
Module contents
Classes to read specific file formats to load DataStore DataFrame/records from a system file