ckanapi_harvesters.harvesters.file_formats package

Submodules

ckanapi_harvesters.harvesters.file_formats.csv_format module

The basic file format for DataStore: CSV

class ckanapi_harvesters.harvesters.file_formats.csv_format.CsvFileFormat(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)

Bases: FileFormatABC

append_allowed() → bool: This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.

append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file, appending it to the end of the file.

append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.

copy(dest=None)

default_read_kwargs: dict = {'dtype': <class 'str'>, 'engine': 'python', 'keep_default_na': False, 'sep': None}

default_write_kwargs: dict = {}

read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) → DataFrame | ListRecords: Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.

read_by_chunks_allowed() → bool

read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) → DataFrame | ListRecords: Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).

write_file(df: DataFrame, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file.

write_in_memory(df: DataFrame, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory.

ckanapi_harvesters.harvesters.file_formats.file_format_abc module

File format base class

class ckanapi_harvesters.harvesters.file_formats.file_format_abc.FileFormatABC(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)

Bases: ABC

allow_chunks: bool

abstractmethod append_allowed() → bool: This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.

append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file, appending it to the end of the file.

append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.

chunk_size: int

abstractmethod copy(dest=None)

default_read_chunksize: int = 10000

default_read_kwargs: dict = {}

default_write_kwargs: dict = {}

extra_args: list | None

options_string: str | None

primary_key_from_file: List[str] | None

print_help_cli(display: bool = True) → str

abstractmethod read_buffer_full(buffer: IOBase, fields: Dict[str, CkanField] | None) → DataFrame | ListRecords: Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.

abstractmethod read_by_chunks_allowed() → bool

read_by_chunks_enabled(allow_chunks: bool = True) → bool

abstractmethod read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) → DataFrame | ListRecords: Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).

read_kwargs: dict

resource_attributes_from_file: CkanResourceInfo | None

abstractmethod write_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file.

abstractmethod write_in_memory(df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory.

write_kwargs: dict

ckanapi_harvesters.harvesters.file_formats.file_format_init module

File format keyword selection

ckanapi_harvesters.harvesters.file_formats.file_format_init.init_file_format_datastore(format: str, options_string: str = None, aux_read_fun_name: str = None, aux_write_fun_name: str = None) → FileFormatABC

ckanapi_harvesters.harvesters.file_formats.json_format module

The basic file format for DataStore: JSON

class ckanapi_harvesters.harvesters.file_formats.json_format.JsonFileFormat(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)

Bases: FileFormatABC

JSON file IO using pandas.read_json.

Reading by chunks is allowed in mode lines=True (reads file as one JSON object per line). In this case, use CLI arguments –allow-chunks –read-kwargs lines=True

Recommended: use with CLI argument –allow-chunks –read-kwargs orient=records,lines=True

NB: argument typ=”frame” cannot be overridden for read arguments.

append_allowed() → bool: This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.

append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file, appending it to the end of the file.

append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.

copy(dest=None)

default_read_kwargs: dict = {'lines': False, 'orient': 'records'}

default_write_kwargs: dict = {'lines': True, 'orient': 'records'}

read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) → DataFrame | ListRecords: Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.

read_by_chunks_allowed() → bool

read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) → DataFrame | ListRecords: Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).

write_file(df: DataFrame, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file.

write_in_memory(df: DataFrame, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory.

ckanapi_harvesters.harvesters.file_formats.shp_format module

Shapefile format support

class ckanapi_harvesters.harvesters.file_formats.shp_format.DownloadedShapeFileConversion(*values)

Bases: IntEnum

CsvWkb = 0

ShapefileAsIs = 3

ShapefileProjection = 2

class ckanapi_harvesters.harvesters.file_formats.shp_format.ShapeFileFormat(options_string: str = None, *, read_kwargs: dict = None, write_kwargs: dict = None)

Bases: FileFormatABC

append_allowed() → bool: This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.

append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file, appending it to the end of the file.

append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.

copy(dest=None)

default_read_kwargs: dict = {'encoding': 'utf-8'}

default_write_kwargs: dict = {'encoding': 'utf-8'}

downloaded_df_to_gdf(df: DataFrame, *, fields: Dict[str, CkanField] | None, context: str = None) → None

read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) → DataFrame | ListRecords: Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.

read_by_chunks_allowed() → bool

read_file(file_path: str | StringIO, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) → DataFrame | ListRecords: Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).

require_field_crs: bool

write_file(df: DataFrame, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file.

write_in_memory(df: DataFrame, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory.

ckanapi_harvesters.harvesters.file_formats.user_format module

The basic file format for DataStore: CSV

class ckanapi_harvesters.harvesters.file_formats.user_format.UserFileFormat(options_string: str, *, df_read_fun: Callable[[Any], ListRecords | DataFrame] = None, df_write_fun: Callable[[ListRecords | DataFrame, Any], Any] = None, read_kwargs: dict = None, write_kwargs: dict = None)

Bases: FileFormatABC

append_allowed() → bool: This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.

append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file, appending it to the end of the file.

append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.

copy(dest=None)

read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) → DataFrame | ListRecords: Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.

read_by_chunks_allowed() → bool

read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) → DataFrame | ListRecords: Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).

write_file(df: DataFrame, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file.

write_in_memory(df: DataFrame, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory.

ckanapi_harvesters.harvesters.file_formats.user_format_prototypes module

User custom IO function examples

ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_by_chunks(file_path_or_stream: str | IOBase, *, fields: Dict[str, CkanField] | None, allow_chunks: bool = True, params: UserFileFormat = None, **kwargs) → Generator

Read a file/IO stream and return a DataFrame generator. This function implements a context manager that ensures the file is closed properly when it is released. The DataFrame generator must be defined in a sub-function, such as in this example.

Implementation prototype

file_handle = open(file_path_or_stream, ‘r’) try:

yield read_function_example_by_chunks_generator(file_handle)

finally:: file_handle.close()

ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_by_chunks_generator(file_handle) → Generator[DataFrame | List[dict], None, None]: This is the function which properly yields DataFrame chunks. It is called by read_function_example_by_chunks.

ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_df(file_path_or_stream: str | IOBase, *, fields: Dict[str, CkanField] | None, allow_chunks: bool = True, params: UserFileFormat = None, **kwargs) → DataFrame | List[dict]: Read a file/IO stream and return a unique DataFrame. This case is the simplest implementation.

ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.read_function_example_df_with_metadata(file_path_or_stream: str | IOBase, *, fields: Dict[str, CkanField] | None, allow_chunks: bool = True, params: UserFileFormat = None, **kwargs) → Tuple[DataFrame | List[dict], CkanResourceInfo]: Read a file/IO stream and return a unique DataFrame. This case returns as well metadata which can be read from the file.

ckanapi_harvesters.harvesters.file_formats.user_format_prototypes.write_function_example(df: DataFrame | List[dict], file_path_or_buffer: str | IOBase, *, fields: Dict[str, CkanField] | None, append: bool = False, params: UserFileFormat = None, **kwargs) → None: This function writes a DataFrame to the given file path.

ckanapi_harvesters.harvesters.file_formats.xls_format module

The basic file format for DataStore: XLS

class ckanapi_harvesters.harvesters.file_formats.xls_format.ExcelFileFormat(options_string: str, *, read_kwargs: dict = None, write_kwargs: dict = None)

Bases: FileFormatABC

Excel file IO using pandas.read_excel. Supports xls, xlsx, xlsm, xlsb, odf, ods and odt formats.

Recommended: use with CLI argument –read-kwargs sheet_name=your_sheet (default is 0 i.e. the first sheet)

append_allowed() → bool: This function announces that the file format is allowed to be written in append mode, and that the function append_file is implemented.

append_file(df: DataFrame | ListRecords, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file, appending it to the end of the file.

append_in_memory(stream: bytes, df: DataFrame | ListRecords, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory, appending its to the end of the buffer.

copy(dest=None)

read_buffer_full(buffer: StringIO, fields: Dict[str, CkanField] | None) → DataFrame | ListRecords: Read a file from memory, as a DataFrame or ListRecords. This function reads entirely the file.

read_by_chunks_allowed() → bool

read_file(file_path: str, fields: Dict[str, CkanField] | None, allow_chunks: bool = True) → DataFrame | ListRecords: Read a file from the file system, either fully (returning DataFrame or ListRecords) or by chunks (Iterator over a number of records).

write_file(df: DataFrame, file_path: str, fields: Dict[str, CkanField] | None) → None: Write a DataFrame or ListRecords to a file.

write_in_memory(df: DataFrame, fields: Dict[str, CkanField] | None) → bytes: This function writes a DataFrame or ListRecords to a file in memory.

Module contents

Classes to read specific file formats to load DataStore DataFrame/records from a system file