ckanapi_harvesters.harvesters package
Subpackages
- ckanapi_harvesters.harvesters.data_cleaner package
- Submodules
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_abc module
CkanDataCleanerABCCkanDataCleanerABC._add_field_from_path()CkanDataCleanerABC._clean_final_steps()CkanDataCleanerABC._clean_subvalue()CkanDataCleanerABC._detect_non_standard_field()CkanDataCleanerABC._detect_standard_field_bypass()CkanDataCleanerABC._extra_checks()CkanDataCleanerABC._replace_non_standard_subvalue()CkanDataCleanerABC._replace_non_standard_value()CkanDataCleanerABC._replace_standard_subvalue_bypass()CkanDataCleanerABC._replace_standard_value_bypass()CkanDataCleanerABC.apply_new_fields_request()CkanDataCleanerABC.clean_records()CkanDataCleanerABC.clean_value_field()CkanDataCleanerABC.clear_all_outputs()CkanDataCleanerABC.clear_outputs_new_dataframe()CkanDataCleanerABC.copy()CkanDataCleanerABC.create_new_field()CkanDataCleanerABC.detect_field_types_and_subs()CkanDataCleanerABC.get_class_keyword()CkanDataCleanerABC.merge_field_changes()
DataCleanerNone
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_errors module
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_init module
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload module
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_1_basic module
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_2_geom module
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_3_assist module
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_3_check module
- Module contents
- ckanapi_harvesters.harvesters.file_formats package
- Submodules
- ckanapi_harvesters.harvesters.file_formats.csv_format module
CsvFileFormatCsvFileFormat.append_allowed()CsvFileFormat.append_file()CsvFileFormat.append_in_memory()CsvFileFormat.copy()CsvFileFormat.default_read_kwargsCsvFileFormat.default_write_kwargsCsvFileFormat.read_buffer_full()CsvFileFormat.read_by_chunks_allowed()CsvFileFormat.read_file()CsvFileFormat.write_file()CsvFileFormat.write_in_memory()
- ckanapi_harvesters.harvesters.file_formats.file_format_abc module
FileFormatABCFileFormatABC.allow_chunksFileFormatABC.append_allowed()FileFormatABC.append_file()FileFormatABC.append_in_memory()FileFormatABC.chunk_sizeFileFormatABC.copy()FileFormatABC.default_read_chunksizeFileFormatABC.default_read_kwargsFileFormatABC.default_write_kwargsFileFormatABC.extra_argsFileFormatABC.options_stringFileFormatABC.print_help_cli()FileFormatABC.read_buffer_full()FileFormatABC.read_by_chunks_allowed()FileFormatABC.read_by_chunks_enabled()FileFormatABC.read_file()FileFormatABC.read_kwargsFileFormatABC.resource_attributes_from_fileFileFormatABC.write_file()FileFormatABC.write_in_memory()FileFormatABC.write_kwargs
- ckanapi_harvesters.harvesters.file_formats.file_format_init module
- ckanapi_harvesters.harvesters.file_formats.json_format module
JsonFileFormatJsonFileFormat.append_allowed()JsonFileFormat.append_file()JsonFileFormat.append_in_memory()JsonFileFormat.copy()JsonFileFormat.default_read_kwargsJsonFileFormat.default_write_kwargsJsonFileFormat.read_buffer_full()JsonFileFormat.read_by_chunks_allowed()JsonFileFormat.read_file()JsonFileFormat.write_file()JsonFileFormat.write_in_memory()
- ckanapi_harvesters.harvesters.file_formats.shp_format module
DownloadedShapeFileConversionShapeFileFormatShapeFileFormat.append_allowed()ShapeFileFormat.append_file()ShapeFileFormat.append_in_memory()ShapeFileFormat.copy()ShapeFileFormat.default_read_kwargsShapeFileFormat.default_write_kwargsShapeFileFormat.downloaded_df_to_gdf()ShapeFileFormat.read_buffer_full()ShapeFileFormat.read_by_chunks_allowed()ShapeFileFormat.read_file()ShapeFileFormat.require_field_crsShapeFileFormat.write_file()ShapeFileFormat.write_in_memory()
- ckanapi_harvesters.harvesters.file_formats.user_format module
- ckanapi_harvesters.harvesters.file_formats.user_format_prototypes module
- ckanapi_harvesters.harvesters.file_formats.xls_format module
- Module contents
Submodules
ckanapi_harvesters.harvesters.harvester_abc module
Harvester base class
- class ckanapi_harvesters.harvesters.harvester_abc.DatabaseHarvesterABC(params: DatabaseParams = None)
Bases:
HarvesterConnectABC,ABC- clear_secrets()
- abstractmethod copy(*, dest=None)
- abstractmethod get_dataset_harvester(dataset_name: str) DatasetHarvesterABC
- abstractmethod get_description() str
- abstractmethod get_login_url_without_auth() str
- abstractmethod static init_from_options_string(options_string: str, *, base_dir: str = None) Tuple[DatabaseHarvesterABC, List[str]]
- abstractmethod list_datasets(return_metadata: bool = True) List[str] | OrderedDict[str, DatasetMetadata]
- update_from_ckan(ckan)
- class ckanapi_harvesters.harvesters.harvester_abc.DatasetHarvesterABC(params: DatasetParams = None)
Bases:
DatabaseHarvesterABC,ABC- clean_dataset_metadata() DatasetMetadata
- abstractmethod copy(*, dest=None)
- abstractmethod get_table_harvester(table_name: str) TableHarvesterABC
- abstractmethod static init_from_options_string(options_string: str, *, base_dir: str = None) Tuple[DatasetHarvesterABC, List[str]]
- abstractmethod list_tables(return_metadata: bool = True) List[str] | OrderedDict[str, TableMetadata]
- abstractmethod query_dataset_metadata(cancel_if_present: bool = True) DatasetMetadata
- class ckanapi_harvesters.harvesters.harvester_abc.HarvesterConnectABC
Bases:
ABC- abstractmethod check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- abstractmethod clear_secrets()
- abstractmethod disconnect() None
- abstractmethod update_from_ckan(ckan)
- class ckanapi_harvesters.harvesters.harvester_abc.TableHarvesterABC(params: TableParams = None)
Bases:
DatasetHarvesterABC,ABC- clean_table_metadata() TableMetadata
- abstractmethod copy(*, dest=None)
- get_default_data_cleaner() CkanDataCleanerABC | None
- classmethod get_default_df_upload_fun() Callable[[Any], DataFrame] | None
- abstractmethod get_default_primary_key() List[str]
- abstractmethod static init_from_options_string(options_string: str, *, base_dir: str = None, file_url_attr: str = None) Tuple[TableHarvesterABC, List[str]]
- abstractmethod query_data(query: Any) List[dict] | DataFrame
- abstractmethod query_table_metadata(cancel_if_present: bool = True) TableMetadata
ckanapi_harvesters.harvesters.harvester_errors module
Errors specific to harvesting data
- exception ckanapi_harvesters.harvesters.harvester_errors.HarvestMethodRequiredError
Bases:
Exception
- exception ckanapi_harvesters.harvesters.harvester_errors.HarvesterArgumentError
Bases:
Exception
- exception ckanapi_harvesters.harvesters.harvester_errors.HarvesterArgumentRequiredError(argument: str, harvest_method: str, help: str = None)
Bases:
HarvesterArgumentError
- exception ckanapi_harvesters.harvesters.harvester_errors.HarvesterRequirementError(requirement: str, harvest_method: str)
Bases:
RequirementError
- exception ckanapi_harvesters.harvesters.harvester_errors.ResourceNotFoundError(resource_type: str, table_name: str, host: str)
Bases:
Exception
ckanapi_harvesters.harvesters.harvester_init module
Harvester initialization from the options_string arguments
- ckanapi_harvesters.harvesters.harvester_init.init_dataset_harvester_from_options_string(options_string: str, *, base_dir: str = None) Tuple[DatasetHarvesterABC, List[str]]
- ckanapi_harvesters.harvesters.harvester_init.init_table_harvester_from_options_string(options_string: str, *, file_url_attr: str, base_dir: str = None) Tuple[TableHarvesterABC, List[str]]
ckanapi_harvesters.harvesters.harvester_model module
Harvester base class
ckanapi_harvesters.harvesters.harvester_params module
Harvester parameters. The base names of the parameters are shared between harvesters.
- class ckanapi_harvesters.harvesters.harvester_params.DatabaseParams(source: DatabaseParams = None)
Bases:
objectClass representing parameters to connect to a database. This class mangages the connection parameters such as proxy and CA. It also manages authentication parameters.
- abstractmethod copy(*, dest=None)
- initialize_from_cli_args(args: Namespace, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
- static parse_harvest_method(options_string: str) str
- parse_options_string(options_string: str, *, base_dir: str = None, file_url_attr: str = None, parser: ArgumentParser = None) List[str]
- property proxies: dict
- property proxy_auth: AuthBase | Tuple[str, str]
- property proxy_string: str
- static setup_cli_harvester_parser(parser: ArgumentParser = None) ArgumentParser
- static unlock_external_url_resource_download(value: bool = True)
This function enables the download of resources external from the CKAN server.
- class ckanapi_harvesters.harvesters.harvester_params.DatasetParams(source: DatasetParams = None)
Bases:
DatabaseParams- copy(*, dest=None)
- initialize_from_cli_args(args: Namespace, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
- static setup_cli_harvester_parser(parser: ArgumentParser = None) ArgumentParser
- class ckanapi_harvesters.harvesters.harvester_params.TableParams(source: TableParams = None)
Bases:
DatasetParams- copy(*, dest=None)
- initialize_from_cli_args(args: Namespace, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
- parse_options_string(options_string: str, *, base_dir: str = None, file_url_attr: str = None, parser: ArgumentParser = None) List[str]
- static setup_cli_harvester_parser(parser: ArgumentParser = None) ArgumentParser
ckanapi_harvesters.harvesters.mongodb_data_cleaner module
Harvest from a MongoDB database using pymongo (data cleaner)
- exception ckanapi_harvesters.harvesters.mongodb_data_cleaner.BrokenMongoRefError
Bases:
Exception
- class ckanapi_harvesters.harvesters.mongodb_data_cleaner.MongoDataCleanerUpload
Bases:
CkanDataCleanerUploadGeomData cleaner operations specific to MongoDB objects.
- clear_all_outputs()
Some values must not be cleared for each DataFrame upload. The cleaner is stateful for certain values cleared only here.
- clear_outputs_new_dataframe()
- copy(dest=None) MongoDataCleanerUpload
- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
- ckanapi_harvesters.harvesters.mongodb_data_cleaner.mongo_default_data_cleaner() MongoDataCleanerUpload
- ckanapi_harvesters.harvesters.mongodb_data_cleaner.mongo_default_df_conversion(documents: List[dict], **kwargs) DataFrame | ListRecords
ckanapi_harvesters.harvesters.mongodb_harvester module
Harvest from a MongoDB database using pymongo
- class ckanapi_harvesters.harvesters.mongodb_harvester.DatabaseHarvesterMongoServer(params: DatabaseParams = None)
Bases:
DatabaseHarvesterABCThis class manages the connection to a MongoDB server. It can list datasets (MongoDB databases) but this call could lead to an error.
- check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- copy(*, dest=None)
- disconnect() None
- get_dataset_harvester(dataset_name: str) DatasetHarvesterMongoDatabase
- get_description() str
- get_login_url_without_auth() str
- static init_from_options_string(options_string: str, base_dir: str = None) Tuple[DatabaseHarvesterMongoServer, List[str]]
- list_datasets(return_metadata: bool = True) List[str] | OrderedDict[str, DatasetMetadata]
- class ckanapi_harvesters.harvesters.mongodb_harvester.DatasetHarvesterMongoDatabase(params: DatasetParams = None)
Bases:
DatabaseHarvesterMongoServer,DatasetHarvesterABCA CKAN dataset corresponds to a MongoDB database (set of collections).
- check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- clean_dataset_metadata() DatasetMetadata
- disconnect() None
- get_description() str
- get_table_harvester(table_name: str) TableHarvesterMongoCollection
- static init_from_options_string(options_string: str, base_dir: str = None) Tuple[DatasetHarvesterMongoDatabase, List[str]]
- list_tables(return_metadata: bool = True) List[str] | OrderedDict[str, TableMetadata]
- query_dataset_metadata(cancel_if_present: bool = True) DatasetMetadata
- class ckanapi_harvesters.harvesters.mongodb_harvester.TableHarvesterMongoCollection(params: TableParamsMongoCollection = None)
Bases:
DatasetHarvesterMongoDatabase,TableHarvesterABCA table (CKAN DataStore) corresponds to a MongoDB collection.
- check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- clean_table_metadata() TableMetadata
- copy(*, dest=None)
- disconnect() None
- get_default_data_cleaner() CkanDataCleanerABC | None
- get_default_primary_key() List[str]
- get_description() str
- static init_from_options_string(options_string: str, *, base_dir: str = None, file_url_attr: str = None) Tuple[TableHarvesterMongoCollection, List[str]]
- query_data(query: Dict[str, Any]) List[dict]
- query_table_metadata(cancel_if_present: bool = True) TableMetadata
ckanapi_harvesters.harvesters.mongodb_params module
Harvest from a MongoDB database using pymongo (parameters)
- class ckanapi_harvesters.harvesters.mongodb_params.TableParamsMongoCollection(source: TableParamsMongoCollection = None)
Bases:
TableParamsA table (CKAN DataStore) corresponds to a MongoDB collection. This subclass of TableParams implements an alias attribute for table name called collection.
- property collection: str
- copy(*, dest=None)
- initialize_from_cli_args(args: Namespace, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
- static setup_cli_harvester_parser(parser: ArgumentParser = None) ArgumentParser
ckanapi_harvesters.harvesters.postgre_harvester module
Harvest from a PostgreSQL database using sqlalchemy
- class ckanapi_harvesters.harvesters.postgre_harvester.DatabaseHarvesterPostgre(params: DatabaseParams = None)
Bases:
DatabaseHarvesterABCThis class manages the connection to a PostgreSQL database server. It can list schemas (corresponding to CKAN datasets).
- check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- copy(*, dest=None)
- disconnect() None
- get_dataset_harvester(dataset_name: str) DatasetHarvesterPostgre
- get_description() str
- get_login_url_without_auth() str
- static init_from_options_string(options_string: str, base_dir: str = None) Tuple[DatabaseHarvesterPostgre, List[str]]
- list_datasets(return_metadata: bool = True) List[str] | OrderedDict[str, DatasetMetadata]
- class ckanapi_harvesters.harvesters.postgre_harvester.DatasetHarvesterPostgre(params: DatasetParamsPostgreSchema = None)
Bases:
DatabaseHarvesterPostgre,DatasetHarvesterABCA CKAN dataset corresponds to a PostgreSQL schema (set of tables).
- check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- clean_dataset_metadata() DatasetMetadata
- disconnect() None
- get_description() str
- get_table_harvester(table_name: str) TableHarvesterPostgre
- static init_from_options_string(options_string: str, base_dir: str = None) Tuple[DatasetHarvesterPostgre, List[str]]
- list_tables(return_metadata: bool = True) List[str] | OrderedDict[str, TableMetadata]
- query_dataset_metadata(cancel_if_present: bool = True) DatasetMetadata
- class ckanapi_harvesters.harvesters.postgre_harvester.TableHarvesterPostgre(params: TableParamsPostgre = None)
Bases:
DatasetHarvesterPostgre,TableHarvesterABCA CKAN table (DataStore) corresponds to a PostgreSQL table.
- _data_type_map_to_ckan(field_metadata: FieldMetadata) None
Some data types need to be translated
- _get_field_query_function(field_metadata: FieldMetadata) str
Force some data types to return as text
- check_connection(*, new_connection: bool = False, raise_error: bool = False) None | ContextErrorLevelMessage
- clean_table_metadata() TableMetadata
- copy(*, dest=None)
- get_default_data_cleaner() CkanDataCleanerABC
- get_default_primary_key() List[str]
- get_description() str
- static init_from_options_string(options_string: str, *, base_dir: str = None, file_url_attr: str = None) Tuple[TableHarvesterPostgre, List[str]]
- query_data(query: Dict[str, Any]) DataFrame
- query_table_metadata(cancel_if_present: bool = True) TableMetadata
- update_from_ckan(ckan)
ckanapi_harvesters.harvesters.postgre_params module
Harvest from a PostgreSQL database
- class ckanapi_harvesters.harvesters.postgre_params.DatasetParamsPostgreSchema(source: DatasetParamsPostgreSchema = None)
Bases:
DatasetParamsA CKAN dataset corresponds to a PostgreSQL schema (set of tables). This subclass of DatasetParams implements an alias attribute for dataset name called schema.
- copy(*, dest=None)
- initialize_from_cli_args(args: Namespace, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
- property schema: str
- static setup_cli_harvester_parser(parser: ArgumentParser = None) ArgumentParser
- class ckanapi_harvesters.harvesters.postgre_params.TableParamsPostgre(source: TableParamsPostgre = None)
Bases:
TableParams- copy(*, dest=None)
- initialize_from_cli_args(args: Namespace, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
- property schema: str
- static setup_cli_harvester_parser(parser: ArgumentParser = None) ArgumentParser
ckanapi_harvesters.harvesters.pymongo_harvester module
Deprecated module name alias for mongodb_harvester
Module contents
Section of the package dedicated to the harvesting of data using APIs, or databases