ckanapi_harvesters.auxiliary package
Submodules
ckanapi_harvesters.auxiliary.ckan_action module
Action response common treatments
- exception ckanapi_harvesters.auxiliary.ckan_action.CkanActionError(ckan, response: CkanActionResponse, display_request: bool = True)
Bases:
Exception
- class ckanapi_harvesters.auxiliary.ckan_action.CkanActionResponse(response: Response, dry_run: bool = False)
Bases:
objectClass which decodes and checks the response of a CKAN request
- default_error(ckan) CkanActionError
Raise specific error codes depending on response
- exception ckanapi_harvesters.auxiliary.ckan_action.CkanAuthorizationError(ckan, response: CkanActionResponse, display_request: bool = True)
Bases:
CkanActionError
- exception ckanapi_harvesters.auxiliary.ckan_action.CkanNotFoundError(ckan, object_type: str, response: CkanActionResponse, display_request: bool = True)
Bases:
CkanActionError
- exception ckanapi_harvesters.auxiliary.ckan_action.CkanSqlCapabilityError(ckan, response: CkanActionResponse, display_request: bool = True)
Bases:
CkanActionError
ckanapi_harvesters.auxiliary.ckan_api_key module
Methods to load an API key
- class ckanapi_harvesters.auxiliary.ckan_api_key.ApiKey(remote_url: str = None, *, apikey: str = None, apikey_file: str = None, api_key_header_name: str | Iterable[str] = None)
Bases:
objectAPI key storage class.
- __init__(remote_url: str = None, *, apikey: str = None, apikey_file: str = None, api_key_header_name: str | Iterable[str] = None)
CKAN Database API key storage class.
- Parameters:
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
- clear() None
- copy(*, dest=None)
- get_auth_header() Dict[str, str]
Returns the correct header with the API key for the requests needing it. If no API key was loaded, returns an empty dictionary.
- input()
Prompt the user to input the API key in the console window.
- Returns:
- is_empty()
- load_apikey(apikey_file: str = None, *, base_dir: str = None, error_not_found: bool = True) bool
Load the API key from file. The file should contain a valid API key in the first line of text.
- Parameters:
apikey_file – path to the API key file. The following keywords are accepted: - “environ”: the API key will be looked up in the environment variable with load_from_environ
base_dir – base directory to find the API key file, if a relative path is provided
error_not_found – option to raise an exception if the API key file is not found
- Returns:
- load_from_environ(*, error_not_found: bool = False) bool
Load CKAN API key from environment variables, by order of priority:
By default, no environment variables are used.
- property remote_url: str
- property remote_url_constraint: str | None
- property value: str | None
- class ckanapi_harvesters.auxiliary.ckan_api_key.CkanApiKey(apikey: str = None, *, remote_url: str = None, apikey_file: str = None, apikey_auto_load: bool = True)
Bases:
ApiKeyCKAN Database API key storage class.
- API_KEY_FILE_DEFAULT_LIST = ['/home/runner/.config/__CKAN_API_KEY__.txt', '/home/runner/.ckan/__CKAN_API_KEY__.txt']
- API_KEY_FILE_ENVIRON = 'CKAN_API_KEY_FILE'
- CKAN_API_KEY_ENVIRON = 'CKAN_API_KEY'
- CKAN_API_KEY_HEADER_NAME = {'Authorization', 'X-CKAN-API-Key'}
- __init__(apikey: str = None, *, remote_url: str = None, apikey_file: str = None, apikey_auto_load: bool = True)
CKAN Database API key storage class.
- Parameters:
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
apikey_auto_load – option to automatically load the API key from file or envrionment variables
Order of priority:
Value of apikey
Contents of file pointed by apikey_file
Value of environment variable CKAN_API_KEY
Contents of file pointed by the environment variable CKAN_API_KEY_FILE
Contents of the file at the default location: ~/.config/__CKAN_API_KEY__.txt or ~/.ckan/__CKAN_API_KEY__.txt
- copy(*, dest=None) CkanApiKey
- static get_default_apikey_file() str | None
- input()
Prompt the user to input the API key in the console window.
- Returns:
- load_from_environ(*, error_not_found: bool = False, empty_warning: bool = True) bool
Load CKAN API key from environment variables, by order of priority:
CKAN_API_KEY: for the raw API key (it is not recommended to store API key in an environment variable)
CKAN_API_KEY_FILE: path to a file containing a valid API key in the first line of text
- Parameters:
error_not_found – raise an error if the API key file was not found
- Returns:
ckanapi_harvesters.auxiliary.ckan_auxiliary module
Data model to represent a CKAN database architecture
- class ckanapi_harvesters.auxiliary.ckan_auxiliary.CkanFieldInternalAttrs
Bases:
objectCustom information for internal use
- copy() CkanFieldInternalAttrs
- init_from_native_type(native_type: str) None
- init_from_options_string(options_string: str) None
- merge(new_values: CkanFieldInternalAttrs) CkanFieldInternalAttrs
- update_from_ckan(ckan)
- class ckanapi_harvesters.auxiliary.ckan_auxiliary.CkanIdFieldTreatment(*values)
Bases:
IntEnum- Keep = 0
- Remove = 2
- SetIndex = 1
- class ckanapi_harvesters.auxiliary.ckan_auxiliary.FileChunkDataFrame(df: ListRecords | DataFrame | None, file_path: str, file_index: int, chunk_index: int, file_position: int, read_line_counter: int)
Bases:
objectClass to hold a chunk of a DataFrame of a file (only for DataStores), with the file name, index and an indication of the position in the file
- __init__(df: ListRecords | DataFrame | None, file_path: str, file_index: int, chunk_index: int, file_position: int, read_line_counter: int) None
- Parameters:
df – the data of the file chunk (leave None if not loaded)
file_path – the path to the source
file_index – the index of the file in the list
chunk_index – counter of chunks read within the file
file_position – the position within the file itself (approximation)
- df: ListRecords | DataFrame | None
- file_path: str
- class ckanapi_harvesters.auxiliary.ckan_auxiliary.RequestType(*values)
Bases:
IntEnum- Get = 1
- Post = 2
- ckanapi_harvesters.auxiliary.ckan_auxiliary.bytes_to_megabytes(size_bytes: int | None) float | None
- ckanapi_harvesters.auxiliary.ckan_auxiliary.ca_arg_to_str(ca_cert: bool | str | None, base_dir: str = None, source_string: str = None) str | None
- ckanapi_harvesters.auxiliary.ckan_auxiliary.ca_file_rel_to_dir(ca_file: str | None, base_dir: str = None) Tuple[bool | str | None, str | None]
- ckanapi_harvesters.auxiliary.ckan_auxiliary.dict_recursive_update(d: dict, u: dict) dict
- ckanapi_harvesters.auxiliary.ckan_auxiliary.empty_str_to_None(value: str | None) str | None
- ckanapi_harvesters.auxiliary.ckan_auxiliary.find_duplicates(list_str: Iterable) list
- ckanapi_harvesters.auxiliary.ckan_auxiliary.import_args_kwargs_dict(args_kwargs: str) dict
- ckanapi_harvesters.auxiliary.ckan_auxiliary.json_encode_params(params: dict) Tuple[str, dict]
For upload requests, with a records field, it is necessary to specify the params in the data argument instead of the json argument of requests. In the case there are NaN values, these are not supported by the requests encoder.
___Requirement___: add headers=json_headers !!!
- Parameters:
params
- Returns:
- ckanapi_harvesters.auxiliary.ckan_auxiliary.parse_geometry_native_type(geometry_type: str) Tuple[str, int]
- ckanapi_harvesters.auxiliary.ckan_auxiliary.requests_multipart_data(json_dict: dict, files: dict) dict
Generate the multipart data for a request containing json and a file. Used to fill the files argument of requests.post json_headers must not be used
- Parameters:
json_dict
files
- Returns:
- ckanapi_harvesters.auxiliary.ckan_auxiliary.sql_varname_escape(var_name: str) str
- ckanapi_harvesters.auxiliary.ckan_auxiliary.ssl_arguments_decompose(ca_cert: bool | str | None, *, default_ca_verify: bool = True) Tuple[bool, str | None]
Decompose requirements argument verify into boolean and path to a certificate file.
- Parameters:
ca_cert
default_ca_verify – option to indicate if SSL should be enabled if ca_cert is None
- Returns:
Tuple ca_verify, ssl_server_certfile
- ckanapi_harvesters.auxiliary.ckan_auxiliary.str_to_python_value(value: str) Any
- ckanapi_harvesters.auxiliary.ckan_auxiliary.to_jsons_indent_lists_single_line(obj, *args, reduced_size: bool = False, **kwargs) str
Modified json representation of an object. Lists with strings / integers are displayed on one line.
- Parameters:
obj – object to encode
args – args to pass to json.dumps()
reduced_size – option to not indent the json output (not human-readable)
kwargs – kwargs to pass to json.dumps()
- Returns:
- ckanapi_harvesters.auxiliary.ckan_auxiliary.upload_prepare_requests_files_arg(*, files: dict = None, file_path: str = None, df: DataFrame = None, payload: bytes | BufferedIOBase = None, payload_name: str = None) dict
Create files argument for requests.post, by order of priority:
- Parameters:
files – files pass through argument to the requests.post function. Use to send other data formats.
payload – bytes to upload as a file
payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.
file_path – path of the file to transmit (binary and text files are supported here)
df – pandas DataFrame to replace resource
- Returns:
ckanapi_harvesters.auxiliary.ckan_configuration module
Parameters which apply to the package
ckanapi_harvesters.auxiliary.ckan_defs module
Data model to represent a CKAN database architecture
ckanapi_harvesters.auxiliary.ckan_errors module
CKAN error types
- exception ckanapi_harvesters.auxiliary.ckan_errors.AdminFeatureLockedError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.ApiKeyFileError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.ArgumentError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.CkanArgumentError(api_name: str, argument_name: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.CkanMandatoryArgumentError(action_name: str, attribute_name: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.CkanServerError(ckan, response: Response, msg: str, display_request: bool = True)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.DataStoreNotFoundError(resource_id: str, error_message: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.DuplicateNameError(object_type: str, names: Iterable[str])
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.ExternalUrlLockedError(url: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.FileFormatRequirementError(requirement: str, file_format: str)
Bases:
RequirementError
- exception ckanapi_harvesters.auxiliary.ckan_errors.FileOrDirNotExistError(path: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.ForbiddenNameError(object_type: str, names: Iterable[str])
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.FunctionMissingArgumentError(function_name: str, argument_name: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.HostContraintError(host_url: str, url: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.HttpRetryCodeError(status_code: int, description: str = None)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.IncompletePatchError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.IntegrityError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.InvalidParameterError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.LoginFileError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MandatoryAttributeError(object_type: str, attribute_name: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MaxAttemptsError(accumulated_traceback: List[str])
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MaxRequestsCountError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MissingCodeFileError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MissingIOFunctionError(function_type: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MissingIdError(object_type: str, object_name)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.MultipleErrors(errors: List[Exception])
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.NameFormatError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.NoCAVerificationError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.NoDefaultView(resource_format: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.NotMappedObjectNameError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.ReadOnlyError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.RequestError
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.RequirementError(requirement: str, function: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.SearchAllNoCountsError(api_name: str, argument_name_value: str = None)
Bases:
ArgumentError
- exception ckanapi_harvesters.auxiliary.ckan_errors.UnexpectedError
Bases:
RuntimeError
- exception ckanapi_harvesters.auxiliary.ckan_errors.UnknownCliArgumentError(extra_args: List[str], context: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.UnknownTargetCRSError(source_crs, context: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.ckan_errors.UrlError
Bases:
Exception
ckanapi_harvesters.auxiliary.ckan_field_types module
CKAN DataStore field types and link with PostgreSQL types
- class ckanapi_harvesters.auxiliary.ckan_field_types.CkanFieldType
Bases:
str,CkanFieldTypeABCRole previously managed by CkanFieldTypeEnum, but accepts any string
- static from_str(s)
- class ckanapi_harvesters.auxiliary.ckan_field_types.CkanFieldTypeABC
Bases:
ABC- abstractmethod static from_str(s)
- class ckanapi_harvesters.auxiliary.ckan_field_types.CkanFieldTypeEnum(*values)
Bases:
IntEnumEnumeration of types encountered during development + documentation
- Numeric = 2
- Text = 1
- TimeStamp = 3
- bigint = 102
- bigserial = 151
- bit = 200
- bool = 15
- box = 223
- bson = 230
- bytea = 204
- char = 201
- cidr = 156
- circle = 224
- date = 11
- float = 14
- float4 = 106
- float8 = 107
- static from_str(s)
- geometry = 228
- inet = 155
- int = 13
- int16 = 103
- int2 = 101
- int32 = 104
- int4 = 13
- int64 = 105
- int8 = 102
- integer = 13
- json = 30
- jsonb = 31
- line = 226
- lseg = 225
- macaddr = 154
- money = 108
- oid = 153
- path = 221
- point = 220
- polygon = 222
- serial = 150
- serial4 = 150
- serial8 = 151
- time = 12
- timestamptz = 21
- timetz = 20
- uuid = 152
- varbit = 202
- varchar = 203
- xml = 32
ckanapi_harvesters.auxiliary.ckan_map module
Data model to represent a CKAN database architecture
- class ckanapi_harvesters.auxiliary.ckan_map.CkanMap
Bases:
CkanMapABCClass to store an image of the CKAN database architecture Auxiliary class of CkanApi
- _update_datastore_info(datastore_info: CkanDataStoreInfo) None
Internal function to update the length of a DataStore without making a request.
- _update_datastore_len(resource_id: str, new_len: int) None
Internal function to update the length of a DataStore without making a request.
- Parameters:
resource_id – resource id.
new_len – value to replace
- _update_group_info(group_info: CkanGroupInfo | List[CkanGroupInfo]) None
Internal function to update information on a group.
- _update_license_info(license_info: CkanLicenseInfo | List[CkanLicenseInfo]) None
Internal function to update the information on a license.
- _update_organization_info(organization_info: CkanOrganizationInfo | List[CkanOrganizationInfo]) None
Internal function to update information on an organization.
- _update_package_info(package_info: CkanPackageInfo | List[CkanPackageInfo]) None
Internal function to update the information of a package.
NB: the indicator pkg_info.requested_datastore_info remains False until map_resources is called.
- _update_resource_info(resource_info: CkanResourceInfo | List[CkanResourceInfo]) None
Internal function to update the length of a DataStore without making a request.
- _update_user_info(user_info: CkanUserInfo | List[CkanUserInfo]) None
Internal function to update information on a group.
- get_datastore_info(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) CkanDataStoreInfo | None
- Parameters:
resource_name – resource name or id.
package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.
- Returns:
- get_datastore_len(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) int | None
Retrieve the number of rows in a DataStore from the mapped data. This requires the map_resources to be called with the option datastore_info=True.
- Parameters:
resource_name – resource name or id.
package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.
- Returns:
- get_license_id(license_name: str, *, error_not_mapped: bool = True) str
Retrieve the ID of a license based on the mapped data.
- Parameters:
license_name – license title or id.
- Returns:
- get_license_info(license_name: str, *, error_not_mapped: bool = True) CkanLicenseInfo | None
Retrieve the information on a license based on the mapped data.
- Parameters:
license_name – license title or id.
- Returns:
- get_organization_for_owner_org(organization_name: str, *, error_not_mapped: bool = True) CkanOrganizationInfo | None
Retrieve the organization name for a given organization name based on the mapped data. This is the field usually used for the owner_org argument. Calls CkanOrganizationInfo.get_owner_org
- Parameters:
organization_name – organization name or id.
- Returns:
- get_organization_id(organization_name: str, *, error_not_mapped: bool = True, search_title: bool = True) str | None
Retrieve the organization id for a given organization name based on the mapped data.
- Parameters:
organization_name – organization name, title or id.
- Returns:
- get_organization_info(organization_name: str, *, error_not_mapped: bool = True) CkanOrganizationInfo | None
Retrieve the organization info for a given organization name based on the mapped data.
- Parameters:
organization_name – organization name or id.
- Returns:
- get_package_id(package_name: str, *, error_not_mapped: bool = True, search_title: bool = True) str | None
Retrieve the package id for a given package name based on the package map.
- Parameters:
package_name – package name or id.
- Returns:
- get_package_info(package_name: str, *, error_not_mapped: bool = True) CkanPackageInfo | None
Retrieve the package info for a given package name based on the package map.
- Parameters:
package_name – package name or id.
- Returns:
- get_resource_id(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) str | None
Retrieve the resource id for a given combination of (package name and resource name) based on the package map.
- Parameters:
resource_name – resource alias, name or id.
package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.
- Returns:
- get_resource_info(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) CkanResourceInfo | None
Retrieve the information on a given resource.
- Parameters:
resource_name – resource name or id.
package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.
- Returns:
- get_resource_package_id(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) str | None
Retrieve the package id of a given resource.
- Parameters:
resource_name – resource name or id.
package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.
- Returns:
- purge()
Erase known package mappings.
- Returns:
- to_dict() dict
- update_from_dict(data: dict) None
ckanapi_harvesters.auxiliary.ckan_model module
Data model to represent a CKAN database architecture
- class ckanapi_harvesters.auxiliary.ckan_model.CkanAliasInfo(d: dict = None)
Bases:
object- copy() CkanAliasInfo
- static from_dict(d: dict) CkanAliasInfo
- class ckanapi_harvesters.auxiliary.ckan_model.CkanCapacity(*values)
Bases:
IntEnum- Admin = 3
- Editor = 2
- Excluded = 0
- Member = 1
- Public = 5
- SysAdmin = 4
- static from_str(s)
- class ckanapi_harvesters.auxiliary.ckan_model.CkanCollaboration(capacity: CkanCapacity = None, modified: datetime = None, group_id: str = None, d: dict = None)
Bases:
object- capacity: CkanCapacity
- copy() CkanCollaboration
- group_id: str | None
- modified: datetime
- to_dict(user_info: CkanUserInfo, group_table: Dict[str, CkanGroupInfo], date_format: str) dict
- class ckanapi_harvesters.auxiliary.ckan_model.CkanConfigurableObjectABC
Bases:
ABC- configurable_attributes: set = None
- extra_attributes: set = {}
- abstractmethod static get_resource_type() str
- mandatory_attributes: set = None
- class ckanapi_harvesters.auxiliary.ckan_model.CkanDataStoreInfo(d: dict = None)
Bases:
object- aliases: List[str] | None
- copy() CkanDataStoreInfo
- details: dict
- fields_id_list: List[str]
- static from_dict(d: dict) CkanDataStoreInfo
- get_basic_field_list_dict()
- get_original_field_list_dict()
- get_recomp_field_list_dict()
- index_fields: List[str]
- resource_id: str
- class ckanapi_harvesters.auxiliary.ckan_model.CkanField(name: str, data_type: str, *, notes: str = None, native_type: str = None, type_override: bool = False, label: str = None)
Bases:
CkanConfigurableObjectABCObject representation of a CKAN Field configuration
- configurable_attributes: set = {'label', 'name', 'notes'}
- data_type: CkanFieldType | None
- details: dict
- static get_resource_type() str
- internal_attrs: CkanFieldInternalAttrs
- label: str | None
- mandatory_attributes: set = {'name'}
- name: str
- notes: str | None
- class ckanapi_harvesters.auxiliary.ckan_model.CkanGroupInfo(d: dict)
Bases:
object- copy() CkanGroupInfo
- description: str
- details: dict
- static from_dict(d: dict) CkanGroupInfo
- static get_resource_type() str
- id: str
- name: str
- package_members: dict[str, CkanCapacity] | None
- title: str
- user_members: dict[str, CkanCapacity] | None
- class ckanapi_harvesters.auxiliary.ckan_model.CkanLicenseDomain(*values)
Bases:
IntFlag- Content = 4
- Data = 2
- NoDomain = 0
- Software = 1
- static _generate_next_value_(name, start, count, last_values)
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_values: the last value assigned or None
- static from_bool(*, domain_software: bool = False, domain_data: bool = False, domain_content: bool = False) CkanLicenseDomain
- static from_dict(d: dict) CkanLicenseDomain
- to_dict() dict
- class ckanapi_harvesters.auxiliary.ckan_model.CkanLicenseInfo(d: dict)
Bases:
object- details: dict
- domain: CkanLicenseDomain
- family: str
- static from_dict(d: dict) CkanLicenseInfo
- id: str
- title: str
- url: str
- class ckanapi_harvesters.auxiliary.ckan_model.CkanOrganizationInfo(d: dict)
Bases:
object- copy() CkanOrganizationInfo
- static from_dict(d: dict) CkanOrganizationInfo
- get_owner_org()
Returns the value used for the owner_org argument
- Returns:
- user_members: None | Dict[str, CkanCapacity]
- class ckanapi_harvesters.auxiliary.ckan_model.CkanPackageInfo(d: dict = None, *, package_name: str = None, package_id: str = None, title: str = None, description: str = None, private: bool = None, state: CkanState = None, version: str = None, url: str = None, tags: List[str] = None)
Bases:
CkanConfigurableObjectABC- author: str | None
- author_email: str | None
- collaborators: None | Dict[str, CkanCollaboration]
- configurable_attributes: set = {'author', 'author_email', 'description', 'maintainer', 'maintainer_email', 'name', 'private', 'state', 'title', 'version'}
- copy() CkanPackageInfo
- custom_fields: OrderedDict[str, str] | None
- description: str | None
- details: dict
- extra_attributes: set = {'custom_fields', 'tags'}
- static from_dict(d: dict) CkanPackageInfo
- static get_resource_type() str
- groups: List[CkanGroupInfo]
- id: str | None
- license_id: str | None
- maintainer: str | None
- maintainer_email: str | None
- mandatory_attributes: set = {'name'}
- metadata_created: datetime | None
- metadata_modified: datetime | None
- name: str | None
- organization_info: CkanOrganizationInfo | None
- package_resources: OrderedDict[str, CkanResourceInfo]
- resources_id_index: Dict[str, str]
- tags: List[str] | None
- tags_info: Dict[str, CkanTagInfo] | None
- title: str | None
- update(refresh: CkanPackageInfo)
- update_resource(resource_info: CkanResourceInfo) int
- url: str | None
- user_access: None | Dict[str, CkanCollaboration]
- version: str | None
- class ckanapi_harvesters.auxiliary.ckan_model.CkanResourceInfo(d: dict = None, name: str = None, format: str = None, description: str = None, state: CkanState = None)
Bases:
CkanConfigurableObjectABC- configurable_attributes: set = {'description', 'format', 'name', 'state'}
- copy() CkanResourceInfo
- created: datetime | None
- datastore_info: CkanDataStoreInfo | None
- datastore_info_error: dict | None
- description: str | None
- details: dict
- download_url: str | None
- extra_attributes: set = {'download_url'}
- format: str | None
- static from_dict(d: dict) CkanResourceInfo
- static get_resource_type() str
- id: str | None
- last_modified: datetime | None
- mandatory_attributes: set = {'name'}
- metadata_modified: datetime | None
- name: str | None
- package_id: str | None
- update(refresh: CkanResourceInfo) None
- update_missing(refresh: CkanResourceInfo) None
- update_view(view_info: CkanViewInfo | List[CkanViewInfo], view_list: bool = False) None
- views: OrderedDict[str, CkanViewInfo] | None
- class ckanapi_harvesters.auxiliary.ckan_model.CkanState(*values)
Bases:
IntEnum- Active = 1
- Deleted = 2
- Draft = 0
- static from_str(s)
- class ckanapi_harvesters.auxiliary.ckan_model.CkanTagInfo(d: dict)
Bases:
object- details: dict
- display_name: str
- static from_dict(d: dict) CkanTagInfo
- id: str
- name: str
- vocabulary_id: str | None
- class ckanapi_harvesters.auxiliary.ckan_model.CkanUserInfo(d: dict = None)
Bases:
object- about: str | None
- copy() CkanUserInfo
- created: datetime | None
- details: dict | None
- display_name: str | None
- email_hash: str | None
- static from_dict(d: dict) CkanUserInfo
- fullname: str | None
- static get_resource_type() str
- id: str | None
- last_active: datetime | None
- name: str | None
- organizations: None | List[str]
- class ckanapi_harvesters.auxiliary.ckan_model.CkanViewInfo(d: dict)
Bases:
object- copy() CkanViewInfo
- details: dict
- static from_dict(d: dict) CkanViewInfo
- id: str
- package_id: str
- resource_id: str
- title: str
- view_type: str
ckanapi_harvesters.auxiliary.ckan_progress_callbacks module
Progress callback function definition
ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc module
Progress callback function interface
- class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanCallbackLevel(*values)
Bases:
IntEnum- MultiFileResource = 3
- Packages = 0
- Requests = 4
- ResourceChunks = 2
- Resources = 1
- class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressBarType(*values)
Bases:
IntEnum- NoBar = 0
- TqdmAuto = 1
- TqdmConsole = 2
- TqdmJupyter = 3
- class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressCallbackABC(callback_fun: Callable | CkanProgressCallbackABC = None, *, progress_bar_type: CkanProgressBarType = None)
Bases:
ABC- add_context(context: str, *, level: CkanCallbackLevel = None)
- abstractmethod copy(*, dest=None)
- default_progress_bar_type = 0
- end_task(total: int, *, file_count: int = None, position: int = None, file_index: int = None, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = None, **kwargs) None
- extra_context: dict[CkanCallbackLevel, str]
- last_progress_file_index: dict[CkanCallbackLevel, int]
- last_progress_position: dict[CkanCallbackLevel, int]
- progress_bar_enables: dict[CkanCallbackLevel, bool]
- property progress_bar_type: CkanProgressBarType
- progress_bars: Dict[CkanCallbackLevel, Any]
- progress_callback_kwargs: dict
- release_resources()
Release resources used by the progress callback like progress bars.
- remove_context(*, level: CkanCallbackLevel = None)
- start_task(total: int, *, file_count: int = None, position: int = 0, file_index: int = 0, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = 0, units: CkanProgressUnits = None, **kwargs) None
- start_time: Dict[CkanCallbackLevel, float | None]
- abstractmethod update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None
- verbosity: dict[CkanCallbackLevel, bool]
- class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressCallbackEmpty(callback_fun: Callable | CkanProgressCallbackABC = None, *, progress_bar_type: CkanProgressBarType = None)
Bases:
CkanProgressCallbackABCProgress callback which does not display anything.
- copy(*, dest=None)
- end_task(total: int, *, file_count: int = None, position: int = None, file_index: int = None, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = None, **kwargs) None
- start_task(total: int, *, file_count: int = None, position: int = 0, file_index: int = 0, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = 0, units: CkanProgressUnits = None, **kwargs) None
ckanapi_harvesters.auxiliary.ckan_progress_callbacks_prototypes module
Progress callback function definition
- ckanapi_harvesters.auxiliary.ckan_progress_callbacks_prototypes.jupyter_progress_callback(position: int, total: int, info: Any = None, *, context: str = None, file_index: int = None, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_upload: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, start_time: float = None, last_position: int = None, last_progress_position: int = None, **kwargs) None
Example of a progress_callback function which can be copied into a Jupyter Notebook using a progress bar:
`python from ipywidgets import IntProgress from IPython.display import display f = IntProgress(min=0,max=100) `
ckanapi_harvesters.auxiliary.ckan_progress_callbacks_simple module
Progress callback function definition
- class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_simple.CkanProgressCallbackSimple(callback_fun: Callable | CkanProgressCallbackSimple = None, *, progress_bar_type: CkanProgressBarType = None)
Bases:
CkanProgressCallbackABC- copy(*, dest=None)
- update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None
Progress callback function. Use to implement a progress indication for the user.
- Parameters:
position – the position within the resource (usually, in bytes or line count)
total – the total size of the resource
info – an object from which more information can be extracted, typically, the DataFrame itself, with an indication of the data origin.
context – the context of the call (ckan instance, upload/download, single/multi-threaded)
file_index – the index of the file in the list
file_count – the number of files in the list
lines_chunk – the number of lines in the chunk currently being processed
total_lines_read – the total number of lines read, including the current chunk
canceled_request – this callback is also called when a line is ignored
end_message – boolean indicating of the work in progress
level – the level of the progress callback
- ckanapi_harvesters.auxiliary.ckan_progress_callbacks_simple.default_progress_callback(position: int, total: int, info: Any = None, *, context: str = None, file_index: int = None, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_upload: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, start_time: float = None, last_position: int = None, last_progress_position: int = None, **kwargs) str | None
ckanapi_harvesters.auxiliary.ckan_progress_callbacks_tqdm module
Progress callback function definition
- class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_tqdm.CkanProgressCallbackTqdm(callback_fun: Callable | CkanProgressCallbackSimple = None, *, progress_bar_type: CkanProgressBarType = None)
Bases:
CkanProgressCallbackSimple- copy(*, dest=None)
- default_progress_bar_type = 1
- end_task(total: int, *, file_count: int = None, position: int = None, file_index: int = None, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = None, **kwargs) None
- property progress_bar_type: CkanProgressBarType
- progress_bar_update_min_interval_s = 0.25
- progress_bar_update_threshold_pct = 0.5
- release_resources() None
Release resources used by the progress callback like progress bars.
- start_task(total: int, *, file_count: int = None, position: int = 0, file_index: int = 0, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = 0, units: CkanProgressUnits = None, **kwargs) None
- update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None
Progress callback function. Use to implement a progress indication for the user.
- Parameters:
position – the position within the resource (usually, in bytes or line count)
total – the total size of the resource
info – an object from which more information can be extracted, typically, the DataFrame itself, with an indication of the data origin.
context – the context of the call (ckan instance, upload/download, single/multi-threaded)
file_index – the index of the file in the list
file_count – the number of files in the list
lines_chunk – the number of lines in the chunk currently being processed
total_lines_read – the total number of lines read, including the current chunk
canceled_request – this callback is also called when a line is ignored
end_message – boolean indicating of the work in progress
level – the level of the progress callback
ckanapi_harvesters.auxiliary.ckan_vocabulary_deprecated module
CKAN tag vocabulary information
- class ckanapi_harvesters.auxiliary.ckan_vocabulary_deprecated.CkanTagVocabularyInfo(d: dict)
Bases:
object- static from_dict(d: dict) CkanTagVocabularyInfo
- class ckanapi_harvesters.auxiliary.ckan_vocabulary_deprecated.CkanVocabularyMap
Bases:
CkanMapABC- _update_vocabulary_info(vocabulary_info: CkanTagVocabularyInfo | List[CkanTagVocabularyInfo], vocabularies_listed: bool = False) None
Internal function to update the information of a vocabulary.
- copy() CkanVocabularyMap
- static from_dict(d: dict) CkanVocabularyMap
- get_vocabulary_id(vocabulary_name: str, *, error_not_mapped: bool = True, search_title: bool = True) str | None
Retrieve the vocabulary id for a given vocabulary name based on the vocabulary map.
- Parameters:
vocabulary_name – vocabulary name or id.
- Returns:
- purge()
- to_dict() dict
- update_from_dict(data: dict) None
ckanapi_harvesters.auxiliary.deprecated module
Dead code from auxiliary functions
- class ckanapi_harvesters.auxiliary.deprecated.CkanBasicDataFieldType(*values)
Bases:
IntEnum- Default = 0
- Numeric = 2
- Text = 1
- TimeStamp = 3
- static from_str(s)
ckanapi_harvesters.auxiliary.error_level_message module
Functions to define messages with an error level
- exception ckanapi_harvesters.auxiliary.error_level_message.ContextErrorLevelMessage(context: str, error_level: ErrorLevel, specific_message: str)
Bases:
ErrorLevelMessage
- class ckanapi_harvesters.auxiliary.error_level_message.ErrorLevel(*values)
Bases:
IntEnum- Error = 2
- Information = 0
- Warning = 1
- static from_str(s)
- exception ckanapi_harvesters.auxiliary.error_level_message.ErrorLevelMessage(error_level: ErrorLevel, message: str)
Bases:
Exception- error_level: ErrorLevel
- message: str
- to_dict() dict
ckanapi_harvesters.auxiliary.external_code_import module
This implements functionality to dynamically call functions specified by the user. This functionality is disabled by default. You must call unlock_external_code_execution to enable external code execution. __Warning__: only run code if you trust the source!
- exception ckanapi_harvesters.auxiliary.external_code_import.ExternalUserCodeDisabledException(function_name: str, source_file: str)
Bases:
Exception
- class ckanapi_harvesters.auxiliary.external_code_import.PythonUserCode(python_file: str, base_dir: str = None)
Bases:
objectThis class imports an arbitrary Python file as a module and makes it available to the rest of the code. This functionality is disabled by default. You must call unlock_external_code_execution to enable external code execution.
__Warning__: only run code if you trust the source!
- copy() PythonUserCode
- enable_external_code = False
- function_pointer(function_name: str) Callable
Obtain function pointer for a given name in the loaded Python module.
- Parameters:
function_name
- Returns:
- ckanapi_harvesters.auxiliary.external_code_import.clean_var_name(variable_name: str) str
ckanapi_harvesters.auxiliary.lazy_imports module
Central implementation of lazy imports of optional dependencies / dependencies rarely used
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_bson()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_geopandas_gpd()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_psycopg2()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_pymongo()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_pyproj()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_shapely()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_sqlalchemy()
- ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_ssh_tunnel_SSHTunnelForwarder()
ckanapi_harvesters.auxiliary.list_records module
Give partial DataFrame behavior to a list of dictionaries
- class ckanapi_harvesters.auxiliary.list_records.ListRecords(*args, **kwargs)
Bases:
listGive partial DataFrame behavior to a list of dictionaries
- copy() ListRecords
Return a shallow copy of the list.
- property iloc
- ckanapi_harvesters.auxiliary.list_records.records_to_df(records: List[dict] | ListRecords, df_args: dict = None, *, missing_value='', none_value='None') DataFrame
Keep source values (lesser type inference) and replace cells with missing keys with a fixed value. None values are also preserved using the none_value.
- Parameters:
records – input data
df_args – arguments to pass to DataFrame constructor
missing_value – value to set if a column is not specified on a row.
none_value – value to set if a value is None in the input data.
- Returns:
ckanapi_harvesters.auxiliary.login module
Methods to load authentication credentials (user, password)
- class ckanapi_harvesters.auxiliary.login.Login(username: str = None, password: str = None, login_file: str = None, auto_load: bool = True, login_file_environ: str = None)
Bases:
object- LOGIN_FILE_ENVIRON: str | None = None
- clear() None
- copy(*, dest=None)
- get_default_login_file() str | None
- input()
Prompt the user to input the login credentials in the console window.
- Returns:
- is_empty()
- load_from_environ(*, error_not_found: bool = False, empty_warning: bool = True) bool
Load login from environment variables, by order of priority:
LOGIN_FILE_ENVIRON: path to a file containing the username, password
- Parameters:
error_not_found – raise an error if the API key file was not found
- Returns:
- load_from_file(login_file: str = None, *, base_dir: str = None, error_not_found: bool = True) bool
Load the credentials from file. The file should contain username in first line and password in second line.
- Parameters:
login_file – path to the API key file. The following keywords are accepted: - “environ”: the API key will be looked up in the environment variable with load_from_environ
base_dir – base directory to find the API key file, if a relative path is provided
error_not_found – option to raise an exception if the API key file is not found
- Returns:
- property password: str | None
- to_dict() Dict[str, str]
- to_tuple() Tuple[str, str]
- property username: str | None
ckanapi_harvesters.auxiliary.path module
Extensions of os.path and operations on urls
- exception ckanapi_harvesters.auxiliary.path.AbsolutePathError(field: str, path: str)
Bases:
Exception
- exception ckanapi_harvesters.auxiliary.path.BaseDirUndefError(path: str)
Bases:
Exception
- ckanapi_harvesters.auxiliary.path.glob_name(glob_str: str)
Extract file name glob from a glob string (last element of path, except if it is “**”)
- Parameters:
glob_str
- Returns:
- ckanapi_harvesters.auxiliary.path.glob_rm_glob(glob_str: str, *, default_rec_dir: str = None) str
Extract directory name from a glob string (first elements of path without glob characters).
- Parameters:
glob_str – the glob string
default_rec_dir – if the last removed element is “**” (directory recursion), the name of the directory to use instead
- Returns:
a path without glob characters
Examples: >>> glob_rm_glob(“test*.csv”) ‘test’
>>> glob_rm_glob("**\*.csv", default_rec_dir="hello") 'hello'
- ckanapi_harvesters.auxiliary.path.list_files_scandir(path: str) List[str]
- ckanapi_harvesters.auxiliary.path.make_path_relative(path: str, to_base_dir: str = None, *, default_value: str = None, source_string: str = None, keyword_exceptions: Set[str] = None, same_destination: bool = True) str
When you save a file to a new location, make relative paths relative to the new file location, pointing to the same destination (except if same_destination is False -> source_string is used in this case, if present and relative path) The source_string is the path present in the original document.
- Parameters:
path – full file path (absolute, ideally output from path_rel_to_dir)
to_base_dir – the new base directory, to derive the relative paths from
default_value – the value to return if the path is None
source_string – string representing the path in the original document, without any treatments
keyword_exceptions – keywords to return as-is
- Returns:
path relative to to_base_dir or keyword/path relative to environment variable/home directory symbol (~)
- ckanapi_harvesters.auxiliary.path.path_rel_to_dir(path: str | None, base_dir: str = None, *, keyword_exceptions: Set[str] = None, error_base_dir_undef: bool = False, default_value: str = None, only_relative: bool = False, abs_error: bool = False, field: str = None) str | None
Returns the absolute path. If relative, the base directory can be specified. If not specified, the cwd is used.
- Parameters:
path – original path string
base_dir – the base directory, for relative paths if provided (default = cwd)
keyword_exceptions – some values are not replaced and must be treated after this function call.
error_base_dir_undef – Option to raise an error if no base_dir was provided (cwd is used by default).
default_value – the value to return if path is None.
only_relative – If set to True, a warning or error message is raised if an absolute path is provided.
abs_error – Condition to choose between a warning or an error message.
field – name of the field for the error message.
- Returns:
absolute path or keyword
- ckanapi_harvesters.auxiliary.path.resolve_rel_path(base_dir: str, rel_path: str, *args: str, field: str, only_relative: bool = True) str
Alias to path_rel_to_dir, with arguments order similar to os.path.join and requirement for a relative path. Relative path verification can be removed by calling unlock_relative_path_constraint. field: name of the field for the error message.
- Returns:
ckanapi_harvesters.auxiliary.proxy_config module
Setting the proxy from simple command line arguments
- exception ckanapi_harvesters.auxiliary.proxy_config.HttpsProxyDefError
Bases:
Exception
- class ckanapi_harvesters.auxiliary.proxy_config.ProxyConfig(proxy_string: str | dict = None, default_proxies: dict = None, proxy_headers: dict = None, proxy_auth: AuthBase | Tuple[str, str] = None)
Bases:
object- __init__(proxy_string: str | dict = None, default_proxies: dict = None, proxy_headers: dict = None, proxy_auth: AuthBase | Tuple[str, str] = None) None
- Parameters:
proxy_string – string or proxies dict or ProxyConfig object.
- If a string is provided, it must be an url to a proxy or one of the following values:
“environ”: use the proxies specified in the environment variables “http_proxy” and “https_proxy”
“noproxy”: do not use any proxies
“unspecified”: do not specify the proxies
“default”: use value provided by default_proxies
- Parameters:
default_proxies – proxies used if proxies=”default”
proxy_headers – headers used to access the proxies, generally for authentication
- static _setup_cli_proxy_parser(parser: ArgumentParser = None) ArgumentParser
Define or add CLI arguments to initialize the proxy parser help message:
Proxy parameters initialization
- options:
- -h, --help
show this help message and exit
- --proxy PROXY
Proxy for HTTP and HTTPS
- Parameters:
parser – option to provide an existing parser to add the specific fields needed to initialize a CKAN API connection
- Returns:
- copy() ProxyConfig
- static from_cli_args(args: Namespace, *, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) ProxyConfig
- static from_str_or_config(proxies: str | dict | ProxyConfig, *, default_proxies: dict = None, proxy_headers: dict = None) ProxyConfig
- load_proxy_auth_from_file(file_path: str, *, base_dir: str = None, error_not_found: bool = True) bool
- property proxies: dict
- property proxy_auth: AuthBase | Tuple[str, str]
- property proxy_string: str | dict | None
- replace_default_proxy(default_proxies: dict) None
- reset() None
- ckanapi_harvesters.auxiliary.proxy_config.get_proxies_from_environ() dict
ckanapi_harvesters.auxiliary.ssh_tunnel module
Class to parameterize and establish an SSH tunnel to a distant server
- class ckanapi_harvesters.auxiliary.ssh_tunnel.SshLogin(username: str = None, password: str = None, login_file: str = None, auto_load: bool = True, login_file_environ: str = None)
Bases:
Login- LOGIN_FILE_ENVIRON: str | None = 'SSH_AUTH_FILE'
- input()
Prompt the user to input the login credentials in the console window.
- Returns:
- class ckanapi_harvesters.auxiliary.ssh_tunnel.SshTunnel(*, remote_host: str = None, remote_port: int = None, ssh_host: str = None, ssh_port: int = None, ssh_login: SshLogin = None, ssh_login_file: str = None, ssh_pkey_file: str = None, proxy: ProxyConfig = None)
Bases:
object- __init__(*, remote_host: str = None, remote_port: int = None, ssh_host: str = None, ssh_port: int = None, ssh_login: SshLogin = None, ssh_login_file: str = None, ssh_pkey_file: str = None, proxy: ProxyConfig = None) None
SSH Tunnel parameterization functions.
SSH remote is to be configured by the caller. The other attributes can be configured by the CLI.
- Parameters:
remote_host – Remote bind host. This is the service which is not exposed in clear, on server side.
remote_port – Remote bind port.
ssh_host – Remote SSH server host.
ssh_port – Remote SSH server port.
ssh_login_file – Login to connect to the SSH server.
ssh_pkey_file – Path to the SSH private key file.
- close_tunnel()
Close SSH tunnel. Please close underlying connections before.
- get_tunnel_host() str
- get_tunnel_url() str
- remote_host: str
- server: None
- socks_proxy: ProxyConfig
- ssh_host: str
- start_tunnel()
ckanapi_harvesters.auxiliary.urls module
Operations on urls
- ckanapi_harvesters.auxiliary.urls.clean_base_url(url: str | None) str | None
- ckanapi_harvesters.auxiliary.urls.url_insert_login(url: str, login: Login)
Insert user authentication parameters in a url
- ckanapi_harvesters.auxiliary.urls.url_join(base: str, *args: str) str
Module contents
Package with helper function for CKAN requests using pandas DataFrames.