ckanapi_harvesters.auxiliary package

Submodules

ckanapi_harvesters.auxiliary.ckan_action module

Action response common treatments

exception ckanapi_harvesters.auxiliary.ckan_action.CkanActionError(ckan, response: CkanActionResponse, display_request: bool = True)

Bases: Exception

class ckanapi_harvesters.auxiliary.ckan_action.CkanActionResponse(response: Response, dry_run: bool = False)

Bases: object

Class which decodes and checks the response of a CKAN request

default_error(ckan) CkanActionError

Raise specific error codes depending on response

exception ckanapi_harvesters.auxiliary.ckan_action.CkanAuthorizationError(ckan, response: CkanActionResponse, display_request: bool = True)

Bases: CkanActionError

exception ckanapi_harvesters.auxiliary.ckan_action.CkanNotFoundError(ckan, object_type: str, response: CkanActionResponse, display_request: bool = True)

Bases: CkanActionError

exception ckanapi_harvesters.auxiliary.ckan_action.CkanSqlCapabilityError(ckan, response: CkanActionResponse, display_request: bool = True)

Bases: CkanActionError

ckanapi_harvesters.auxiliary.ckan_api_key module

Methods to load an API key

class ckanapi_harvesters.auxiliary.ckan_api_key.ApiKey(remote_url: str = None, *, apikey: str = None, apikey_file: str = None, api_key_header_name: str | Iterable[str] = None)

Bases: object

API key storage class.

__init__(remote_url: str = None, *, apikey: str = None, apikey_file: str = None, api_key_header_name: str | Iterable[str] = None)

CKAN Database API key storage class.

Parameters:
  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

apply_constraints(*, auto_clear: bool = True, raise_error: bool = False) bool
clear() None
copy(*, dest=None)
get_auth_header() Dict[str, str]

Returns the correct header with the API key for the requests needing it. If no API key was loaded, returns an empty dictionary.

input()

Prompt the user to input the API key in the console window.

Returns:

is_empty()
load_apikey(apikey_file: str = None, *, base_dir: str = None, error_not_found: bool = True) bool

Load the API key from file. The file should contain a valid API key in the first line of text.

Parameters:
  • apikey_file – path to the API key file. The following keywords are accepted: - “environ”: the API key will be looked up in the environment variable with load_from_environ

  • base_dir – base directory to find the API key file, if a relative path is provided

  • error_not_found – option to raise an exception if the API key file is not found

Returns:

load_from_environ(*, error_not_found: bool = False) bool

Load CKAN API key from environment variables, by order of priority:

By default, no environment variables are used.

print_help_cli(display: bool = True) str
property remote_url: str
property remote_url_constraint: str | None
property value: str | None
class ckanapi_harvesters.auxiliary.ckan_api_key.CkanApiKey(apikey: str = None, *, remote_url: str = None, apikey_file: str = None, apikey_auto_load: bool = True)

Bases: ApiKey

CKAN Database API key storage class.

API_KEY_FILE_DEFAULT_LIST = ['/home/runner/.config/__CKAN_API_KEY__.txt', '/home/runner/.ckan/__CKAN_API_KEY__.txt']
API_KEY_FILE_ENVIRON = 'CKAN_API_KEY_FILE'
CKAN_API_KEY_ENVIRON = 'CKAN_API_KEY'
CKAN_API_KEY_HEADER_NAME = {'Authorization', 'X-CKAN-API-Key'}
__init__(apikey: str = None, *, remote_url: str = None, apikey_file: str = None, apikey_auto_load: bool = True)

CKAN Database API key storage class.

Parameters:
  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • apikey_auto_load – option to automatically load the API key from file or envrionment variables

Order of priority:

  1. Value of apikey

  2. Contents of file pointed by apikey_file

  3. Value of environment variable CKAN_API_KEY

  4. Contents of file pointed by the environment variable CKAN_API_KEY_FILE

  5. Contents of the file at the default location: ~/.config/__CKAN_API_KEY__.txt or ~/.ckan/__CKAN_API_KEY__.txt

copy(*, dest=None) CkanApiKey
static get_default_apikey_file() str | None
input()

Prompt the user to input the API key in the console window.

Returns:

load_from_environ(*, error_not_found: bool = False, empty_warning: bool = True) bool

Load CKAN API key from environment variables, by order of priority:

  • CKAN_API_KEY: for the raw API key (it is not recommended to store API key in an environment variable)

  • CKAN_API_KEY_FILE: path to a file containing a valid API key in the first line of text

Parameters:

error_not_found – raise an error if the API key file was not found

Returns:

ckanapi_harvesters.auxiliary.ckan_auxiliary module

Data model to represent a CKAN database architecture

class ckanapi_harvesters.auxiliary.ckan_auxiliary.CkanFieldInternalAttrs

Bases: object

Custom information for internal use

copy() CkanFieldInternalAttrs
init_from_native_type(native_type: str) None
init_from_options_string(options_string: str) None
merge(new_values: CkanFieldInternalAttrs) CkanFieldInternalAttrs
print_help_cli(display: bool = True) str
update_from_ckan(ckan)
class ckanapi_harvesters.auxiliary.ckan_auxiliary.CkanIdFieldTreatment(*values)

Bases: IntEnum

Keep = 0
Remove = 2
SetIndex = 1
class ckanapi_harvesters.auxiliary.ckan_auxiliary.FileChunkDataFrame(df: ListRecords | DataFrame | None, file_path: str, file_index: int, chunk_index: int, file_position: int, read_line_counter: int)

Bases: object

Class to hold a chunk of a DataFrame of a file (only for DataStores), with the file name, index and an indication of the position in the file

__init__(df: ListRecords | DataFrame | None, file_path: str, file_index: int, chunk_index: int, file_position: int, read_line_counter: int) None
Parameters:
  • df – the data of the file chunk (leave None if not loaded)

  • file_path – the path to the source

  • file_index – the index of the file in the list

  • chunk_index – counter of chunks read within the file

  • file_position – the position within the file itself (approximation)

chunk_index: int
df: ListRecords | DataFrame | None
file_index: int
file_path: str
file_position: int
is_first_chunk: bool
read_line_counter: int
class ckanapi_harvesters.auxiliary.ckan_auxiliary.RequestType(*values)

Bases: IntEnum

Get = 1
Post = 2
ckanapi_harvesters.auxiliary.ckan_auxiliary.assert_or_raise(condition: bool, e: Exception) None
ckanapi_harvesters.auxiliary.ckan_auxiliary.bytes_to_megabytes(size_bytes: int | None) float | None
ckanapi_harvesters.auxiliary.ckan_auxiliary.ca_arg_to_str(ca_cert: bool | str | None, base_dir: str = None, source_string: str = None) str | None
ckanapi_harvesters.auxiliary.ckan_auxiliary.ca_file_rel_to_dir(ca_file: str | None, base_dir: str = None) Tuple[bool | str | None, str | None]
ckanapi_harvesters.auxiliary.ckan_auxiliary.dict_recursive_update(d: dict, u: dict) dict
ckanapi_harvesters.auxiliary.ckan_auxiliary.empty_str_to_None(value: str | None) str | None
ckanapi_harvesters.auxiliary.ckan_auxiliary.find_duplicates(list_str: Iterable) list
ckanapi_harvesters.auxiliary.ckan_auxiliary.import_args_kwargs_dict(args_kwargs: str) dict
ckanapi_harvesters.auxiliary.ckan_auxiliary.json_encode_params(params: dict) Tuple[str, dict]

For upload requests, with a records field, it is necessary to specify the params in the data argument instead of the json argument of requests. In the case there are NaN values, these are not supported by the requests encoder.

___Requirement___: add headers=json_headers !!!

Parameters:

params

Returns:

ckanapi_harvesters.auxiliary.ckan_auxiliary.parse_geometry_native_type(geometry_type: str) Tuple[str, int]
ckanapi_harvesters.auxiliary.ckan_auxiliary.requests_multipart_data(json_dict: dict, files: dict) dict

Generate the multipart data for a request containing json and a file. Used to fill the files argument of requests.post json_headers must not be used

Parameters:
  • json_dict

  • files

Returns:

ckanapi_harvesters.auxiliary.ckan_auxiliary.sql_varname_escape(var_name: str) str
ckanapi_harvesters.auxiliary.ckan_auxiliary.ssl_arguments_decompose(ca_cert: bool | str | None, *, default_ca_verify: bool = True) Tuple[bool, str | None]

Decompose requirements argument verify into boolean and path to a certificate file.

Parameters:
  • ca_cert

  • default_ca_verify – option to indicate if SSL should be enabled if ca_cert is None

Returns:

Tuple ca_verify, ssl_server_certfile

ckanapi_harvesters.auxiliary.ckan_auxiliary.str_is_not_empty(value: str | None) bool
ckanapi_harvesters.auxiliary.ckan_auxiliary.str_to_python_value(value: str) Any
ckanapi_harvesters.auxiliary.ckan_auxiliary.to_jsons_indent_lists_single_line(obj, *args, reduced_size: bool = False, **kwargs) str

Modified json representation of an object. Lists with strings / integers are displayed on one line.

Parameters:
  • obj – object to encode

  • args – args to pass to json.dumps()

  • reduced_size – option to not indent the json output (not human-readable)

  • kwargs – kwargs to pass to json.dumps()

Returns:

ckanapi_harvesters.auxiliary.ckan_auxiliary.upload_prepare_requests_files_arg(*, files: dict = None, file_path: str = None, df: DataFrame = None, payload: bytes | BufferedIOBase = None, payload_name: str = None) dict

Create files argument for requests.post, by order of priority:

Parameters:
  • files – files pass through argument to the requests.post function. Use to send other data formats.

  • payload – bytes to upload as a file

  • payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.

  • file_path – path of the file to transmit (binary and text files are supported here)

  • df – pandas DataFrame to replace resource

Returns:

ckanapi_harvesters.auxiliary.ckan_configuration module

Parameters which apply to the package

ckanapi_harvesters.auxiliary.ckan_configuration.unlock_external_url_resource_download(value: bool = True) None

This function enables the download of resources external from the CKAN server.

Returns:

ckanapi_harvesters.auxiliary.ckan_configuration.unlock_no_server_ca(value: bool = True) None

This function enables you to disable the CA verification of the CKAN server.

__Warning__: Only allow in a local environment!

Returns:

ckanapi_harvesters.auxiliary.ckan_defs module

Data model to represent a CKAN database architecture

ckanapi_harvesters.auxiliary.ckan_errors module

CKAN error types

exception ckanapi_harvesters.auxiliary.ckan_errors.AdminFeatureLockedError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.ApiKeyFileError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.ArgumentError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.CkanArgumentError(api_name: str, argument_name: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.CkanMandatoryArgumentError(action_name: str, attribute_name: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.CkanServerError(ckan, response: Response, msg: str, display_request: bool = True)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.DataStoreNotFoundError(resource_id: str, error_message: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.DuplicateNameError(object_type: str, names: Iterable[str])

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.ExternalUrlLockedError(url: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.FileFormatRequirementError(requirement: str, file_format: str)

Bases: RequirementError

exception ckanapi_harvesters.auxiliary.ckan_errors.FileOrDirNotExistError(path: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.ForbiddenNameError(object_type: str, names: Iterable[str])

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.FunctionMissingArgumentError(function_name: str, argument_name: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.HostContraintError(host_url: str, url: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.HttpRetryCodeError(status_code: int, description: str = None)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.IncompletePatchError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.IntegrityError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.InvalidParameterError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.LoginFileError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MandatoryAttributeError(object_type: str, attribute_name: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MaxAttemptsError(accumulated_traceback: List[str])

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MaxRequestsCountError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MissingCodeFileError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MissingIOFunctionError(function_type: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MissingIdError(object_type: str, object_name)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.MultipleErrors(errors: List[Exception])

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.NameFormatError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.NoCAVerificationError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.NoDefaultView(resource_format: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.NotMappedObjectNameError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.ReadOnlyError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.RequestError

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.RequirementError(requirement: str, function: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.SearchAllNoCountsError(api_name: str, argument_name_value: str = None)

Bases: ArgumentError

exception ckanapi_harvesters.auxiliary.ckan_errors.UnexpectedError

Bases: RuntimeError

exception ckanapi_harvesters.auxiliary.ckan_errors.UnknownCliArgumentError(extra_args: List[str], context: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.UnknownTargetCRSError(source_crs, context: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.ckan_errors.UrlError

Bases: Exception

ckanapi_harvesters.auxiliary.ckan_field_types module

CKAN DataStore field types and link with PostgreSQL types

class ckanapi_harvesters.auxiliary.ckan_field_types.CkanFieldType

Bases: str, CkanFieldTypeABC

Role previously managed by CkanFieldTypeEnum, but accepts any string

static from_str(s)
class ckanapi_harvesters.auxiliary.ckan_field_types.CkanFieldTypeABC

Bases: ABC

abstractmethod static from_str(s)
class ckanapi_harvesters.auxiliary.ckan_field_types.CkanFieldTypeEnum(*values)

Bases: IntEnum

Enumeration of types encountered during development + documentation

Numeric = 2
Text = 1
TimeStamp = 3
bigint = 102
bigserial = 151
bit = 200
bool = 15
box = 223
bson = 230
bytea = 204
char = 201
cidr = 156
circle = 224
date = 11
float = 14
float4 = 106
float8 = 107
static from_str(s)
geometry = 228
inet = 155
int = 13
int16 = 103
int2 = 101
int32 = 104
int4 = 13
int64 = 105
int8 = 102
integer = 13
json = 30
jsonb = 31
line = 226
lseg = 225
macaddr = 154
money = 108
oid = 153
path = 221
point = 220
polygon = 222
serial = 150
serial4 = 150
serial8 = 151
time = 12
timestamptz = 21
timetz = 20
uuid = 152
varbit = 202
varchar = 203
xml = 32

ckanapi_harvesters.auxiliary.ckan_map module

Data model to represent a CKAN database architecture

class ckanapi_harvesters.auxiliary.ckan_map.CkanMap

Bases: CkanMapABC

Class to store an image of the CKAN database architecture Auxiliary class of CkanApi

_update_datastore_info(datastore_info: CkanDataStoreInfo) None

Internal function to update the length of a DataStore without making a request.

_update_datastore_len(resource_id: str, new_len: int) None

Internal function to update the length of a DataStore without making a request.

Parameters:
  • resource_id – resource id.

  • new_len – value to replace

_update_group_info(group_info: CkanGroupInfo | List[CkanGroupInfo]) None

Internal function to update information on a group.

_update_license_info(license_info: CkanLicenseInfo | List[CkanLicenseInfo]) None

Internal function to update the information on a license.

_update_organization_info(organization_info: CkanOrganizationInfo | List[CkanOrganizationInfo]) None

Internal function to update information on an organization.

_update_package_info(package_info: CkanPackageInfo | List[CkanPackageInfo]) None

Internal function to update the information of a package.

NB: the indicator pkg_info.requested_datastore_info remains False until map_resources is called.

_update_resource_info(resource_info: CkanResourceInfo | List[CkanResourceInfo]) None

Internal function to update the length of a DataStore without making a request.

_update_user_info(user_info: CkanUserInfo | List[CkanUserInfo]) None

Internal function to update information on a group.

copy() CkanMap
static from_dict(d: dict) CkanMap
get_datastore_info(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) CkanDataStoreInfo | None
Parameters:
  • resource_name – resource name or id.

  • package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.

Returns:

get_datastore_len(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) int | None

Retrieve the number of rows in a DataStore from the mapped data. This requires the map_resources to be called with the option datastore_info=True.

Parameters:
  • resource_name – resource name or id.

  • package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.

Returns:

get_license_id(license_name: str, *, error_not_mapped: bool = True) str

Retrieve the ID of a license based on the mapped data.

Parameters:

license_name – license title or id.

Returns:

get_license_info(license_name: str, *, error_not_mapped: bool = True) CkanLicenseInfo | None

Retrieve the information on a license based on the mapped data.

Parameters:

license_name – license title or id.

Returns:

get_organization_for_owner_org(organization_name: str, *, error_not_mapped: bool = True) CkanOrganizationInfo | None

Retrieve the organization name for a given organization name based on the mapped data. This is the field usually used for the owner_org argument. Calls CkanOrganizationInfo.get_owner_org

Parameters:

organization_name – organization name or id.

Returns:

get_organization_id(organization_name: str, *, error_not_mapped: bool = True, search_title: bool = True) str | None

Retrieve the organization id for a given organization name based on the mapped data.

Parameters:

organization_name – organization name, title or id.

Returns:

get_organization_info(organization_name: str, *, error_not_mapped: bool = True) CkanOrganizationInfo | None

Retrieve the organization info for a given organization name based on the mapped data.

Parameters:

organization_name – organization name or id.

Returns:

get_package_id(package_name: str, *, error_not_mapped: bool = True, search_title: bool = True) str | None

Retrieve the package id for a given package name based on the package map.

Parameters:

package_name – package name or id.

Returns:

get_package_info(package_name: str, *, error_not_mapped: bool = True) CkanPackageInfo | None

Retrieve the package info for a given package name based on the package map.

Parameters:

package_name – package name or id.

Returns:

get_resource_id(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) str | None

Retrieve the resource id for a given combination of (package name and resource name) based on the package map.

Parameters:
  • resource_name – resource alias, name or id.

  • package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.

Returns:

get_resource_info(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) CkanResourceInfo | None

Retrieve the information on a given resource.

Parameters:
  • resource_name – resource name or id.

  • package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.

Returns:

get_resource_package_id(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) str | None

Retrieve the package id of a given resource.

Parameters:
  • resource_name – resource name or id.

  • package_name – package name or id (required if resource_name is a resource name). An integrity check is performed if given.

Returns:

purge()

Erase known package mappings.

Returns:

to_dict() dict
update_from_dict(data: dict) None
class ckanapi_harvesters.auxiliary.ckan_map.CkanMapABC

Bases: ABC

abstractmethod copy()
abstractmethod static from_dict(d: dict) CkanMap
abstractmethod purge()
abstractmethod to_dict() dict
abstractmethod update_from_dict(data: dict) None

ckanapi_harvesters.auxiliary.ckan_model module

Data model to represent a CKAN database architecture

class ckanapi_harvesters.auxiliary.ckan_model.CkanAliasInfo(d: dict = None)

Bases: object

copy() CkanAliasInfo
static from_dict(d: dict) CkanAliasInfo
to_dict(include_details: bool = True) dict
class ckanapi_harvesters.auxiliary.ckan_model.CkanCapacity(*values)

Bases: IntEnum

Admin = 3
Editor = 2
Excluded = 0
Member = 1
Public = 5
SysAdmin = 4
static from_str(s)
class ckanapi_harvesters.auxiliary.ckan_model.CkanCollaboration(capacity: CkanCapacity = None, modified: datetime = None, group_id: str = None, d: dict = None)

Bases: object

capacity: CkanCapacity
copy() CkanCollaboration
group_id: str | None
modified: datetime
to_dict(user_info: CkanUserInfo, group_table: Dict[str, CkanGroupInfo], date_format: str) dict
class ckanapi_harvesters.auxiliary.ckan_model.CkanConfigurableObjectABC

Bases: ABC

configurable_attributes: set = None
extra_attributes: set = {}
abstractmethod static get_resource_type() str
mandatory_attributes: set = None
class ckanapi_harvesters.auxiliary.ckan_model.CkanDataStoreInfo(d: dict = None)

Bases: object

aliases: List[str] | None
copy() CkanDataStoreInfo
details: dict
fields_dict: OrderedDict[str, CkanField] | None
fields_id_list: List[str]
static from_dict(d: dict) CkanDataStoreInfo
get_basic_field_list_dict()
get_original_field_list_dict()
get_recomp_field_list_dict()
index_fields: List[str]
index_size_mb: float | None
resource_id: str
row_count: int
table_size_mb: float | None
to_dict(include_details: bool = True) dict
class ckanapi_harvesters.auxiliary.ckan_model.CkanField(name: str, data_type: str, *, notes: str = None, native_type: str = None, type_override: bool = False, label: str = None)

Bases: CkanConfigurableObjectABC

Object representation of a CKAN Field configuration

configurable_attributes: set = {'label', 'name', 'notes'}
copy() CkanField
data_type: CkanFieldType | None
details: dict
static from_ckan_dict(d: dict) CkanField
static from_dict(d: dict) CkanField
static get_resource_type() str
internal_attrs: CkanFieldInternalAttrs
is_index: bool | None
label: str | None
mandatory_attributes: set = {'name'}
merge(new_values, dest: CkanField = None)
name: str
notes: str | None
notnull: bool | None
to_ckan_dict(include_details: bool = True) dict
to_dict(include_details: bool = True) dict
type_override: bool | None
uniquekey: bool | None
update_missing(other: CkanField)
class ckanapi_harvesters.auxiliary.ckan_model.CkanGroupInfo(d: dict)

Bases: object

copy() CkanGroupInfo
description: str
details: dict
static from_dict(d: dict) CkanGroupInfo
static get_resource_type() str
id: str
name: str
package_count: None | int
package_members: dict[str, CkanCapacity] | None
title: str
to_dict(include_details: bool = True) dict
user_members: dict[str, CkanCapacity] | None
class ckanapi_harvesters.auxiliary.ckan_model.CkanLicenseDomain(*values)

Bases: IntFlag

Content = 4
Data = 2
NoDomain = 0
Software = 1
static _generate_next_value_(name, start, count, last_values)

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_values: the last value assigned or None

static from_bool(*, domain_software: bool = False, domain_data: bool = False, domain_content: bool = False) CkanLicenseDomain
static from_dict(d: dict) CkanLicenseDomain
to_dict() dict
class ckanapi_harvesters.auxiliary.ckan_model.CkanLicenseInfo(d: dict)

Bases: object

details: dict
domain: CkanLicenseDomain
family: str
static from_dict(d: dict) CkanLicenseInfo
id: str
is_generic: bool | None
state: CkanState
title: str
to_dict(include_details: bool = True) dict
url: str
class ckanapi_harvesters.auxiliary.ckan_model.CkanOrganizationInfo(d: dict)

Bases: object

copy() CkanOrganizationInfo
static from_dict(d: dict) CkanOrganizationInfo
get_owner_org()

Returns the value used for the owner_org argument

Returns:

to_dict(include_details: bool = True) dict
user_members: None | Dict[str, CkanCapacity]
class ckanapi_harvesters.auxiliary.ckan_model.CkanPackageInfo(d: dict = None, *, package_name: str = None, package_id: str = None, title: str = None, description: str = None, private: bool = None, state: CkanState = None, version: str = None, url: str = None, tags: List[str] = None)

Bases: CkanConfigurableObjectABC

author: str | None
author_email: str | None
collaborators: None | Dict[str, CkanCollaboration]
configurable_attributes: set = {'author', 'author_email', 'description', 'maintainer', 'maintainer_email', 'name', 'private', 'state', 'title', 'version'}
copy() CkanPackageInfo
custom_fields: OrderedDict[str, str] | None
description: str | None
details: dict
extra_attributes: set = {'custom_fields', 'tags'}
static from_dict(d: dict) CkanPackageInfo
get_resource_index(resource_id: str) int
static get_resource_type() str
groups: List[CkanGroupInfo]
id: str | None
license_id: str | None
maintainer: str | None
maintainer_email: str | None
mandatory_attributes: set = {'name'}
metadata_created: datetime | None
metadata_modified: datetime | None
name: str | None
newly_created: bool
organization_info: CkanOrganizationInfo | None
package_resources: OrderedDict[str, CkanResourceInfo]
private: bool | None
requested_datastore_info: bool
resources_id_index: Dict[str, str]
resources_id_index_counts: Dict[str, int]
state: CkanState | None
tags: List[str] | None
tags_info: Dict[str, CkanTagInfo] | None
title: str | None
to_dict(include_details: bool = True) dict
update(refresh: CkanPackageInfo)
update_resource(resource_info: CkanResourceInfo) int
updated: bool
url: str | None
user_access: None | Dict[str, CkanCollaboration]
version: str | None
class ckanapi_harvesters.auxiliary.ckan_model.CkanResourceInfo(d: dict = None, name: str = None, format: str = None, description: str = None, state: CkanState = None)

Bases: CkanConfigurableObjectABC

configurable_attributes: set = {'description', 'format', 'name', 'state'}
copy() CkanResourceInfo
created: datetime | None
datastore_active: bool | None
datastore_info: CkanDataStoreInfo | None
datastore_info_error: dict | None
datastore_queried() bool
description: str | None
details: dict
download_size_mb: None | float
download_url: str | None
extra_attributes: set = {'download_url'}
format: str | None
static from_dict(d: dict) CkanResourceInfo
static get_resource_type() str
id: str | None
index_in_package: int | None
is_datastore() bool | None
last_modified: datetime | None
mandatory_attributes: set = {'name'}
metadata_modified: datetime | None
name: str | None
newly_created: bool
newly_updated: bool
package_id: str | None
state: CkanState | None
to_dict(include_details: bool = True) dict
update(refresh: CkanResourceInfo) None
update_missing(refresh: CkanResourceInfo) None
update_view(view_info: CkanViewInfo | List[CkanViewInfo], view_list: bool = False) None
view_is_full_list: bool
views: OrderedDict[str, CkanViewInfo] | None
class ckanapi_harvesters.auxiliary.ckan_model.CkanState(*values)

Bases: IntEnum

Active = 1
Deleted = 2
Draft = 0
static from_str(s)
class ckanapi_harvesters.auxiliary.ckan_model.CkanTagInfo(d: dict)

Bases: object

details: dict
display_name: str
static from_dict(d: dict) CkanTagInfo
id: str
name: str
state: CkanState | None
to_dict(include_details: bool = True) dict
vocabulary_id: str | None
class ckanapi_harvesters.auxiliary.ckan_model.CkanUserInfo(d: dict = None)

Bases: object

about: str | None
copy() CkanUserInfo
created: datetime | None
details: dict | None
display_name: str | None
email_hash: str | None
static from_dict(d: dict) CkanUserInfo
fullname: str | None
static get_resource_type() str
id: str | None
last_active: datetime | None
name: str | None
organizations: None | List[str]
state: CkanState | None
sysadmin: bool
to_dict(include_details: bool = True) dict
class ckanapi_harvesters.auxiliary.ckan_model.CkanViewInfo(d: dict)

Bases: object

copy() CkanViewInfo
details: dict
static from_dict(d: dict) CkanViewInfo
id: str
package_id: str
resource_id: str
title: str
to_dict(include_details: bool = True) dict
view_type: str
class ckanapi_harvesters.auxiliary.ckan_model.CkanVisibility(*values)

Bases: IntEnum

Private = 0
Public = 1
static from_bool_is_private(value)
static from_str(s)
to_bool_is_private()
class ckanapi_harvesters.auxiliary.ckan_model.UpsertChoice(*values)

Bases: IntEnum

Insert = 1
Update = 2
Upsert = 3

ckanapi_harvesters.auxiliary.ckan_progress_callbacks module

Progress callback function definition

ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc module

Progress callback function interface

class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanCallbackLevel(*values)

Bases: IntEnum

MultiFileResource = 3
Packages = 0
Requests = 4
ResourceChunks = 2
Resources = 1
class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressBarType(*values)

Bases: IntEnum

NoBar = 0
TqdmAuto = 1
TqdmConsole = 2
TqdmJupyter = 3
class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressCallbackABC(callback_fun: Callable | CkanProgressCallbackABC = None, *, progress_bar_type: CkanProgressBarType = None)

Bases: ABC

add_context(context: str, *, level: CkanCallbackLevel = None)
abstractmethod copy(*, dest=None)
default_progress_bar_type = 0
end_task(total: int, *, file_count: int = None, position: int = None, file_index: int = None, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = None, **kwargs) None
extra_context: dict[CkanCallbackLevel, str]
last_progress_file_index: dict[CkanCallbackLevel, int]
last_progress_position: dict[CkanCallbackLevel, int]
progress_bar_enables: dict[CkanCallbackLevel, bool]
property progress_bar_type: CkanProgressBarType
progress_bars: Dict[CkanCallbackLevel, Any]
progress_callback_fun: Callable[[int, int, Any], None] | None
progress_callback_kwargs: dict
release_resources()

Release resources used by the progress callback like progress bars.

remove_context(*, level: CkanCallbackLevel = None)
start_task(total: int, *, file_count: int = None, position: int = 0, file_index: int = 0, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = 0, units: CkanProgressUnits = None, **kwargs) None
start_time: Dict[CkanCallbackLevel, float | None]
abstractmethod update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None
verbosity: dict[CkanCallbackLevel, bool]
class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressCallbackEmpty(callback_fun: Callable | CkanProgressCallbackABC = None, *, progress_bar_type: CkanProgressBarType = None)

Bases: CkanProgressCallbackABC

Progress callback which does not display anything.

copy(*, dest=None)
end_task(total: int, *, file_count: int = None, position: int = None, file_index: int = None, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = None, **kwargs) None
start_task(total: int, *, file_count: int = None, position: int = 0, file_index: int = 0, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = 0, units: CkanProgressUnits = None, **kwargs) None
update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None
class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_abc.CkanProgressUnits(*values)

Bases: IntEnum

Bytes = 1
Items = 4
Pages = 3
Records = 2
Undef = 0
short_name() str

ckanapi_harvesters.auxiliary.ckan_progress_callbacks_prototypes module

Progress callback function definition

ckanapi_harvesters.auxiliary.ckan_progress_callbacks_prototypes.jupyter_progress_callback(position: int, total: int, info: Any = None, *, context: str = None, file_index: int = None, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_upload: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, start_time: float = None, last_position: int = None, last_progress_position: int = None, **kwargs) None

Example of a progress_callback function which can be copied into a Jupyter Notebook using a progress bar: `python from ipywidgets import IntProgress from IPython.display import display f = IntProgress(min=0,max=100) `

ckanapi_harvesters.auxiliary.ckan_progress_callbacks_simple module

Progress callback function definition

class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_simple.CkanProgressCallbackSimple(callback_fun: Callable | CkanProgressCallbackSimple = None, *, progress_bar_type: CkanProgressBarType = None)

Bases: CkanProgressCallbackABC

copy(*, dest=None)
update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None

Progress callback function. Use to implement a progress indication for the user.

Parameters:
  • position – the position within the resource (usually, in bytes or line count)

  • total – the total size of the resource

  • info – an object from which more information can be extracted, typically, the DataFrame itself, with an indication of the data origin.

  • context – the context of the call (ckan instance, upload/download, single/multi-threaded)

  • file_index – the index of the file in the list

  • file_count – the number of files in the list

  • lines_chunk – the number of lines in the chunk currently being processed

  • total_lines_read – the total number of lines read, including the current chunk

  • canceled_request – this callback is also called when a line is ignored

  • end_message – boolean indicating of the work in progress

  • level – the level of the progress callback

ckanapi_harvesters.auxiliary.ckan_progress_callbacks_simple.default_progress_callback(position: int, total: int, info: Any = None, *, context: str = None, file_index: int = None, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_upload: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, start_time: float = None, last_position: int = None, last_progress_position: int = None, **kwargs) str | None

ckanapi_harvesters.auxiliary.ckan_progress_callbacks_tqdm module

Progress callback function definition

class ckanapi_harvesters.auxiliary.ckan_progress_callbacks_tqdm.CkanProgressCallbackTqdm(callback_fun: Callable | CkanProgressCallbackSimple = None, *, progress_bar_type: CkanProgressBarType = None)

Bases: CkanProgressCallbackSimple

copy(*, dest=None)
default_progress_bar_type = 1
end_task(total: int, *, file_count: int = None, position: int = None, file_index: int = None, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = None, **kwargs) None
property progress_bar_type: CkanProgressBarType
progress_bar_update_min_interval_s = 0.25
progress_bar_update_threshold_pct = 0.5
release_resources() None

Release resources used by the progress callback like progress bars.

start_task(total: int, *, file_count: int = None, position: int = 0, file_index: int = 0, level: CkanCallbackLevel = None, info: Any = None, context: str = None, lines_chunk: int = None, total_lines_read: int = 0, units: CkanProgressUnits = None, **kwargs) None
update_task(position: int, total: int, *, info: Any = None, context: str = None, file_index: int = 0, file_count: int = None, lines_chunk: int = None, total_lines_read: int = None, canceled_request: bool = False, end_message: bool = False, level: CkanCallbackLevel = None, **kwargs) str | None

Progress callback function. Use to implement a progress indication for the user.

Parameters:
  • position – the position within the resource (usually, in bytes or line count)

  • total – the total size of the resource

  • info – an object from which more information can be extracted, typically, the DataFrame itself, with an indication of the data origin.

  • context – the context of the call (ckan instance, upload/download, single/multi-threaded)

  • file_index – the index of the file in the list

  • file_count – the number of files in the list

  • lines_chunk – the number of lines in the chunk currently being processed

  • total_lines_read – the total number of lines read, including the current chunk

  • canceled_request – this callback is also called when a line is ignored

  • end_message – boolean indicating of the work in progress

  • level – the level of the progress callback

ckanapi_harvesters.auxiliary.ckan_vocabulary_deprecated module

CKAN tag vocabulary information

class ckanapi_harvesters.auxiliary.ckan_vocabulary_deprecated.CkanTagVocabularyInfo(d: dict)

Bases: object

static from_dict(d: dict) CkanTagVocabularyInfo
to_dict(include_details: bool = True) dict
class ckanapi_harvesters.auxiliary.ckan_vocabulary_deprecated.CkanVocabularyMap

Bases: CkanMapABC

_update_vocabulary_info(vocabulary_info: CkanTagVocabularyInfo | List[CkanTagVocabularyInfo], vocabularies_listed: bool = False) None

Internal function to update the information of a vocabulary.

copy() CkanVocabularyMap
static from_dict(d: dict) CkanVocabularyMap
get_vocabulary_id(vocabulary_name: str, *, error_not_mapped: bool = True, search_title: bool = True) str | None

Retrieve the vocabulary id for a given vocabulary name based on the vocabulary map.

Parameters:

vocabulary_name – vocabulary name or id.

Returns:

purge()
to_dict() dict
update_from_dict(data: dict) None

ckanapi_harvesters.auxiliary.deprecated module

Dead code from auxiliary functions

class ckanapi_harvesters.auxiliary.deprecated.CkanBasicDataFieldType(*values)

Bases: IntEnum

Default = 0
Numeric = 2
Text = 1
TimeStamp = 3
static from_str(s)
class ckanapi_harvesters.auxiliary.deprecated.CkanCollaboratorCapacity(*values)

Bases: IntEnum

Collaboration capacities of users associated to a package/dataset

Editor = 2
Excluded = 0
Member = 1
static from_str(s)
class ckanapi_harvesters.auxiliary.deprecated.CkanGroupCapacity(*values)

Bases: IntEnum

Capacities of users in a group

Admin = 3
Excluded = 0
Member = 1
static from_str(s)

ckanapi_harvesters.auxiliary.error_level_message module

Functions to define messages with an error level

exception ckanapi_harvesters.auxiliary.error_level_message.ContextErrorLevelMessage(context: str, error_level: ErrorLevel, specific_message: str)

Bases: ErrorLevelMessage

class ckanapi_harvesters.auxiliary.error_level_message.ErrorLevel(*values)

Bases: IntEnum

Error = 2
Information = 0
Warning = 1
static from_str(s)
exception ckanapi_harvesters.auxiliary.error_level_message.ErrorLevelMessage(error_level: ErrorLevel, message: str)

Bases: Exception

error_level: ErrorLevel
message: str
to_dict() dict

ckanapi_harvesters.auxiliary.external_code_import module

This implements functionality to dynamically call functions specified by the user. This functionality is disabled by default. You must call unlock_external_code_execution to enable external code execution. __Warning__: only run code if you trust the source!

exception ckanapi_harvesters.auxiliary.external_code_import.ExternalUserCodeDisabledException(function_name: str, source_file: str)

Bases: Exception

class ckanapi_harvesters.auxiliary.external_code_import.PythonUserCode(python_file: str, base_dir: str = None)

Bases: object

This class imports an arbitrary Python file as a module and makes it available to the rest of the code. This functionality is disabled by default. You must call unlock_external_code_execution to enable external code execution.

__Warning__: only run code if you trust the source!

copy() PythonUserCode
enable_external_code = False
function_pointer(function_name: str) Callable

Obtain function pointer for a given name in the loaded Python module.

Parameters:

function_name

Returns:

ckanapi_harvesters.auxiliary.external_code_import.clean_var_name(variable_name: str) str
ckanapi_harvesters.auxiliary.external_code_import.unlock_external_code_execution(value: bool = True) None

This function enables external code execution for the PythonUserCode class.

__Warning__: only run code if you trust the source!

Returns:

ckanapi_harvesters.auxiliary.lazy_imports module

Central implementation of lazy imports of optional dependencies / dependencies rarely used

ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_bson()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_geopandas_gpd()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_psycopg2()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_pymongo()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_pyproj()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_shapely()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_sqlalchemy()
ckanapi_harvesters.auxiliary.lazy_imports.lazy_import_ssh_tunnel_SSHTunnelForwarder()

ckanapi_harvesters.auxiliary.list_records module

Give partial DataFrame behavior to a list of dictionaries

class ckanapi_harvesters.auxiliary.list_records.ListRecords(*args, **kwargs)

Bases: list

Give partial DataFrame behavior to a list of dictionaries

copy() ListRecords

Return a shallow copy of the list.

property iloc
ckanapi_harvesters.auxiliary.list_records.records_to_df(records: List[dict] | ListRecords, df_args: dict = None, *, missing_value='', none_value='None') DataFrame

Keep source values (lesser type inference) and replace cells with missing keys with a fixed value. None values are also preserved using the none_value.

Parameters:
  • records – input data

  • df_args – arguments to pass to DataFrame constructor

  • missing_value – value to set if a column is not specified on a row.

  • none_value – value to set if a value is None in the input data.

Returns:

ckanapi_harvesters.auxiliary.login module

Methods to load authentication credentials (user, password)

class ckanapi_harvesters.auxiliary.login.Login(username: str = None, password: str = None, login_file: str = None, auto_load: bool = True, login_file_environ: str = None)

Bases: object

LOGIN_FILE_ENVIRON: str | None = None
clear() None
copy(*, dest=None)
static from_dict(values: Dict[str, str]) Login
static from_tuple(values: Tuple[str, str]) Login
get_default_login_file() str | None
input()

Prompt the user to input the login credentials in the console window.

Returns:

is_empty()
load_from_environ(*, error_not_found: bool = False, empty_warning: bool = True) bool

Load login from environment variables, by order of priority:

  • LOGIN_FILE_ENVIRON: path to a file containing the username, password

Parameters:

error_not_found – raise an error if the API key file was not found

Returns:

load_from_file(login_file: str = None, *, base_dir: str = None, error_not_found: bool = True) bool

Load the credentials from file. The file should contain username in first line and password in second line.

Parameters:
  • login_file – path to the API key file. The following keywords are accepted: - “environ”: the API key will be looked up in the environment variable with load_from_environ

  • base_dir – base directory to find the API key file, if a relative path is provided

  • error_not_found – option to raise an exception if the API key file is not found

Returns:

property password: str | None
print_help_cli(display: bool = True) str
to_dict() Dict[str, str]
to_tuple() Tuple[str, str]
property username: str | None

ckanapi_harvesters.auxiliary.path module

Extensions of os.path and operations on urls

exception ckanapi_harvesters.auxiliary.path.AbsolutePathError(field: str, path: str)

Bases: Exception

exception ckanapi_harvesters.auxiliary.path.BaseDirUndefError(path: str)

Bases: Exception

ckanapi_harvesters.auxiliary.path.glob_name(glob_str: str)

Extract file name glob from a glob string (last element of path, except if it is “**”)

Parameters:

glob_str

Returns:

Example: >>> glob_name(”***.csv”) ‘*.csv’

ckanapi_harvesters.auxiliary.path.glob_rm_glob(glob_str: str, *, default_rec_dir: str = None) str

Extract directory name from a glob string (first elements of path without glob characters).

Parameters:
  • glob_str – the glob string

  • default_rec_dir – if the last removed element is “**” (directory recursion), the name of the directory to use instead

Returns:

a path without glob characters

Examples: >>> glob_rm_glob(“test*.csv”) ‘test’

>>> glob_rm_glob("**\*.csv", default_rec_dir="hello")
'hello'
ckanapi_harvesters.auxiliary.path.list_files_scandir(path: str) List[str]
ckanapi_harvesters.auxiliary.path.make_path_relative(path: str, to_base_dir: str = None, *, default_value: str = None, source_string: str = None, keyword_exceptions: Set[str] = None, same_destination: bool = True) str

When you save a file to a new location, make relative paths relative to the new file location, pointing to the same destination (except if same_destination is False -> source_string is used in this case, if present and relative path) The source_string is the path present in the original document.

Parameters:
  • path – full file path (absolute, ideally output from path_rel_to_dir)

  • to_base_dir – the new base directory, to derive the relative paths from

  • default_value – the value to return if the path is None

  • source_string – string representing the path in the original document, without any treatments

  • keyword_exceptions – keywords to return as-is

Returns:

path relative to to_base_dir or keyword/path relative to environment variable/home directory symbol (~)

ckanapi_harvesters.auxiliary.path.path_rel_to_dir(path: str | None, base_dir: str = None, *, keyword_exceptions: Set[str] = None, error_base_dir_undef: bool = False, default_value: str = None, only_relative: bool = False, abs_error: bool = False, field: str = None) str | None

Returns the absolute path. If relative, the base directory can be specified. If not specified, the cwd is used.

Parameters:
  • path – original path string

  • base_dir – the base directory, for relative paths if provided (default = cwd)

  • keyword_exceptions – some values are not replaced and must be treated after this function call.

  • error_base_dir_undef – Option to raise an error if no base_dir was provided (cwd is used by default).

  • default_value – the value to return if path is None.

  • only_relative – If set to True, a warning or error message is raised if an absolute path is provided.

  • abs_error – Condition to choose between a warning or an error message.

  • field – name of the field for the error message.

Returns:

absolute path or keyword

ckanapi_harvesters.auxiliary.path.resolve_rel_path(base_dir: str, rel_path: str, *args: str, field: str, only_relative: bool = True) str

Alias to path_rel_to_dir, with arguments order similar to os.path.join and requirement for a relative path. Relative path verification can be removed by calling unlock_relative_path_constraint. field: name of the field for the error message.

Returns:

ckanapi_harvesters.auxiliary.path.sanitize_path(path: str | None, *, expand_path: bool = False, keyword_exceptions: Set[str] = None) str | None

Sanitize paths from user inputs

ckanapi_harvesters.auxiliary.path.unlock_relative_path_constraint(value: bool = True) None

This function disables relative path error messages when a relative path is required.

Returns:

ckanapi_harvesters.auxiliary.proxy_config module

Setting the proxy from simple command line arguments

exception ckanapi_harvesters.auxiliary.proxy_config.HttpsProxyDefError

Bases: Exception

class ckanapi_harvesters.auxiliary.proxy_config.ProxyConfig(proxy_string: str | dict = None, default_proxies: dict = None, proxy_headers: dict = None, proxy_auth: AuthBase | Tuple[str, str] = None)

Bases: object

__init__(proxy_string: str | dict = None, default_proxies: dict = None, proxy_headers: dict = None, proxy_auth: AuthBase | Tuple[str, str] = None) None
Parameters:

proxy_string – string or proxies dict or ProxyConfig object.

If a string is provided, it must be an url to a proxy or one of the following values:
  • “environ”: use the proxies specified in the environment variables “http_proxy” and “https_proxy”

  • “noproxy”: do not use any proxies

  • “unspecified”: do not specify the proxies

  • “default”: use value provided by default_proxies

Parameters:
  • default_proxies – proxies used if proxies=”default”

  • proxy_headers – headers used to access the proxies, generally for authentication

static _setup_cli_proxy_parser(parser: ArgumentParser = None) ArgumentParser

Define or add CLI arguments to initialize the proxy parser help message:

Proxy parameters initialization

options:
-h, --help

show this help message and exit

--proxy PROXY

Proxy for HTTP and HTTPS

Parameters:

parser – option to provide an existing parser to add the specific fields needed to initialize a CKAN API connection

Returns:

copy() ProxyConfig
static from_cli_args(args: Namespace, *, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) ProxyConfig
static from_str_or_config(proxies: str | dict | ProxyConfig, *, default_proxies: dict = None, proxy_headers: dict = None) ProxyConfig
get_host_port() Tuple[str | None, int | None]
get_proxy_login() Login
is_defined() bool
load_proxy_auth_environ(*, error_not_found: bool = False) bool
load_proxy_auth_from_file(file_path: str, *, base_dir: str = None, error_not_found: bool = True) bool
property proxies: dict
property proxy_auth: AuthBase | Tuple[str, str]
property proxy_string: str | dict | None
replace_default_proxy(default_proxies: dict) None
reset() None
ckanapi_harvesters.auxiliary.proxy_config.get_proxies_from_environ() dict
ckanapi_harvesters.auxiliary.proxy_config.host_port_sep(url: str | None, *, default_port: int = None) Tuple[str | None, int | None]

ckanapi_harvesters.auxiliary.ssh_tunnel module

Class to parameterize and establish an SSH tunnel to a distant server

class ckanapi_harvesters.auxiliary.ssh_tunnel.SshLogin(username: str = None, password: str = None, login_file: str = None, auto_load: bool = True, login_file_environ: str = None)

Bases: Login

LOGIN_FILE_ENVIRON: str | None = 'SSH_AUTH_FILE'
input()

Prompt the user to input the login credentials in the console window.

Returns:

class ckanapi_harvesters.auxiliary.ssh_tunnel.SshTunnel(*, remote_host: str = None, remote_port: int = None, ssh_host: str = None, ssh_port: int = None, ssh_login: SshLogin = None, ssh_login_file: str = None, ssh_pkey_file: str = None, proxy: ProxyConfig = None)

Bases: object

__init__(*, remote_host: str = None, remote_port: int = None, ssh_host: str = None, ssh_port: int = None, ssh_login: SshLogin = None, ssh_login_file: str = None, ssh_pkey_file: str = None, proxy: ProxyConfig = None) None

SSH Tunnel parameterization functions.

SSH remote is to be configured by the caller. The other attributes can be configured by the CLI.

Parameters:
  • remote_host – Remote bind host. This is the service which is not exposed in clear, on server side.

  • remote_port – Remote bind port.

  • ssh_host – Remote SSH server host.

  • ssh_port – Remote SSH server port.

  • ssh_login_file – Login to connect to the SSH server.

  • ssh_pkey_file – Path to the SSH private key file.

close_tunnel()

Close SSH tunnel. Please close underlying connections before.

copy() SshTunnel
get_tunnel_host() str
get_tunnel_port() int
get_tunnel_url() str
is_connected() bool
is_defined() bool
print_help_cli(display: bool = True) str
remote_host: str
remote_port: int
server: None
socks_proxy: ProxyConfig
ssh_host: str
ssh_login: SshLogin
ssh_port: int
start_tunnel()
verbose: bool

ckanapi_harvesters.auxiliary.urls module

Operations on urls

ckanapi_harvesters.auxiliary.urls.clean_base_url(url: str | None) str | None
ckanapi_harvesters.auxiliary.urls.is_valid_url(url: str) bool
ckanapi_harvesters.auxiliary.urls.url_insert_login(url: str, login: Login)

Insert user authentication parameters in a url

ckanapi_harvesters.auxiliary.urls.url_join(base: str, *args: str) str
ckanapi_harvesters.auxiliary.urls.url_matches_host(host_url: str, url: str) bool

Module contents

Package with helper function for CKAN requests using pandas DataFrames.