ckanapi_harvesters.ckan_api package

Subpackages

Submodules

ckanapi_harvesters.ckan_api.ckan_api module

Alias to most complete CkanApi implementation

ckanapi_harvesters.ckan_api.ckan_api_0_base module

class ckanapi_harvesters.ckan_api.ckan_api_0_base.CkanApiABC

Bases: ABC

class ckanapi_harvesters.ckan_api.ckan_api_0_base.CkanApiBase(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, identifier=None)

Bases: CkanApiABC

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements the basic parameters and request functions.

CKAN_URL_ENVIRON = 'CKAN_URL'
__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:
  • url – url of the CKAN server

  • proxies – proxies to use for requests

  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • owner_org – name of the organization to limit package_search (optional)

  • params – other connection/behavior parameters

  • identifier – identifier of the ckan client

__str__() str

String representation of the instance, for debugging purposes.

Returns:

URL representing the CKAN server

_api_action_request(action: str, *, method: RequestType, params: dict = None, headers: dict = None, data: dict | str | bytes = None, json: dict = None, files: List[tuple] = None, timeout: float = None, _attempt_counts: int = 0, _attempt_traceback: List[str] = None) CkanActionResponse

Send API action request and return response.

Parameters:
  • action – action name

  • method – GET / POST

  • params – params to set in the url

  • data – information to encode in the request body (only for POST method)

  • json – information to encode as JSON in the request json (only for POST method)

  • files – files to upload in the request (only for POST method)

  • headers – headers for the request (authentication tokens are added by the function)

  • timeout – request timeout in seconds

  • _attempt_counts – internal argument in case of re-post of the request to count retries

  • _attempt_traceback – internal argument in case of re-post of the request to list error history

Returns:

_ckan_url_request(path: str, *, method: RequestType, params: dict = None, headers: dict = None, data: dict = None, json: dict = None, files: List[tuple] = None, timeout: float = None) Response

Send request to server and return response.

Parameters:
  • path – relative path to server url

  • method – GET / POST

  • params – params to set in the url

  • data – information to encode in the request body (only for POST method)

  • headers – headers for the request (authentication tokens are added by the function)

Returns:

_cli_ckan_args_apply(args: Namespace, *, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None

Apply the arguments parsed by the argument parser defined by _setup_cli_ckan_parser

Parameters:
  • args

  • base_dir – base directory to find the CKAN API key file, if a relative path is provided (recommended: leave None to use cwd)

  • error_not_found – option to raise an exception if the CKAN API key file is not found

  • default_proxies – proxies used if proxies=”default”

  • proxy_headers – headers used to access the proxies, generally for authentication

Returns:

_get_api_url(category: str = None)

Returns the base API url and appends the category

Parameters:

category – usually, “action”

Returns:

_init_session(*, internal: bool = False)

Initialize the session objects which are used to perform requests with this CKAN instance. This method can be overloaded to fit your needs (proxies, certificates, cookies, headers, etc.).

Parameters:

internal

Returns:

_prepare_headers(headers: dict = None, include_ckan_auth: bool = False) dict

Prepare headers for a request. If the request is destined to the CKAN server, include authentication headers, if API key was provided.

Parameters:
  • headers – initial headers

  • include_ckan_auth – boolean to include CKAN authentication headers

Returns:

_request_all_results_df(api_fun: Callable, *, params: dict = None, list_attrs: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, search_all: bool = True, progress_callback: CkanProgressCallbackABC = None, **kwargs) DataFrame

Multiply request with a limited length until no more data is transmitted thanks to the offset parameter. DataFrame implementation returns the concatenated DataFrame from the unitary function calls.

Parameters:
  • api_fun – function to call, typically a unitary request function

  • params – api_fun must accept params argument in order to transmit other values and enforce the offset parameter

  • limit – api_fun must accept limit argument in order to update the limit value

  • offset – api_fun must accept offset argument in order to update the offset value

  • search_all – if False, only the first request is operated

  • list_attrs – option to aggregate DataFrame attrs field into lists. # False not tested

  • kwargs – additional keyword arguments to pass to api_fun

Returns:

_request_all_results_list(api_fun: Callable, *, params: dict = None, limit: int = None, offset: int = 0, requests_limit: int = None, search_all: bool = True, progress_callback: CkanProgressCallbackABC = None, **kwargs) List[CkanActionResponse] | list

Multiply request with a limited length until no more data is transmitted thanks to the offset parameter. List implementation returns the list of the unitary function return values.

Parameters:
  • api_fun – function to call, typically a unitary request function

  • params – api_fun must accept params argument in order to transmit other values and enforce the offset parameter

  • limit – api_fun must accept limit argument in order to update the limit value

  • offset – api_fun must accept offset argument in order to update the offset value

  • search_all – if False, only the first request is operated

  • kwargs – additional keyword arguments to pass to api_fun

Returns:

_request_all_results_page_generator(api_fun: Callable, *, params: dict = None, limit: int = None, offset: int = 0, requests_limit: int = None, search_all: bool = True, progress_callback: CkanProgressCallbackABC = None, **kwargs) Generator[Any, Any, None]

Multiply request with a limited length until no more data is transmitted thanks to the offset parameter. Lazy auxiliary function which yields a result for each request.

Parameters:
  • api_fun – function to call, typically a unitary request function

  • params – api_fun must accept params argument in order to transmit other values and enforce the offset parameter

  • limit – api_fun must accept limit argument in order to update the limit value

  • offset – api_fun must accept offset argument in order to update the offset value

  • search_all – if False, only the first request is operated

  • kwargs – additional keyword arguments to pass to api_fun

Returns:

_setup_cli_ckan_parser(parser: ArgumentParser = None) ArgumentParser

Define or add CLI arguments to initialize a CKAN API connection parser help message:

CKAN API connection parameters initialization

options:
-h, --help

show this help message and exit

--ckan-url CKAN_URL

CKAN URL

--apikey APIKEY

CKAN API key

--apikey-file APIKEY_FILE

Path to a file containing the CKAN API key (first line)

--policy-file POLICY_FILE

Path to a file containing the CKAN data format policy (json format)

--owner-org OWNER_ORG

CKAN Owner Organization

--default-limit DEFAULT_LIMIT

Default number of rows per request

--verbose VERBOSE

Option to set verbosity

Parameters:

parser – option to provide an existing parser to add the specific fields needed to initialize a CKAN API connection

Returns:

api_action_call(action: str, *, method: RequestType, params: dict = None, headers: dict = None, data: dict = None, json: dict = None, files: List[tuple] = None) CkanActionResponse
api_help_show(action_name: str, *, print_output: bool = True) str

API help command on a given action.

Parameters:
  • action_name

  • print_output – Option to print the output in the command line

Returns:

property apikey: CkanApiKey
connect()
copy(new_identifier: str = None, *, dest=None)

Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

disconnect()
download_url_proxy(url: str, *, method: str = None, auth_if_ckan: bool = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None, stream: bool = False, timeout: float = None) Response

Download a URL using the CKAN parameters (proxy, authentication etc.)

Parameters:
  • url

  • proxies

  • headers

Returns:

download_url_proxy_test_head(url: str, *, raise_error: bool = False, auth_if_ckan: bool = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None, context: str = None, timeout: float = None) None | ContextErrorLevelMessage

This sends a HEAD request to the url using the CKAN connexion parameters via download_url_proxy. The resource is not downloaded but the headers indicate if the url is valid.

Returns:

None if successful

full_unlock(unlock: bool = True, *, no_ca: bool = None, external_url_resource_download: bool = None) None

Function to unlock full capabilities of the CKAN API

Parameters:

unlock

Returns:

init_from_environ(*, init_api_key: bool = True, error_not_found: bool = False) None

Initialize CKAN from environment variables.

  • CKAN_URL for the url of the CKAN server.

And optionally: - CKAN_API_KEY: for the raw API key (it is not recommended to store API key in an environment variable) - CKAN_API_KEY_FILE: path to a file containing a valid API key in the first line of text

Parameters:

error_not_found – raise an error if the API key file was not found

Returns:

initialize_from_cli_args(*, args: Sequence[str] = None, base_dir: str = None, error_not_found: bool = True, parser: ArgumentParser = None, default_proxies: dict = None, proxy_headers: dict = None) None

Intialize the CKAN API connection from command line arguments.

Parameters:

args – Option to provide arguments from another source.

Returns:

initialize_from_options_string(options_string: str = None, base_dir: str = None, error_not_found: bool = True, parser: ArgumentParser = None, default_proxies: dict = None, proxy_headers: dict = None) None
input_cli_args(*, base_dir: str = None, error_not_found: bool = True, only_if_necessary: bool = False, default_proxies: dict = None, proxy_headers: dict = None)

Initialize the query for initialization parameters in the command-line format in the console window.

Returns:

input_missing_info(*, base_dir: str = None, input_args: bool = False, input_args_if_necessary: bool = False, input_apikey: bool = True, error_not_found: bool = True)

Ask user information in the console window.

Parameters:

input_owner_org – option to ask for the owner organization.

Returns:

is_url_internal(url: str) bool

Tests whether a url points to the same server as the CKAN url.

Parameters:

url

Returns:

load_apikey(apikey_file: str = None, base_dir: str = None, error_not_found: bool = True)

Load the CKAN API key from file. The file should contain a valid API key in the first line of text.

Parameters:
  • apikey_file – API key file (optional if specified at the creation of the object)

  • base_dir – base directory, if the apikey_file is a relative path

Returns:

prepare_arguments_for_url_download_request(url: str, *, auth_if_ckan: bool = None, headers: dict = None, verify: bool | str | None = None) Tuple[bool, dict]

Include CKAN authentication headers only if the URL points to the CKAN server.

Parameters:
  • url – target URL

  • headers – initial headers

  • auth_if_ckan – option to include CKAN authentication headers if the url is recognized as part of the CKAN server.

Returns:

prepare_for_multithreading(mode_reduced: bool = True) None

This method disables unnecessary writes to this object. It is recommended to enable the reduced writes mode in a multithreaded context. Do not forget to reset sessions at the beginning of each thread.

Parameters:

mode_reduced

Returns:

print_help_cli(display: bool = True) str
purge() None

Erase temporary data stored in this object

Parameters:

purge_map – whether to purge the map created with map_resources

set_limits(limit_read: int | None) None

Set default query limits. If only one argument is provided, it applies to both limits.

Parameters:

limit_read – default limit for read requests

Returns:

set_proxies(proxies: str | dict | ProxyConfig, *, default_proxies: dict = None, proxy_headers: dict = None) None

Set up the proxy configuration

Parameters:

proxies – string or proxies dict or ProxyConfig object.

If a string is provided, it must be an url to a proxy or one of the following values:
  • “environ”: use the proxies specified in the environment variables “http_proxy” and “https_proxy”

  • “noproxy”: do not use any proxies

  • “unspecified”: do not specify the proxies

  • “default”: use value provided by default_proxies

Parameters:
  • default_proxies – proxies used if proxies=”default”

  • proxy_headers – headers used to access the proxies, generally for authentication

Returns:

set_requests_delay(time_between_requests: int) None

Set delay between requests in seconds.

Parameters:

time_between_requests – delay between requests in seconds

set_requests_timeout(requests_timeout: float, multi_requests_timeout=None) None

Set timeout for requests.

Parameters:
  • requests_timeout – timeout for each request (seconds)

  • multi_requests_timeout – timeout for grouped request (seconds)

Returns:

set_verbosity(verbosity: bool = True, verbose_extra: bool = None) None

Enable/disable full verbose output

Parameters:

verbosity – boolean. Cannot be None

Returns:

test_ckan_url_reachable(raise_error: bool = False) bool

Test if the CKAN URL is reachable with a HEAD request. This does not check it is really a CKAN server and does not check authentication.

static unlock_external_url_resource_download(value: bool = True)

This function enables the download of resources external from the CKAN server.

static unlock_no_ca(value: bool = True)

This function enables you to disable the CA verification of the CKAN server.

__Warning__: Only allow in a local environment!

property url: str

ckanapi_harvesters.ckan_api.ckan_api_1_map module

class ckanapi_harvesters.ckan_api.ckan_api_1_map.CkanApiMap(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, map: CkanMap = None, identifier=None)

Bases: CkanApiBase

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements the resource mapping capabilities to obtain resource ids necessary for the requests.

__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, map: CkanMap = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:
  • url – url of the CKAN server

  • proxies – proxies to use for requests

  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • owner_org – name of the organization to limit package_search (optional)

  • params – other connection/behavior parameters

  • map – map of known resources

  • identifier – identifier of the ckan client

_api_datastore_info(resource_id: str, *, params: dict = None, display_request_not_found: bool = True) CkanDataStoreInfo

API call to datastore_info. Returns the information on the DataStore. Used to know the number of rows in a DataStore.

Parameters:
  • resource_id – resource id.

  • params – N/A

  • display_request_not_found – whether to display the request in the command window, in case of a CkanNotFoundError. This option is recommended if you are testing whether the resource has a DataStore or not.

Returns:

_api_group_list(*, limit: int = None, offset: int = 0, groups: List[str] = None, all_fields: bool = True, include_users: bool = True, params: dict = None) List[CkanGroupInfo] | List[str]

API call to group_list.

Parameters:

params

Returns:

_api_group_list_all(*, all_fields: bool = True, include_users: bool = True, params: dict = None, limit: int = None, offset: int = None) List[CkanUserInfo] | List[str]

API call to group_list until an empty list is received.

See:

_api_group_list()

Parameters:

params

Returns:

_api_license_list(*, params: dict = None) List[CkanLicenseInfo]

API call to license_list.

Parameters:

params

Returns:

_api_organization_list(*, params: dict = None, all_fields: bool = True, include_users: bool = False, limit: int = None, offset: int = None) List[CkanOrganizationInfo] | List[str]

API call to organization_list.

Parameters:
  • params – typically, the request can be limited to an organization with the owner_org parameter

  • all_fields – whether to return full information or only the organization names in a list

Returns:

_api_organization_list_all(*, params: dict = None, all_fields: bool = True, include_users: bool = False, limit: int = None, offset: int = None) List[CkanOrganizationInfo] | List[str]

API call to organization_list until an empty list is received.

See:

_api_organization_list()

Parameters:

params

Returns:

_api_organization_show(id: str, *, params: dict = None) CkanOrganizationInfo

API call to organization_show.

Parameters:
  • id – organization id or name.

  • params – typically, the request can be limited to an organization with the owner_org parameter

Returns:

_api_package_collaborator_list(package_id: str, *, params: dict = None, cancel_if_present: bool = False) Dict[str, CkanCollaboration]

API call to package_collaborator_list.

Parameters:

params

Returns:

API call to package_search.

Parameters:
  • owner_org – ability to filter packages by owner_org

  • filter – dict of filters to apply, which translate to the API fq argument fq documentation: any filter queries to apply. Note: +site_id:{ckan_site_id} is added to this string prior to the query being executed.

  • q – the solr query. Optional. Default is ‘:

  • include_private – if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API

  • include_drafts – if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False.

  • sort – sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings.

  • facet – whether to enable faceted results. Default: True in API.

  • limit – maximum number of results to return. Translatees to the API rows argument.

  • offset – the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument.

  • params – other parameters to pass to package_search

Returns:

_api_package_search_all(*, params: dict = None, owner_org: str = None, filter: dict = None, q: str = None, include_private: bool = True, include_drafts: bool = True, sort: str = None, facet: bool = None, limit: int = None, offset: int = None, search_all: bool = True) List[CkanPackageInfo]

API call to package_search until an empty list is received.

See:

_api_package_search()

Parameters:
  • owner_org – ability to filter packages by owner_org

  • filter – dict of filters to apply, which translate to the API fq argument fq documentation: any filter queries to apply. Note: +site_id:{ckan_site_id} is added to this string prior to the query being executed.

  • q – the solr query. Optional. Default is ‘:

  • include_private – if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API

  • include_drafts – if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False.

  • sort – sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings.

  • facet – whether to enable faceted results. Default: True in API.

  • limit – maximum number of results to return. Translatees to the API rows argument.

  • offset – the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument.

  • params – other parameters to pass to package_search

Returns:

_api_package_show(package_id, *, params: dict = None) CkanPackageInfo

API call to package_show. Returns the information on the package and the resources contained in the package. Not recommended for outer use because this method does not return information about the DataStores. Prefer the map_resources method.

See:

map_resources()

Parameters:
  • package_id – package id.

  • params – See API documentation.

Returns:

_api_resource_show(resource_id, *, params: dict = None) CkanResourceInfo

API call to resource_show. Returns the metadata on a resource.

Parameters:
  • resource_id – resource id.

  • params – See API documentation.

Returns:

_api_resource_view_list(resource_id: str, *, params: dict = None) List[CkanViewInfo]

API call to resource_view_list.

Parameters:

params – typically, the request can be limited to an organization with the owner_org parameter

Returns:

_api_user_list(*, q: str = None, email: str = None, params: dict = None) List[CkanUserInfo]

API call to user_list.

Parameters:

params

Returns:

_api_user_show(*, params: dict = None) CkanUserInfo | None

API call to user_show. With no params, returns the name of the current user logged in.

Returns:

dict with information on the current user

_enrich_resource_info(resource_info: CkanResourceInfo, *, datastore_info: bool = False, resource_view_list: bool = False) None

Perform additional optional queries to add more information on a resource.

Parameters:
  • resource_info

  • datastore_info – option to query datastore_info

  • resource_view_list – option to query resource_view_list

Returns:

check_package_name_arg(*, package_name: str, package_id: str, raise_error: bool = True) bool

Check package name argument against ID which was found by API

Parameters:
  • package_name – package name, ID or title

  • package_id – package ID known by the API

  • raise_error – Option to raise an error

Returns:

complete_package_list(package_list: str | List[str] = None, *, owner_org: str = None, include_private: bool = True, include_drafts: bool = True, params: dict = None) List[str]

This function can list all packages of a CKAN server, for an organization or keeps the list as is. It is an auxiliary function to initialize a package_list argument

copy(new_identifier: str = None, *, dest=None)

Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

datastore_info(resource_id: str, *, params: dict = None, display_request_not_found: bool = True) CkanDataStoreInfo
get_datastore_fields_or_request(resource_id: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, return_list: bool = False) List[dict] | OrderedDict[str, CkanField] | None
get_datastore_info_or_request(resource_name: str, package_name: str = None, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) CkanDataStoreInfo | None

Get information on a DataStore if present in the map or perform request.

Parameters:
  • resource_name – resource name or id

  • package_name – package name or id (required if the resource name is provided)

  • request_missing – confirm to perform the request if the information is missing

  • error_not_mapped – raise error if the resource is not mapped

Returns:

get_datastore_info_or_request_of_id(resource_id: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) CkanDataStoreInfo | None

Get information on a DataStore if present in the map or perform request.

Parameters:
  • resource_id – resource id

  • request_missing – confirm to perform the request if the information is missing

  • error_not_mapped – raise error if the resource is not mapped

Returns:

get_organization_info_or_request(organization_name: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) CkanOrganizationInfo | None

Get information on a Package if present in the map or perform request.

Parameters:
  • organization_name – organization name or id

  • request_missing – confirm to perform the request if the information is missing

  • error_not_mapped – raise error if the resource is not mapped

Returns:

get_package_info_or_request(package_name: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None) CkanPackageInfo | None

Get information on a Package if present in the map or perform request.

Parameters:
  • package_name – package name or id

  • request_missing – confirm to perform the request if the information is missing

  • error_not_mapped – raise error if the resource is not mapped

Returns:

get_package_page_url(package_name: str, *, error_not_found: bool = True, default_url: bool = False) str

Get URL of package presentation page in CKAN (landing page).

Parameters:
  • package_name

  • error_not_found

  • default_url – return url based on package name, even if it was not found.

Returns:

get_resource_id_or_request(resource_name: str, package_name: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) str | None
get_resource_info_or_request(resource_name: str, package_name: str = None, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, datastore_info: bool = False) CkanResourceInfo | None
get_resource_info_or_request_of_id(resource_id: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, datastore_info: bool = False) CkanResourceInfo | None

Get information on a resource if present in the map or perform request. Recommended: self.map.get_resource_info() rather than this for this usage because resource information is returned when calling package_info during the mapping process.

Parameters:
  • resource_id – resource id

  • request_missing – confirm to perform the request if the information is missing

  • error_not_mapped – raise error if the resource is not mapped

Returns:

get_resource_page_url(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) str

Get URL of resource presentation page in CKAN (landing page).

Parameters:

package_name

Returns:

get_resource_view_list_or_request(resource_id: str, error_not_found: bool = True) List[CkanViewInfo] | None

Returns either the resource view list which was already received or emits a new query for this information.

Parameters:
  • resource_id

  • error_not_found

Returns:

group_list(*, limit: int = None, offset: int = 0, groups: List[str] = None, all_fields: bool = True, include_users: bool = True, params: dict = None) List[CkanGroupInfo]
group_list_all(*, all_fields: bool = True, include_users: bool = True, cancel_if_present: bool = False, params: dict = None, limit: int = None, offset: int = None) List[CkanGroupInfo] | List[str]

API call to group_list. The call can be canceled if the list is already present (not recommended, rather use get_organization_info_or_request).

Parameters:
  • params

  • cancel_if_present – option to cancel when list is already present.

Returns:

input_missing_info(*, base_dir: str = None, input_args: bool = False, input_args_if_necessary: bool = False, input_apikey: bool = True, error_not_found: bool = True, input_owner_org: bool = False)

Ask user information in the console window.

Parameters:

input_owner_org – option to ask for the owner organization.

Returns:

license_list(*, cancel_if_present: bool = True, params: dict = None) List[CkanLicenseInfo]

API call to license_list. The call can be canceled if the list is already present.

Parameters:
  • params

  • cancel_if_present – option to cancel when list is already present.

Returns:

map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, progress_callback: CkanProgressCallbackABC = None) CkanMap

Map the resources of a given package to obtain resource IDs associated with the package name and its resources.

Parameters:
  • package_list – List of packages to request. If not provided, the result of package_search is used.

  • params – Additional parameters to pass to all API calls (not recommended).

  • datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.

  • resource_view_list – If True, enables the request of the view_list API for each resource.

  • organization_info – If True, enables the request of the organization_list API before other requests.

  • license_list – If True, enables the request of the license_list API.

  • only_missing – If True, skips requesting already-mapped packages.

  • error_not_found – If True, packages not found by the API are ignored (no error is raised).

  • owner_org – Filters packages by a specific organization (only if package_search is used).

Returns:

A mapping of resources for the specified package(s).

Note

  • Packages were previously referred to as DataSets in earlier CKAN implementations.

  • A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.

map_user_rights(*, cancel_if_present: bool = True, progress_callback: CkanProgressCallbackABC = None) CkanMap

Map user and group access rights to the packages currently mapped by CKAN :return:

organization_list_all(*, cancel_if_present: bool = False, params: dict = None, all_fields: bool = True, include_users: bool = False, limit: int = None, offset: int = None) List[CkanOrganizationInfo] | List[str]

API call to license_list. The call can be canceled if the list is already present (not recommended, rather use get_organization_info_or_request).

Parameters:
  • params

  • cancel_if_present – option to cancel when list is already present.

Returns:

organization_show(id: str, *, params: dict = None) CkanOrganizationInfo
package_collaborator_list(package_id: str, *, params: dict = None, cancel_if_present: bool = False) Dict[str, CkanCollaboration]
package_search_all(*, params: dict = None, owner_org: str = None, filter: dict = None, q: str = None, include_private: bool = True, include_drafts: bool = True, sort: str = None, facet: bool = None, limit: int = None, offset: int = None, search_all: bool = True) List[CkanPackageInfo]
package_show(package_id, *, params: dict = None) CkanPackageInfo
purge(purge_map: bool = False) None

Erase temporary data stored in this object

Parameters:

purge_map – whether to purge the map created with map_resources

query_current_user(*, verbose: bool = None, error_not_found: bool = False) CkanUserInfo | None
remap_resources(*, params=None, purge: bool = True, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None)

Perform a new request on previously mapped packages.

Parameters:
  • params

  • purge – option to reset the map before remapping.

  • datastore_info – enforce the request of api_datastore_info

  • resource_view_list – enforce the request of view_list API for each resource

  • license_list – enforce the request of license_list API

Returns:

resource_is_datastore(resource_id: str) bool

Basic test to know whether a resource is DataStore.

Parameters:

resource_id

Returns:

resource_show(resource_id, *, params: dict = None) CkanResourceInfo
resource_view_list(resource_id: str, *, params: dict = None) List[CkanViewInfo]
set_default_map_mode(datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None) None

Set up the optional queries orchestrated by the map_resources function

Parameters:
  • datastore_info

  • resource_view_list

  • organization_info

  • license_list

Returns:

set_owner_org(owner_org: str, *, error_not_found: bool = True) None

Set the default owner organization.

Parameters:

owner_org – owner organization name, title or id.

Returns:

test_ckan_connection(raise_error: bool = False) bool

Test if the CKAN URL aims to a CKAN server by testing the package_search API. This does not check authentication.

test_ckan_login(*, raise_error: bool = False, verbose: bool = None, empty_key_connected: bool = False) bool

Test if your login leads to a user account.

Parameters:
  • raise_error – option to raise an error if no account was detected

  • verbose – option to display username in console

  • empty_key_connected – option to ignore the test if the API key is empty

user_list(*, cancel_if_present: bool = False, q: str = None, email: str = None, params: dict = None) List[CkanUserInfo]

API call to user_list. The call can be canceled if the list is already present.

Parameters:
  • params

  • cancel_if_present – option to cancel when list is already present.

Returns:

ckanapi_harvesters.ckan_api.ckan_api_2_readonly module

class ckanapi_harvesters.ckan_api.ckan_api_2_readonly.CkanApiReadOnly(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiReadOnlyParams = None, map: CkanMap = None, identifier=None)

Bases: CkanApiMap

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements requests to read data from the CKAN server resources / DataStores.

__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiReadOnlyParams = None, map: CkanMap = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:
  • url – url of the CKAN server

  • proxies – proxies to use for requests

  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • owner_org – name of the organization to limit package_search (optional)

  • params – other connection/behavior parameters

  • map – map of known resources

  • identifier – identifier of the ckan client

_api_datastore_dump_all(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, bom: bool = None, requests_limit: int = None, params: dict = None, search_all: bool = True, return_df: bool = True, progress_callback: CkanProgressCallbackABC = None) DataFrame | Response

Successive calls to _api_datastore_dump_df until an empty list is received.

See:

_api_datastore_dump()

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • format – The return format in the returned response (default=csv, tsv, json, xml) (optional)

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – if False, only the first request is operated

Returns:

_api_datastore_dump_all_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, format: str = None, bom: bool = None, params: dict = None, search_all: bool = True, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[Response, Any, None]

Successive calls to _api_datastore_dump until an empty list is received. Generator implementation which yields one DataFrame per request.

See:

_api_datastore_dump()

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • format – The return format in the returned response (default=csv, tsv, json, xml) (optional)

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – if False, only the first request is operated

Returns:

_api_datastore_dump_df(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, bom: bool = None, params: dict = None) DataFrame

Convert output of _api_datastore_dump_raw to pandas DataFrame.

_api_datastore_dump_raw(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, bom: bool = None, params: dict = None, compute_len: bool = False) Response

URL call to datastore/dump URL. Dumps successive lines in the DataStore.

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • format – The return format in the returned response (default=csv, tsv, json, xml) (optional)

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

Returns:

raw response

_api_datastore_search_all(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, search_all: bool = True, params: dict = None, return_df: bool = True, compute_len: bool = False) DataFrame | ListRecords | Any

Successive calls to _api_datastore_search_df until an empty list is received.

See:

_api_datastore_search()

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • format – The return format in the returned response (default=objects, csv, tsv, lists) (optional)

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – if False, only the first request is operated

Returns:

_api_datastore_search_all_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, search_all: bool = True, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]

Successive calls to _api_datastore_search_df until an empty list is received. Generator implementation which yields one DataFrame per request.

See:

_api_datastore_search()

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • format – The return format in the returned response (default=objects, csv, tsv, lists) (optional)

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – if False, only the first request is operated

Returns:

_api_datastore_search_df(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, params: dict = None, compute_len: bool = True) DataFrame

Convert output of _api_datastore_search_raw to pandas DataFrame.

_api_datastore_search_raw(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, params: dict = None, compute_len: bool = False) CkanActionResponse

API call to datastore_search. Performs queries on the DataStore.

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • format – The return format in the returned response (default=objects, csv, tsv, lists) (optional)

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

Returns:

_api_datastore_search_sql_all(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) DataFrame | ListRecords

Successive calls to _api_datastore_search_sql until an empty list is received.

See:

_api_datastore_search_sql()

Parameters:
  • sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • params – N/A

  • search_all – if False, only the first request is operated

Returns:

_api_datastore_search_sql_all_page_generator(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]

Successive calls to _api_datastore_search_sql until an empty list is received. Generator implementation which yields one DataFrame per request.

See:

_api_datastore_search_sql()

Parameters:
  • sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • params – N/A

  • search_all – if False, only the first request is operated

Returns:

_api_datastore_search_sql_df(sql: str, *, params: dict = None, limit: int = None, offset: int = 0) DataFrame

Convert output of _api_datastore_search_sql_raw to pandas DataFrame.

_api_datastore_search_sql_raw(sql: str, *, params: dict = None, limit: int = None, offset: int = 0) CkanActionResponse

API call to datastore_search_sql. Performs SQL queries on the DataStore. These queries can be more complex than with datastore_search. The DataStores are referenced by their resource_id, surrounded by quotes. The field names are referred by their name in upper case, surrounded by quotes. __NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false

Parameters:
  • sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • params – N/A

Returns:

static _get_default_bom_option(bom: bool = None, format: str = None, search_method: bool = False) bool | None

API datastore_dump includes an option to return the BOM (Byte Order Mark) for requests in CSV/TSV format. The BOM helps text-processing tools and applications determine the encoding of the file e.g. to distinguish between UTF-8 and UTF-16.

Note

To correctly handle BOM characters in pandas.read_csv, you should specify encoding=utf-8-sig parameter. This is taken into account in the decoding function.

_rx_records_df_clean(df: DataFrame) None

Auxiliary function for cleaning dataframe from DataStore requests

Parameters:

df

Returns:

datastore_dump(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, return_df: bool = True) DataFrame | ListRecords | Any | List[CkanActionResponse]

Alias of datastore_search with search_all=True by default. Uses the API datastore_search

See:

datastore_search()

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – Option to renew the request until there are no more records.

  • search_method – API method selection (True=datastore_search, False=datastore_dump)

  • return_df – Return pandas DataFrame (True) or dict (False)

Returns:

datastore_dump_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]

Function alias to datastore_search_generator with search_all=True by default. Uses the API datastore_search

See:

datastore_search_generator

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • offset – Offset in the returned records

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – Option to renew the request until there are no more records.

  • search_method – API method selection (True=datastore_search, False=datastore_dump)

Returns:

Preferred entry-point for a DataStore read request. Uses the API datastore_search

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – Option to renew the request until there are no more records.

  • search_method – API method selection (True=datastore_search, False=datastore_dump)

Returns:

datastore_search_cursor(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = False) Generator[Series | dict | list | str, Any, None]

Cursor on rows of datastore_search

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • total_limit – Limit the number of records to return

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – Option to renew the request until there are no more records.

  • search_method – API method selection (True=datastore_search, False=datastore_dump)

  • return_df – Return pandas Series (True) or dict (False)

  • format – Format of the data requested through the API. This does not change the output if return_df is True.

Returns:

datastore_search_fields_type_dict(resource_id: str, *, filters: dict = None, q: str = None, distinct: bool = None, fields: List[str] = None, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) OrderedDict
datastore_search_find_one(resource_id: str, *, filters: dict = None, q: str = None, distinct: bool = None, fields: List[str] = None, offset: int = 0, return_df: bool = True) DataFrame | ListRecords | Any | List[CkanActionResponse]

Request first result of a query

Parameters:

resource_id – resource id

Returns:

datastore_search_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None] | Generator[Response, Any, None]

Preferred entry-point for a DataStore read request. Uses the API datastore_search

Parameters:
  • resource_id – resource id.

  • filters – The base argument to filter values in a table (optional)

  • q – Full text query (optional)

  • fields – The base argument to filter columns (optional)

  • distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True

  • sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.

  • search_all – Option to renew the request until there are no more records.

  • search_method – API method selection (True=datastore_search, False=datastore_dump)

  • return_df – Return pandas DataFrame (True) or dict (False)

Returns:

datastore_search_row_count(resource_id: str, *, filters: dict = None, q: str = None, distinct: bool = None, fields: List[str] = None) int

Request the number of rows in a DataStore

Parameters:

resource_id – resource id

Returns:

datastore_search_sql(sql: str, *, params: dict = None, search_all: bool = False, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) DataFrame | Tuple[ListRecords, dict]

Preferred entry-point for a DataStore SQL request. :see: _api_datastore_search_sql() __NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false

Parameters:
  • sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – N/A

  • search_all – Option to renew the request until there are no more records.

  • return_df – Return pandas DataFrame (True) or dict (False)

Returns:

datastore_search_sql_cursor(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = False) Generator[Series | dict, Any, None]

Preferred entry-point for a DataStore SQL request, to iterate over records. :see: _api_datastore_search_sql()

__NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false

Parameters:
  • sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • total_limit – Limit the number of records to return

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – N/A

  • search_all – Option to renew the request until there are no more records.

  • return_df – Return pandas Series (True) or dict (False)

Returns:

datastore_search_sql_fields_type_dict(sql: str, *, params: dict = None) OrderedDict
datastore_search_sql_find_one(sql: str, *, params: dict = None, offset: int = 0, return_df: bool = True) DataFrame | Tuple[ListRecords, dict]

First element of an SQL request

datastore_search_sql_page_generator(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]

Preferred entry-point for a DataStore SQL request. :see: _api_datastore_search_sql()

__NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false

Parameters:
  • sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’

  • limit – Limit the number of records per request

  • offset – Offset in the returned records

  • requests_limit – Limit the number of requests

  • progress_callback – Progress callback function

  • params – N/A

  • search_all – Option to renew the request until there are no more records.

  • return_df – Return pandas DataFrame (True) or dict (False)

Returns:

datastore_search_sql_row_count(sql: str, *, params: dict = None) int
static from_dict_df_args(fields_type_dict: OrderedDict) dict
list_datastore_aliases() List[CkanAliasInfo]
map_file_resource_sizes(*, cancel_if_present: bool = True, progress_callback: CkanProgressCallbackABC = None) None
map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, progress_callback: CkanProgressCallbackABC = None) CkanMap

Map the resources of a given package to obtain resource IDs associated with the package name and its resources.

Parameters:
  • package_list – List of packages to request. If not provided, the result of package_search is used.

  • params – Additional parameters to pass to all API calls (not recommended).

  • datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.

  • resource_view_list – If True, enables the request of the view_list API for each resource.

  • organization_info – If True, enables the request of the organization_list API before other requests.

  • license_list – If True, enables the request of the license_list API.

  • only_missing – If True, skips requesting already-mapped packages.

  • error_not_found – If True, packages not found by the API are ignored (no error is raised).

  • owner_org – Filters packages by a specific organization (only if package_search is used).

Returns:

A mapping of resources for the specified package(s).

Note

  • Packages were previously referred to as DataSets in earlier CKAN implementations.

  • A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.

static read_fields_df_args(fields_type_dict: OrderedDict) dict
static read_fields_type_dict(fields_list_dict: List[dict]) OrderedDict
resource_download(resource_id: str, *, method: str = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None, stream: bool = False) Tuple[CkanResourceInfo, Response | None]

Uses the link provided in resource_show to download a resource.

Parameters:

resource_id – resource id

Returns:

resource_download_df(resource_id: str, *, method: str = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None) Tuple[CkanResourceInfo, DataFrame | None]

Uses the link provided in resource_show to download a resource and interprets it as a DataFrame.

Parameters:

resource_id – resource id

Returns:

resource_download_test_head(resource_id: str, *, raise_error: bool = False, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None) None | ContextErrorLevelMessage

This sends a HEAD request to the resource download url using the CKAN connexion parameters via resource_download. The resource is not downloaded but the headers indicate if the url is valid.

Returns:

None if successful

test_sql_capabilities(*, raise_error: bool = False) bool

Test the availability of the API datastore_search_sql

Returns:

class ckanapi_harvesters.ckan_api.ckan_api_2_readonly.CkanApiReadOnlyParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)

Bases: CkanApiParamsBasic

copy(new_identifier: str = None, *, dest=None)
default_df_download_id_field_treatment: CkanIdFieldTreatment = 1
map_all_aliases: bool = True

ckanapi_harvesters.ckan_api.ckan_api_3_policy module

class ckanapi_harvesters.ckan_api.ckan_api_3_policy.CkanApiPolicy(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, identifier=None)

Bases: CkanApiReadOnly

__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:
  • url – url of the CKAN server

  • proxies – proxies to use for requests

  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • policy – data format policy to use with policy_check function

  • policy_file – path to a JSON file containing the data format policy to use with policy_check function

  • owner_org – name of the organization to limit package_search (optional)

  • params – other connection/behavior parameters

  • map – map of known resources

  • policy – data format policy to be used with the policy_check function.

  • policy_file – path to a JSON file containing the data format policy to load.

  • identifier – identifier of the ckan client

copy(new_identifier: str = None, *, dest=None)

Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

load_default_policy(*, error_not_found: bool = False, load_error: bool = True, cancel_if_present: bool = False, force: bool = False) CkanPackageDataFormatPolicy | None

Function to load the default data format policy from the CKAN server. The default policy is defined in ckan_configuration

Parameters:
  • error_not_found

  • cancel_if_present

  • force

Returns:

load_policy(policy_file: str, base_dir: str = None, proxies: dict = None, headers: dict = None, error_not_found: bool = True, load_error: bool = True) CkanPackageDataFormatPolicy

Load the CKAN data format policy from file (JSON format).

Parameters:
  • policy_file – path to the policy file

  • base_dir – base directory, if the apikey_file is a relative path

Returns:

map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, load_policy: bool = None, progress_callback: CkanProgressCallbackABC = None) CkanMap

Map the resources of a given package to obtain resource IDs associated with the package name and its resources.

Parameters:
  • package_list – List of packages to request. If not provided, the result of package_search is used.

  • params – Additional parameters to pass to all API calls (not recommended).

  • datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.

  • resource_view_list – If True, enables the request of the view_list API for each resource.

  • organization_info – If True, enables the request of the organization_list API before other requests.

  • license_list – If True, enables the request of the license_list API.

  • only_missing – If True, skips requesting already-mapped packages.

  • error_not_found – If True, packages not found by the API are ignored (no error is raised).

  • owner_org – Filters packages by a specific organization (only if package_search is used).

Returns:

A mapping of resources for the specified package(s).

Note

  • Packages were previously referred to as DataSets in earlier CKAN implementations.

  • A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.

policy_check(package_list: str | List[str] = None, policy: CkanPackageDataFormatPolicy = None, *, buffer: Dict[str, List[DataPolicyError]] = None, raise_error: bool = False, verbose: bool = None, auto_update: bool = None, progress_callback: CkanProgressCallbackABC = None) bool

Enforce policy on mapped packages

Parameters:

policy

Returns:

query_default_policy(*, error_not_found: bool = False, load_error: bool = True) CkanPackageDataFormatPolicy | None

Download default policy and return it without loading it in the policy attribute.

Parameters:

error_not_found

Returns:

set_default_map_mode(datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, load_policy: bool = None) None

Set up the optional queries orchestrated by the map_resources function

Parameters:
  • datastore_info

  • resource_view_list

  • organization_info

  • license_list

Returns:

set_verbosity(verbosity: bool = True, verbose_extra: bool = None) None

Enable/disable full verbose output

Parameters:

verbosity – boolean. Cannot be None

Returns:

class ckanapi_harvesters.ckan_api.ckan_api_3_policy.CkanApiPolicyParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)

Bases: CkanApiReadOnlyParams

copy(new_identifier: str = None, *, dest=None)

ckanapi_harvesters.ckan_api.ckan_api_4_readwrite module

class ckanapi_harvesters.ckan_api.ckan_api_4_readwrite.CkanApiReadWrite(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

Bases: CkanApiPolicy

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements requests to write data to the CKAN server resources / DataStores.

__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:
  • url – url of the CKAN server

  • proxies – proxies to use for requests

  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • policy – data format policy to use with policy_check function

  • policy_file – path to a JSON file containing the data format policy to use with policy_check function

  • owner_org – name of the organization to limit package_search (optional)

  • params – other connection/behavior parameters

  • map – map of known resources

  • policy – data format policy to be used with the policy_check function.

  • policy_file – path to a JSON file containing the data format policy to load.

  • data_cleaner_upload – data cleaner object to use before uploading to a CKAN DataStore.

  • identifier – identifier of the ckan client

_api_datapusher_submit(resource_id: str, *, params: dict = None) bool

Call to API action datapusher_submit. This triggers the normally asynchronous DataPusher service for a given resource.

Parameters:
  • resource_id – resource id

  • params

Returns:

_api_datastore_upsert_raw(records: dict | List[dict] | DataFrame, resource_id: str, *, method: UpsertChoice | str, params: dict = None, force: bool = None, dry_run: bool = False, last_insertion: bool = True) CkanActionResponse

API call to api_datastore_upsert.

Parameters:
  • records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.

  • resource_id – destination resource id

  • method – see UpsertChoice (insert, update or upsert)

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • params – additional parameters

  • dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors

  • last_insertion – trigger for calculate_record_count

(doc: updates the stored count of records, used to optimize datastore_search in combination with the total_estimation_threshold parameter. If doing a series of requests to change a resource, you only need to set this to True on the last request.) :return: the inserted records as a pandas DataFrame, from the server response

_api_resource_patch(resource_id: str, *, name: str = None, format: str = None, description: str = None, title: str = None, state: CkanState = None, df: DataFrame = None, file_path: str = None, url: str = None, files=None, payload: bytes | BufferedIOBase = None, payload_name: str = None, params: dict = None) CkanResourceInfo

Call to resource_patch API. This call can be used to change the resource parameters via params (cf. API documentation) or to reupload the resource file into FileStore. The latter action replaces the current resource. If it is a DataStore, it is reset to the new contents of the file. The file can be transmitted either as an url, a file path or a pandas DataFrame. The files argument can pass through these arguments to the requests.post function. A call to datapusher_submit() could be required to take immediately into account the newly downloaded file.

See:

_api_resource_create

See:

resource_create

Parameters:
  • resource_id – resource id

  • url – url of the resource to replace resource

  • params – parameters such as name, format, resource_type can be changed

For file uploads, the following parameters are taken, by order of priority: See upload_prepare_requests_files_arg for an example of formatting.

Parameters:
  • files – files pass through argument to the requests.post function. Use to send other data formats.

  • payload – bytes to upload as a file

  • payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.

  • file_path – path of the file to transmit (binary and text files are supported here)

  • df – pandas DataFrame to replace resource

Returns:

copy(new_identifier: str = None, *, dest=None)

Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

datastore_insert(records: dict | List[dict] | DataFrame, resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, apply_last_condition: bool = True, always_last_condition: bool = None, data_cleaner: CkanDataCleanerABC = None, force: bool = None, params: dict = None) DataFrame

Alias function to insert data in a DataStore using datastore_upsert.

See:

_api_datastore_upsert()

Parameters:
  • records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.

  • resource_id – destination resource id

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • params – additional parameters

  • dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors

Returns:

the inserted records as a pandas DataFrame, from the server response

datastore_submit(resource_id: str, *, apply_delay: bool = True, error_timeout: bool = True, params: dict = None) bool

Submit file to re-initiate DataStore, using the preferred method. Current method is datapusher_submit. This encapsulation includes a call to datastore_wait.

Parameters:
  • resource_id

  • apply_delay – Keep true to wait until the datastore is ready (a datastore_search query is performed as a test)

  • params

Returns:

datastore_update(records: dict | List[dict] | DataFrame, resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, apply_last_condition: bool = True, always_last_condition: bool = None, data_cleaner: CkanDataCleanerABC = None, force: bool = None, params: dict = None) DataFrame

Alias function to update data in a DataStore using datastore_upsert. The update is performed based on the DataStore primary keys

See:

_api_datastore_upsert()

Parameters:
  • records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.

  • resource_id – destination resource id

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • params – additional parameters

  • dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors

Returns:

the inserted records as a pandas DataFrame, from the server response

datastore_upsert(records: dict | List[dict] | DataFrame, resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, force: bool = None, method: UpsertChoice | str = UpsertChoice.Upsert, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None) DataFrame | List[dict]

Encapsulation of _api_datastore_upsert to cut the requests to a limited number of rows.

See:

_api_datastore_upsert()

Parameters:
  • records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.

  • resource_id – destination resource id

  • method – by default, set to Upsert

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • limit – number of records per transaction

  • offset – number of records to skip - use to restart the transfer

  • params – additional parameters

  • dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors

  • apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).

  • always_last_condition – if True, each request applies the last insert operations - default is False

  • return_df – if True, return a pandas DataFrame or else, a list of dictionaries.

  • data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.

  • progress_callback – progress callback function

Returns:

the inserted records as a pandas DataFrame, from the server response

datastore_upsert_auto(records_generator: DataFrame | List[dict] | Generator[ListRecords | DataFrame, None, None], resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, request_threshold: int = None, force: bool = None, method: UpsertChoice | str = UpsertChoice.Upsert, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None) int

Version of datastore_upsert accepting generators or DataFrames. The call to the correct function is made upon the type of the records_generator argument.

See:

datastore_upsert_generator(), datastore_upsert()

Parameters:
  • records_generator – generator of records, e.g. chunks from a CSV file generated with pandas.read_csv(.., chunksize=1000)

  • resource_id – destination resource id

  • method – by default, set to Upsert

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • limit – number of records per transaction

  • offset – number of records to skip - use to restart the transfer

  • request_threshold – number of records to cumulate before sending a request

  • params – additional parameters

  • dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors

  • apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).

  • always_last_condition – if True, each request applies the last insert operations - default is False

  • return_df – if True, return a pandas DataFrame or else, a list of dictionaries.

  • data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.

  • progress_callback – progress callback function

Returns:

the number of records inserted

datastore_upsert_generator(records_generator: Generator[ListRecords | DataFrame, None, None], resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, request_threshold: int = None, force: bool = None, method: UpsertChoice | str = UpsertChoice.Upsert, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None) int

Encapsulation of datastore_upsert to send the rows by chunks provided by records_generator.

Parameters:
  • records_generator – generator of records, e.g. chunks from a CSV file generated with pandas.read_csv(.., chunksize=1000)

  • resource_id – destination resource id

  • method – by default, set to Upsert

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • limit – number of records per transaction

  • offset – number of records to skip - use to restart the transfer

  • request_threshold – number of records to cumulate before sending a request

  • params – additional parameters

  • dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors

  • apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).

  • always_last_condition – if True, each request applies the last insert operations - default is False

  • return_df – if True, return a pandas DataFrame or else, a list of dictionaries.

  • data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.

  • progress_callback – progress callback function

Returns:

the number of records inserted

datastore_upsert_last_line(resource_id: str)

Apply last line treatments to a resource.

datastore_wait(resource_id: str, *, apply_delay: bool = True, error_timeout: bool = True) Tuple[int, float]

Wait until a DataStore has at least one row. The delay between requests to peer on the presence of the DataStore is given by the class attribute submit_delay. If the loop exceeds submit_timeout, an exception is raised.

Parameters:
  • resource_id

  • apply_delay

  • error_timeout – option to raise an exception in case of timeout

Returns:

full_unlock(unlock: bool = True, *, no_ca: bool = None, external_url_resource_download: bool = None) None

Function to unlock full capabilities of the CKAN API

Parameters:

unlock

Returns:

resource_patch(resource_id: str, *, name: str = None, format: str = None, description: str = None, title: str = None, state: CkanState = None, df: DataFrame = None, file_path: str = None, url: str = None, files=None, payload: bytes | BufferedIOBase = None, payload_name: str = None, params: dict = None) CkanResourceInfo
set_limits(limit_read: int | None, limit_write: int = None) None

Set default query limits. If only one argument is provided, it applies to both limits.

Parameters:
  • limit_read – default limit for read requests

  • limit_write – default limit for upsert (write) requests

Returns:

set_submit_timeout(submit_timeout: float, submit_delay: float = None) None

Set timeout for the datastore_wait method. This is called after datastore_submit.

Parameters:
  • submit_timeout – timeout after which a TimeoutError is raised (seconds)

  • submit_delay – delay between requests to peer on DataStore initialization (datastore_wait) (seconds)

Returns:

class ckanapi_harvesters.ckan_api.ckan_api_4_readwrite.CkanApiReadWriteParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)

Bases: CkanApiPolicyParams

copy(new_identifier: str = None, *, dest=None)
default_readonly: bool = False

ckanapi_harvesters.ckan_api.ckan_api_5_manage module

class ckanapi_harvesters.ckan_api.ckan_api_5_manage.CkanApiExtendedParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)

Bases: CkanApiManageParams

copy(new_identifier: str = None, *, dest=None)
class ckanapi_harvesters.ckan_api.ckan_api_5_manage.CkanApiManage(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

Bases: CkanApiReadWrite

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements more advanced requests to manage packages, resources and DataStores on the CKAN server.

__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:
  • url – url of the CKAN server

  • proxies – proxies to use for requests

  • apikey – way to provide the API key directly (optional)

  • apikey_file – path to a file containing a valid API key in the first line of text (optional)

  • policy – data format policy to use with policy_check function

  • policy_file – path to a JSON file containing the data format policy to use with policy_check function

  • owner_org – name of the organization to limit package_search (optional)

  • params – other connection/behavior parameters

  • map – map of known resources

  • policy – data format policy to be used with the policy_check function.

  • policy_file – path to a JSON file containing the data format policy to load.

  • data_cleaner_upload – data cleaner object to use before uploading to a CKAN DataStore.

  • identifier – identifier of the ckan client

_api_dataset_purge(package_id: str, *, params: dict = None) dict

API call to dataset_purge. This fully removes the package. This action is not reversible. It requires an admin account.

Parameters:
  • package_id

  • params

Returns:

_api_datastore_create(resource_id: str, *, records: dict | List[dict] | DataFrame = None, fields: List[dict | CkanField] = None, primary_key: str | List[str] = None, indexes: str | List[str] = None, aliases: str | List[str] = None, params: dict = None, force: bool = None) dict

API call to datastore_create. This endpoint also supports altering tables, aliases and indexes and bulk insertion.

Parameters:
  • resource_id – resource id

  • records

  • fields

  • primary_key

  • indexes

  • params

  • force

Returns:

_api_datastore_delete(resource_id: str, *, params: dict = None, force: bool = None) dict

Function to delete rows an api_datastore using api_datastore_upsert. If no filter is given, the whole database will be erased. This function is private and should not be called directly.

Parameters:
  • resource_id

  • params

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

Returns:

_api_package_create(name: str, private: bool, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None) CkanPackageInfo

API call to package_create.

Parameters:
  • name

  • private

  • title

  • notes

  • owner_org

  • state

  • license_id

  • tags

  • params

Returns:

_api_package_delete(package_id: str, *, params: dict = None) dict

API call to package_delete. This marks the package as deleted and does not remove data.

Parameters:
  • package_id

  • params

Returns:

_api_package_patch(package_id: str, package_name: str = None, private: bool = None, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields_update: dict = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None) CkanPackageInfo

API call to package_patch. Use to change the properties of a package. This method is preferred to package_update which requires to resend the full package configuration. (API doc for package_update: It is recommended to call ckanapi_harvesters.logic.action.get.package_show(), make the desired changes to the result, and then call package_update() with it.)

Parameters:
  • package_id

  • package_name

  • private

  • title

  • notes

  • owner_org

  • state

  • license_id

  • params

Returns:

_api_package_resource_reorder(package_id: str, resource_ids: List[str], *, params: dict = None) dict

API call to package_resource_reorder. Reorders resources within a package. Reorder resources against datasets. If only partial resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.

Parameters:
  • package_id – the id or name of the package to update

  • resource_ids – a list of resource ids in the order needed

  • params

Returns:

_api_resource_create(package_id: str, name: str, *, format: str = None, description: str = None, state: CkanState = None, df: DataFrame = None, file_path: str = None, url: str = None, files=None, payload: bytes | BufferedIOBase = None, payload_name: str = None, params: dict = None) CkanResourceInfo

API call to resource_create.

See:

_api_resource_patch

See:

resource_create

Parameters:
  • package_id

  • name

  • format

  • url – url of the resource to replace resource

  • params – additional parameters such as resource_type can be set

Note

For file uploads, the following parameters are taken, by order of priority: See upload_prepare_requests_files_arg for an example of formatting.

Parameters:
  • files – files pass through argument to the requests.post function. Use to send other data formats.

  • payload – bytes to upload as a file

  • payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.

  • file_path – path of the file to transmit (binary and text files are supported here)

  • df – pandas DataFrame to replace resource

Returns:

_api_resource_delete(resource_id: str, *, params: dict = None, force: bool = None, bypass_admin: bool = False) dict

Function to delete a resource. This fully removes the resource, definitively. Requires enable_admin=True.

Parameters:
  • resource_id

  • params

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

Returns:

_api_resource_view_create(resource_id: str, title: str | List[str] = None, *, view_type: str | List[str] = None, params: dict = None) List[CkanViewInfo]

API call to resource_view_create.

title and view_type must have same length if specified as lists.

Parameters:
  • resource_id – resource id

  • title – Title of the resource

  • view_type – Type of view, typically recline_view for Data Explorer

  • params

Returns:

copy(new_identifier: str = None, *, dest=None)

Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

datastore_clear(resource_id: str, *, error_not_found: bool = True, params: dict = None, force: bool = None, bypass_admin: bool = False) dict | None

Function to clear data in a DataStore using _api_datastore_delete. Requires enable_admin=True. This implementation adds the option error_not_found. If set to False, no error is raised if the resource is found by the datastore is not.

See:

_api_datastore_delete()

Parameters:
  • resource_id

  • error_not_found – if False, does not raise an exception if the resource exists but there is not datastore

  • params

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • bypass_admin – option to bypass check of enable_admin

Returns:

datastore_create(resource_id: str, *, delete_previous: bool = False, bypass_admin: bool = False, records: dict | List[dict] | DataFrame = None, fields: List[dict | CkanField] = None, primary_key: str | List[str] = None, indexes: str | List[str] = None, aliases: str | List[str] = None, params: dict = None, force: bool = None, data_cleaner: CkanDataCleanerABC = None, inhibit_datastore_patch_indexes: bool = False, progress_callback: CkanProgressCallbackABC = None) dict

Encapsulation of the datastore_create API call. This function can optionally clear the DataStore before creating it.

Parameters:
  • resource_id

  • delete_previous – option to delete the previous datastore, if exists (default:False)

  • records

  • fields

  • primary_key

  • indexes

  • params

  • force

  • inhibit_datastore_patch_indexes – option to ignore primary_key and indexes in case the DataStore already exists. In certain cases, running without this option can lead to impossible updates (recomputing indexes on large tables can be costly).

Returns:

datastore_default_alias(resource_name: str, package_name: str, *, query_names: bool = True, error_not_found: bool = True) str
static datastore_default_alias_of_info(resource_info: CkanResourceInfo, package_info: CkanPackageInfo) str
static datastore_default_alias_of_names(resource_name: str, package_name: str) str
datastore_delete_rows(resource_id: str, filters: dict, *, params: dict = None, force: bool = None, calculate_record_count: bool = True) dict

Function to delete certain rows a DataStore using _api_datastore_delete. The filters are mandatory here. If not given, the whole database would be erased. Prefer using datastore_clear for this usage.

See:

_api_datastore_delete()

Parameters:
  • resource_id

  • filters

  • params

  • force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force

  • calculate_record_count

Returns:

static datastore_field_dict(fields: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_merge: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_update: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, *, fields_type_override: Dict[str, str] = None, fields_description: Dict[str, str] = None, fields_label: Dict[str, str] = None, return_list: bool = False) Dict[str, CkanField] | List[dict]

Initialization of the fields parameter for datastore_create. Only parts used by this package are present. To complete the field’s dictionaries, refer to datastore_field_patch_dict.

Parameters:
  • fields – first source of field information, usually the fields from the DataStore

  • fields_merge – second source. Values from this dictionary will overwrite fields

  • fields_update – third source. Values from this dictionary will be prioritary over all values.

  • fields_type_override

  • fields_description

  • fields_label

  • return_list

Returns:

dict if return_list is False, list if return_list is True.

You can easily transform the dict to a list with the following code: `python fields = list(fields_update.values()) `

datastore_field_patch(resource_id: str, fields_merge: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_update: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, *, only_if_needed: bool = False, fields: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_type_override: Dict[str, str] = None, field_description: Dict[str, str] = None, fields_label: Dict[str, str] = None) Tuple[bool, List[dict], dict | bool | None]

Function helper call to API datastore_create in order to update the parameters of some fields. The initial field configuration is taken from the mapped information or requested. Typically, this could be used to enforce a data type on a field. In this case, it is required to resubmit the resource data with the API resource_patch. The field_update argument would be e.g. field_update={“id”: {“info”: {“type_override”: “text”}}} This is equivalent to the option field_type_override={“id”: “text”}

Note

It is not possible to rename a field after creation through the API. To do this, the change must be done in the database.

Parameters:
  • resource_id – resource id

  • fields_update – dictionary of field id and properties to change. The update of the property dictionary is recursive, ensuring only the fields appearing in the update are changed. This field can be overridden by the values given in field_type_override, field_description, or field_label.

  • fields_type_override – argument to simplify the edition of the info.type_override value for each field id.

  • field_description – argument to simplify the edition of the info.notes value for each field id

  • fields_label – argument to simplify the edition of the info.label value for each field id

  • only_if_needed – Cancels the request if the changes do not affect the current configuration

Returns:

a tuple (update_needed, fields_new, update_dict)

datastore_field_patch_dict(fields_merge: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_update: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, *, fields_type_override: Dict[str, str] = None, fields_description: Dict[str, str] = None, fields_label: Dict[str, str] = None, return_list: bool = False, datastore_merge: bool = True, resource_id: str = None, error_not_found: bool = True) Tuple[bool | None, Dict[str, CkanField] | List[dict]]

Calls datastore_field_dict and merges attributes with those found in datastore_info if datastore_merge=True.

Parameters:
  • fields_update

  • fields_type_override

  • fields_description

  • fields_label

  • return_list

  • datastore_merge

  • resource_id – required if datastore_merge=True

Returns:

static default_resource_view(resource_format: str, is_datastore: bool = True) Tuple[str, str]

Definition of the default resource view based on the resource format.

Parameters:

resource_format

Returns:

full_unlock(unlock: bool = True, *, no_ca: bool = None, external_url_resource_download: bool = None) None

Function to unlock full capabilities of the CKAN API

Parameters:

unlock

Returns:

package_create(package_name: str, private: bool = True, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields_update: dict = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None, cancel_if_exists: bool = True, update_if_exists=True, clear_if_deleted_state: bool = None) CkanPackageInfo

Helper function to create a new package. This first checks if the package already exists.

See:

_api_package_create()

Parameters:
  • package_name

  • private

  • title

  • notes

  • owner_org

  • license_id

  • state

  • params

  • cancel_if_exists

  • update_if_exists

  • clear_if_deleted_state – Option to clear the resources of a package if it was found in Deleted state. Default behavior is set in params.

Returns:

package_delete(package_id: str, definitive_delete: bool = False, *, params: dict = None) dict

Alias function for package removal. Either calls API package_delete to simply mark for deletion or dataset_purge to definitively delete the package.

Parameters:
  • package_id

  • definitive_delete – True: calls dataset_purge (action not reversible), False: calls API package_delete.

  • params

Returns:

package_delete_resources(package_name: str, *, bypass_admin: bool = False)

Definitively delete all resources associated with the package.

Parameters:

package_name

Returns:

package_patch(package_id: str, package_name: str = None, private: bool = None, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields_update: dict = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None) CkanPackageInfo
package_resource_reorder(package_id: str, resource_ids: List[str], *, params: dict = None) dict

API call to package_resource_reorder. Reorders resources within a package. Reorder resources against datasets. If only partial resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.

Parameters:
  • package_id – the id or name of the package to update

  • resource_ids – a list of resource ids in the order needed

  • params

Returns:

package_state_change(package_id: str, state: CkanState) CkanPackageInfo

Change package state using the package_patch API.

Parameters:
  • package_id

  • state

Returns:

resource_create(package_id: str, name: str, *, format: str = None, description: str = None, state: CkanState = None, params: dict = None, url: str = None, files=None, file_path: str = None, df: DataFrame = None, payload: bytes | BufferedIOBase = None, payload_name: str = None, cancel_if_exists: bool = True, update_if_exists: bool = False, reupload: bool = False, create_default_view: bool = True, auto_submit: bool = False, datastore_create: bool = False, records: dict | List[dict] | DataFrame = None, fields: List[dict] = None, primary_key: str | List[str] = None, indexes: str | List[str] = None, aliases: str | List[str] = None, inhibit_datastore_patch_indexes: bool = False, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None) CkanResourceInfo

Proxy to API call resource_create verifying if a resource with the same name already exists and adding the default view.

Parameters:
  • package_id

  • name

  • format

  • params

  • cancel_if_exists – check if a resource with the same name already exists in the package on CKAN server If a resource with the same name already exists, the info for this resource is returned

  • update_if_exists – If a resource with the same name already exists (and cancel_if_exists=True), a call to resource_patch is performed.

  • reupload – re-upload the resource if a resource with the same name already exists and cancel_if_exists=True and update_if_exists=True

  • create_default_view

Note

For file uploads, the following parameters are taken, by order of priority: See upload_prepare_requests_files_arg for an example of formatting.

Parameters:
  • files – files pass through argument to the requests.post function. Use to send other data formats.

  • payload – bytes to upload as a file

  • payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.

  • file_path – path of the file to transmit (binary and text files are supported here)

  • df – pandas DataFrame to replace resource

Returns:

resource_delete(resource_id: str, *, params: dict = None, force: bool = None, bypass_admin: bool = False) dict
resource_view_create(resource_id: str, title: str | List[str] = None, *, view_type: str | List[str] = None, params: dict = None, error_no_default_view_type: bool = False, cancel_if_exists: bool = True, is_datastore: bool = True) List[CkanViewInfo]

Encapsulation of the API resource_view_create. If no resource view is provided to create (None), the function looks up the default view defined in default_resource_view. This function also looks at the existing views and cancels the creation of those which have the same title. If provided as a list, title and view_type must have same length.

Parameters:
  • resource_id

  • title

  • view_type

  • params

  • error_no_default_view_type

  • cancel_if_exists – option to cancel an existing view if it exists (based on the title)

Returns:

static verify_field_name_format(field_name: str, *, raise_error: bool = True, display_warnings: bool = True) bool

Verifies that the field name format is correct.

static verify_package_name_format(package_name: str, *, raise_error: bool = True) bool

Verifies that the package name format is correct.

class ckanapi_harvesters.ckan_api.ckan_api_5_manage.CkanApiManageParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)

Bases: CkanApiReadWriteParams

copy(new_identifier: str = None, *, dest=None)
default_alias_enforce: bool = False
default_enable_admin: bool = False
get_num_rows_datastore_create_partial(limit: int = None) int
package_create_default_clear_if_deleted_state: bool = True
ckanapi_harvesters.ckan_api.ckan_api_5_manage.clean_table_name(variable_name: str) str

Replace unwanted characters and spaces to generate a table name similar to a table name

ckanapi_harvesters.ckan_api.ckan_api_params module

Basic parameters for the CkanApi class

class ckanapi_harvesters.ckan_api.ckan_api_params.CkanApiDebug

Bases: object

ckan_request_counter: int
extern_request_counter: int
last_response: Response | None
last_response_request_count: int
multi_requests_last_successful_offset: int
class ckanapi_harvesters.ckan_api.ckan_api_params.CkanApiParamsBasic(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)

Bases: object

__init__(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Parameters:
  • proxies – proxies to use for requests

  • ckan_headers – headers to use for requests, only to the CKAN server

  • http_headers – headers to use for requests, for all requests, including external requests and to the CKAN server

_cli_ckan_args_apply(args: Namespace, *, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None

Apply the arguments parsed by the argument parser defined by _setup_cli_ckan_parser

Parameters:
  • args

  • base_dir – base directory to find the CKAN API key file, if a relative path is provided (recommended: leave None to use cwd)

  • error_not_found – option to raise an exception if the CKAN API key file is not found

  • default_proxies – proxies used if proxies=”default”

  • proxy_headers – headers used to access the proxies, generally for authentication

Returns:

static _setup_cli_ckan_parser__params(parser: ArgumentParser = None) ArgumentParser

Define or add CLI arguments to initialize a CKAN API connection parser help message:

CKAN API connection parameters initialization

Parameters:

parser – option to provide an existing parser to add the specific fields needed to initialize a CKAN API connection

Returns:

action_requests_retry_always: bool
property ckan_ca: bool | str | None
ckan_headers: dict
copy(*, dest=None)
default_limit_list: int | None
default_limit_read: int | None
dry_run: bool
property extern_ca: bool | str | None
http_headers: dict
max_requests_attempts: int
max_requests_count: int
multi_requests_time_between_requests: float
multi_requests_timeout: float
property proxies: dict
property proxy_auth: AuthBase | Tuple[str, str]
property proxy_string: str
requests_timeout: float | None
response_time_wait_threshold: None | float
store_last_response: bool
store_last_response_debug_info: bool
time_between_attempts: float
user_agent: str | None
verbose_extra: bool
verbose_multi_requests: bool
verbose_request: bool
verbose_request_error: bool

Module contents

Package with helper functions for CKAN requests using pandas DataFrames.