ckanapi_harvesters.ckan_api package
Subpackages
- ckanapi_harvesters.ckan_api.deprecated package
- Submodules
- ckanapi_harvesters.ckan_api.deprecated.ckan_api_deprecated module
CkanApiDeprecatedCkanApiDeprecated._api_group_package_show()CkanApiDeprecated._api_group_package_show_all()CkanApiDeprecated._api_package_list()CkanApiDeprecated._api_package_list_all()CkanApiDeprecated._api_resource_create_default_resource_views()CkanApiDeprecated._api_resource_search()CkanApiDeprecated._api_resource_search_all()CkanApiDeprecated.copy()CkanApiDeprecated.group_package_show_all()CkanApiDeprecated.package_list_all()CkanApiDeprecated.resource_create_default_resource_views()CkanApiDeprecated.resource_move()CkanApiDeprecated.resource_search_all()
- ckanapi_harvesters.ckan_api.deprecated.ckan_api_deprecated_vocabularies module
CkanApiVocabulariesDeprecatedCkanApiVocabulariesDeprecated.__init__()CkanApiVocabulariesDeprecated._api_vocabulary_create()CkanApiVocabulariesDeprecated._api_vocabulary_delete()CkanApiVocabulariesDeprecated._api_vocabulary_list()CkanApiVocabulariesDeprecated._api_vocabulary_update()CkanApiVocabulariesDeprecated.copy()CkanApiVocabulariesDeprecated.initiate_vocabularies_from_policy()CkanApiVocabulariesDeprecated.map_resources()CkanApiVocabulariesDeprecated.set_default_map_mode()CkanApiVocabulariesDeprecated.vocabularies_clear()CkanApiVocabulariesDeprecated.vocabulary_delete()CkanApiVocabulariesDeprecated.vocabulary_list()CkanApiVocabulariesDeprecated.vocabulary_update()
- Module contents
Submodules
ckanapi_harvesters.ckan_api.ckan_api module
Alias to most complete CkanApi implementation
ckanapi_harvesters.ckan_api.ckan_api_0_base module
- class ckanapi_harvesters.ckan_api.ckan_api_0_base.CkanApiABC
Bases:
ABC
- class ckanapi_harvesters.ckan_api.ckan_api_0_base.CkanApiBase(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, identifier=None)
Bases:
CkanApiABCCKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements the basic parameters and request functions.
- CKAN_URL_ENVIRON = 'CKAN_URL'
- __init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, identifier=None)
CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.
- Parameters:
url – url of the CKAN server
proxies – proxies to use for requests
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
owner_org – name of the organization to limit package_search (optional)
params – other connection/behavior parameters
identifier – identifier of the ckan client
- __str__() str
String representation of the instance, for debugging purposes.
- Returns:
URL representing the CKAN server
- _api_action_request(action: str, *, method: RequestType, params: dict = None, headers: dict = None, data: dict | str | bytes = None, json: dict = None, files: List[tuple] = None, timeout: float = None, _attempt_counts: int = 0, _attempt_traceback: List[str] = None) CkanActionResponse
Send API action request and return response.
- Parameters:
action – action name
method – GET / POST
params – params to set in the url
data – information to encode in the request body (only for POST method)
json – information to encode as JSON in the request json (only for POST method)
files – files to upload in the request (only for POST method)
headers – headers for the request (authentication tokens are added by the function)
timeout – request timeout in seconds
_attempt_counts – internal argument in case of re-post of the request to count retries
_attempt_traceback – internal argument in case of re-post of the request to list error history
- Returns:
- _ckan_url_request(path: str, *, method: RequestType, params: dict = None, headers: dict = None, data: dict = None, json: dict = None, files: List[tuple] = None, timeout: float = None) Response
Send request to server and return response.
- Parameters:
path – relative path to server url
method – GET / POST
params – params to set in the url
data – information to encode in the request body (only for POST method)
headers – headers for the request (authentication tokens are added by the function)
- Returns:
- _cli_ckan_args_apply(args: Namespace, *, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
Apply the arguments parsed by the argument parser defined by _setup_cli_ckan_parser
- Parameters:
args
base_dir – base directory to find the CKAN API key file, if a relative path is provided (recommended: leave None to use cwd)
error_not_found – option to raise an exception if the CKAN API key file is not found
default_proxies – proxies used if proxies=”default”
proxy_headers – headers used to access the proxies, generally for authentication
- Returns:
- _get_api_url(category: str = None)
Returns the base API url and appends the category
- Parameters:
category – usually, “action”
- Returns:
- _init_session(*, internal: bool = False)
Initialize the session objects which are used to perform requests with this CKAN instance. This method can be overloaded to fit your needs (proxies, certificates, cookies, headers, etc.).
- Parameters:
internal
- Returns:
- _prepare_headers(headers: dict = None, include_ckan_auth: bool = False) dict
Prepare headers for a request. If the request is destined to the CKAN server, include authentication headers, if API key was provided.
- Parameters:
headers – initial headers
include_ckan_auth – boolean to include CKAN authentication headers
- Returns:
- _request_all_results_df(api_fun: Callable, *, params: dict = None, list_attrs: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, search_all: bool = True, progress_callback: CkanProgressCallbackABC = None, **kwargs) DataFrame
Multiply request with a limited length until no more data is transmitted thanks to the offset parameter. DataFrame implementation returns the concatenated DataFrame from the unitary function calls.
- Parameters:
api_fun – function to call, typically a unitary request function
params – api_fun must accept params argument in order to transmit other values and enforce the offset parameter
limit – api_fun must accept limit argument in order to update the limit value
offset – api_fun must accept offset argument in order to update the offset value
search_all – if False, only the first request is operated
list_attrs – option to aggregate DataFrame attrs field into lists. # False not tested
kwargs – additional keyword arguments to pass to api_fun
- Returns:
- _request_all_results_list(api_fun: Callable, *, params: dict = None, limit: int = None, offset: int = 0, requests_limit: int = None, search_all: bool = True, progress_callback: CkanProgressCallbackABC = None, **kwargs) List[CkanActionResponse] | list
Multiply request with a limited length until no more data is transmitted thanks to the offset parameter. List implementation returns the list of the unitary function return values.
- Parameters:
api_fun – function to call, typically a unitary request function
params – api_fun must accept params argument in order to transmit other values and enforce the offset parameter
limit – api_fun must accept limit argument in order to update the limit value
offset – api_fun must accept offset argument in order to update the offset value
search_all – if False, only the first request is operated
kwargs – additional keyword arguments to pass to api_fun
- Returns:
- _request_all_results_page_generator(api_fun: Callable, *, params: dict = None, limit: int = None, offset: int = 0, requests_limit: int = None, search_all: bool = True, progress_callback: CkanProgressCallbackABC = None, **kwargs) Generator[Any, Any, None]
Multiply request with a limited length until no more data is transmitted thanks to the offset parameter. Lazy auxiliary function which yields a result for each request.
- Parameters:
api_fun – function to call, typically a unitary request function
params – api_fun must accept params argument in order to transmit other values and enforce the offset parameter
limit – api_fun must accept limit argument in order to update the limit value
offset – api_fun must accept offset argument in order to update the offset value
search_all – if False, only the first request is operated
kwargs – additional keyword arguments to pass to api_fun
- Returns:
- _setup_cli_ckan_parser(parser: ArgumentParser = None) ArgumentParser
Define or add CLI arguments to initialize a CKAN API connection parser help message:
CKAN API connection parameters initialization
- options:
- -h, --help
show this help message and exit
- --ckan-url CKAN_URL
CKAN URL
- --apikey APIKEY
CKAN API key
- --apikey-file APIKEY_FILE
Path to a file containing the CKAN API key (first line)
- --policy-file POLICY_FILE
Path to a file containing the CKAN data format policy (json format)
- --owner-org OWNER_ORG
CKAN Owner Organization
- --default-limit DEFAULT_LIMIT
Default number of rows per request
- --verbose VERBOSE
Option to set verbosity
- Parameters:
parser – option to provide an existing parser to add the specific fields needed to initialize a CKAN API connection
- Returns:
- api_action_call(action: str, *, method: RequestType, params: dict = None, headers: dict = None, data: dict = None, json: dict = None, files: List[tuple] = None) CkanActionResponse
- api_help_show(action_name: str, *, print_output: bool = True) str
API help command on a given action.
- Parameters:
action_name
print_output – Option to print the output in the command line
- Returns:
- property apikey: CkanApiKey
- connect()
- copy(new_identifier: str = None, *, dest=None)
Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)
- disconnect()
- download_url_proxy(url: str, *, method: str = None, auth_if_ckan: bool = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None, stream: bool = False, timeout: float = None) Response
Download a URL using the CKAN parameters (proxy, authentication etc.)
- Parameters:
url
proxies
headers
- Returns:
- download_url_proxy_test_head(url: str, *, raise_error: bool = False, auth_if_ckan: bool = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None, context: str = None, timeout: float = None) None | ContextErrorLevelMessage
This sends a HEAD request to the url using the CKAN connexion parameters via download_url_proxy. The resource is not downloaded but the headers indicate if the url is valid.
- Returns:
None if successful
- full_unlock(unlock: bool = True, *, no_ca: bool = None, external_url_resource_download: bool = None) None
Function to unlock full capabilities of the CKAN API
- Parameters:
unlock
- Returns:
- init_from_environ(*, init_api_key: bool = True, error_not_found: bool = False) None
Initialize CKAN from environment variables.
CKAN_URL for the url of the CKAN server.
And optionally: - CKAN_API_KEY: for the raw API key (it is not recommended to store API key in an environment variable) - CKAN_API_KEY_FILE: path to a file containing a valid API key in the first line of text
- Parameters:
error_not_found – raise an error if the API key file was not found
- Returns:
- initialize_from_cli_args(*, args: Sequence[str] = None, base_dir: str = None, error_not_found: bool = True, parser: ArgumentParser = None, default_proxies: dict = None, proxy_headers: dict = None) None
Intialize the CKAN API connection from command line arguments.
- Parameters:
args – Option to provide arguments from another source.
- Returns:
- initialize_from_options_string(options_string: str = None, base_dir: str = None, error_not_found: bool = True, parser: ArgumentParser = None, default_proxies: dict = None, proxy_headers: dict = None) None
- input_cli_args(*, base_dir: str = None, error_not_found: bool = True, only_if_necessary: bool = False, default_proxies: dict = None, proxy_headers: dict = None)
Initialize the query for initialization parameters in the command-line format in the console window.
- Returns:
- input_missing_info(*, base_dir: str = None, input_args: bool = False, input_args_if_necessary: bool = False, input_apikey: bool = True, error_not_found: bool = True)
Ask user information in the console window.
- Parameters:
input_owner_org – option to ask for the owner organization.
- Returns:
- is_url_internal(url: str) bool
Tests whether a url points to the same server as the CKAN url.
- Parameters:
url
- Returns:
- load_apikey(apikey_file: str = None, base_dir: str = None, error_not_found: bool = True)
Load the CKAN API key from file. The file should contain a valid API key in the first line of text.
- Parameters:
apikey_file – API key file (optional if specified at the creation of the object)
base_dir – base directory, if the apikey_file is a relative path
- Returns:
- prepare_arguments_for_url_download_request(url: str, *, auth_if_ckan: bool = None, headers: dict = None, verify: bool | str | None = None) Tuple[bool, dict]
Include CKAN authentication headers only if the URL points to the CKAN server.
- Parameters:
url – target URL
headers – initial headers
auth_if_ckan – option to include CKAN authentication headers if the url is recognized as part of the CKAN server.
- Returns:
- prepare_for_multithreading(mode_reduced: bool = True) None
This method disables unnecessary writes to this object. It is recommended to enable the reduced writes mode in a multithreaded context. Do not forget to reset sessions at the beginning of each thread.
- Parameters:
mode_reduced
- Returns:
- purge() None
Erase temporary data stored in this object
- Parameters:
purge_map – whether to purge the map created with map_resources
- set_limits(limit_read: int | None) None
Set default query limits. If only one argument is provided, it applies to both limits.
- Parameters:
limit_read – default limit for read requests
- Returns:
- set_proxies(proxies: str | dict | ProxyConfig, *, default_proxies: dict = None, proxy_headers: dict = None) None
Set up the proxy configuration
- Parameters:
proxies – string or proxies dict or ProxyConfig object.
- If a string is provided, it must be an url to a proxy or one of the following values:
“environ”: use the proxies specified in the environment variables “http_proxy” and “https_proxy”
“noproxy”: do not use any proxies
“unspecified”: do not specify the proxies
“default”: use value provided by default_proxies
- Parameters:
default_proxies – proxies used if proxies=”default”
proxy_headers – headers used to access the proxies, generally for authentication
- Returns:
- set_requests_delay(time_between_requests: int) None
Set delay between requests in seconds.
- Parameters:
time_between_requests – delay between requests in seconds
- set_requests_timeout(requests_timeout: float, multi_requests_timeout=None) None
Set timeout for requests.
- Parameters:
requests_timeout – timeout for each request (seconds)
multi_requests_timeout – timeout for grouped request (seconds)
- Returns:
- set_verbosity(verbosity: bool = True, verbose_extra: bool = None) None
Enable/disable full verbose output
- Parameters:
verbosity – boolean. Cannot be None
- Returns:
- test_ckan_url_reachable(raise_error: bool = False) bool
Test if the CKAN URL is reachable with a HEAD request. This does not check it is really a CKAN server and does not check authentication.
- static unlock_external_url_resource_download(value: bool = True)
This function enables the download of resources external from the CKAN server.
- static unlock_no_ca(value: bool = True)
This function enables you to disable the CA verification of the CKAN server.
__Warning__: Only allow in a local environment!
- property url: str
ckanapi_harvesters.ckan_api.ckan_api_1_map module
- class ckanapi_harvesters.ckan_api.ckan_api_1_map.CkanApiMap(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, map: CkanMap = None, identifier=None)
Bases:
CkanApiBaseCKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements the resource mapping capabilities to obtain resource ids necessary for the requests.
- __init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiParamsBasic = None, map: CkanMap = None, identifier=None)
CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.
- Parameters:
url – url of the CKAN server
proxies – proxies to use for requests
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
owner_org – name of the organization to limit package_search (optional)
params – other connection/behavior parameters
map – map of known resources
identifier – identifier of the ckan client
- _api_datastore_info(resource_id: str, *, params: dict = None, display_request_not_found: bool = True) CkanDataStoreInfo
API call to datastore_info. Returns the information on the DataStore. Used to know the number of rows in a DataStore.
- Parameters:
resource_id – resource id.
params – N/A
display_request_not_found – whether to display the request in the command window, in case of a CkanNotFoundError. This option is recommended if you are testing whether the resource has a DataStore or not.
- Returns:
- _api_group_list(*, limit: int = None, offset: int = 0, groups: List[str] = None, all_fields: bool = True, include_users: bool = True, params: dict = None) List[CkanGroupInfo] | List[str]
API call to group_list.
- Parameters:
params
- Returns:
- _api_group_list_all(*, all_fields: bool = True, include_users: bool = True, params: dict = None, limit: int = None, offset: int = None) List[CkanUserInfo] | List[str]
API call to group_list until an empty list is received.
- See:
_api_group_list()
- Parameters:
params
- Returns:
- _api_license_list(*, params: dict = None) List[CkanLicenseInfo]
API call to license_list.
- Parameters:
params
- Returns:
- _api_organization_list(*, params: dict = None, all_fields: bool = True, include_users: bool = False, limit: int = None, offset: int = None) List[CkanOrganizationInfo] | List[str]
API call to organization_list.
- Parameters:
params – typically, the request can be limited to an organization with the owner_org parameter
all_fields – whether to return full information or only the organization names in a list
- Returns:
- _api_organization_list_all(*, params: dict = None, all_fields: bool = True, include_users: bool = False, limit: int = None, offset: int = None) List[CkanOrganizationInfo] | List[str]
API call to organization_list until an empty list is received.
- See:
_api_organization_list()
- Parameters:
params
- Returns:
- _api_organization_show(id: str, *, params: dict = None) CkanOrganizationInfo
API call to organization_show.
- Parameters:
id – organization id or name.
params – typically, the request can be limited to an organization with the owner_org parameter
- Returns:
- _api_package_collaborator_list(package_id: str, *, params: dict = None, cancel_if_present: bool = False) Dict[str, CkanCollaboration]
API call to package_collaborator_list.
- Parameters:
params
- Returns:
- _api_package_search(*, params: dict = None, owner_org: str = None, filter: dict = None, q: str = None, include_private: bool = True, include_drafts: bool = True, sort: str = None, facet: bool = None, limit: int = None, offset: int = None) List[CkanPackageInfo]
API call to package_search.
- Parameters:
owner_org – ability to filter packages by owner_org
filter – dict of filters to apply, which translate to the API fq argument fq documentation: any filter queries to apply. Note: +site_id:{ckan_site_id} is added to this string prior to the query being executed.
q – the solr query. Optional. Default is ‘:’
include_private – if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API
include_drafts – if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False.
sort – sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings.
facet – whether to enable faceted results. Default: True in API.
limit – maximum number of results to return. Translatees to the API rows argument.
offset – the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument.
params – other parameters to pass to package_search
- Returns:
- _api_package_search_all(*, params: dict = None, owner_org: str = None, filter: dict = None, q: str = None, include_private: bool = True, include_drafts: bool = True, sort: str = None, facet: bool = None, limit: int = None, offset: int = None, search_all: bool = True) List[CkanPackageInfo]
API call to package_search until an empty list is received.
- See:
_api_package_search()
- Parameters:
owner_org – ability to filter packages by owner_org
filter – dict of filters to apply, which translate to the API fq argument fq documentation: any filter queries to apply. Note: +site_id:{ckan_site_id} is added to this string prior to the query being executed.
q – the solr query. Optional. Default is ‘:’
include_private – if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API
include_drafts – if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False.
sort – sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings.
facet – whether to enable faceted results. Default: True in API.
limit – maximum number of results to return. Translatees to the API rows argument.
offset – the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument.
params – other parameters to pass to package_search
- Returns:
- _api_package_show(package_id, *, params: dict = None) CkanPackageInfo
API call to package_show. Returns the information on the package and the resources contained in the package. Not recommended for outer use because this method does not return information about the DataStores. Prefer the map_resources method.
- See:
map_resources()
- Parameters:
package_id – package id.
params – See API documentation.
- Returns:
- _api_resource_show(resource_id, *, params: dict = None) CkanResourceInfo
API call to resource_show. Returns the metadata on a resource.
- Parameters:
resource_id – resource id.
params – See API documentation.
- Returns:
- _api_resource_view_list(resource_id: str, *, params: dict = None) List[CkanViewInfo]
API call to resource_view_list.
- Parameters:
params – typically, the request can be limited to an organization with the owner_org parameter
- Returns:
- _api_user_list(*, q: str = None, email: str = None, params: dict = None) List[CkanUserInfo]
API call to user_list.
- Parameters:
params
- Returns:
- _api_user_show(*, params: dict = None) CkanUserInfo | None
API call to user_show. With no params, returns the name of the current user logged in.
- Returns:
dict with information on the current user
- _enrich_resource_info(resource_info: CkanResourceInfo, *, datastore_info: bool = False, resource_view_list: bool = False) None
Perform additional optional queries to add more information on a resource.
- Parameters:
resource_info
datastore_info – option to query datastore_info
resource_view_list – option to query resource_view_list
- Returns:
- check_package_name_arg(*, package_name: str, package_id: str, raise_error: bool = True) bool
Check package name argument against ID which was found by API
- Parameters:
package_name – package name, ID or title
package_id – package ID known by the API
raise_error – Option to raise an error
- Returns:
- complete_package_list(package_list: str | List[str] = None, *, owner_org: str = None, include_private: bool = True, include_drafts: bool = True, params: dict = None) List[str]
This function can list all packages of a CKAN server, for an organization or keeps the list as is. It is an auxiliary function to initialize a package_list argument
- copy(new_identifier: str = None, *, dest=None)
Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)
- datastore_info(resource_id: str, *, params: dict = None, display_request_not_found: bool = True) CkanDataStoreInfo
- get_datastore_fields_or_request(resource_id: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, return_list: bool = False) List[dict] | OrderedDict[str, CkanField] | None
- get_datastore_info_or_request(resource_name: str, package_name: str = None, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) CkanDataStoreInfo | None
Get information on a DataStore if present in the map or perform request.
- Parameters:
resource_name – resource name or id
package_name – package name or id (required if the resource name is provided)
request_missing – confirm to perform the request if the information is missing
error_not_mapped – raise error if the resource is not mapped
- Returns:
- get_datastore_info_or_request_of_id(resource_id: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) CkanDataStoreInfo | None
Get information on a DataStore if present in the map or perform request.
- Parameters:
resource_id – resource id
request_missing – confirm to perform the request if the information is missing
error_not_mapped – raise error if the resource is not mapped
- Returns:
- get_organization_info_or_request(organization_name: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) CkanOrganizationInfo | None
Get information on a Package if present in the map or perform request.
- Parameters:
organization_name – organization name or id
request_missing – confirm to perform the request if the information is missing
error_not_mapped – raise error if the resource is not mapped
- Returns:
- get_package_info_or_request(package_name: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None) CkanPackageInfo | None
Get information on a Package if present in the map or perform request.
- Parameters:
package_name – package name or id
request_missing – confirm to perform the request if the information is missing
error_not_mapped – raise error if the resource is not mapped
- Returns:
- get_package_page_url(package_name: str, *, error_not_found: bool = True, default_url: bool = False) str
Get URL of package presentation page in CKAN (landing page).
- Parameters:
package_name
error_not_found
default_url – return url based on package name, even if it was not found.
- Returns:
- get_resource_id_or_request(resource_name: str, package_name: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) str | None
- get_resource_info_or_request(resource_name: str, package_name: str = None, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, datastore_info: bool = False) CkanResourceInfo | None
- get_resource_info_or_request_of_id(resource_id: str, *, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True, datastore_info: bool = False) CkanResourceInfo | None
Get information on a resource if present in the map or perform request. Recommended: self.map.get_resource_info() rather than this for this usage because resource information is returned when calling package_info during the mapping process.
- Parameters:
resource_id – resource id
request_missing – confirm to perform the request if the information is missing
error_not_mapped – raise error if the resource is not mapped
- Returns:
- get_resource_page_url(resource_name: str, package_name: str = None, *, error_not_mapped: bool = True) str
Get URL of resource presentation page in CKAN (landing page).
- Parameters:
package_name
- Returns:
- get_resource_view_list_or_request(resource_id: str, error_not_found: bool = True) List[CkanViewInfo] | None
Returns either the resource view list which was already received or emits a new query for this information.
- Parameters:
resource_id
error_not_found
- Returns:
- group_list(*, limit: int = None, offset: int = 0, groups: List[str] = None, all_fields: bool = True, include_users: bool = True, params: dict = None) List[CkanGroupInfo]
- group_list_all(*, all_fields: bool = True, include_users: bool = True, cancel_if_present: bool = False, params: dict = None, limit: int = None, offset: int = None) List[CkanGroupInfo] | List[str]
API call to group_list. The call can be canceled if the list is already present (not recommended, rather use get_organization_info_or_request).
- Parameters:
params
cancel_if_present – option to cancel when list is already present.
- Returns:
- input_missing_info(*, base_dir: str = None, input_args: bool = False, input_args_if_necessary: bool = False, input_apikey: bool = True, error_not_found: bool = True, input_owner_org: bool = False)
Ask user information in the console window.
- Parameters:
input_owner_org – option to ask for the owner organization.
- Returns:
- license_list(*, cancel_if_present: bool = True, params: dict = None) List[CkanLicenseInfo]
API call to license_list. The call can be canceled if the list is already present.
- Parameters:
params
cancel_if_present – option to cancel when list is already present.
- Returns:
- map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, progress_callback: CkanProgressCallbackABC = None) CkanMap
Map the resources of a given package to obtain resource IDs associated with the package name and its resources.
- Parameters:
package_list – List of packages to request. If not provided, the result of package_search is used.
params – Additional parameters to pass to all API calls (not recommended).
datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.
resource_view_list – If True, enables the request of the view_list API for each resource.
organization_info – If True, enables the request of the organization_list API before other requests.
license_list – If True, enables the request of the license_list API.
only_missing – If True, skips requesting already-mapped packages.
error_not_found – If True, packages not found by the API are ignored (no error is raised).
owner_org – Filters packages by a specific organization (only if package_search is used).
- Returns:
A mapping of resources for the specified package(s).
Note
Packages were previously referred to as DataSets in earlier CKAN implementations.
A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.
- map_user_rights(*, cancel_if_present: bool = True, progress_callback: CkanProgressCallbackABC = None) CkanMap
Map user and group access rights to the packages currently mapped by CKAN :return:
- organization_list_all(*, cancel_if_present: bool = False, params: dict = None, all_fields: bool = True, include_users: bool = False, limit: int = None, offset: int = None) List[CkanOrganizationInfo] | List[str]
API call to license_list. The call can be canceled if the list is already present (not recommended, rather use get_organization_info_or_request).
- Parameters:
params
cancel_if_present – option to cancel when list is already present.
- Returns:
- organization_show(id: str, *, params: dict = None) CkanOrganizationInfo
- package_collaborator_list(package_id: str, *, params: dict = None, cancel_if_present: bool = False) Dict[str, CkanCollaboration]
- package_search_all(*, params: dict = None, owner_org: str = None, filter: dict = None, q: str = None, include_private: bool = True, include_drafts: bool = True, sort: str = None, facet: bool = None, limit: int = None, offset: int = None, search_all: bool = True) List[CkanPackageInfo]
- package_show(package_id, *, params: dict = None) CkanPackageInfo
- purge(purge_map: bool = False) None
Erase temporary data stored in this object
- Parameters:
purge_map – whether to purge the map created with map_resources
- query_current_user(*, verbose: bool = None, error_not_found: bool = False) CkanUserInfo | None
- remap_resources(*, params=None, purge: bool = True, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None)
Perform a new request on previously mapped packages.
- Parameters:
params
purge – option to reset the map before remapping.
datastore_info – enforce the request of api_datastore_info
resource_view_list – enforce the request of view_list API for each resource
license_list – enforce the request of license_list API
- Returns:
- resource_is_datastore(resource_id: str) bool
Basic test to know whether a resource is DataStore.
- Parameters:
resource_id
- Returns:
- resource_show(resource_id, *, params: dict = None) CkanResourceInfo
- resource_view_list(resource_id: str, *, params: dict = None) List[CkanViewInfo]
- set_default_map_mode(datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None) None
Set up the optional queries orchestrated by the map_resources function
- Parameters:
datastore_info
resource_view_list
organization_info
license_list
- Returns:
- set_owner_org(owner_org: str, *, error_not_found: bool = True) None
Set the default owner organization.
- Parameters:
owner_org – owner organization name, title or id.
- Returns:
- test_ckan_connection(raise_error: bool = False) bool
Test if the CKAN URL aims to a CKAN server by testing the package_search API. This does not check authentication.
- test_ckan_login(*, raise_error: bool = False, verbose: bool = None, empty_key_connected: bool = False) bool
Test if your login leads to a user account.
- Parameters:
raise_error – option to raise an error if no account was detected
verbose – option to display username in console
empty_key_connected – option to ignore the test if the API key is empty
- user_list(*, cancel_if_present: bool = False, q: str = None, email: str = None, params: dict = None) List[CkanUserInfo]
API call to user_list. The call can be canceled if the list is already present.
- Parameters:
params
cancel_if_present – option to cancel when list is already present.
- Returns:
ckanapi_harvesters.ckan_api.ckan_api_2_readonly module
- class ckanapi_harvesters.ckan_api.ckan_api_2_readonly.CkanApiReadOnly(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiReadOnlyParams = None, map: CkanMap = None, identifier=None)
Bases:
CkanApiMapCKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements requests to read data from the CKAN server resources / DataStores.
- __init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiReadOnlyParams = None, map: CkanMap = None, identifier=None)
CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.
- Parameters:
url – url of the CKAN server
proxies – proxies to use for requests
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
owner_org – name of the organization to limit package_search (optional)
params – other connection/behavior parameters
map – map of known resources
identifier – identifier of the ckan client
- _api_datastore_dump_all(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, bom: bool = None, requests_limit: int = None, params: dict = None, search_all: bool = True, return_df: bool = True, progress_callback: CkanProgressCallbackABC = None) DataFrame | Response
Successive calls to _api_datastore_dump_df until an empty list is received.
- See:
_api_datastore_dump()
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
format – The return format in the returned response (default=csv, tsv, json, xml) (optional)
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – if False, only the first request is operated
- Returns:
- _api_datastore_dump_all_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, format: str = None, bom: bool = None, params: dict = None, search_all: bool = True, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[Response, Any, None]
Successive calls to _api_datastore_dump until an empty list is received. Generator implementation which yields one DataFrame per request.
- See:
_api_datastore_dump()
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
format – The return format in the returned response (default=csv, tsv, json, xml) (optional)
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – if False, only the first request is operated
- Returns:
- _api_datastore_dump_df(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, bom: bool = None, params: dict = None) DataFrame
Convert output of _api_datastore_dump_raw to pandas DataFrame.
- _api_datastore_dump_raw(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, bom: bool = None, params: dict = None, compute_len: bool = False) Response
URL call to datastore/dump URL. Dumps successive lines in the DataStore.
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
format – The return format in the returned response (default=csv, tsv, json, xml) (optional)
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
- Returns:
raw response
- _api_datastore_search_all(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, search_all: bool = True, params: dict = None, return_df: bool = True, compute_len: bool = False) DataFrame | ListRecords | Any
Successive calls to _api_datastore_search_df until an empty list is received.
- See:
_api_datastore_search()
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
format – The return format in the returned response (default=objects, csv, tsv, lists) (optional)
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – if False, only the first request is operated
- Returns:
- _api_datastore_search_all_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, search_all: bool = True, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]
Successive calls to _api_datastore_search_df until an empty list is received. Generator implementation which yields one DataFrame per request.
- See:
_api_datastore_search()
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
format – The return format in the returned response (default=objects, csv, tsv, lists) (optional)
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – if False, only the first request is operated
- Returns:
- _api_datastore_search_df(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, params: dict = None, compute_len: bool = True) DataFrame
Convert output of _api_datastore_search_raw to pandas DataFrame.
- _api_datastore_search_raw(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, format: str = None, params: dict = None, compute_len: bool = False) CkanActionResponse
API call to datastore_search. Performs queries on the DataStore.
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
format – The return format in the returned response (default=objects, csv, tsv, lists) (optional)
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
- Returns:
- _api_datastore_search_sql_all(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) DataFrame | ListRecords
Successive calls to _api_datastore_search_sql until an empty list is received.
- See:
_api_datastore_search_sql()
- Parameters:
sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’
limit – Limit the number of records per request
offset – Offset in the returned records
params – N/A
search_all – if False, only the first request is operated
- Returns:
- _api_datastore_search_sql_all_page_generator(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]
Successive calls to _api_datastore_search_sql until an empty list is received. Generator implementation which yields one DataFrame per request.
- See:
_api_datastore_search_sql()
- Parameters:
sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’
limit – Limit the number of records per request
offset – Offset in the returned records
params – N/A
search_all – if False, only the first request is operated
- Returns:
- _api_datastore_search_sql_df(sql: str, *, params: dict = None, limit: int = None, offset: int = 0) DataFrame
Convert output of _api_datastore_search_sql_raw to pandas DataFrame.
- _api_datastore_search_sql_raw(sql: str, *, params: dict = None, limit: int = None, offset: int = 0) CkanActionResponse
API call to datastore_search_sql. Performs SQL queries on the DataStore. These queries can be more complex than with datastore_search. The DataStores are referenced by their resource_id, surrounded by quotes. The field names are referred by their name in upper case, surrounded by quotes. __NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false
- Parameters:
sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’
limit – Limit the number of records per request
offset – Offset in the returned records
params – N/A
- Returns:
- static _get_default_bom_option(bom: bool = None, format: str = None, search_method: bool = False) bool | None
API datastore_dump includes an option to return the BOM (Byte Order Mark) for requests in CSV/TSV format. The BOM helps text-processing tools and applications determine the encoding of the file e.g. to distinguish between UTF-8 and UTF-16.
Note
To correctly handle BOM characters in pandas.read_csv, you should specify encoding=utf-8-sig parameter. This is taken into account in the decoding function.
- _rx_records_df_clean(df: DataFrame) None
Auxiliary function for cleaning dataframe from DataStore requests
- Parameters:
df
- Returns:
- datastore_dump(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, return_df: bool = True) DataFrame | ListRecords | Any | List[CkanActionResponse]
Alias of datastore_search with search_all=True by default. Uses the API datastore_search
- See:
datastore_search()
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)
return_df – Return pandas DataFrame (True) or dict (False)
- Returns:
- datastore_dump_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]
Function alias to datastore_search_generator with search_all=True by default. Uses the API datastore_search
- See:
datastore_search_generator
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
requests_limit – Limit the number of requests
progress_callback – Progress callback function
offset – Offset in the returned records
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)
- Returns:
- datastore_search(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, params: dict = None, search_all: bool = False, search_method: bool = True, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, format: str = None, bom: bool = None, return_df: bool = True) DataFrame | ListRecords | Any | List[CkanActionResponse]
Preferred entry-point for a DataStore read request. Uses the API datastore_search
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)
- Returns:
- datastore_search_cursor(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = False) Generator[Series | dict | list | str, Any, None]
Cursor on rows of datastore_search
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
total_limit – Limit the number of records to return
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)
return_df – Return pandas Series (True) or dict (False)
format – Format of the data requested through the API. This does not change the output if return_df is True.
- Returns:
- datastore_search_fields_type_dict(resource_id: str, *, filters: dict = None, q: str = None, distinct: bool = None, fields: List[str] = None, request_missing: bool = True, error_not_mapped: bool = False, error_not_found: bool = True) OrderedDict
- datastore_search_find_one(resource_id: str, *, filters: dict = None, q: str = None, distinct: bool = None, fields: List[str] = None, offset: int = 0, return_df: bool = True) DataFrame | ListRecords | Any | List[CkanActionResponse]
Request first result of a query
- Parameters:
resource_id – resource id
- Returns:
- datastore_search_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None] | Generator[Response, Any, None]
Preferred entry-point for a DataStore read request. Uses the API datastore_search
- Parameters:
resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit – Limit the number of records per request
offset – Offset in the returned records
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)
return_df – Return pandas DataFrame (True) or dict (False)
- Returns:
- datastore_search_row_count(resource_id: str, *, filters: dict = None, q: str = None, distinct: bool = None, fields: List[str] = None) int
Request the number of rows in a DataStore
- Parameters:
resource_id – resource id
- Returns:
- datastore_search_sql(sql: str, *, params: dict = None, search_all: bool = False, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) DataFrame | Tuple[ListRecords, dict]
Preferred entry-point for a DataStore SQL request. :see: _api_datastore_search_sql() __NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false
- Parameters:
sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’
limit – Limit the number of records per request
offset – Offset in the returned records
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – N/A
search_all – Option to renew the request until there are no more records.
return_df – Return pandas DataFrame (True) or dict (False)
- Returns:
- datastore_search_sql_cursor(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = False) Generator[Series | dict, Any, None]
Preferred entry-point for a DataStore SQL request, to iterate over records. :see: _api_datastore_search_sql()
__NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false
- Parameters:
sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’
limit – Limit the number of records per request
offset – Offset in the returned records
total_limit – Limit the number of records to return
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – N/A
search_all – Option to renew the request until there are no more records.
return_df – Return pandas Series (True) or dict (False)
- Returns:
- datastore_search_sql_fields_type_dict(sql: str, *, params: dict = None) OrderedDict
- datastore_search_sql_find_one(sql: str, *, params: dict = None, offset: int = 0, return_df: bool = True) DataFrame | Tuple[ListRecords, dict]
First element of an SQL request
- datastore_search_sql_page_generator(sql: str, *, params: dict = None, search_all: bool = True, limit: int = None, offset: int = 0, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, return_df: bool = True) Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]
Preferred entry-point for a DataStore SQL request. :see: _api_datastore_search_sql()
__NB__: This action is not available when ckanapi_harvesters.datastore.sqlsearch.enabled is set to false
- Parameters:
sql – SQL query e.g. f’SELECT * IN “{resource_id}” WHERE “USER_ID” < 0’
limit – Limit the number of records per request
offset – Offset in the returned records
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – N/A
search_all – Option to renew the request until there are no more records.
return_df – Return pandas DataFrame (True) or dict (False)
- Returns:
- static from_dict_df_args(fields_type_dict: OrderedDict) dict
- list_datastore_aliases() List[CkanAliasInfo]
- map_file_resource_sizes(*, cancel_if_present: bool = True, progress_callback: CkanProgressCallbackABC = None) None
- map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, progress_callback: CkanProgressCallbackABC = None) CkanMap
Map the resources of a given package to obtain resource IDs associated with the package name and its resources.
- Parameters:
package_list – List of packages to request. If not provided, the result of package_search is used.
params – Additional parameters to pass to all API calls (not recommended).
datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.
resource_view_list – If True, enables the request of the view_list API for each resource.
organization_info – If True, enables the request of the organization_list API before other requests.
license_list – If True, enables the request of the license_list API.
only_missing – If True, skips requesting already-mapped packages.
error_not_found – If True, packages not found by the API are ignored (no error is raised).
owner_org – Filters packages by a specific organization (only if package_search is used).
- Returns:
A mapping of resources for the specified package(s).
Note
Packages were previously referred to as DataSets in earlier CKAN implementations.
A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.
- static read_fields_df_args(fields_type_dict: OrderedDict) dict
- static read_fields_type_dict(fields_list_dict: List[dict]) OrderedDict
- resource_download(resource_id: str, *, method: str = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None, stream: bool = False) Tuple[CkanResourceInfo, Response | None]
Uses the link provided in resource_show to download a resource.
- Parameters:
resource_id – resource id
- Returns:
- resource_download_df(resource_id: str, *, method: str = None, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None) Tuple[CkanResourceInfo, DataFrame | None]
Uses the link provided in resource_show to download a resource and interprets it as a DataFrame.
- Parameters:
resource_id – resource id
- Returns:
- resource_download_test_head(resource_id: str, *, raise_error: bool = False, proxies: dict = None, headers: dict = None, auth: AuthBase | Tuple[str, str] = None, verify: bool | str | None = None) None | ContextErrorLevelMessage
This sends a HEAD request to the resource download url using the CKAN connexion parameters via resource_download. The resource is not downloaded but the headers indicate if the url is valid.
- Returns:
None if successful
- class ckanapi_harvesters.ckan_api.ckan_api_2_readonly.CkanApiReadOnlyParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Bases:
CkanApiParamsBasic- copy(new_identifier: str = None, *, dest=None)
- default_df_download_id_field_treatment: CkanIdFieldTreatment = 1
ckanapi_harvesters.ckan_api.ckan_api_3_policy module
- class ckanapi_harvesters.ckan_api.ckan_api_3_policy.CkanApiPolicy(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, identifier=None)
Bases:
CkanApiReadOnly- __init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, identifier=None)
CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.
- Parameters:
url – url of the CKAN server
proxies – proxies to use for requests
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
policy – data format policy to use with policy_check function
policy_file – path to a JSON file containing the data format policy to use with policy_check function
owner_org – name of the organization to limit package_search (optional)
params – other connection/behavior parameters
map – map of known resources
policy – data format policy to be used with the policy_check function.
policy_file – path to a JSON file containing the data format policy to load.
identifier – identifier of the ckan client
- copy(new_identifier: str = None, *, dest=None)
Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)
- load_default_policy(*, error_not_found: bool = False, load_error: bool = True, cancel_if_present: bool = False, force: bool = False) CkanPackageDataFormatPolicy | None
Function to load the default data format policy from the CKAN server. The default policy is defined in ckan_configuration
- Parameters:
error_not_found
cancel_if_present
force
- Returns:
- load_policy(policy_file: str, base_dir: str = None, proxies: dict = None, headers: dict = None, error_not_found: bool = True, load_error: bool = True) CkanPackageDataFormatPolicy
Load the CKAN data format policy from file (JSON format).
- Parameters:
policy_file – path to the policy file
base_dir – base directory, if the apikey_file is a relative path
- Returns:
- map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, load_policy: bool = None, progress_callback: CkanProgressCallbackABC = None) CkanMap
Map the resources of a given package to obtain resource IDs associated with the package name and its resources.
- Parameters:
package_list – List of packages to request. If not provided, the result of package_search is used.
params – Additional parameters to pass to all API calls (not recommended).
datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.
resource_view_list – If True, enables the request of the view_list API for each resource.
organization_info – If True, enables the request of the organization_list API before other requests.
license_list – If True, enables the request of the license_list API.
only_missing – If True, skips requesting already-mapped packages.
error_not_found – If True, packages not found by the API are ignored (no error is raised).
owner_org – Filters packages by a specific organization (only if package_search is used).
- Returns:
A mapping of resources for the specified package(s).
Note
Packages were previously referred to as DataSets in earlier CKAN implementations.
A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.
- policy_check(package_list: str | List[str] = None, policy: CkanPackageDataFormatPolicy = None, *, buffer: Dict[str, List[DataPolicyError]] = None, raise_error: bool = False, verbose: bool = None, auto_update: bool = None, progress_callback: CkanProgressCallbackABC = None) bool
Enforce policy on mapped packages
- Parameters:
policy
- Returns:
- query_default_policy(*, error_not_found: bool = False, load_error: bool = True) CkanPackageDataFormatPolicy | None
Download default policy and return it without loading it in the policy attribute.
- Parameters:
error_not_found
- Returns:
- set_default_map_mode(datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, load_policy: bool = None) None
Set up the optional queries orchestrated by the map_resources function
- Parameters:
datastore_info
resource_view_list
organization_info
license_list
- Returns:
- class ckanapi_harvesters.ckan_api.ckan_api_3_policy.CkanApiPolicyParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Bases:
CkanApiReadOnlyParams- copy(new_identifier: str = None, *, dest=None)
ckanapi_harvesters.ckan_api.ckan_api_4_readwrite module
- class ckanapi_harvesters.ckan_api.ckan_api_4_readwrite.CkanApiReadWrite(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)
Bases:
CkanApiPolicyCKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements requests to write data to the CKAN server resources / DataStores.
- __init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiPolicyParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)
CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.
- Parameters:
url – url of the CKAN server
proxies – proxies to use for requests
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
policy – data format policy to use with policy_check function
policy_file – path to a JSON file containing the data format policy to use with policy_check function
owner_org – name of the organization to limit package_search (optional)
params – other connection/behavior parameters
map – map of known resources
policy – data format policy to be used with the policy_check function.
policy_file – path to a JSON file containing the data format policy to load.
data_cleaner_upload – data cleaner object to use before uploading to a CKAN DataStore.
identifier – identifier of the ckan client
- _api_datapusher_submit(resource_id: str, *, params: dict = None) bool
Call to API action datapusher_submit. This triggers the normally asynchronous DataPusher service for a given resource.
- Parameters:
resource_id – resource id
params
- Returns:
- _api_datastore_upsert_raw(records: dict | List[dict] | DataFrame, resource_id: str, *, method: UpsertChoice | str, params: dict = None, force: bool = None, dry_run: bool = False, last_insertion: bool = True) CkanActionResponse
API call to api_datastore_upsert.
- Parameters:
records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.
resource_id – destination resource id
method – see UpsertChoice (insert, update or upsert)
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
last_insertion – trigger for calculate_record_count
(doc: updates the stored count of records, used to optimize datastore_search in combination with the total_estimation_threshold parameter. If doing a series of requests to change a resource, you only need to set this to True on the last request.) :return: the inserted records as a pandas DataFrame, from the server response
- _api_resource_patch(resource_id: str, *, name: str = None, format: str = None, description: str = None, title: str = None, state: CkanState = None, df: DataFrame = None, file_path: str = None, url: str = None, files=None, payload: bytes | BufferedIOBase = None, payload_name: str = None, params: dict = None) CkanResourceInfo
Call to resource_patch API. This call can be used to change the resource parameters via params (cf. API documentation) or to reupload the resource file into FileStore. The latter action replaces the current resource. If it is a DataStore, it is reset to the new contents of the file. The file can be transmitted either as an url, a file path or a pandas DataFrame. The files argument can pass through these arguments to the requests.post function. A call to datapusher_submit() could be required to take immediately into account the newly downloaded file.
- See:
_api_resource_create
- See:
resource_create
- Parameters:
resource_id – resource id
url – url of the resource to replace resource
params – parameters such as name, format, resource_type can be changed
For file uploads, the following parameters are taken, by order of priority: See upload_prepare_requests_files_arg for an example of formatting.
- Parameters:
files – files pass through argument to the requests.post function. Use to send other data formats.
payload – bytes to upload as a file
payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.
file_path – path of the file to transmit (binary and text files are supported here)
df – pandas DataFrame to replace resource
- Returns:
- copy(new_identifier: str = None, *, dest=None)
Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)
- datastore_insert(records: dict | List[dict] | DataFrame, resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, apply_last_condition: bool = True, always_last_condition: bool = None, data_cleaner: CkanDataCleanerABC = None, force: bool = None, params: dict = None) DataFrame
Alias function to insert data in a DataStore using datastore_upsert.
- See:
_api_datastore_upsert()
- Parameters:
records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.
resource_id – destination resource id
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
- Returns:
the inserted records as a pandas DataFrame, from the server response
- datastore_submit(resource_id: str, *, apply_delay: bool = True, error_timeout: bool = True, params: dict = None) bool
Submit file to re-initiate DataStore, using the preferred method. Current method is datapusher_submit. This encapsulation includes a call to datastore_wait.
- Parameters:
resource_id
apply_delay – Keep true to wait until the datastore is ready (a datastore_search query is performed as a test)
params
- Returns:
- datastore_update(records: dict | List[dict] | DataFrame, resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, apply_last_condition: bool = True, always_last_condition: bool = None, data_cleaner: CkanDataCleanerABC = None, force: bool = None, params: dict = None) DataFrame
Alias function to update data in a DataStore using datastore_upsert. The update is performed based on the DataStore primary keys
- See:
_api_datastore_upsert()
- Parameters:
records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.
resource_id – destination resource id
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
- Returns:
the inserted records as a pandas DataFrame, from the server response
- datastore_upsert(records: dict | List[dict] | DataFrame, resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, force: bool = None, method: UpsertChoice | str = UpsertChoice.Upsert, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None) DataFrame | List[dict]
Encapsulation of _api_datastore_upsert to cut the requests to a limited number of rows.
- See:
_api_datastore_upsert()
- Parameters:
records – records, preferably in a pandas DataFrame - they will be converted to a list of dictionaries.
resource_id – destination resource id
method – by default, set to Upsert
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
limit – number of records per transaction
offset – number of records to skip - use to restart the transfer
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).
always_last_condition – if True, each request applies the last insert operations - default is False
return_df – if True, return a pandas DataFrame or else, a list of dictionaries.
data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.
progress_callback – progress callback function
- Returns:
the inserted records as a pandas DataFrame, from the server response
- datastore_upsert_auto(records_generator: DataFrame | List[dict] | Generator[ListRecords | DataFrame, None, None], resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, request_threshold: int = None, force: bool = None, method: UpsertChoice | str = UpsertChoice.Upsert, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None) int
Version of datastore_upsert accepting generators or DataFrames. The call to the correct function is made upon the type of the records_generator argument.
- See:
datastore_upsert_generator(), datastore_upsert()
- Parameters:
records_generator – generator of records, e.g. chunks from a CSV file generated with pandas.read_csv(.., chunksize=1000)
resource_id – destination resource id
method – by default, set to Upsert
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
limit – number of records per transaction
offset – number of records to skip - use to restart the transfer
request_threshold – number of records to cumulate before sending a request
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).
always_last_condition – if True, each request applies the last insert operations - default is False
return_df – if True, return a pandas DataFrame or else, a list of dictionaries.
data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.
progress_callback – progress callback function
- Returns:
the number of records inserted
- datastore_upsert_generator(records_generator: Generator[ListRecords | DataFrame, None, None], resource_id: str, *, dry_run: bool = False, limit: int = None, offset: int = 0, request_threshold: int = None, force: bool = None, method: UpsertChoice | str = UpsertChoice.Upsert, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None) int
Encapsulation of datastore_upsert to send the rows by chunks provided by records_generator.
- Parameters:
records_generator – generator of records, e.g. chunks from a CSV file generated with pandas.read_csv(.., chunksize=1000)
resource_id – destination resource id
method – by default, set to Upsert
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
limit – number of records per transaction
offset – number of records to skip - use to restart the transfer
request_threshold – number of records to cumulate before sending a request
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).
always_last_condition – if True, each request applies the last insert operations - default is False
return_df – if True, return a pandas DataFrame or else, a list of dictionaries.
data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.
progress_callback – progress callback function
- Returns:
the number of records inserted
- datastore_upsert_last_line(resource_id: str)
Apply last line treatments to a resource.
- datastore_wait(resource_id: str, *, apply_delay: bool = True, error_timeout: bool = True) Tuple[int, float]
Wait until a DataStore has at least one row. The delay between requests to peer on the presence of the DataStore is given by the class attribute submit_delay. If the loop exceeds submit_timeout, an exception is raised.
- Parameters:
resource_id
apply_delay
error_timeout – option to raise an exception in case of timeout
- Returns:
- full_unlock(unlock: bool = True, *, no_ca: bool = None, external_url_resource_download: bool = None) None
Function to unlock full capabilities of the CKAN API
- Parameters:
unlock
- Returns:
- resource_patch(resource_id: str, *, name: str = None, format: str = None, description: str = None, title: str = None, state: CkanState = None, df: DataFrame = None, file_path: str = None, url: str = None, files=None, payload: bytes | BufferedIOBase = None, payload_name: str = None, params: dict = None) CkanResourceInfo
- set_limits(limit_read: int | None, limit_write: int = None) None
Set default query limits. If only one argument is provided, it applies to both limits.
- Parameters:
limit_read – default limit for read requests
limit_write – default limit for upsert (write) requests
- Returns:
- set_submit_timeout(submit_timeout: float, submit_delay: float = None) None
Set timeout for the datastore_wait method. This is called after datastore_submit.
- Parameters:
submit_timeout – timeout after which a TimeoutError is raised (seconds)
submit_delay – delay between requests to peer on DataStore initialization (datastore_wait) (seconds)
- Returns:
- class ckanapi_harvesters.ckan_api.ckan_api_4_readwrite.CkanApiReadWriteParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Bases:
CkanApiPolicyParams- copy(new_identifier: str = None, *, dest=None)
ckanapi_harvesters.ckan_api.ckan_api_5_manage module
- class ckanapi_harvesters.ckan_api.ckan_api_5_manage.CkanApiExtendedParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Bases:
CkanApiManageParams- copy(new_identifier: str = None, *, dest=None)
- class ckanapi_harvesters.ckan_api.ckan_api_5_manage.CkanApiManage(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)
Bases:
CkanApiReadWriteCKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements more advanced requests to manage packages, resources and DataStores on the CKAN server.
- __init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)
CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.
- Parameters:
url – url of the CKAN server
proxies – proxies to use for requests
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
policy – data format policy to use with policy_check function
policy_file – path to a JSON file containing the data format policy to use with policy_check function
owner_org – name of the organization to limit package_search (optional)
params – other connection/behavior parameters
map – map of known resources
policy – data format policy to be used with the policy_check function.
policy_file – path to a JSON file containing the data format policy to load.
data_cleaner_upload – data cleaner object to use before uploading to a CKAN DataStore.
identifier – identifier of the ckan client
- _api_dataset_purge(package_id: str, *, params: dict = None) dict
API call to dataset_purge. This fully removes the package. This action is not reversible. It requires an admin account.
- Parameters:
package_id
params
- Returns:
- _api_datastore_create(resource_id: str, *, records: dict | List[dict] | DataFrame = None, fields: List[dict | CkanField] = None, primary_key: str | List[str] = None, indexes: str | List[str] = None, aliases: str | List[str] = None, params: dict = None, force: bool = None) dict
API call to datastore_create. This endpoint also supports altering tables, aliases and indexes and bulk insertion.
- Parameters:
resource_id – resource id
records
fields
primary_key
indexes
params
force
- Returns:
- _api_datastore_delete(resource_id: str, *, params: dict = None, force: bool = None) dict
Function to delete rows an api_datastore using api_datastore_upsert. If no filter is given, the whole database will be erased. This function is private and should not be called directly.
- Parameters:
resource_id
params
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
- Returns:
- _api_package_create(name: str, private: bool, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None) CkanPackageInfo
API call to package_create.
- Parameters:
name
private
title
notes
owner_org
state
license_id
tags
params
- Returns:
- _api_package_delete(package_id: str, *, params: dict = None) dict
API call to package_delete. This marks the package as deleted and does not remove data.
- Parameters:
package_id
params
- Returns:
- _api_package_patch(package_id: str, package_name: str = None, private: bool = None, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields_update: dict = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None) CkanPackageInfo
API call to package_patch. Use to change the properties of a package. This method is preferred to package_update which requires to resend the full package configuration. (API doc for package_update: It is recommended to call ckanapi_harvesters.logic.action.get.package_show(), make the desired changes to the result, and then call package_update() with it.)
- Parameters:
package_id
package_name
private
title
notes
owner_org
state
license_id
params
- Returns:
- _api_package_resource_reorder(package_id: str, resource_ids: List[str], *, params: dict = None) dict
API call to package_resource_reorder. Reorders resources within a package. Reorder resources against datasets. If only partial resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.
- Parameters:
package_id – the id or name of the package to update
resource_ids – a list of resource ids in the order needed
params
- Returns:
- _api_resource_create(package_id: str, name: str, *, format: str = None, description: str = None, state: CkanState = None, df: DataFrame = None, file_path: str = None, url: str = None, files=None, payload: bytes | BufferedIOBase = None, payload_name: str = None, params: dict = None) CkanResourceInfo
API call to resource_create.
- See:
_api_resource_patch
- See:
resource_create
- Parameters:
package_id
name
format
url – url of the resource to replace resource
params – additional parameters such as resource_type can be set
Note
For file uploads, the following parameters are taken, by order of priority: See upload_prepare_requests_files_arg for an example of formatting.
- Parameters:
files – files pass through argument to the requests.post function. Use to send other data formats.
payload – bytes to upload as a file
payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.
file_path – path of the file to transmit (binary and text files are supported here)
df – pandas DataFrame to replace resource
- Returns:
- _api_resource_delete(resource_id: str, *, params: dict = None, force: bool = None, bypass_admin: bool = False) dict
Function to delete a resource. This fully removes the resource, definitively. Requires enable_admin=True.
- Parameters:
resource_id
params
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
- Returns:
- _api_resource_view_create(resource_id: str, title: str | List[str] = None, *, view_type: str | List[str] = None, params: dict = None) List[CkanViewInfo]
API call to resource_view_create.
title and view_type must have same length if specified as lists.
- Parameters:
resource_id – resource id
title – Title of the resource
view_type – Type of view, typically recline_view for Data Explorer
params
- Returns:
- copy(new_identifier: str = None, *, dest=None)
Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)
- datastore_clear(resource_id: str, *, error_not_found: bool = True, params: dict = None, force: bool = None, bypass_admin: bool = False) dict | None
Function to clear data in a DataStore using _api_datastore_delete. Requires enable_admin=True. This implementation adds the option error_not_found. If set to False, no error is raised if the resource is found by the datastore is not.
- See:
_api_datastore_delete()
- Parameters:
resource_id
error_not_found – if False, does not raise an exception if the resource exists but there is not datastore
params
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
bypass_admin – option to bypass check of enable_admin
- Returns:
- datastore_create(resource_id: str, *, delete_previous: bool = False, bypass_admin: bool = False, records: dict | List[dict] | DataFrame = None, fields: List[dict | CkanField] = None, primary_key: str | List[str] = None, indexes: str | List[str] = None, aliases: str | List[str] = None, params: dict = None, force: bool = None, data_cleaner: CkanDataCleanerABC = None, inhibit_datastore_patch_indexes: bool = False, progress_callback: CkanProgressCallbackABC = None) dict
Encapsulation of the datastore_create API call. This function can optionally clear the DataStore before creating it.
- Parameters:
resource_id
delete_previous – option to delete the previous datastore, if exists (default:False)
records
fields
primary_key
indexes
params
force
inhibit_datastore_patch_indexes – option to ignore primary_key and indexes in case the DataStore already exists. In certain cases, running without this option can lead to impossible updates (recomputing indexes on large tables can be costly).
- Returns:
- datastore_default_alias(resource_name: str, package_name: str, *, query_names: bool = True, error_not_found: bool = True) str
- static datastore_default_alias_of_info(resource_info: CkanResourceInfo, package_info: CkanPackageInfo) str
- static datastore_default_alias_of_names(resource_name: str, package_name: str) str
- datastore_delete_rows(resource_id: str, filters: dict, *, params: dict = None, force: bool = None, calculate_record_count: bool = True) dict
Function to delete certain rows a DataStore using _api_datastore_delete. The filters are mandatory here. If not given, the whole database would be erased. Prefer using datastore_clear for this usage.
- See:
_api_datastore_delete()
- Parameters:
resource_id
filters
params
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
calculate_record_count
- Returns:
- static datastore_field_dict(fields: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_merge: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_update: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, *, fields_type_override: Dict[str, str] = None, fields_description: Dict[str, str] = None, fields_label: Dict[str, str] = None, return_list: bool = False) Dict[str, CkanField] | List[dict]
Initialization of the fields parameter for datastore_create. Only parts used by this package are present. To complete the field’s dictionaries, refer to datastore_field_patch_dict.
- Parameters:
fields – first source of field information, usually the fields from the DataStore
fields_merge – second source. Values from this dictionary will overwrite fields
fields_update – third source. Values from this dictionary will be prioritary over all values.
fields_type_override
fields_description
fields_label
return_list
- Returns:
dict if return_list is False, list if return_list is True.
You can easily transform the dict to a list with the following code:
`python fields = list(fields_update.values()) `
- datastore_field_patch(resource_id: str, fields_merge: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_update: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, *, only_if_needed: bool = False, fields: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_type_override: Dict[str, str] = None, field_description: Dict[str, str] = None, fields_label: Dict[str, str] = None) Tuple[bool, List[dict], dict | bool | None]
Function helper call to API datastore_create in order to update the parameters of some fields. The initial field configuration is taken from the mapped information or requested. Typically, this could be used to enforce a data type on a field. In this case, it is required to resubmit the resource data with the API resource_patch. The field_update argument would be e.g. field_update={“id”: {“info”: {“type_override”: “text”}}} This is equivalent to the option field_type_override={“id”: “text”}
Note
It is not possible to rename a field after creation through the API. To do this, the change must be done in the database.
- Parameters:
resource_id – resource id
fields_update – dictionary of field id and properties to change. The update of the property dictionary is recursive, ensuring only the fields appearing in the update are changed. This field can be overridden by the values given in field_type_override, field_description, or field_label.
fields_type_override – argument to simplify the edition of the info.type_override value for each field id.
field_description – argument to simplify the edition of the info.notes value for each field id
fields_label – argument to simplify the edition of the info.label value for each field id
only_if_needed – Cancels the request if the changes do not affect the current configuration
- Returns:
a tuple (update_needed, fields_new, update_dict)
- datastore_field_patch_dict(fields_merge: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, fields_update: List[dict | CkanField] | OrderedDict[str, CkanField | dict] = None, *, fields_type_override: Dict[str, str] = None, fields_description: Dict[str, str] = None, fields_label: Dict[str, str] = None, return_list: bool = False, datastore_merge: bool = True, resource_id: str = None, error_not_found: bool = True) Tuple[bool | None, Dict[str, CkanField] | List[dict]]
Calls datastore_field_dict and merges attributes with those found in datastore_info if datastore_merge=True.
- Parameters:
fields_update
fields_type_override
fields_description
fields_label
return_list
datastore_merge
resource_id – required if datastore_merge=True
- Returns:
- static default_resource_view(resource_format: str, is_datastore: bool = True) Tuple[str, str]
Definition of the default resource view based on the resource format.
- Parameters:
resource_format
- Returns:
- full_unlock(unlock: bool = True, *, no_ca: bool = None, external_url_resource_download: bool = None) None
Function to unlock full capabilities of the CKAN API
- Parameters:
unlock
- Returns:
- package_create(package_name: str, private: bool = True, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields_update: dict = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None, cancel_if_exists: bool = True, update_if_exists=True, clear_if_deleted_state: bool = None) CkanPackageInfo
Helper function to create a new package. This first checks if the package already exists.
- See:
_api_package_create()
- Parameters:
package_name
private
title
notes
owner_org
license_id
state
params
cancel_if_exists
update_if_exists
clear_if_deleted_state – Option to clear the resources of a package if it was found in Deleted state. Default behavior is set in params.
- Returns:
- package_delete(package_id: str, definitive_delete: bool = False, *, params: dict = None) dict
Alias function for package removal. Either calls API package_delete to simply mark for deletion or dataset_purge to definitively delete the package.
- Parameters:
package_id
definitive_delete – True: calls dataset_purge (action not reversible), False: calls API package_delete.
params
- Returns:
- package_delete_resources(package_name: str, *, bypass_admin: bool = False)
Definitively delete all resources associated with the package.
- Parameters:
package_name
- Returns:
- package_patch(package_id: str, package_name: str = None, private: bool = None, *, title: str = None, notes: str = None, owner_org: str = None, state: CkanState | str = None, license_id: str = None, tags: List[str] = None, tags_list_dict: List[Dict[str, str]] = None, url: str = None, version: str = None, custom_fields_update: dict = None, custom_fields: dict = None, author: str = None, author_email: str = None, maintainer: str = None, maintainer_email: str = None, params: dict = None) CkanPackageInfo
- package_resource_reorder(package_id: str, resource_ids: List[str], *, params: dict = None) dict
API call to package_resource_reorder. Reorders resources within a package. Reorder resources against datasets. If only partial resource ids are supplied then these are assumed to be first and the other resources will stay in their original order.
- Parameters:
package_id – the id or name of the package to update
resource_ids – a list of resource ids in the order needed
params
- Returns:
- package_state_change(package_id: str, state: CkanState) CkanPackageInfo
Change package state using the package_patch API.
- Parameters:
package_id
state
- Returns:
- resource_create(package_id: str, name: str, *, format: str = None, description: str = None, state: CkanState = None, params: dict = None, url: str = None, files=None, file_path: str = None, df: DataFrame = None, payload: bytes | BufferedIOBase = None, payload_name: str = None, cancel_if_exists: bool = True, update_if_exists: bool = False, reupload: bool = False, create_default_view: bool = True, auto_submit: bool = False, datastore_create: bool = False, records: dict | List[dict] | DataFrame = None, fields: List[dict] = None, primary_key: str | List[str] = None, indexes: str | List[str] = None, aliases: str | List[str] = None, inhibit_datastore_patch_indexes: bool = False, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None) CkanResourceInfo
Proxy to API call resource_create verifying if a resource with the same name already exists and adding the default view.
- Parameters:
package_id
name
format
params
cancel_if_exists – check if a resource with the same name already exists in the package on CKAN server If a resource with the same name already exists, the info for this resource is returned
update_if_exists – If a resource with the same name already exists (and cancel_if_exists=True), a call to resource_patch is performed.
reupload – re-upload the resource if a resource with the same name already exists and cancel_if_exists=True and update_if_exists=True
create_default_view
Note
For file uploads, the following parameters are taken, by order of priority: See upload_prepare_requests_files_arg for an example of formatting.
- Parameters:
files – files pass through argument to the requests.post function. Use to send other data formats.
payload – bytes to upload as a file
payload_name – name of the payload to use (associated with the payload argument) - this determines the format recognized in CKAN viewers.
file_path – path of the file to transmit (binary and text files are supported here)
df – pandas DataFrame to replace resource
- Returns:
- resource_delete(resource_id: str, *, params: dict = None, force: bool = None, bypass_admin: bool = False) dict
- resource_view_create(resource_id: str, title: str | List[str] = None, *, view_type: str | List[str] = None, params: dict = None, error_no_default_view_type: bool = False, cancel_if_exists: bool = True, is_datastore: bool = True) List[CkanViewInfo]
Encapsulation of the API resource_view_create. If no resource view is provided to create (None), the function looks up the default view defined in default_resource_view. This function also looks at the existing views and cancels the creation of those which have the same title. If provided as a list, title and view_type must have same length.
- Parameters:
resource_id
title
view_type
params
error_no_default_view_type
cancel_if_exists – option to cancel an existing view if it exists (based on the title)
- Returns:
- class ckanapi_harvesters.ckan_api.ckan_api_5_manage.CkanApiManageParams(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Bases:
CkanApiReadWriteParams- copy(new_identifier: str = None, *, dest=None)
- ckanapi_harvesters.ckan_api.ckan_api_5_manage.clean_table_name(variable_name: str) str
Replace unwanted characters and spaces to generate a table name similar to a table name
ckanapi_harvesters.ckan_api.ckan_api_params module
Basic parameters for the CkanApi class
- class ckanapi_harvesters.ckan_api.ckan_api_params.CkanApiDebug
Bases:
object- last_response: Response | None
- class ckanapi_harvesters.ckan_api.ckan_api_params.CkanApiParamsBasic(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
Bases:
object- __init__(*, proxies: str | dict | ProxyConfig = None, ckan_headers: dict = None, http_headers: dict = None)
- Parameters:
proxies – proxies to use for requests
ckan_headers – headers to use for requests, only to the CKAN server
http_headers – headers to use for requests, for all requests, including external requests and to the CKAN server
- _cli_ckan_args_apply(args: Namespace, *, base_dir: str = None, error_not_found: bool = True, default_proxies: dict = None, proxy_headers: dict = None) None
Apply the arguments parsed by the argument parser defined by _setup_cli_ckan_parser
- Parameters:
args
base_dir – base directory to find the CKAN API key file, if a relative path is provided (recommended: leave None to use cwd)
error_not_found – option to raise an exception if the CKAN API key file is not found
default_proxies – proxies used if proxies=”default”
proxy_headers – headers used to access the proxies, generally for authentication
- Returns:
- static _setup_cli_ckan_parser__params(parser: ArgumentParser = None) ArgumentParser
Define or add CLI arguments to initialize a CKAN API connection parser help message:
CKAN API connection parameters initialization
- Parameters:
parser – option to provide an existing parser to add the specific fields needed to initialize a CKAN API connection
- Returns:
- ckan_headers: dict
- copy(*, dest=None)
- http_headers: dict
- property proxies: dict
- property proxy_auth: AuthBase | Tuple[str, str]
- property proxy_string: str
- user_agent: str | None
Module contents
Package with helper functions for CKAN requests using pandas DataFrames.