ckanapi_harvesters.ckan_api.deprecated package

Submodules

ckanapi_harvesters.ckan_api.deprecated.ckan_api_deprecated module

class ckanapi_harvesters.ckan_api.deprecated.ckan_api_deprecated.CkanApiDeprecated(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

Bases: CkanApiManage

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames. This class implements API calls which are not recommended to use.

_api_group_package_show(group_name: str, *, params: dict = None, owner_org: str = None, include_private: bool = True, include_drafts: bool = False, sort: str = None, limit_per_request: int = None, offset: int = None) → List[CkanPackageInfo]: __Not recommended__ API call to group_package_show. Return the datasets (packages) of a group. :param group_name: group name or id :param owner_org: ability to filter packages by owner_org :param include_private: if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API :param include_drafts: if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False. :param sort: sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings. :param limit_per_request: maximum number of results to return. Translatees to the API rows argument. :param offset: the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument. :param params: other parameters to pass to package_search :return:

_api_group_package_show_all(group_name: str, *, params: dict = None, owner_org: str = None, include_private: bool = True, include_drafts: bool = False, sort: str = None, limit_per_request: int = None, offset: int = None) → List[CkanPackageInfo]: __Not recommended__ API call to group_package_show until an empty list is received. :see: _api_group_package_show() :param group_name: group name or id :param owner_org: ability to filter packages by owner_org :param include_private: if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API :param include_drafts: if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False. :param sort: sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings. :param limit_per_request: maximum number of results to return. Translatees to the API rows argument. :param offset: the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument. :param params: other parameters to pass to API :return:

_api_package_list(*, params: dict = None, owner_org: str = None, limit_per_request: int = None, offset: int = None) → List[str]: __Not recommended__ API call to package_list. :param params: typically, the request can be limited to an organization with the owner_org parameter :return:

_api_package_list_all(*, params: dict = None, owner_org: str = None, limit_per_request: int = None, offset: int = None) → List[str]: __Not recommended__ API call to package_list until an empty list is received. :see: api_package_list() :param params: :return:

_api_resource_create_default_resource_views(resource_id: str, *, create_datastore_views: bool = None, params: dict = None) → List[CkanViewInfo]: API call to resource_create_default_resource_views :param resource_id: resource id :param create_datastore_views: whether to create views that rely on data being on the DataStore (optional, API defaults to False) :param params: :return:

_api_resource_search(query: str = None, *, order_by: str = None, limit_per_request: int = None, offset: int = None, resource_name: str = None, datastore_info: bool = None, resource_view_list: bool = None, params: dict = None) → List[CkanResourceInfo]: __Not recommended__ API call to resource_search. It is more recommended to use the package_show API because it is not possible to filter the resources by package name here. Moreover, it does not return information on private resources. :see: map_resources() :param query: (string or list of strings of the form {field}:{term1}) – The search criteria. See above for description. :param order_by: A field on the Resource model that orders the results. :param limit_per_request: :param offset: :param resource_name: a shortcut to add the filter “name:{resource_name}” :param datastore_info: an option to query the datastore info for all the resources found. If not provided, the last value for this option used with map_resources will be used. :param resource_view_list: an option to query the resource views list for all the resources found. If not provided, the last value for this option used with map_resources will be used. :param params: additional parameters to pass to resource_search :return:

_api_resource_search_all(query: str = None, *, order_by: str = None, limit_per_request: int = None, offset: int = None, resource_name: str = None, datastore_info: bool = None, resource_view_list: bool = None, params: dict = None) → List[CkanResourceInfo]: __Not recommended__ API call to resource_search until an empty list is received. It is more recommended to use the package_show API because it is not possible to filter the resources by package name here. Moreover, it does not return information on private resources. :see: map_resources() :see: _api_resource_search() :param query: (string or list of strings of the form {field}:{term1}) – The search criteria. See above for description. :param order_by: A field on the Resource model that orders the results. :param limit_per_request: maximum number of results to return. :param offset: the offset in the complete result for where the set of returned datasets should begin. :param resource_name: a shortcut to add the filter “name:{resource_name}” :param datastore_info: an option to query the datastore info for all the resources found. If not provided, the last value for this option used with map_resources will be used. :param resource_view_list: an option to query the resource views list for all the resources found. If not provided, the last value for this option used with map_resources will be used. :param params: additional parameters to pass to resource_search :return:

copy(new_identifier: str = None, *, dest=None): Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

datastore_dump(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit_per_request: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, return_df: bool = True, limit: int = None) → DataFrame | ListRecords | Any | List[CkanActionResponse]

Alias of datastore_search with search_all=True by default. Uses the API datastore_search

See:

datastore_search()

Parameters:

resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit_per_request – Limit the number of records per request
offset – Offset in the returned records
total_limit – Strictly limit the number of records to return, counting from the initial offset
requests_limit – Limit the number of requests
progress_callback – Progress callback function
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)
return_df – Return pandas DataFrame (True) or dict (False)

Returns:

datastore_dump_page_generator(resource_id: str, *, filters: dict = None, q: str = None, fields: List[str] = None, distinct: bool = None, sort: str = None, limit_per_request: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, search_all: bool = True, search_method: bool = True, format: str = None, bom: bool = None, return_df: bool = True, limit: int = None) → Generator[DataFrame, Any, None] | Generator[CkanActionResponse, Any, None]

Function alias to datastore_search_generator with search_all=True by default. Uses the API datastore_search

See:

datastore_search_generator

Parameters:

resource_id – resource id.
filters – The base argument to filter values in a table (optional)
q – Full text query (optional)
fields – The base argument to filter columns (optional)
distinct – return only distinct rows (optional, default: false) e.g. to return distinct ids: fields=”id”, distinct=True
sort – Argument to sort results e.g. sort=”index, quantity desc” or sort=”index asc”
limit_per_request – Limit the number of records per request
total_limit – Strictly limit the number of records to return, counting from the initial offset
requests_limit – Limit the number of requests
progress_callback – Progress callback function
offset – Offset in the returned records
params – Additional parameters such as filters, q, sort and fields can be given. See DataStore API documentation.
search_all – Option to renew the request until there are no more records.
search_method – API method selection (True=datastore_search, False=datastore_dump)

Returns:

datastore_insert(records_generator: DataFrame | List[dict] | Iterable[ListRecords | DataFrame], resource_id: str, *, dry_run: bool = False, limit_per_request: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, force: bool = None, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, records: DataFrame | List[dict] = None, return_documents: bool = True, return_counters: bool = False, limit: int = None) → DataFrame | List[dict] | Tuple[DataFrame | List[dict], LinesRequestCounter] | LinesRequestCounter | None

Alias function to insert data in a DataStore using datastore_upsert.

See:

datastore_upsert()

Parameters:

records – generator of records, e.g. chunks from a CSV file generated with pandas.read_csv(.., chunksize=1000)
resource_id – destination resource id
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
limit_per_request – number of records per transaction
offset – number of records to skip - use to restart the transfer
total_limit – maximum number of lines to transmit, counting from the initial offset
requests_limit – maximum number of requests
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).
always_last_condition – if True, each request applies the last insert operations - default is False
return_df – if True, return a pandas DataFrame or else, a list of dictionaries.
data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.
progress_callback – progress callback function

Returns:

the inserted records as a pandas DataFrame, from the server response

datastore_search_sql_row_count(sql: str, *, params: dict = None) → int

datastore_update(records_generator: DataFrame | List[dict] | Iterable[ListRecords | DataFrame], resource_id: str, *, dry_run: bool = False, limit_per_request: int = None, offset: int = 0, total_limit: int = None, requests_limit: int = None, force: bool = None, apply_last_condition: bool = True, always_last_condition: bool = None, return_df: bool = None, data_cleaner: CkanDataCleanerABC = None, progress_callback: CkanProgressCallbackABC = None, params: dict = None, records: DataFrame | List[dict] = None, return_documents: bool = True, return_counters: bool = False, limit: int = None) → DataFrame | List[dict] | Tuple[DataFrame | List[dict], LinesRequestCounter] | LinesRequestCounter | None

Alias function to update data in a DataStore using datastore_upsert. The update is performed based on the DataStore primary keys

See:

datastore_upsert()

Parameters:

records – generator of records, e.g. chunks from a CSV file generated with pandas.read_csv(.., chunksize=1000)
resource_id – destination resource id
force – set to True to edit a read-only resource. If not provided, this is overridden by self.default_force
limit_per_request – number of records per transaction
offset – number of records to skip - use to restart the transfer
total_limit – maximum number of lines to transmit, counting from the initial offset
requests_limit – maximum number of requests
params – additional parameters
dry_run – set to True to abort transaction instead of committing, e.g. to check for validation or type errors
apply_last_condition – if True, the last upsert request applies the last insert operations (calculate_record_count and force_indexing).
always_last_condition – if True, each request applies the last insert operations - default is False
return_df – if True, return a pandas DataFrame or else, a list of dictionaries.
data_cleaner – data cleaner instance. A data cleaner detects and changes invalid values before upload.
progress_callback – progress callback function

Returns:

the inserted records as a pandas DataFrame, from the server response

group_package_show_all(group_name: str, *, params: dict = None, owner_org: str = None, include_private: bool = True, include_drafts: bool = False, sort: str = None, limit_per_request: int = None, offset: int = None) → List[CkanPackageInfo]: __Not recommended__ API call to group_package_show until an empty list is received. :see: _api_group_package_show() :param group_name: group name or id :param owner_org: ability to filter packages by owner_org :param include_private: if True, private datasets will be included in the results. Only private datasets from the user’s organizations will be returned and sysadmins will be returned all private datasets. Optional, the default is False in the API :param include_drafts: if True, draft datasets will be included in the results. A user will only be returned their own draft datasets, and a sysadmin will be returned all draft datasets. Optional, the default is False. :param sort: sorting of the search results. Optional. Default: ‘score desc, metadata_modified desc’. As per the solr documentation, this is a comma-separated string of field names and sort-orderings. :param limit_per_request: maximum number of results to return. Translatees to the API rows argument. :param offset: the offset in the complete result for where the set of returned datasets should begin. Translatees to the API start argument. :param params: other parameters to pass to API :return:

package_list_all(*, params: dict = None, owner_org: str = None, limit_per_request: int = None, offset: int = None) → List[str]: __Not recommended__ API call to package_list until an empty list is received. :see: api_package_list() :param params: :return:

resource_create_default_resource_views(resource_id: str, *, create_datastore_views: bool = None, params: dict = None) → List[CkanViewInfo]: API call to resource_create_default_resource_views :param resource_id: resource id :param create_datastore_views: whether to create views that rely on data being on the DataStore (optional, API defaults to False) :param params: :return:

resource_move(resource_id: str, package_name: str, dest_package_name: str, params: dict = None)

Move resource from one dataset to another using resource_patch API. Does not work.

Parameters:

resource_id
package_name
dest_package_name
params

Returns:

resource_search_all(query: str = None, *, order_by: str = None, limit_per_request: int = None, offset: int = None, resource_name: str = None, datastore_info: bool = None, resource_view_list: bool = None, params: dict = None) → List[CkanResourceInfo]: __Not recommended__ API call to resource_search until an empty list is received. It is more recommended to use the package_show API because it is not possible to filter the resources by package name here. Moreover, it does not return information on private resources. :see: map_resources() :see: _api_resource_search() :param query: (string or list of strings of the form {field}:{term1}) – The search criteria. See above for description. :param order_by: A field on the Resource model that orders the results. :param limit_per_request: maximum number of results to return. :param offset: the offset in the complete result for where the set of returned datasets should begin. :param resource_name: a shortcut to add the filter “name:{resource_name}” :param datastore_info: an option to query the datastore info for all the resources found. If not provided, the last value for this option used with map_resources will be used. :param resource_view_list: an option to query the resource views list for all the resources found. If not provided, the last value for this option used with map_resources will be used. :param params: additional parameters to pass to resource_search :return:

ckanapi_harvesters.ckan_api.deprecated.ckan_api_deprecated_vocabularies module

class ckanapi_harvesters.ckan_api.deprecated.ckan_api_deprecated_vocabularies.CkanApiVocabulariesDeprecated(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

Bases: CkanApiDeprecated

__init__(url: str = None, *, proxies: str | dict | ProxyConfig = None, apikey: str | CkanApiKey = None, apikey_file: str = None, owner_org: str = None, params: CkanApiExtendedParams = None, map: CkanMap = None, policy: CkanPackageDataFormatPolicy = None, policy_file: str = None, data_cleaner_upload: CkanDataCleanerABC = None, identifier=None)

CKAN Database API interface to CKAN server with helper functions using pandas DataFrames.

Parameters:

url – url of the CKAN server
proxies – proxies to use for requests
ckan_headers – headers to use for requests, only to the CKAN server
http_headers – headers to use for requests, for all requests, including external requests and to the CKAN server
apikey – way to provide the API key directly (optional)
apikey_file – path to a file containing a valid API key in the first line of text (optional)
policy – data format policy to use with policy_check function
policy_file – path to a JSON file containing the data format policy to use with policy_check function
owner_org – name of the organization to limit package_search (optional)

_api_vocabulary_create(vocabulary_name: str, tags_list_dict: List[Dict[str, str]], *, params: dict = None) → CkanTagVocabularyInfo

API call to vocabulary_create.

Returns:: a

_api_vocabulary_delete(vocabulary_id: str, *, params: dict = None) → bool

API call to vocabulary_delete.

Returns:: True if success

_api_vocabulary_list(*, params: dict = None) → List[CkanTagVocabularyInfo]

API call to vocabulary_list.

Returns:: a list of vocabulary info objects

_api_vocabulary_update(vocabulary_id: str, tags_list_dict: List[Dict[str, str]], *, params: dict = None) → CkanTagVocabularyInfo

API call to vocabulary_update.

Returns:: a

copy(new_identifier: str = None, *, dest=None): Returns a copy of the current instance. Useful to use an initialized ckan object in a multithreaded context. Each thread would have its own copy. It is recommended to purge the last response before doing a copy (with purge_map=False)

initiate_vocabularies_from_policy(policy: CkanPackageDataFormatPolicy, *, remove_others: bool = False)

map_resources(package_list: str | List[str] = None, *, params: dict = None, datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, only_missing: bool = True, error_not_found: bool = True, owner_org: str = None, load_policy: bool = None, vocabulary_list: bool = None, progress_callback: CkanProgressCallbackABC = None) → CkanMap

Map the resources of a given package to obtain resource IDs associated with the package name and its resources.

Parameters:

package_list – List of packages to request. If not provided, the result of package_search is used.
params – Additional parameters to pass to all API calls (not recommended).
datastore_info – If True, enables the request of the API datastore_info to return information about DataStore fields, aliases, and row count. Required to search a DataStore by alias.
resource_view_list – If True, enables the request of the view_list API for each resource.
organization_info – If True, enables the request of the organization_list API before other requests.
license_list – If True, enables the request of the license_list API.
only_missing – If True, skips requesting already-mapped packages.
error_not_found – If True, packages not found by the API are ignored (no error is raised).
owner_org – Filters packages by a specific organization (only if package_search is used).

Returns:

A mapping of resources for the specified package(s).

Note

Packages were previously referred to as DataSets in earlier CKAN implementations.
A single name can be shared across multiple resources within a package. In such cases, the first occurrence is used as a reference, and a warning is issued.

set_default_map_mode(datastore_info: bool = None, resource_view_list: bool = None, organization_info: bool = None, license_list: bool = None, load_policy: bool = None, vocabulary_list: bool = None) → None

Set up the optional queries orchestrated by the map_resources function

Parameters:

datastore_info
resource_view_list
organization_info
license_list

Returns:

vocabularies_clear()

vocabulary_delete(vocabulary_id: str) → bool

vocabulary_list(cancel_if_present: bool = True) → List[CkanTagVocabularyInfo]

vocabulary_update(vocabulary_name: str, tags_list_dict: List[Dict[str, str]])

Module contents

Deprecated code