ckanapi_harvesters.harvesters.data_cleaner package
Submodules
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_abc module
Functions to clean data before upload.
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_abc.CkanDataCleanerABC
Bases:
ABCData cleaner abstract base class.
A table is defined by a list of fields with a data type. Each row can specify the value of all/some fields. When a value is nested (dictionary or list), the functions iterate over the values of these elements with a recursive implementation. These elements are called sub-values.
- _add_field_from_path(path: str, data_type: str, new_field_name: str = None, suggest_index: bool = True, notes: str = None) None
Auxiliary method to define a new column from a nested object.
- abstractmethod _clean_final_steps(records: List[dict] | DataFrame, fields: OrderedDict[str, CkanField] | None, known_fields: OrderedDict[str, CkanField] | None) List[dict] | DataFrame
Method called at the end of clean_records
- abstractmethod _clean_subvalue(subvalue: Any, field: CkanField, path: str, level: int, *, field_data_type: str) Any
Cleaning of a subvalue. A subvalue is a value within a nested cell.
- _detect_non_standard_field(field_name: str, values: Any | Series) CkanField
Auxiliary function of create_new_field to detect field type used if the default criteria did not match any specific case.
- _detect_standard_field_bypass(field_name: str, values: Any | Series) CkanField | None
Auxiliary function of create_new_field to detect field type used to bypass the default criteria.
- _extra_checks(records: List[dict] | DataFrame, fields: OrderedDict[str, CkanField] | None) None
Method called at the end of _clean_final_steps
- _replace_non_standard_subvalue(subvalue: Any, field: CkanField, path: str, level: int, *, field_data_type: str) Any
Auxiliary function of _clean_subvalue to perform type castings/checks used if none of the default criteria were met.
- _replace_non_standard_value(value: Any, field: CkanField, *, field_data_type: str) Any
Auxiliary function of clean_value_field to perform type castings/checks used if none of the default criteria were met.
- _replace_standard_subvalue_bypass(subvalue: Any, field: CkanField, path: str, level: int, *, field_data_type: str) Tuple[Any, bool]
Auxiliary function of _clean_subvalue to perform type castings/checks used to bypass the default criteria.
- _replace_standard_value_bypass(value: Any, field: CkanField, *, field_data_type: str) Tuple[Any, bool]
Auxiliary function of clean_value_field to perform type castings/checks used to bypass the default criteria.
- apply_new_fields_request(ckan, resource_id: str)
This method performs the field patch if a new field was detected. Call before upsert.
- abstractmethod clean_records(records: List[dict] | DataFrame, known_fields: OrderedDict[str, CkanField] | None, *, inplace: bool = False) List[dict] | DataFrame
Main function to clean a list of records.
- Parameters:
records
known_fields
inplace
- Returns:
- abstractmethod clean_value_field(value: Any, field: CkanField) Any
Cleaning of a value. A value is directly the value of a cell.
- clear_all_outputs()
Some values must not be cleared for each DataFrame upload. The cleaner is stateful for certain values cleared only here.
- clear_outputs_new_dataframe()
- abstractmethod copy(dest=None)
- abstractmethod create_new_field(field_name: str, values: Any | Series) CkanField
This method adds a new field definition
- abstractmethod detect_field_types_and_subs(records: List[dict] | DataFrame, known_fields: OrderedDict[str, CkanField] = None) OrderedDict[str, str]
This function detects the initial fields and necessary field renamings
- abstractmethod static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
- merge_field_changes(fields: List[dict] = None) List[dict]
This method merges the fields argument of a datastore_create with the fields detected by the data cleaner. Fields already defined in the fields argument are not overwritten.
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_abc.DataCleanerNone
Bases:
CkanDataCleanerABCImplementation which does nothing. Placeholder to explicitly mention a data cleaner must not be used.
- clean_records(records: List[dict] | DataFrame, known_fields: OrderedDict[str, CkanField] | None, *, inplace: bool = False) List[dict] | DataFrame
Main function to clean a list of records.
- Parameters:
records
known_fields
inplace
- Returns:
- clean_value_field(value: Any, field: CkanField) Any
Cleaning of a value. A value is directly the value of a cell.
- copy(dest=None) DataCleanerNone
- create_new_field(field_name: str, values: Any | Series) CkanField
This method adds a new field definition
- detect_field_types_and_subs(records: List[dict] | DataFrame, known_fields: OrderedDict[str, CkanField] = None) OrderedDict[str, str]
This function detects the initial fields and necessary field renamings
- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_errors module
Error codes for data cleaner
- exception ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_errors.CleanError
Bases:
Exception
- exception ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_errors.CleanerRequirementError(requirement: str, data_type: str)
Bases:
RequirementError
- exception ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_errors.FormatError(data: str, data_type: str)
Bases:
Exception
- exception ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_errors.UnexpectedGeometryError(found_type: str, expected_type: str)
Bases:
Exception
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_init module
File format keyword selection
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_init.init_data_cleaner(data_cleaner_string: str | None) CkanDataCleanerABC | None
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload module
Alias
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_1_basic module
Functions to clean data before upload.
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_1_basic.CkanDataCleanerUploadBasic
Bases:
CkanDataCleanerABCData cleaner for basic data types
- clean_records(records: List[dict] | DataFrame, known_fields: OrderedDict[str, CkanField] | OrderedDict[str, dict] | List[dict | CkanField] | None, *, inplace: bool = False) List[dict] | DataFrame
Main function to clean a list of records.
- Parameters:
records
known_fields
inplace
- Returns:
- clean_value_field(value: Any, field: CkanField) Any
Cleaning of a value. A value is directly the value of a cell.
- copy(dest=None) CkanDataCleanerUploadBasic
- create_new_field(field_name: str, values: Any | Series) CkanField
This method adds a new field definition
- detect_field_types_and_subs(records: List[dict] | DataFrame, known_fields: OrderedDict[str, CkanField] = None) OrderedDict[str, CkanField]
This function detects the initial fields and necessary field renamings
- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_1_basic.CkanDataCleanerUploadDigitalColumns
Bases:
CkanDataCleanerUploadBasic- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_1_basic._pd_series_type_instance_detect(values: Series, test_type: Type)
This function checks that the test_type matches all rows which are not NaN/None/NA in a pandas Series.
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_1_basic.default_cleaner() CkanDataCleanerABC
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_2_geom module
Adding support for geometries
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_2_geom.CkanDataCleanerUploadGeom
Bases:
CkanDataCleanerUploadBasic- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_2_geom.has_invalid_coordinates(shape: None) Tuple[bool, bool]
- ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_2_geom.shapely_geometry_from_value(value: Any) None
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_3_assist module
Functions to clean data before upload.
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_3_assist.CkanDataCleanerUploadAssist
Bases:
CkanDataCleanerUploadGeomImplementation which raises an exception if a data change is recommended by the data cleaner and assists in field typing.
- clean_value_field(value: Any, field: CkanField) Any
Cleaning of a value. A value is directly the value of a cell.
- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_3_check module
Functions to clean data before upload.
- class ckanapi_harvesters.harvesters.data_cleaner.data_cleaner_upload_3_check.CkanDataCleanerUploadCheckOnly
Bases:
CkanDataCleanerUploadGeomImplementation which raises an exception if a data change is recommended by the data cleaner.
- clean_value_field(value: Any, field: CkanField) Any
Cleaning of a value. A value is directly the value of a cell.
- static get_class_keyword() str
Returns the name of the class, according to data_cleaner_dict defined in data_cleaner_init.py. This name is used to setup the data cleaner for a resource builder.
Module contents
Section of the package dedicated to the conversion of records to a CKAN-compatible format. This is linked to the data harvesters.