mobts.utils package

Submodules

mobts.utils.formatting module

Initial formatting of the dataset

This module contains: - standardizing input names - removing rows with missing counter names - removing counters which do not have sufficient present observations - transforming the timestamp column to datetime type - adding further temporal columns

mobts.utils.formatting._add_temporal_columns(df: DataFrame, freq: str, cols: ColumnsConfig = ColumnsConfig(counter='counter', timestamp='timestamp', count='count', weekday='weekday', week_num='week_num', how='how', hour='hour', date='date')) DataFrame

Adding further temporal columns

  • df: Dataset

  • cols: Column config

  • Dataframe with additional date, weekday, and week number columns for daily frequency

  • All above, plus hour and hour of week (how) columns for hourly frequency

mobts.utils.formatting._drop_nan_counters(df: DataFrame, cols: ColumnsConfig = ColumnsConfig(counter='counter', timestamp='timestamp', count='count', weekday='weekday', week_num='week_num', how='how', hour='hour', date='date')) DataFrame

Remove rows with missing counter names

  • df: Dataset

  • cols: Column config

  • Dataframe with no missing counter name

  • Number of rows with missing counter names

mobts.utils.formatting._drop_sparse_counters(df: DataFrame, cols: ColumnsConfig = ColumnsConfig(counter='counter', timestamp='timestamp', count='count', weekday='weekday', week_num='week_num', how='how', hour='hour', date='date'), cfg_spr: SparsityConfig = SparsityConfig(drop_sparse_counters=True, sparse_threshold=0.5)) DataFrame

Remove counters with sparse observations

  • df: dataset

  • cols: column config

  • cfg_spr: sparsity config

  • Dataframe with no sparse counters

mobts.utils.formatting._format_datetime(df: DataFrame, cols: ColumnsConfig = ColumnsConfig(counter='counter', timestamp='timestamp', count='count', weekday='weekday', week_num='week_num', how='how', hour='hour', date='date')) DataFrame

Transform the timestamp column to datetime type

  • df: Dataset

  • cols: Column config

  • Dataframe with timestamp column as datetime datetype

mobts.utils.formatting._standardize_input(df: DataFrame, counter_col: str, timestamp_col: str, count_col: str, out_cols: ColumnsConfig = ColumnsConfig(counter='counter', timestamp='timestamp', count='count', weekday='weekday', week_num='week_num', how='how', hour='hour', date='date')) DataFrame

Standardize raw input to canonical column names

  • df: Dataset

  • counter_col: Column for counter

  • timestamp_col: Column for timestamp

  • count_col: Column for counts

  • out_cols: Column config

  • Dataframe with standardized canonical names

  • This functions is fed externally when it is run. The three column names should be given by user.

Module contents