Preset loader

class utils.helpers.loader.LoaderBase(name: str, path: str, target: list[int], header: None | int | list[int] = None, sep: str = ',', drop_columns: None | list[int] = None, drop_rows: None | list[int] = None)

Bases: object

__init__(name: str, path: str, target: list[int], header: None | int | list[int] = None, sep: str = ',', drop_columns: None | list[int] = None, drop_rows: None | list[int] = None)

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:
  • name (str) – Name of the instance.

  • path (str) – Path to the CSV dataset.

  • target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.

  • header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.

  • sep (str) – Separator character. Default value is “,”.

  • drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.

  • drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]

Function for running an AHFS instance on the loaded dataset.

Parameters:
  • kwargs – Overrides AHFS run parameters. See below for detailed documentation.

  • k – Number of features to select. int

  • data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int

  • target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int

  • save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool

  • save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None

  • load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None

  • is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool

  • verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

save(path: str | None = None) None

Saves the transformed preset dataset into a .csv file.

Parameters:

path (str | None) – Path to save the dataset. If None, saved in the working directory. Default value is None.