AHFS class

class ahfs_class.ahfs.AHFS(k: int, data_bin: int = 5, target_bin: int = 2, save_precomp: bool = True, save_precomp_path: str | None = None, load_precomp_path: str | None = None, is_in_pipeline: bool = False, verbose: int = 1)
__init__(k: int, data_bin: int = 5, target_bin: int = 2, save_precomp: bool = True, save_precomp_path: str | None = None, load_precomp_path: str | None = None, is_in_pipeline: bool = False, verbose: int = 1)

Implements the Adaptive Hybrid Feature Selection (AHFS) algorithm by Viharos et al. https://doi.org/10.1016/j.patcog.2021.107932

Parameters:
  • k (int) – Number of features to select.

  • data_bin (int) – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5.

  • target_bin (int) – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2.

  • save_precomp (bool) – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True.

  • save_precomp_path (str | None) – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None.

  • load_precomp_path (str | None) – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None.

  • is_in_pipeline (bool) – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False.

  • verbose (int) – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1.

entropies_wrapper(index: tuple[int, int] | tuple[int, int, int]) list[float, tuple]
evaluate() tuple[int, float | floating, float | floating]

Evaluates the selected feature set using a specific evaluator.

Returns:

The newly selected feature, loss, accuracy.

Return type:

tuple[int, float, float]

fit(X: ndarray, y: ndarray) None

Placeholder function to ensure compatibility with scikit-learn interface.

nn_one_fold(candidate: int, train_index: ~numpy.ndarray, test_index: ~numpy.ndarray, nn_layers: list[[<class 'int'>, typing.Optional[typing.Callable[[float], float]]]]) tuple[int, float, float]

One fold of an evaluation. Used for parallel execution of the evaluation phase if CPU is used.

Parameters:
  • candidate (int) – Candidate feature index.

  • train_index (np.ndarray) – Row index of train samples.

  • test_index (np.ndarray) – Row index of test samples.

  • nn_layers (list[[int, Callable[[float], float]|None], ]) – Layers of the neural network.

Returns:

Candidate index, loss and accuracy associated with the candidate.

Return type:

tuple[int, float, float]

transform(X: ndarray, y: ndarray) tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)], list[float], float] | tuple[~numpy.ndarray, ~numpy.ndarray]

Applies the Adaptive Hybrid Feature Selection algorithm on the dataset.

Parameters:
  • X (np.ndarray) – Numpy array holding the data.

  • y (np.ndarray) – Target vector.

Returns:

If is_in_pipeline was set to False: A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration. If is_in_pipeline was set to True: The original dataset containing only the selected features, target variable.