AHFS class¶
- class ahfs_class.ahfs.AHFS(k: int, data_bin: int = 5, target_bin: int = 2, save_precomp: bool = True, save_precomp_path: str | None = None, load_precomp_path: str | None = None, is_in_pipeline: bool = False, verbose: int = 1)¶
- __init__(k: int, data_bin: int = 5, target_bin: int = 2, save_precomp: bool = True, save_precomp_path: str | None = None, load_precomp_path: str | None = None, is_in_pipeline: bool = False, verbose: int = 1)¶
Implements the Adaptive Hybrid Feature Selection (AHFS) algorithm by Viharos et al. https://doi.org/10.1016/j.patcog.2021.107932
- Parameters:
k (int) – Number of features to select.
data_bin (int) – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5.
target_bin (int) – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2.
save_precomp (bool) – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True.
save_precomp_path (str | None) – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None.
load_precomp_path (str | None) – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None.
is_in_pipeline (bool) – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False.
verbose (int) – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1.
- entropies_wrapper(index: tuple[int, int] | tuple[int, int, int]) list[float, tuple]¶
- evaluate() tuple[int, float | floating, float | floating]¶
Evaluates the selected feature set using a specific evaluator.
- Returns:
The newly selected feature, loss, accuracy.
- Return type:
tuple[int, float, float]
- fit(X: ndarray, y: ndarray) None¶
Placeholder function to ensure compatibility with scikit-learn interface.
- nn_one_fold(candidate: int, train_index: ~numpy.ndarray, test_index: ~numpy.ndarray, nn_layers: list[[<class 'int'>, typing.Optional[typing.Callable[[float], float]]]]) tuple[int, float, float]¶
One fold of an evaluation. Used for parallel execution of the evaluation phase if CPU is used.
- Parameters:
candidate (int) – Candidate feature index.
train_index (np.ndarray) – Row index of train samples.
test_index (np.ndarray) – Row index of test samples.
nn_layers (list[[int, Callable[[float], float]|None], ]) – Layers of the neural network.
- Returns:
Candidate index, loss and accuracy associated with the candidate.
- Return type:
tuple[int, float, float]
- transform(X: ndarray, y: ndarray) tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)], list[float], float] | tuple[~numpy.ndarray, ~numpy.ndarray]¶
Applies the Adaptive Hybrid Feature Selection algorithm on the dataset.
- Parameters:
X (np.ndarray) – Numpy array holding the data.
y (np.ndarray) – Target vector.
- Returns:
If is_in_pipeline was set to False: A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration. If is_in_pipeline was set to True: The original dataset containing only the selected features, target variable.