Preset configurations¶

class utils.presets.AbaloneDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.BankDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.CalculatedCuttingFcDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.CalculatedCuttingPDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.CalculatedCuttingRaDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.CalculatedCuttingTDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.CarDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.CommCrimeDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.ForestFiresDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.HousingDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.IrisDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.MNISTTrain¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.MeasuredCuttingFcDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.MeasuredCuttingPDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.MeasuredCuttingRaDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.MeasuredCuttingTDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.MitbihTest¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.ParkinsonsTelemonitoringMotorDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.ParkinsonsTelemonitoringTotalDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.ReplicatedParkinsonDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.SonarDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.SuperConductDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.WineDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.WineQualityRedDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.WisconsinBreastCancerDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

class utils.presets.YearPredDataset¶

Bases: LoaderBase

__init__()¶

Base class for dataset processing and AHFS running. Only CSV files are supported.

Parameters:

name (str) – Name of the instance.
path (str) – Path to the CSV dataset.
target (list[int]) – List of indexes that define target column(s). A list size greater than 1 implies one-hot encoding.
header (None | int | list[int]) – List of indexes that define header row(s). If None, no header is extracted from the data. Default value is None.
sep (str) – Separator character. Default value is “,”.
drop_columns (None | list[int]) – List of indexes that define which column(s) to remove. If None, no removal is done. Default value is None.
drop_rows (None | list[int]) – List of indexes that define which row(s) to remove. If None, no removal is done. Default value is None.

run(**kwargs) → tuple[list[int], list[float], list[float], dict[slice(<class 'str'>, <class 'int'>, None)]] | tuple[~numpy.ndarray, ~numpy.ndarray]¶

Function for running an AHFS instance on the loaded dataset.

Parameters:

kwargs – Overrides AHFS run parameters. See below for detailed documentation.
k – Number of features to select. int
data_bin – How many bins to discretize the dataset into, excluding the target variable. If 0, no discretization is performed. Default value is 5. int
target_bin – How many bins to discretize the target into. Effectively sets the number of classes. If 0, no discretization is performed. Default value is 2. int
save_precomp – Whether to save all basic measures computed during the precomputing phase in a binary .npy file. Path is set by save_precomp_path. Default value is True. bool
save_precomp_path – If save_precomp is True, sets the path for saving the precomputed basic measures. If this variable’s value is None, saves into the current directory with filename format “measures_{time.time_ns()}.npy”. Default value is None. str | None
load_precomp_path – Path to load the precomputed basic measures from, skipping the precomputing phase. File must be a binary numpy file. If None, precomputing is not skipped. Default value is None. str | None
is_in_pipeline – Set to True if the algorithm is intended to be part of a scikit-learn pipeline. Changes the return value of transform() to X and y. Default value is False. bool
verbose – Controls global verbosity, including feature selection measures. 0: no output, 1: feature selection measure execution time, selected feature and metric, and basic algorithm steps are printed, 2: all steps are printed. Default value is 1. int

Returns:

A list of the selected features, loss of the selected feature set, accuracy of the selected feature set, features selected per iteration.

Preset configurations¶

Adaptive, Hybrid Feature Selection

Navigation

Related Topics