自学内容网 自学内容网

sklearn: pos_label=1 is not a valid label: It should be one of [‘1‘ ‘2‘]

HPOBench (0.0.10) + scikit-learn (1.5.2)

HPOBench set metrics with default:

# hpobench/dependencies/ml/ml_benchmark_template.py
metrics = dict(
    acc=accuracy_score,
    bal_acc=balanced_accuracy_score,
    f1=f1_score,
    precision=precision_score,
)

metrics_kwargs = dict(
    acc=dict(),
    bal_acc=dict(),
    f1=dict(average="macro", zero_division=0),
    precision=dict(average="macro", zero_division=0),
)
# set metrics with defalut param
self.scorers = dict()
for k, v in metrics.items():
   self.scorers[k] = make_scorer(v, **metrics_kwargs[k])

as for f1_score and precision_score, pos label is set to1 as defalut:

# sklearn\metrics\_classification.py
def f1_score(
    y_true,
    y_pred,
    *,
    labels=None,
    pos_label=1, # look here
    average="binary",
    sample_weight=None,
    zero_division="warn",
)
...
def precision_score(
    y_true,
    y_pred,
    *,
    labels=None,
    pos_label=1, # look here
    average="binary",
    sample_weight=None,
    zero_division="warn",
):
...

在实验中,验证(validation)步骤,计算y_pred 时(此时不涉及具体指标的计算):

when call _check_set_wise_labels, raise ValueError.
Note that present_labels = [‘1’,‘2’] determined by dataset.

# sklearn/metrics/_scorer.py:372
def _score(self, method_caller, estimator, X, y_true, **kwargs):
...
y_pred = method_caller(
estimator, response_method.__name__, X, pos_label=pos_label
)
...

# sklearn/metrics/_scorer.py:89
def _cached_call(cache, estimator, response_method, *args, **kwargs):
...
    result, _ = _get_response_values(
        estimator, *args, response_method=response_method, **kwargs
    )
    
# sklearn/utils/_response.py:113
def _get_response_values(
    estimator,
    X,
    response_method,
    pos_label=None,
    return_response_method_used=False,
):
...
    if is_classifier(estimator):
        prediction_method = _check_response_method(estimator, response_method)
        classes = estimator.classes_
        target_type = type_of_target(classes)

        if target_type in ("binary", "multiclass"):
            if pos_label is not None and pos_label not in classes.tolist():
                raise ValueError(
                    f"pos_label={pos_label} is not a valid label: It should be "
                    f"one of {classes}"
                )
            elif pos_label is None and target_type == "binary":
                pos_label = classes[-1]

        y_pred = prediction_method(X)

理论上,f1_score / precision_score 的参数说明中已经写清楚:

pos_label : int, float, bool or str, default=1
    The class to report if `average='binary'` and the data is binary,
    otherwise this parameter is ignored.

在设置average = “macro”时,pos_label 本应该被忽略,然而_get_response_values方法并没有很好地处理这一点。

Quickly skip value error:

metrics_kwargs = dict(
    acc=dict(),
    bal_acc=dict(),
    f1=dict(average="macro", zero_division=0, pos_label=None),
    precision=dict(average="macro", zero_division=0, pos_label=None),
)

原文地址:https://blog.csdn.net/yuanyang5917/article/details/143874855

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!