sklearn: pos_label=1 is not a valid label: It should be one of [‘1‘ ‘2‘]
HPOBench (0.0.10) + scikit-learn (1.5.2)
HPOBench set metrics with default:
# hpobench/dependencies/ml/ml_benchmark_template.py
metrics = dict(
acc=accuracy_score,
bal_acc=balanced_accuracy_score,
f1=f1_score,
precision=precision_score,
)
metrics_kwargs = dict(
acc=dict(),
bal_acc=dict(),
f1=dict(average="macro", zero_division=0),
precision=dict(average="macro", zero_division=0),
)
# set metrics with defalut param
self.scorers = dict()
for k, v in metrics.items():
self.scorers[k] = make_scorer(v, **metrics_kwargs[k])
as for f1_score and precision_score, pos label is set to1 as defalut:
# sklearn\metrics\_classification.py
def f1_score(
y_true,
y_pred,
*,
labels=None,
pos_label=1, # look here
average="binary",
sample_weight=None,
zero_division="warn",
)
...
def precision_score(
y_true,
y_pred,
*,
labels=None,
pos_label=1, # look here
average="binary",
sample_weight=None,
zero_division="warn",
):
...
在实验中,验证(validation)步骤,计算y_pred 时(此时不涉及具体指标的计算):
when call _check_set_wise_labels, raise ValueError.
Note that present_labels = [‘1’,‘2’] determined by dataset.
# sklearn/metrics/_scorer.py:372
def _score(self, method_caller, estimator, X, y_true, **kwargs):
...
y_pred = method_caller(
estimator, response_method.__name__, X, pos_label=pos_label
)
...
# sklearn/metrics/_scorer.py:89
def _cached_call(cache, estimator, response_method, *args, **kwargs):
...
result, _ = _get_response_values(
estimator, *args, response_method=response_method, **kwargs
)
# sklearn/utils/_response.py:113
def _get_response_values(
estimator,
X,
response_method,
pos_label=None,
return_response_method_used=False,
):
...
if is_classifier(estimator):
prediction_method = _check_response_method(estimator, response_method)
classes = estimator.classes_
target_type = type_of_target(classes)
if target_type in ("binary", "multiclass"):
if pos_label is not None and pos_label not in classes.tolist():
raise ValueError(
f"pos_label={pos_label} is not a valid label: It should be "
f"one of {classes}"
)
elif pos_label is None and target_type == "binary":
pos_label = classes[-1]
y_pred = prediction_method(X)
理论上,f1_score / precision_score 的参数说明中已经写清楚:
pos_label : int, float, bool or str, default=1
The class to report if `average='binary'` and the data is binary,
otherwise this parameter is ignored.
在设置average = “macro”时,pos_label 本应该被忽略,然而_get_response_values方法并没有很好地处理这一点。
Quickly skip value error:
metrics_kwargs = dict(
acc=dict(),
bal_acc=dict(),
f1=dict(average="macro", zero_division=0, pos_label=None),
precision=dict(average="macro", zero_division=0, pos_label=None),
)
原文地址:https://blog.csdn.net/yuanyang5917/article/details/143874855
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!