不使用递归的决策树生成算法
不使用递归的决策树生成算法
利用队列 queue ,实现层次遍历(广度优先遍历),逐步处理每个节点来建立子树结构。再构建一个辅助队列,将每个节点存储到 nodes_to_process 列表中,以便在树生成完成后可以反向遍历计算每个节点的 leaf_num(叶子节点数量)。对于每个节点,根据特征选择和树的条件构建子节点;如果达到叶节点条件,直接将其标记为叶节点。最后,逆序处理计算每个结点的叶节点数量:通过逆序遍历 nodes_to_process 列表(即从叶节点到根节点),每次更新父节点的 leaf_num 为其所有子节点 leaf_num 的总和。
不使用递归的建树算法的实现思路
代码实现
def generate_tree(self, X, y):
root = Node()
root.high = 0 # 根节点的高度为0
queue = deque([(root, X, y)]) # 使用队列来存储节点和数据
nodes_to_process = [] # 记录所有节点以便后续计算 leaf_num
while queue:
current_node, current_X, current_y = queue.popleft()
nodes_to_process.append(current_node)
# 叶节点条件:达到最大深度或只有单一类别或没有特征
if current_node.high >= self.MaxDepth or current_y.nunique() == 1 or current_X.empty:
current_node.is_leaf = True
current_node.leaf_class = current_y.mode()[0]
current_node.leaf_num = 1 # 是叶子节点,叶子数量为 1
continue
# 选择最佳划分特征
best_feature_name, best_impurity = self.choose_best_feature_to_split(current_X, current_y)
current_node.feature_name = best_feature_name
current_node.impurity = best_impurity[0]
current_node.feature_index = self.columns.index(best_feature_name)
feature_values = current_X[best_feature_name]
if len(best_impurity) == 1: # 离散值特征
current_node.is_continuous = False
unique_vals = feature_values.unique()
sub_X = current_X.drop(best_feature_name, axis=1)
for value in unique_vals:
child_node = Node()
child_node.high = current_node.high + 1
queue.append((child_node, sub_X[feature_values == value], current_y[feature_values == value]))
current_node.subtree[value] = child_node
elif len(best_impurity) == 2: # 连续值特征
current_node.is_continuous = True
current_node.split_value = best_impurity[1]
up_part = '>= {:.3f}'.format(current_node.split_value)
down_part = '< {:.3f}'.format(current_node.split_value)
child_node_up = Node()
child_node_down = Node()
child_node_up.high = current_node.high + 1
child_node_down.high = current_node.high + 1
queue.append((child_node_up, current_X[feature_values >= current_node.split_value],
current_y[feature_values >= current_node.split_value]))
queue.append((child_node_down, current_X[feature_values < current_node.split_value],
current_y[feature_values < current_node.split_value]))
current_node.subtree[up_part] = child_node_up
current_node.subtree[down_part] = child_node_down
# 逆序遍历 nodes_to_process,计算每个节点的 leaf_num
while nodes_to_process:
node = nodes_to_process.pop()
if node.is_leaf:
node.leaf_num = 1
else:
node.leaf_num = sum(child.leaf_num for child in node.subtree.values())
return root
原文地址:https://blog.csdn.net/Fuxiao___/article/details/143656159
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!