econml.policy.PolicyForest

class econml.policy.PolicyForest(n_estimators=100, *, criterion='neg_welfare', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, max_features='auto', min_impurity_decrease=0.0, max_samples=0.5, min_balancedness_tol=0.45, honest=True, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

基类: econml._ensemble._ensemble.BaseEnsemble

福利最大化策略森林。训练一个森林以最大化目标函数：\(1/n \sum_i \sum_j a_j(X_i) * y_{ij}\)，其中 \(a(X)\) 被约束为只在一个坐标上取值为 1，其他坐标取值为零。这对应于一个策略优化问题。

参数

n_estimators (int, 默认 100) – 森林中的总树木数量。森林由 sqrt(n_estimators) 个子森林组成，每个子森林包含 sqrt(n_estimators) 棵树。
criterion ({'neg_welfare'}, 默认 ‘neg_welfare’) – 准则类型
max_depth (int, 默认 None) – 树的最大深度。如果为 None，则节点会一直扩展直到所有叶子节点都是纯净的，或者直到所有叶子节点包含少于 min_samples_split 个样本。
min_samples_split (int 或 float, 默认 10) – 分割内部节点所需的最小样本数
- 如果为 int，则将 min_samples_split 作为最小数量。
- 如果为 float，则 min_samples_split 是一个比例，每个分割所需的最小样本数是 ceil(min_samples_split * n_samples)。
min_samples_leaf (int 或 float, 默认 5) – 叶子节点所需的最小样本数。只有当分割点在左右分支中至少留下 min_samples_leaf 个训练样本时，才会考虑任何深度的分割点。这可能有助于平滑模型，尤其是在回归中。
- 如果为 int，则将 min_samples_leaf 作为最小数量。
- 如果为 float，则 min_samples_leaf 是一个比例，每个节点所需的最小样本数是 ceil(min_samples_leaf * n_samples)。
min_weight_fraction_leaf (float, 默认 0.0) – 叶子节点所需的总权重（所有输入样本）的最小加权比例。当未提供 sample_weight 时，样本权重相等。
max_features (int, float, {“auto”, “sqrt”, “log2”}, 或 None, 默认 None) – 寻找最佳分割时考虑的特征数量
- 如果为 int，则在每个分割处考虑 max_features 个特征。
- 如果为 float，则 max_features 是一个比例，在每个分割处考虑 int(max_features * n_features) 个特征。
- 如果为 “auto”，则 max_features=n_features。
- 如果为 “sqrt”，则 max_features=sqrt(n_features)。
- 如果为 “log2”，则 max_features=log2(n_features)。
- 如果为 None，则 max_features=n_features。
注意：搜索分割不会停止，直到找到至少一个有效的节点样本分区，即使这实际上需要检查超过 max_features 个特征。
min_impurity_decrease (float, 默认 0.0) – 如果分割引起的杂质减少大于或等于此值，则该节点将被分割。加权杂质减少公式如下：
```
N_t / N * (impurity - N_t_R / N_t * right_impurity
                    - N_t_L / N_t * left_impurity)
```
其中 N 是样本总数，N_t 是当前节点的样本数，N_t_L 是左子节点的样本数，N_t_R 是右子节点的样本数。N、N_t、N_t_R 和 N_t_L 都指的是加权和，如果传入了 sample_weight。
max_samples (int 或 float in (0, 1], 默认 .5) – 用于训练每棵树的子样本的样本数量
- 如果为 int，则使用从所有样本中不放回抽取的 max_samples 个样本训练每棵树
- 如果为 float，则使用从所有样本中不放回抽取的 ceil(max_samples * n_samples) 个样本训练每棵树。
min_balancedness_tol (float in [0, .5], 默认 .45) – 我们可以容忍的分割失衡程度。这要求每个分割在分割的每一侧至少留下 (.5 - min_balancedness_tol) 比例的样本；或者当 sample_weight 不为 None 时，是样本总权重的比例。默认值确保父节点权重的至少 5% 落在分割的每一侧。将其设置为 0.0 表示没有平衡性要求，设置为 .5 表示完全平衡的分割。为了使形式推断理论有效，这必须是任何远离零的正常数。
honest (bool, 默认 True) – 数据是否应该分成大小相等的两个样本，其中一半样本用于确定每个节点的最佳分割，另一半样本用于确定每个节点的值。
n_jobs (int 或 None, 默认 -1) – fit 和 predict 并行运行的作业数量。None 表示 1，除非在 joblib.parallel_backend() 上下文中使用。-1 表示使用所有处理器。有关更多详细信息，请参见术语表。
verbose (int, 默认 0) – 控制拟合和预测时的详细程度。
random_state (int, RandomState 实例, 或 None, 默认 None) – 如果是 int，random_state 是随机数生成器使用的种子；如果是 RandomState 实例，random_state 是随机数生成器；如果是 None，则随机数生成器是 RandomState 实例使用的 np.random。
warm_start (bool, 默认 False) – 当设置为 True 时，重用上一次调用 fit 的解决方案，并向集合中添加更多估计器，否则，重新拟合整个新森林。

feature_importances_

基于它们产生的参数异质性程度的特征重要性。值越高，特征越重要。

类型: 形状为 (n_features,) 的 ndarray

__init__(n_estimators=100, *, criterion='neg_welfare', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, max_features='auto', min_impurity_decrease=0.0, max_samples=0.5, min_balancedness_tol=0.45, honest=True, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

方法

`__init__`([n_estimators, criterion, ...])
`apply`(X)	将森林中的树应用于 X，返回叶子索引。
`decision_path`(X)	返回森林中的决策路径。
`feature_importances`([max_depth, ...])	基于它们产生的参数异质性程度的特征重要性。
`fit`(X, y, *[, sample_weight])	根据训练集 (X, y) 和任何其他辅助变量构建树森林。
`get_params`([deep])	获取此估计器的参数。
`get_subsample_inds`()	使用相同的伪随机性，重新生成拟合时使用的相同样本索引。
`predict`(X)	预测每个样本的最佳处理。
`predict_proba`(X)	预测推荐每种处理的概率。
`predict_value`(X)	预测每个样本组的每个处理的预期值。
`set_params`(**params)	设置此估计器的参数。

属性

feature_importances_

apply(X)[source]

将森林中的树应用于 X，返回叶子索引。

参数: X (形状为 (n_samples, n_features) 的 array_like) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: X_leaves – 对于 X 中的每个数据点 x 和森林中的每棵树，返回 x 最终所在的叶子的索引。
返回类型: 形状为 (n_samples, n_estimators) 的 ndarray

decision_path(X)[source]

返回森林中的决策路径。

参数

X (形状为 (n_samples, n_features) 的 array_like) – 输入样本。在内部，它将被转换为 dtype=np.float64。

返回

indicator (形状为 (n_samples, n_nodes) 的稀疏矩阵) – 返回一个节点指示矩阵，其中非零元素表示样本通过该节点。矩阵为 CSR 格式。
n_nodes_ptr (形状为 (n_estimators + 1,) 的 ndarray) – 来自 indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] 的列给出第 i 个估计器的指示值。

feature_importances(max_depth=4, depth_decay_exponent=2.0)[source]

基于它们产生的参数异质性程度的特征重要性。值越高，特征越重要。

参数

max_depth (int, 默认 4) – 深度大于 max_depth 的分割在此计算中不使用
depth_decay_exponent (double, 默认 2.0) – 每个分割对总分数的贡献按 1 / (1 + depth)**2.0 进行加权。

返回

feature_importances_ – 每种特征导致的总参数异质性的标准化重要性。

返回类型

形状为 (n_features,) 的 ndarray

fit(X, y, *, sample_weight=None, **kwargs)[source]

根据训练集 (X, y) 和任何其他辅助变量构建树森林。

参数

X (形状为 (n_samples, n_features) 的 array_like) – 训练输入样本。在内部，其 dtype 将被转换为 dtype=np.float64。
y (形状为 (n_samples,) 或 (n_samples, n_treatments) 的 array_like) – 每个样本和每个处理的结果值。
sample_weight (形状为 (n_samples,) 的 array_like, 默认 None) – 样本权重。如果为 None，则样本权重相等。在每个节点搜索分割时，忽略会创建净零或负权重子节点的分割。
**kwargs (形状为 (n_samples, d_var) 的 array_like 字典项) – 辅助随机变量。

返回

self

返回类型

object

get_params(deep=True)

获取此估计器的参数。

参数: deep (bool, 默认=True) – 如果为 True，将返回此估计器和包含的作为估计器的子对象的参数。
返回: params – 参数名称映射到其值。
返回类型: dict

get_subsample_inds()[source]: 使用相同的伪随机性，重新生成拟合时使用的相同样本索引。

predict(X)[source]

预测每个样本的最佳处理。

参数: X (形状为 (n_samples, n_features) 的 {array_like}) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: treatment – 推荐的处理，即对于每个样本，预测具有最高奖励的处理索引。推荐的处理是从集合中的每棵树聚合而来的，并返回获得最多投票的处理。使用 predict_proba 获取集合中每棵树为每个样本推荐每种处理的比例。
返回类型: 形状为 (n_samples) 的 array_like

predict_proba(X)[source]

预测推荐每种处理的概率。

参数

X (形状为 (n_samples, n_features) 的 {array_like}) – 输入样本。在内部，它将被转换为 dtype=np.float64。
check_input (bool, 默认 True) – 允许绕过一些输入检查。除非您知道自己在做什么，否则请勿使用此参数。

返回

treatment_proba – 每种处理推荐的概率。

返回类型

形状为 (n_samples, n_treatments) 的 array_like

predict_value(X)[source]

预测每个样本组的每个处理的预期值。

参数: X (形状为 (n_samples, n_features) 的 {array_like}) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: welfare – 对于每个样本组（由树定义），每种处理的条件平均福利。
返回类型: 形状为 (n_samples, n_treatments) 的 array_like

set_params(**params)

设置此估计器的参数。

此方法适用于简单估计器以及嵌套对象（例如 Pipeline）。后者具有形式为 <component>__<parameter> 的参数，以便可以更新嵌套对象的每个组件。

参数: **params (dict) – 估计器参数。
返回: self – 估计器实例。
返回类型: 估计器实例