econml.grf._base_grf.BaseGRF

class econml.grf._base_grf.BaseGRF(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, min_var_fraction_leaf=None, min_var_leaf_on_val=False, max_features='auto', min_impurity_decrease=0.0, max_samples=0.45, min_balancedness_tol=0.45, honest=True, inference=True, fit_intercept=True, subforest_size=4, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

基类: econml._ensemble._ensemble.BaseEnsemble

用于求解以下形式线性矩方程的广义随机森林基类

E[J * theta(x) - A | X = x] = 0

其中 J 是一个 (d, d) 随机矩阵，A 是一个 (d, 1) 随机向量，theta(x) 是一个待估计的局部参数，它可能包含相关参数和冗余参数。

警告: 此类不应直接使用。请使用派生类。

__init__(n_estimators=100, *, criterion='mse', max_depth=None, min_samples_split=10, min_samples_leaf=5, min_weight_fraction_leaf=0.0, min_var_fraction_leaf=None, min_var_leaf_on_val=False, max_features='auto', min_impurity_decrease=0.0, max_samples=0.45, min_balancedness_tol=0.45, honest=True, inference=True, fit_intercept=True, subforest_size=4, n_jobs=- 1, random_state=None, verbose=0, warm_start=False)[source]

方法

`__init__`([n_estimators, criterion, ...])
`apply`(X)	将森林中的树应用于 X，返回叶子索引。
`decision_path`(X)	返回森林中的决策路径。
`feature_importances`([max_depth, ...])	基于特征所产生的参数异质性数量计算特征重要性。异质性越高，特征越重要。特征的重要性计算为特征所产生的（归一化）总异质性。对于每棵树以及每次选择了该特征进行分裂，都增加::.
`fit`(X, T, y, *[, sample_weight])	使用训练集 (X, T, y) 和任何其他辅助变量构建森林。
`get_params`([deep])	获取此估计器的参数。
`get_subsample_inds`()	使用相同的伪随机性，重新生成拟合时使用的相同样本索引示例。
`oob_predict`(Xtrain)	返回每个训练数据点相关的输出预测，仅使用那些没有用到该数据点的树。此方法不可用是估计器使用 warm_start=True 进行训练。
`predict`(X[, interval, alpha])	返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]。
`predict_alpha_and_jac`(X[, slice, parallel])	使用森林作为核权重，返回条件 Jacobian E[J \| X=x] 和条件 alpha E[A \| X=x] 的值，即。
`predict_and_var`(X)	返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，以及它们的协方差矩阵。
`predict_full`(X[, interval, alpha])	返回 X 中每个 x 拟合的局部参数，即 theta(x)。
`predict_interval`(X[, alpha])	返回 X 中每个 x 相关的拟合局部参数的置信区间，即 theta(x)[1..n_relevant_outputs]。
`predict_moment_and_var`(X, parameter[, ...])	返回每个样本的条件期望矩向量值以及每个样本给定的参数估计值。
`predict_projection`(X, projector)	返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，与投影向量 projector(x) 的内积，即::.
`predict_projection_and_var`(X, projector)	返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，与投影向量 projector(x) 的内积，即::.
`predict_projection_var`(X, projector)	返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，与投影向量 projector(x) 的内积的方差，即::.
`predict_tree_average`(X)	返回每个 X 相关的拟合局部参数的前缀，即 theta(X)[1..n_relevant_outputs]。此方法仅返回每棵树估计的参数的平均值。predict 比 pred_tree_average 更优，因为它在树之间执行更稳定的平均。
`predict_tree_average_full`(X)	返回每个 X 拟合的局部参数，即 theta(X)。此方法仅返回每棵树估计的参数的平均值。predict_full 比 pred_tree_average_full 更优，因为它在树之间执行更稳定的平均。
`predict_var`(X)	返回 X 中每个 x 相关的拟合局部参数前缀的协方差矩阵。
`prediction_stderr`(X)	返回 X 中每个 x 相关的拟合局部参数前缀的每个坐标的标准差。
`set_params`(**params)	设置此估计器的参数。

属性

feature_importances_

apply(X)[source]

将森林中的树应用于 X，返回叶子索引。

参数: X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: X_leaves – 对于 X 中的每个数据点 x 和森林中的每棵树，返回 x 最终所在的叶子索引。
返回类型: ndarray 形状为 (n_samples, n_estimators)

decision_path(X)[source]

返回森林中的决策路径。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。

返回

indicator (稀疏矩阵，形状为 (n_samples, n_nodes)) – 返回一个节点指示矩阵，其中非零元素表示样本经过该节点。该矩阵为 CSR 格式。
n_nodes_ptr (ndarray，形状为 (n_estimators + 1,)) – indicator[n_nodes_ptr[i]:n_nodes_ptr[i+1]] 的列给出了第 i 个估计器的指示值。

feature_importances(max_depth=4, depth_decay_exponent=2.0)[source]

基于特征所产生的参数异质性数量计算特征重要性。异质性越高，特征越重要。特征的重要性计算为特征所产生的（归一化）总异质性。对于每棵树以及每次选择了该特征进行分裂，都增加

parent_weight * (left_weight * right_weight)
    * mean((value_left[k] - value_right[k])**2) / parent_weight**2

到特征的重要性中。每个这样的量都按分裂深度进行加权。这些重要性在树级别进行归一化，然后在树之间取平均。

参数

max_depth (int, default 4) – 深度大于 max_depth 的分裂不在此计算中使用
depth_decay_exponent (double, default 2.0) – 每个分裂对总得分的贡献按 1 / (1 + depth)**2.0 进行重新加权。

返回

feature_importances_ – 归一化的总参数异质性产生的每个特征的重要性

返回类型

ndarray 形状为 (n_features,)

fit(X, T, y, *, sample_weight=None, **kwargs)[source]

使用训练集 (X, T, y) 和任何其他辅助变量构建森林。

参数

X (array_like 形状为 (n_samples, n_features)) – 训练输入样本。在内部，其 dtype 将被转换为 dtype=np.float64。
T (array_like 形状为 (n_samples, n_treatments)) – 每个样本的处理向量
y (array_like 形状为 (n_samples,) 或 (n_samples, n_outcomes)) – 每个样本的结果值。
sample_weight (array_like 形状为 (n_samples,), default None) – 样本权重。如果为 None，则样本权重相等。在每个节点中搜索分裂时，将忽略会创建净权重为零或负值的子节点的分裂。
**kwargs (字典，其中 array_like 项的形状为 (n_samples, d_var)) – 进入矩函数（例如工具变量、审查等）的辅助随机变量。这些变量中的任何一个都将按原样传递给子类的 get_pointJ 和 get_alpha 方法。

返回

self

返回类型

对象

get_params(deep=True)

获取此估计器的参数。

参数: deep (bool, default=True) – 如果为 True，将返回此估计器及其包含的作为估计器的子对象的参数。
返回: params – 参数名称及其对应值的映射。
返回类型: dict

get_subsample_inds()[source]: 使用相同的伪随机性，重新生成拟合时使用的相同样本索引示例。

使用相同的伪随机性，重新生成拟合时使用的相同样本索引示例。

返回每个训练数据点相关的输出预测，仅使用那些没有用到该数据点的树。如果估计器使用 warm_start=True 进行训练，则此方法不可用。

参数: Xtrain ((n_training_samples, n_features) 矩阵) – 必须是拟合时传递给森林的完全相同的 X 矩阵。
返回: oob_preds – 每个训练点相关的袋外预测输出参数
返回类型: (n_training_samples, n_relevant_outputs) 矩阵

predict(X, interval=False, alpha=0.05)[source]

返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
interval (bool, default False) – 是否也返回置信区间
alpha (float，范围 (0, 1)，default 0.05) – 置信区间的置信水平。返回对称的 (alpha/2, 1-alpha/2) 置信区间。

返回

theta(X)[1, .., n_relevant_outputs] (array_like 形状为 (n_samples, n_relevant_outputs)) – X 的每一行估计的相关参数
lb(x), ub(x) (array_like 形状为 (n_samples, n_relevant_outputs)) – 每个参数置信区间的下限和上限。如果 interval=False，则省略返回值。

predict_alpha_and_jac(X, slice=None, parallel=True)[source]

使用森林作为核权重，返回条件 Jacobian E[J | X=x] 和条件 alpha E[A | X=x] 的值，即

alpha(x) = (1/n_trees) sum_{trees} (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] A[i]
jac(x) = (1/n_trees) sum_{trees} (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] J[i]

其中 w[i] 是样本权重（如果 sample_weight 为 None 则为 1.0）。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
slice (int 列表或 None, default None) – 如果不为 None，则仅使用索引在 slice 中的树来计算均值和方差。
parallel (bool, default True) – 是否使用并行计算进行平均。并行计算会增加一些开销，但在树很多时会更快。

返回

alpha (array_like 形状为 (n_samples, n_outputs)) – X 中每个样本 x 的估计条件 A，即 alpha(x)
jac (array_like 形状为 (n_samples, n_outputs, n_outputs)) – X 中每个样本 x 的估计条件 J，即 jac(x)

predict_and_var(X)[source]

返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，以及它们的协方差矩阵。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。

返回

theta(x)[1, .., n_relevant_outputs] (array_like 形状为 (n_samples, n_relevant_outputs)) – X 的每一行估计的相关参数
var(theta(x)) (array_like 形状为 (n_samples, n_relevant_outputs, n_relevant_outputs)) – theta(x)[1, .., n_relevant_outputs] 的协方差

predict_full(X, interval=False, alpha=0.05)[source]

返回 X 中每个 x 拟合的局部参数，即 theta(x)。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
interval (bool, default False) – 是否也返回置信区间
alpha (float，范围 (0, 1)，default 0.05) – 置信区间的置信水平。返回对称的 (alpha/2, 1-alpha/2) 置信区间。

返回

theta(x) (array_like 形状为 (n_samples, n_outputs)) – X 的每一行 x 估计的相关参数
lb(x), ub(x) (array_like 形状为 (n_samples, n_outputs)) – 每个参数置信区间的下限和上限。如果 interval=False，则省略返回值。

predict_interval(X, alpha=0.05)[source]

返回 X 中每个 x 相关的拟合局部参数的置信区间，即 theta(x)[1..n_relevant_outputs]。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
alpha (float，范围 (0, 1)，default 0.05) – 置信区间的置信水平。返回对称的 (alpha/2, 1-alpha/2) 置信区间。

返回

lb(x), ub(x) – 每个参数置信区间的下限和上限。如果 interval=False，则省略返回值。

返回类型

array_like 形状为 (n_samples, n_relevant_outputs)

predict_moment_and_var(X, parameter, slice=None, parallel=True)[source]

返回每个样本的条件期望矩向量值以及每个样本给定的参数估计值。

M(x; theta(x)) := E[J | X=x] theta(x) - E[A | X=x]

其中条件期望是基于森林权重估计的，即

M_tree(x; theta(x)) := (1/ |leaf(x)|) sum_{val sample i in leaf(x)} w[i] (J[i] theta(x) - A[i])
M(x; theta(x) = (1/n_trees) sum_{trees} M_tree(x; theta(x))

其中 w[i] 是样本权重（如果 sample_weight 为 None 则为 1.0），以及局部矩向量在树之间的方差

Var(M_tree(x; theta(x))) = (1/n_trees) sum_{trees} M_tree(x; theta(x)) @ M_tree(x; theta(x)).T

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
parameter (array_like 形状为 (n_samples, n_outputs)) – X 中每个样本 x 的参数 theta(x) 的估计
slice (int 列表或 None, default None) – 如果不为 None，则仅使用索引在 slice 中的树来计算均值和方差。
parallel (bool, default True) – 是否使用并行计算进行平均。并行计算会增加一些开销，但在树很多时会更快。

返回

moment (array_like 形状为 (n_samples, n_outputs)) – X 中每个样本 x 的估计条件矩 M(x; theta(x))
moment_var (array_like 形状为 (n_samples, n_outputs)) – X 中每个样本 x 的条件矩 Var(M_tree(x; theta(x))) 在树之间的方差

predict_projection(X, projector)[source]

返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，与投影向量 projector(x) 的内积，即::.

mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
projector (array_like 形状为 (n_samples, n_relevant_outputs)) – X 中每个样本 x 的投影向量

返回

mu(x) – X 的每一行 x 相关参数与投影向量的估计内积

返回类型

array_like 形状为 (n_samples, 1)

predict_projection_and_var(X, projector)[source]

返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，与投影向量 projector(x) 的内积，即::.

mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

以及 mu(x) 的方差。

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
projector (array_like 形状为 (n_samples, n_relevant_outputs)) – X 中每个样本 x 的投影向量

返回

mu(x) (array_like 形状为 (n_samples, 1)) – X 的每一行 x 相关参数与投影向量的估计内积
var(mu(x)) (array_like 形状为 (n_samples, 1)) – 估计内积的方差

predict_projection_var(X, projector)[source]

返回 X 中每个 x 相关的拟合局部参数的前缀，即 theta(x)[1..n_relevant_outputs]，与投影向量 projector(x) 的内积的方差，即

Var(mu(x)) for mu(x) := <theta(x)[1..n_relevant_outputs], projector(x)>

参数

X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
projector (array_like 形状为 (n_samples, n_relevant_outputs)) – X 中每个样本 x 的投影向量

返回

var(mu(x)) – 估计内积的方差

返回类型

array_like 形状为 (n_samples, 1)

predict_tree_average(X)[source]

返回每个 X 相关的拟合局部参数的前缀，即 theta(X)[1..n_relevant_outputs]。此方法仅返回每棵树估计的参数的平均值。predict 比 pred_tree_average 更优，因为它在树之间执行更稳定的平均。

参数: X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: theta(X)[1, .., n_relevant_outputs] – X 的每一行估计的相关参数
返回类型: array_like 形状为 (n_samples, n_relevant_outputs)

predict_tree_average_full(X)[source]

返回每个 X 拟合的局部参数，即 theta(X)。此方法仅返回每棵树估计的参数的平均值。predict_full 比 pred_tree_average_full 更优，因为它在树之间执行更稳定的平均。

参数: X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: theta(X) – X 的每一行估计的相关参数
返回类型: array_like 形状为 (n_samples, n_outputs)

predict_var(X)[source]

返回 X 中每个 x 相关的拟合局部参数前缀的协方差矩阵。

参数: X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: var(theta(x)) – theta(x)[1, .., n_relevant_outputs] 的协方差
返回类型: array_like 形状为 (n_samples, n_relevant_outputs, n_relevant_outputs)

prediction_stderr(X)[source]

返回 X 中每个 x 相关的拟合局部参数前缀的每个坐标的标准差。

参数: X (array_like 形状为 (n_samples, n_features)) – 输入样本。在内部，它将被转换为 dtype=np.float64。
返回: std(theta(x)) – 对于 i 在 {1, .., n_relevant_outputs} 中的每个 i，theta(x)[i] 的标准差
返回类型: array_like 形状为 (n_samples, n_relevant_outputs)

set_params(**params)

设置此估计器的参数。

此方法适用于简单估计器和嵌套对象（例如 Pipeline）。后者具有 <component>__<parameter> 形式的参数，因此可以更新嵌套对象的每个组件。

参数: **params (dict) – 估计器参数。
返回: self – 估计器实例。
返回类型: 估计器实例