broom系列包: 整理模型输出结果

broom包

说明

tidy、augment和glance函数的输出总是一个小tibble。
输出从来没有行名。这确保了您可以将它与其他整洁的输出组合在一起，而不用担心丢失信息(因为R中的行名不能包含重复)。
有些列名保持一致，这样它们就可以跨不同的模型进行组合。

tidy()

输出中的每一行通常代表一些定义良好的概念，例如回归中的一个术语、一个测试或一个集群/类。
常见的列名包括:

term: 被估计的回归或模型中的名称
p.value: 为了与R的内置统计包中的函数保持一致，选择了这个拼写(而不是常见的替代方法，如pvalue、pvalue或pval)
statistic: 统计量一种检验统计量，通常用于计算p值。在许多亚组中结合这些是进行(例如)自助假设检验的可靠方法估计
conf.low： 估计值的置信区间下限
conf.high： 估计值的置信区间上限
df：自由度

augment()

augment(model, data) 向原始数据添加列。
如果数据参数缺失，则从模型重构数据的 augment 尝试(注意，这可能并不总是可行的，并且通常不会包含模型中未使用的列)。
augment() 输出中的每一行都与原始数据中的相应行匹配。
如果原始数据包含行名，则augment()将它们转换为名为.rownames的列以避免覆盖原始数据
常见的列名包括:

.fitted: 预测值，与数据在同一尺度上。
.resid: 残差实际y值减去拟合值
.cluster:集群分配

glance()

glance()总是返回一个单行tibble。

唯一的例外是glance(NULL)返回一个空tibble。

我们避免包含给建模函数的参数。例如，glm glance输出不需要包含family字段，因为这是由用户调用glm而不是建模函数本身决定的。

常见的列名包括:

r.squared: 由模型解释的方差分数
adj.r.squared: 根据自由度调整的平方R2
sigma: 残差估计方差的平方根

broom.mixed包

函数解释

各类模型能够调用的函数

class	tidy	glance	augment
allFit	TRUE	TRUE	FALSE
brmsfit	TRUE	TRUE	TRUE
gamlss	TRUE	TRUE	FALSE
gamm4	TRUE	TRUE	TRUE
glmmadmb	TRUE	TRUE	TRUE
glmmTMB	TRUE	TRUE	TRUE
gls	TRUE	TRUE	TRUE
lme	TRUE	TRUE	TRUE
lmList4	TRUE	FALSE	FALSE
mcmc	TRUE	FALSE	FALSE
mcmc.list	TRUE	FALSE	FALSE
MCMCglmm	TRUE	FALSE	FALSE
merMod	TRUE	TRUE	TRUE
MixMod	TRUE	FALSE	FALSE
ranef.mer	FALSE	FALSE	TRUE
rjags	TRUE	FALSE	FALSE
rlmerMod	TRUE	FALSE	FALSE
stanfit	TRUE	FALSE	FALSE
stanreg	TRUE	TRUE	FALSE
TMB	TRUE	FALSE	FALSE
varComb	TRUE	FALSE	FALSE
varFunc	TRUE	FALSE	FALSE

broom.helper包

功能：整理数据+绘制森林图

结果输出的每列需要调用的函数

Column 列	Function 功能	Description 描述
original_term	tidy_disambiguate_terms(), tidy_multgee() or tidy_zeroinfl()	Original term before disambiguation. This columns is added only when disambiguation is needed (i.e. for mixed models). Also used for “multgee”, “zeroinfl” and “hurdle” models. 消歧前的原始术语。只有在需要消除歧义时(例如，对于混合模型)才添加此列。也用于“ multgee”，“ zeroinfl”和“ hurdle”模型。
variable	tidy_identify_variables()	String of variable names from the model. For categorical variables and polynomial terms defined with stats::poly(), terms belonging to the variable are identified. 模型中的变量名字符串。对于分类变量和多项式术语定义的统计: : 聚() ，属于变量的术语被识别。
var_class	tidy_identify_variables()	Class of the variable. 变量的类。
var_type	tidy_identify_variables()	One of “intercept”, “continuous”, “dichotomous”, “categorical”, “interaction”, “ran_pars” or “ran_vals” “拦截”、“连续”、“二分法”、“范畴”、“互动”、“运行”或“运行”之一
var_nlevels	tidy_identify_variables()	Number of original levels for categorical variables 分类变量的原始级数
contrasts	tidy_add_contrasts() ()	Contrasts used for categorical variables. Require “variable” column. If needed, will automatically apply tidy_identify_variables(). 对比用于分类变量。需要“变量”列。如果需要，将自动应用整齐 _ 标识 _ 变量()。
contrasts_type	tidy_add_contrasts() ()	Type of contrasts (“treatment”, “sum”, “poly”, “helmert”, “sdif”, “other” or “no.contrast”). “pairwise is used for pairwise contrasts computed with tidy_add_pairwise_contrasts(). 对比的类型(“处理”，“总和”，“多边形”，“赫尔默特”，“ sdif”，“其他”或“不对比”)。“成对对比用于成对对比，用于计算整齐的tidy_add_pairwise_contrasts()
reference_row	tidy_add_reference_rows() ()	Logical indicating if a row is a reference row for categorical variables using a treatment or a sum contrast. Is equal to NA for variables who do not have a reference row. Require “contrasts” column. If needed, will automatically apply tidy_add_contrasts(). tidy_add_reference_rows() will not populate the label of the reference term. It is therefore better to apply tidy_add_term_labels() after tidy_add_reference_rows() rather than before. 使用处理或和对比表示行是否为分类变量的引用行的逻辑。等于没有引用行的变量的 NA。需要“对比度”列。如果需要，将自动应用 tidy_add_contrasts()。tidy_add_reference_rows() 将不会填充引用术语的标签。因此，在tidy_add_term_labels()之后应用 tidy_add_reference_rows()比之前更好。
var_label	tidy_add_variable_labels()	String of variable labels from the model. Columns labelled with the labelled package are retained. It is possible to pass a custom label for an interaction term with the labels argument. Require “variable” column. If needed, will automatically apply tidy_identify_variables(). 模型中的变量标签字符串。保留标有标签的包装的栏。可以为带有 label 参数的交互项传递自定义标签。需要“变量”列。如果需要的话，将自动应用奖励的整齐 _ 标识 _ 变量()。
label 标签	tidy_add_term_labels() ()	String of term labels based on (1) labels provided in labels argument if provided; (2) factor levels for categorical variables coded with treatment, SAS or sum contrasts; (3) variable labels when there is only one term per variable; and (4) term name otherwise. Require “variable_label” column. If needed, will automatically apply tidy_add_variable_labels(). Require “contrasts” column. If needed, will automatically apply tidy_add_contrasts(). 基于(1)标签参数中提供的标签的术语标签串; (2)用处理、 SAS 或和对比编码的分类变量的因子水平; (3)每个变量只有一个术语时的变量标签; 以及(4)其他术语名称。需要“ variable _ label”列。如果需要，将自动应用 tidy_add_variable_labels().。需要“对比度”列。如果需要，将自动应用tidy_add_contrasts().
header_row	tidy_add_header_rows() ()	Logical indicating if a row is a header row for variables with several terms. Is equal to NA for variables who do not have an header row.Require “label” column. If needed, will automatically apply tidy_add_term_labels(). It is better to apply tidy_add_header_rows() after other tidy_* functions 逻辑指示一行是否为具有多个术语的变量的标题行。等于没有标题行的变量的 NA。需要“标签”列。如果需要，将自动应用 tidy_add_term_labels()。最好是在其他tidy_* 函数之后应用 tidy_add_header_rows()
n_obs	tidy_add_n() ()	Number of observations 观察次数
n_event	tidy_add_n() ()	Number of events (for binomial and multinomial logistic models, Poisson and Cox models) 事件数(二项式和多项式逻辑模型，泊松和考克斯模型)
exposure	tidy_add_n() ()	Exposure time (for Poisson and Cox models) 暴露时间(泊松和考克斯模型) Add the (weighted) number of observations — tidy_add_n • broom.helpers (larmarange.github.io)

Column

列

Function

功能

Description

描述

original_term

tidy_disambiguate_terms(), tidy_multgee() or tidy_zeroinfl()

Original term before disambiguation. This columns is added only when disambiguation is needed (i.e. for mixed models). Also used for “multgee”, “zeroinfl” and “hurdle” models.

消歧前的原始术语。只有在需要消除歧义时(例如，对于混合模型)才添加此列。也用于“ multgee”，“ zeroinfl”和“ hurdle”模型。

variable

tidy_identify_variables()

String of variable names from the model. For categorical variables and polynomial terms defined with stats::poly(), terms belonging to the variable are identified.

模型中的变量名字符串。对于分类变量和多项式术语定义的统计: : 聚() ，属于变量的术语被识别。

var_class

tidy_identify_variables()

Class of the variable.

变量的类。

var_type

tidy_identify_variables()

One of “intercept”, “continuous”, “dichotomous”, “categorical”, “interaction”, “ran_pars” or “ran_vals”

“拦截”、“连续”、“二分法”、“范畴”、“互动”、“运行”或“运行”之一

var_nlevels

tidy_identify_variables()

Number of original levels for categorical variables

分类变量的原始级数

contrasts

tidy_add_contrasts()

()

Contrasts used for categorical variables.
Require “variable” column. If needed, will automatically apply tidy_identify_variables().

对比用于分类变量。需要“变量”列。如果需要，将自动应用整齐 _ 标识 _ 变量()。

contrasts_type

tidy_add_contrasts()

()

Type of contrasts (“treatment”, “sum”, “poly”, “helmert”, “sdif”, “other” or “no.contrast”). “pairwise is used for pairwise contrasts computed with tidy_add_pairwise_contrasts().

对比的类型(“处理”，“总和”，“多边形”，“赫尔默特”，“ sdif”，“其他”或“不对比”)。“成对对比用于成对对比，用于计算整齐的tidy_add_pairwise_contrasts()

reference_row

tidy_add_reference_rows()

()

Logical indicating if a row is a reference row for categorical variables using a treatment or a sum contrast. Is equal to NA for variables who do not have a reference row.
Require “contrasts” column. If needed, will automatically apply tidy_add_contrasts().
tidy_add_reference_rows() will not populate the label of the reference term. It is therefore better to apply tidy_add_term_labels() after tidy_add_reference_rows() rather than before.

使用处理或和对比表示行是否为分类变量的引用行的逻辑。等于没有引用行的变量的 NA。需要“对比度”列。如果需要，将自动应用 tidy_add_contrasts()。tidy_add_reference_rows() 将不会填充引用术语的标签。因此，在tidy_add_term_labels()之后应用 tidy_add_reference_rows()比之前更好。

var_label

tidy_add_variable_labels()

String of variable labels from the model. Columns labelled with the labelled package are retained. It is possible to pass a custom label for an interaction term with the labels argument.
Require “variable” column. If needed, will automatically apply tidy_identify_variables().

模型中的变量标签字符串。保留标有标签的包装的栏。可以为带有 label 参数的交互项传递自定义标签。需要“变量”列。如果需要的话，将自动应用奖励的整齐 _ 标识 _ 变量()。

label

标签

tidy_add_term_labels()

()

String of term labels based on (1) labels provided in labels argument if provided; (2) factor levels for categorical variables coded with treatment, SAS or sum contrasts; (3) variable labels when there is only one term per variable; and (4) term name otherwise.
Require “variable_label” column. If needed, will automatically apply tidy_add_variable_labels().
Require “contrasts” column. If needed, will automatically apply tidy_add_contrasts().

基于(1)标签参数中提供的标签的术语标签串; (2)用处理、 SAS 或和对比编码的分类变量的因子水平; (3)每个变量只有一个术语时的变量标签; 以及(4)其他术语名称。需要“ variable _ label”列。如果需要，将自动应用 tidy_add_variable_labels().。需要“对比度”列。如果需要，将自动应用tidy_add_contrasts().

header_row

tidy_add_header_rows()

()

Logical indicating if a row is a header row for variables with several terms. Is equal to NA for variables who do not have an header row.Require “label” column. If needed, will automatically apply tidy_add_term_labels().
It is better to apply tidy_add_header_rows() after other tidy_* functions

逻辑指示一行是否为具有多个术语的变量的标题行。等于没有标题行的变量的 NA。需要“标签”列。如果需要，将自动应用 tidy_add_term_labels()。最好是在其他tidy_* 函数之后应用 tidy_add_header_rows()

n_obs

tidy_add_n()

()

Number of observations

观察次数

n_event

tidy_add_n()

()

Number of events (for binomial and multinomial logistic models, Poisson and Cox models)

事件数(二项式和多项式逻辑模型，泊松和考克斯模型)

exposure

tidy_add_n()

()

Exposure time (for Poisson and Cox models)

暴露时间(泊松和考克斯模型)

Add the (weighted) number of observations — tidy_add_n • broom.helpers (larmarange.github.io)

各类属性调整时，需要调用的函数

Attribute 属性	Function 函数	Description 功能
exponentiate	tidy_and_attach	估计是否为指数形式	Estimates were exponentiated
conf.level	tidy_and_attach	置信区间的置信度	Level of confidence for confidence intervals
coefficients_type	tidy_add_coefficients_type	系数类型	Type of coefficients
coefficients_label	tidy_add_coefficients_type	系数标签	Coefficients label
variable_labels	tidy_add_variable_labels	将自定义变量标签传递给 tidy_add_variable_labels	Custom variable labels passed to tidy_add_variable_labels
term_labels	tidy_add_term_labels	将自定义术语标签传递给tidy_add_term_labels	Custom term labels passed to tidy_add_term_labels
N_obs	tidy_add_n	观测总数	Total number of observations
N_event	tidy_add_n	事件总数	Total number of events
Exposure	tidy_add_n	总暴露时间	Total of exposure time
component	tidy_zeroinfl	参数传递到tidy_zeroinfl()	component argument passed to tidy_zeroinfl()

支持这些模型的模型整理任务

回归模型
- 所有模型汇总
  
  betareg::betareg()、biglm::bigglm()、biglmm::bigglm()、brms::brm()、fixest::feglm()、fixest::femlm()、fixest::feNmlm()、fixest::feols()、glmmTMB::glmmTMB()、gam::gam()、geepack::geeglm()、logitr::logitr()、MASS::glm.nb()、MASS::polr()、mgcv::gam()、mice::mira()、mmrm::mmrm()、nnet::multinom()、ordinal::clm()、ordinal::clmm()、rstanarm::stan_glm()、stats::glm()、plm::plm()、stats::lm()、stats::nls()
- Beta Regression 模型（Beta 回归模型）
  - betareg::betareg(): Beta 回归模型是一种用于处理介于 0 和 1 之间的连续百分比或概率型数据的回归模型。它适合模拟遵循 Beta 分布的数据。
- Generalized Linear Model（广义线性模型）
  - stats::glm(): 广义线性模型适用于探索因变量和自变量之间的关系，可根据因变量的分布选择不同的链接函数。
  - MASS::glm.nb(): 负二项广义线性模型，用于处理计数型数据，适合模拟计数数据的过度离散性。
  - MASS::polr(): 有序 logistic 回归模型，用于处理有序分类数据。
- Generalized Additive Model（广义加性模型）
  - gam::gam(): 广义加性模型通过非参数函数来拟合数据的非线性关系，适合处理复杂数据结构。
  - mgcv::gam(): 广义加性模型的具体实现，用于拟合非线性样条函数，可以处理连续和分类预测变量。
- Mixed Effects Model（混合效应模型）
  - biglm::bigglm() 和 biglmm::bigglm(): 用于大数据集的广义线性混合效应模型，适合应对大规模数据的建模需求。
  - brms::brm(): 贝叶斯混合效应模型，用于探究固定效应和随机效应在数据中的作用。
- Fixed Effects Model（固定效应模型）
  - fixest::feglm(), fixest::femlm(), fixest::feNmlm(), fixest::feols(): 固定效应模型用于处理面板数据，通过固定效应消除个体间的异质性。
- Generalized Estimating Equations（广义估计方程）
  - geepack::geeglm(): 广义估计方程模型，用于处理相关数据和重复测量数据，适合考虑数据相关性的分析。
- Generalized Additive Mixed Model（广义加性混合模型）
  - glmmTMB::glmmTMB(): 广义加性混合模型，结合了非线性关系和随机效应的建模方法。
- Logistic Regression（逻辑回归模型）
  - logitr::logitr(): 逻辑回归模型适用于二分类或多分类问题的建模和预测。
  - nnet::multinom(): 多项 logistic 回归模型，用于多分类问题的建模和预测。
- Ordinal Regression（有序回归模型）
  - ordinal::clm(), ordinal::clmm(): 有序回归模型适用于有序分类数据的建模和预测。
- Survival Analysis（生存分析模型）
  - rstanarm::stan_glm(): 基于 Stan 的生存分析模型，用于处理具有时间到事件或失败的数据。
- Panel Data Models（面板数据模型）
  - plm::plm(): 面板数据模型适用于分析面板数据，可以考虑固定效应和随机效应。
- Linear Regression（线性回归模型）
  - stats::lm(): 线性回归模型，用于研究自变量和因变量之间的线性关系。
- Nonlinear Regression Model（非线性回归模型）
  - stats::nls(): 非线性最小二乘回归模型，适合拟合非线性关系的数据。
- Missing Data Imputation Models（缺失数据插补模型）
  - mice::mira(): 用于缺失数据插补的多重插补方法。
- Repeated Measures ANOVA Models（重复测量 ANOVA 模型）
  - mmrm::mmrm(): 用于处理重复测量设计的混合模型。
生存分析
- 所有模型汇总
  
  cmprsk::crr()、survival::cch()、survival::clogit()、survival::coxph()、survival::survreg()、survey::svycoxph()、survey::svyglm()、survey::svyolr()、tidycmprsk::crr()
- Cox Proportional Hazards Model（Cox 比例风险模型）
  - survival::coxph(): Cox 比例风险模型是一种常用的生存分析方法，用于研究自变量对风险函数比例的影响。
- Accelerated Failure Time Model（加速失效时间模型）
  - survival::survreg(): 加速失效时间模型是生存分析的另一种方法，用于研究生存时间的分布和影响因素。
- Competing Risks Regression Model（竞争风险回归模型）
  - cmprsk::crr(): 竞争风险回归模型用于处理存在竞争事件的生存数据，考虑了不同类型风险之间的相互影响。
- Conditional Logistic Regression Model（条件 logistic 回归模型）
  - survival::clogit(): 条件 logistic 回归模型适用于研究匹配案例-对照的生存数据，用于探究协变量与事件发生之间的关系。
- Proportional Hazards Regression （比例风险回归模型）
  - survival::cch(): proportional hazards regression model to case-cohort data 病例队列数据的比例风险回归模型
- Survey-weighted Cox Proportional Hazards Model（加权调查 Cox 比例风险模型）
  - survey::svycoxph(): 加权调查 Cox 比例风险模型用于处理调查样本中的生存数据，考虑了复杂调查设计所带来的权重。
- Survey-weighted Generalized Linear Models（加权调查广义线性模型）
  - survey::svyglm(): 加权调查广义线性模型适用于处理调查数据的生存分析，考虑了调查数据的权重设计。
- Survey-weighted Ordinal Logistic Regression（加权调查有序 logistic 回归模型）
  - survey::svyolr(): 加权调查有序 logistic 回归模型处理调查数据的有序分类生存分析，考虑了调查数据的权重设计。
- Tidy Version of Competing Risks Regression Model（竞争风险回归模型的简洁版本）
  - tidycmprsk::crr(): 简洁而易用的竞争风险回归模型，用于处理具有竞争风险的生存数据，提供了更直观的数据处理方式。
分类模型
- 所有模型汇总
  - multgee::nomLORgee()、multgee::ordLORgee()、MASS::polr()、nnet::multinom()。
- Generalized Estimating Equations Model（广义估计方程模型）
  - multgee::nomLORgee(): 广义估计方程模型用于处理具有分类响应变量的长期观测数据，适用于二元分类模型。
  - multgee::ordLORgee(): 广义估计方程模型的有序 logistic 回归扩展，用于处理有序分类响应变量的长期观测数据。
- Proportional Odds Logistic Regression Model（比例几率 logistic 回归模型）
  - MASS::polr(): 比例几率 logistic 回归模型用于处理有序分类响应变量的数据，通过拟合分类阈值来预测不同类别的概率。
- Multinomial Logistic Regression Model（多项 logistic 回归模型）
  - nnet::multinom(): 多项 logistic 回归模型适用于处理具有多个离散分类响应变量的数据，用于预测每个类别的条件概率。
其他模型
所有模型汇总
- lavaan::lavaan()、lfe::felm()、parsnip::modelfit()、pscl::hurdle()、pscl::zeroinfl()、VGAM::vglm()。

Structural Equation Modeling（结构方程模型）

lavaan::lavaan(): 结构方程模型用于建立变量间的因果关系模型，通过观察变量之间的协方差和误差来估计模型参数。

Fixed Effects Linear Regression Model（固定效应线性回归模型）

lfe::felm(): 固定效应线性回归模型用于处理面板数据，控制个体固定效应的同时估计因变量与自变量之间的关系。

General Model Fitting（通用模型拟合）

parsnip::modelfit(): 通用模型拟合提供了一种灵活的方式来拟合各种类型的模型，用于在统一的框架下估计不同模型的系数和效果。

Hurdle Model（障碍模型）

pscl::hurdle(): 障碍模型是一种处理具有零膨胀和非零成分的计数数据的模型，用于建模数据的生成过程并估计相关参数。

Zero-Inflated Model（零膨胀模型）

pscl::zeroinfl(): 零膨胀模型是一种处理具有零膨胀和非零成分的计数数据的模型，适用于解释数据中零值出现的原因和其他成分的影响。

Vector Generalized Linear Model（向量广义线性模型）

VGAM::vglm(): 向量广义线性模型是广义线性模型的推广，用于处理多元响应变量的数据，并允许灵活建模不同响应变量的关系

代码示例参考学习

library(broom.helpers)
library(gtsummary)
library(ggplot2)
library(dplyr)
model_logit <- glm(response ~ trt + grade, trial, family = binomial)
broom::tidy(model_logit) # 使用broom包中的tidy函数整理逻辑回归模型，得到模型的摘要信息

# 已被后续做森林的数据准备
tidy_forest <- model_logit %>%
  # 对模型进行初始整理
  tidy_and_attach(exponentiate = TRUE, conf.int = TRUE) %>%
  # 添加分类变量的参考行
  tidy_add_reference_rows() %>%
  # 添加要在图中显示的参考值
  tidy_add_estimate_to_reference_rows() %>%
  # 添加变量标签
  tidy_add_term_labels() %>%
  # 从模型中删除截距估计
  tidy_remove_intercept()   

# 绘制矢森林图
tidy_forest %>%
  # 添加新的列 plot_label，用于构建绘图标签
  mutate(
    plot_label = paste(var_label, label, sep = ":") %>%  # 将变量标签和标签组合起来作为绘图标签
      forcats::fct_inorder() %>%  # 按照顺序重新排序
      forcats::fct_rev()  # 反转顺序
  ) %>%
  ggplot(aes(x = plot_label, y = estimate, ymin = conf.low, ymax = conf.high, color = variable)) +  # 设置绘图 aesthetics
  geom_hline(yintercept = 1, linetype = 2) +  # 添加水平参考线
  geom_pointrange() +  # 添加点范围
  coord_flip() +  # 反转坐标轴
  theme(legend.position = "none") +  # 设置图例位置
  labs(
    y = "Odds Ratio",  # y轴标签
    x = " ",  # x轴标签为空
    title = "Forest Plot using broom.helpers"  # 图表标题
  )  # 设置标签

参考文献：

Getting Started with broom.helpers • broom.helpers (larmarange.github.io)https://larmarange.github.io/broom.helpers/dev/articles/tidy.htmlcran.r-project.org/web/packages/broom.mixed/broom.mixed.pdfhttps://cran.r-project.org/web/packages/broom.mixed/broom.mixed.pdf