Title: | Multiple Imputation using Chained Random Forests |
---|---|
Description: | An R package for multiple imputation using chained random forests. Implemented methods can handle missing data in mixed types of variables by using prediction-based or node-based conditional distributions constructed using random forests. For prediction-based imputation, the method based on the empirical distribution of out-of-bag prediction errors of random forests and the method based on normality assumption for prediction errors of random forests are provided for imputing continuous variables. And the method based on predicted probabilities is provided for imputing categorical variables. For node-based imputation, the method based on the conditional distribution formed by the predicting nodes of random forests, and the method based on proximity measures of random forests are provided. More details of the statistical methods can be found in Hong et al. (2020) <arXiv:2004.14823>. |
Authors: | Shangzhi Hong [aut, cre], Henry S. Lynn [ths] |
Maintainer: | Shangzhi Hong <[email protected]> |
License: | GPL-3 |
Version: | 2.1.8 |
Built: | 2025-02-21 04:30:24 UTC |
Source: | https://github.com/shangzhi-hong/rfempimp |
Convert variables to factors
conv.factor(data, convNames = NULL, exceptNames = NULL, uniqueNum = 5)
conv.factor(data, convNames = NULL, exceptNames = NULL, uniqueNum = 5)
data |
Input data frame. |
convNames |
Names of variable to convert, the default is
|
exceptNames |
Names of variables to be excluded from conversion, the
default is |
uniqueNum |
Variables of less than or equal to a specific number of
unique values in the to be converted to factors, the default is
|
A data frame of converted variables.
nhanes.fix <- conv.factor(data = nhanes, convNames = c("age", "hyp"))
nhanes.fix <- conv.factor(data = nhanes, convNames = c("age", "hyp"))
Generate missing (completely at random) cells in a data set
gen.mcar(df, prop.na = 0.2, warn.empty.row = TRUE, ...)
gen.mcar(df, prop.na = 0.2, warn.empty.row = TRUE, ...)
df |
Input data frame or matrix. |
prop.na |
Proportion of generated missing cells. The default is
|
warn.empty.row |
Show a warning if empty rows were present in the output data set. |
... |
Other parameters (will be ignored). |
A data frame or matrix containing generated missing cells.
Shangzhi Hong
data("mtcars") mtcars.mcar <- gen.mcar(mtcars, warn.empty.row = FALSE)
data("mtcars") mtcars.mcar <- gen.mcar(mtcars, warn.empty.row = FALSE)
RfEmp
multiple imputation method is for mixed types of variables,
and calls corresponding functions based on variable types.
Categorical variables should be of type factor
or logical
, etc.
RfPred.Emp
is used for continuous variables, and RfPred.Cate
is used for categorical variables.
imp.rfemp( data, num.imp = 5, max.iter = 5, num.trees = 10, alpha.emp = 0, sym.dist = TRUE, pre.boot = TRUE, num.trees.cont = NULL, num.trees.cate = NULL, num.threads = NULL, print.flag = FALSE, ... )
imp.rfemp( data, num.imp = 5, max.iter = 5, num.trees = 10, alpha.emp = 0, sym.dist = TRUE, pre.boot = TRUE, num.trees.cont = NULL, num.trees.cate = NULL, num.threads = NULL, print.flag = FALSE, ... )
data |
A data frame or a matrix containing the incomplete data. Missing
values should be coded as |
num.imp |
Number of multiple imputations. The default is
|
max.iter |
Number of iterations. The default is |
num.trees |
Number of trees to build. The default is
|
alpha.emp |
The "significance level" for the empirical distribution of
out-of-bag prediction errors, can be used for prevention for outliers
(helpful for highly skewed variables).
For example, set alpha = 0.05 to use 95% confidence level.
The default is |
sym.dist |
If |
pre.boot |
If |
num.trees.cont |
Number of trees to build for continuous variables.
The default is |
num.trees.cate |
Number of trees to build for categorical variables,
The default is |
num.threads |
Number of threads for parallel computing. The default is
|
print.flag |
If |
... |
Other arguments to pass down. |
For continuous variables, mice.impute.rfpred.emp
is called, performing
imputation based on the empirical distribution of out-of-bag
prediction errors of random forests.
For categorical variables, mice.impute.rfpred.cate
is called,
performing imputation based on predicted probabilities.
An object of S3 class mids
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfemp imp <- imp.rfemp(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfemp imp <- imp.rfemp(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
RfNode.Cond
multiple imputation method is for mixed types of variables,
using conditional distribution formed by predicting nodes of random forest
(out-of-bag observations will be excluded).
imp.rfnode.cond( data, num.imp = 5, max.iter = 5, num.trees = 10, pre.boot = TRUE, print.flag = FALSE, ... )
imp.rfnode.cond( data, num.imp = 5, max.iter = 5, num.trees = 10, pre.boot = TRUE, print.flag = FALSE, ... )
data |
A data frame or a matrix containing the incomplete data. Missing
values should be coded as |
num.imp |
Number of multiple imputations. The default is
|
max.iter |
Number of iterations. The default is |
num.trees |
Number of trees to build. The default is
|
pre.boot |
If |
print.flag |
If |
... |
Other arguments to pass down. |
During imputation using imp.rfnode.cond
, for missing observations, the
candidate non-missing observations will be found by the predicting nodes
of random trees in the random forest model. Only the in-bag observations
for each random tree will be used for imputation.
An object of S3 class mids
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfnode.cond imp <- imp.rfnode.cond(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfnode.cond imp <- imp.rfnode.cond(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
RfNodeProx
multiple imputation method is for mixed types of variables,
using conditional distributions formed by proximity measures of random
forests (both in-bag and out-of-bag observations will be used for imputation).
imp.rfnode.prox( data, num.imp = 5, max.iter = 5, num.trees = 10, pre.boot = TRUE, print.flag = FALSE, ... )
imp.rfnode.prox( data, num.imp = 5, max.iter = 5, num.trees = 10, pre.boot = TRUE, print.flag = FALSE, ... )
data |
A data frame or a matrix containing the incomplete data. Missing
values should be coded as |
num.imp |
Number of multiple imputations. The default is
|
max.iter |
Number of iterations. The default is |
num.trees |
Number of trees to build. The default is
|
pre.boot |
If |
print.flag |
If |
... |
Other arguments to pass down. |
During imputation using imp.rfnode.prox
, for missing observations, the
candidate non-missing observations will be found by whether two observations
can be retrieved from the same predicting node during prediction. The
observations used for imputation may not be necessarily be contained in the
terminal node of random forest model.
An object of S3 class mids
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfnode.prox imp <- imp.rfnode.prox(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
# Prepare data: convert categorical variables to factors nhanes.fix <- nhanes nhanes.fix[, c("age", "hyp")] <- lapply(nhanes[, c("age", "hyp")], as.factor) # Perform imputation using imp.rfnode.prox imp <- imp.rfnode.prox(nhanes.fix) # Do repeated analyses anl <- with(imp, lm(chl ~ bmi + hyp)) # Pool the results pool <- pool(anl) # Get pooled estimates reg.ests(pool)
Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.
RfEmpImp
multiple imputation method, adapter for mice
samplers.
These functions can be called by the mice
sampler function. In the
mice()
function, set method = "rfemp"
to use the RfEmp
method.
mice.impute.rfemp
is for mixed types of variables, and it calls
corresponding functions according to variable types. Categorical variables
should be of type factor
or logical
etc.
For continuous variables, mice.impute.rfpred.emp
is called, performing
imputation based on the empirical distribution of out-of-bag prediction
errors of random forests.
For categorical variables, mice.impute.rfpred.cate
is called,
performing imputation based on predicted probabilities.
mice.impute.rfemp( y, ry, x, wy = NULL, num.trees = 10, alpha.emp = 0, sym.dist = TRUE, pre.boot = TRUE, num.trees.cont = NULL, num.trees.cate = NULL, ... )
mice.impute.rfemp( y, ry, x, wy = NULL, num.trees = 10, alpha.emp = 0, sym.dist = TRUE, pre.boot = TRUE, num.trees.cont = NULL, num.trees.cate = NULL, ... )
y |
Vector to be imputed. |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
num.trees |
Number of trees to build, default to |
alpha.emp |
The "significance level" for empirical distribution of
prediction errors, can be used for prevention for outliers (useful for highly
skewed variables). For example, set alpha = 0.05 to use 95% confidence level
for empirical distribution of prediction errors.
Default is |
sym.dist |
If |
pre.boot |
Perform bootstrap prior to imputation to get 'proper'
multiple imputation, i.e. accommodating sampling variation in estimating
population regression parameters (see Shah et al. 2014).
It should be noted that if |
num.trees.cont |
Number of trees to build for continuous variables,
default to |
num.trees.cate |
Number of trees to build for categorical variables,
default to |
... |
Other arguments to pass down. |
RfEmpImp
imputation sampler, the mice.impute.rfemp
calls
mice.impute.rfpred.emp
if the variable is.numeric
is
TRUE
, otherwise it calls mice.impute.rfpred.cate
.
Vector with imputed data, same type as y
, and of length
sum(wy)
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Prepare data: convert categorical variables to factors nhanes.fix <- conv.factor(nhanes, c("age", "hyp")) # This function is exported to be visible to the mice sampler functions, and # users can set method = "rfemp" in call to mice to use this function. # Users are recommended to use the imp.rfemp function instead: impObj <- mice(nhanes.fix, method = "rfemp", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE )
# Prepare data: convert categorical variables to factors nhanes.fix <- conv.factor(nhanes, c("age", "hyp")) # This function is exported to be visible to the mice sampler functions, and # users can set method = "rfemp" in call to mice to use this function. # Users are recommended to use the imp.rfemp function instead: impObj <- mice(nhanes.fix, method = "rfemp", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE )
Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.
RfNode
imputation methods, adapter for mice
samplers.
These functions can be called by the mice
sampler functions.
mice.impute.rfnode.cond
is for imputation using the conditional formed
by the predicting nodes of random forests. To use this function, set
method = "rfnode.cond"
in mice
function.
mice.impute.rfnode.prox
is for imputation based on proximity measures
from random forests, and provides functionality similar to
mice.impute.rf
. To use this function, set
method = "rfnode.prox"
in mice
function.
mice.impute.rfnode
is the main function for performing imputation, and
both mice.impute.rfnode.cond
and mice.impute.rfnode.prox
call
this function. By default, mice.impute.rfnode
works like
mice.impute.rfnode.cond
.
mice.impute.rfnode( y, ry, x, wy = NULL, num.trees.node = 10, pre.boot = TRUE, use.node.cond.dist = TRUE, obs.eq.prob = FALSE, do.sample = TRUE, num.threads = NULL, ... ) mice.impute.rfnode.cond( y, ry, x, wy = NULL, num.trees = 10, pre.boot = TRUE, obs.eq.prob = FALSE, ... ) mice.impute.rfnode.prox( y, ry, x, wy = NULL, num.trees = 10, pre.boot = TRUE, obs.eq.prob = FALSE, ... )
mice.impute.rfnode( y, ry, x, wy = NULL, num.trees.node = 10, pre.boot = TRUE, use.node.cond.dist = TRUE, obs.eq.prob = FALSE, do.sample = TRUE, num.threads = NULL, ... ) mice.impute.rfnode.cond( y, ry, x, wy = NULL, num.trees = 10, pre.boot = TRUE, obs.eq.prob = FALSE, ... ) mice.impute.rfnode.prox( y, ry, x, wy = NULL, num.trees = 10, pre.boot = TRUE, obs.eq.prob = FALSE, ... )
y |
Vector to be imputed. |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
num.trees.node |
Number of trees to build, default to |
pre.boot |
Perform bootstrap prior to imputation to get 'proper' imputation, i.e. accommodating sampling variation in estimating population regression parameters (see Shah et al. 2014). |
use.node.cond.dist |
If |
obs.eq.prob |
If |
do.sample |
If |
num.threads |
Number of threads for parallel computing. The default is
|
... |
Other arguments to pass down. |
num.trees |
Number of trees to build, default to |
Advanced users can get more flexibility from mice.impute.rfnode
function, as it provides more options than mice.impute.rfnode.cond
or
mice.impute.rfnode.prox
.
Vector with imputed data, same type as y
, and of length
sum(wy)
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Doove, Lisa L., Stef Van Buuren, and Elise Dusseldorp. "Recursive partitioning for missing data imputation in the presence of interaction effects." Computational Statistics & Data Analysis 72 (2014): 92-104.
# Prepare data: convert categorical variables to factors nhanes.fix <- conv.factor(nhanes, c("age", "hyp")) # Using "rfnode.cond" or "rfnode" impRfNodeCond <- mice(nhanes.fix, method = "rfnode.cond", m = 5, maxit = 5, maxcor = 1.0, eps = 0, printFlag = FALSE) # Using "rfnode.prox" impRfNodeProx <- mice(nhanes.fix, method = "rfnode.prox", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
# Prepare data: convert categorical variables to factors nhanes.fix <- conv.factor(nhanes, c("age", "hyp")) # Using "rfnode.cond" or "rfnode" impRfNodeCond <- mice(nhanes.fix, method = "rfnode.cond", m = 5, maxit = 5, maxcor = 1.0, eps = 0, printFlag = FALSE) # Using "rfnode.prox" impRfNodeProx <- mice(nhanes.fix, method = "rfnode.prox", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.
For categorical variables only.
Part of project RfEmpImp
, the function mice.impute.rfpred.cate
is for categorical variables, performing imputation based on predicted
probabilities for the categories.
mice.impute.rfpred.cate( y, ry, x, wy = NULL, num.trees.cate = 10, use.pred.prob.cate = TRUE, forest.vote.cate = FALSE, pre.boot = TRUE, num.threads = NULL, ... )
mice.impute.rfpred.cate( y, ry, x, wy = NULL, num.trees.cate = 10, use.pred.prob.cate = TRUE, forest.vote.cate = FALSE, pre.boot = TRUE, num.threads = NULL, ... )
y |
Vector to be imputed. |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
num.trees.cate |
Number of trees to build for categorical variables,
default to |
use.pred.prob.cate |
Logical, |
forest.vote.cate |
Logical, |
pre.boot |
Perform bootstrap prior to imputation to get 'proper'
multiple imputation, i.e. accommodating sampling variation in estimating
population regression parameters (see Shah et al. 2014).
It should be noted that if |
num.threads |
Number of threads for parallel computing. The default is
|
... |
Other arguments to pass down. |
RfEmpImp
Imputation sampler for: categorical variables based on
predicted probabilities.
Vector with imputed data, same type as y
, and of length
sum(wy)
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Prepare data mtcars.catmcar <- mtcars mtcars.catmcar[, c("gear", "carb")] <- gen.mcar(mtcars.catmcar[, c("gear", "carb")], warn.empty.row = FALSE) mtcars.catmcar <- conv.factor(mtcars.catmcar, c("gear", "carb")) # Perform imputation impObj <- mice(mtcars.catmcar, method = "rfpred.cate", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
# Prepare data mtcars.catmcar <- mtcars mtcars.catmcar[, c("gear", "carb")] <- gen.mcar(mtcars.catmcar[, c("gear", "carb")], warn.empty.row = FALSE) mtcars.catmcar <- conv.factor(mtcars.catmcar, c("gear", "carb")) # Perform imputation impObj <- mice(mtcars.catmcar, method = "rfpred.cate", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.
For continuous variables only.
This function is for RfPred.Emp
multiple imputation method, adapter
for mice
samplers. In the mice()
function, set
method = "rfpred.emp"
to call it.
The function performs multiple imputation based on the empirical distribution of out-of-bag prediction errors of random forests.
mice.impute.rfpred.emp( y, ry, x, wy = NULL, num.trees.cont = 10, sym.dist = TRUE, alpha.emp = 0, pre.boot = TRUE, num.threads = NULL, ... )
mice.impute.rfpred.emp( y, ry, x, wy = NULL, num.trees.cont = 10, sym.dist = TRUE, alpha.emp = 0, pre.boot = TRUE, num.threads = NULL, ... )
y |
Vector to be imputed. |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
num.trees.cont |
Number of trees to build for continuous variables.
The default is |
sym.dist |
If |
alpha.emp |
The "significance level" for the empirical distribution of
out-of-bag prediction errors, can be used for prevention for outliers
(useful for highly skewed variables).
For example, set alpha = 0.05 to use 95% confidence level.
The default is |
pre.boot |
If |
num.threads |
Number of threads for parallel computing. The default is
|
... |
Other arguments to pass down. |
num.trees |
Number of trees to build. The default is
|
RfPred.Emp
imputation sampler.
Vector with imputed data, same type as y
, and of length
sum(wy)
.
Shangzhi Hong
Hong, Shangzhi, et al. "Multiple imputation using chained random forests." Preprint, submitted April 30, 2020. https://arxiv.org/abs/2004.14823.
Zhang, Haozhe, et al. "Random Forest Prediction Intervals." The American Statistician (2019): 1-20.
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
Malley, James D., et al. "Probability machines." Methods of information in medicine 51.01 (2012): 74-81.
# Users can set method = "rfpred.emp" in call to mice to use this method data("airquality") impObj <- mice(airquality, method = "rfpred.emp", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
# Users can set method = "rfpred.emp" in call to mice to use this method data("airquality") impObj <- mice(airquality, method = "rfpred.emp", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
Please note that functions with names starting with "mice.impute" are exported to be visible for the mice sampler functions. Please do not call these functions directly unless you know exactly what you are doing.
For continuous variables only.
This function is for RfPred.Norm
multiple imputation method, adapter for mice
samplers.
In the mice()
function, set method = "rfpred.norm"
to call it.
The function performs multiple imputation based on normality assumption using out-of-bag mean squared error as the estimate for the variance.
mice.impute.rfpred.norm( y, ry, x, wy = NULL, num.trees.cont = 10, norm.err.cont = TRUE, alpha.oob = 0, pre.boot = TRUE, num.threads = NULL, ... )
mice.impute.rfpred.norm( y, ry, x, wy = NULL, num.trees.cont = 10, norm.err.cont = TRUE, alpha.oob = 0, pre.boot = TRUE, num.threads = NULL, ... )
y |
Vector to be imputed. |
ry |
Logical vector of length |
x |
Numeric design matrix with |
wy |
Logical vector of length |
num.trees.cont |
Number of trees to build for continuous variables.
The default is |
norm.err.cont |
Use normality assumption for prediction errors of random
forests. The default is |
alpha.oob |
The "significance level" for individual out-of-bag
prediction errors used for the calculation for out-of-bag mean squared error,
useful when presence of extreme values.
For example, set alpha = 0.05 to use 95% confidence level.
The default is |
pre.boot |
If |
num.threads |
Number of threads for parallel computing. The default is
|
... |
Other arguments to pass down. |
RfPred.Norm
imputation sampler.
Vector with imputed data, same type as y
, and of length
sum(wy)
.
Shangzhi Hong
Shah, Anoop D., et al. "Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study." American journal of epidemiology 179.6 (2014): 764-774.
# Users can set method = "rfpred.norm" in call to mice to use this method data("airquality") impObj <- mice(airquality, method = "rfpred.norm", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
# Users can set method = "rfpred.norm" in call to mice to use this method data("airquality") impObj <- mice(airquality, method = "rfpred.norm", m = 5, maxit = 5, maxcor = 1.0, eps = 0, remove.collinear = FALSE, remove.constant = FALSE, printFlag = FALSE)
ranger
The observation indexes (row numbers) constituting the terminal node
associated with each observation are queried using the ranger
object
and the training data.
The parameter keep.inbag = TRUE
should be applied to call to
ranger
.
query.rf.pred.idx(obj, data, id.name = FALSE, unique.by.id = FALSE, ...)
query.rf.pred.idx(obj, data, id.name = FALSE, unique.by.id = FALSE, ...)
obj |
An R object of class |
data |
Input for training data. |
id.name |
Use the IDs of the terminal nodes as names for the lists. |
unique.by.id |
Only return results of unique terminal node IDs. |
... |
Other parameters (will be ignored). |
The observations are found based on terminal node IDs. It should be noted that the out-of-bag observations are not present in the indexes.
A nested list of length num.trees
.
Shangzhi Hong
data(iris) rfObj <- ranger( Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris, num.trees = 5, keep.inbag = TRUE) outList <- query.rf.pred.idx(rfObj, iris)
data(iris) rfObj <- ranger( Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris, num.trees = 5, keep.inbag = TRUE) outList <- query.rf.pred.idx(rfObj, iris)
ranger
The observed values (for the response variable) constituting the terminal
node associated with each observation are queried using the ranger
object and the training data.
The parameter keep.inbag = TRUE
should be applied to call to
ranger
.
query.rf.pred.val(obj, data, id.name = FALSE, unique.by.id = FALSE, ...)
query.rf.pred.val(obj, data, id.name = FALSE, unique.by.id = FALSE, ...)
obj |
An R object of class |
data |
Input for training data. |
id.name |
Use the IDs of the terminal nodes as names for the lists. |
unique.by.id |
Only return results of unique terminal node IDs. |
... |
Other parameters (will be ignored). |
The observations are found based on terminal node IDs. It should be noted that the out-of-bag observations are not present in the indexes.
A nested list of length num.trees
.
Shangzhi Hong
data(iris) rfObj <- ranger( Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris, num.trees = 5, keep.inbag = TRUE) outList <- query.rf.pred.val(rfObj, iris)
data(iris) rfObj <- ranger( Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species, data = iris, num.trees = 5, keep.inbag = TRUE) outList <- query.rf.pred.val(rfObj, iris)
ranger
functionThis function serves as an workaround for ranger function.
rangerCallerSafe(...)
rangerCallerSafe(...)
... |
Parameters to pass down. |
Constructed ranger
object.
Get the estimates with corresponding confidence intervals after pooling.
reg.ests(obj, ...)
reg.ests(obj, ...)
obj |
Pooled object from function |
... |
Other parameters to pass down. |
A data frame containing coefficient estimates and corresponding confidence intervals.