Skip to contents

In BayesRTMB, you can freely define models using the rtmb_code block, but we also provide wrapper functions to quickly execute common statistical analyses (like t-tests, regressions, and factor analysis).

These functions allow you to specify models with a syntax similar to standard R functions like lm() and lme4::glmer() (though lme4 itself is not required to run BayesRTMB), while fully utilizing the powerful estimation features of BayesRTMB (MCMC, MAP, ADVI).

This page introduces the usage of the main wrapper functions.


1. rtmb_ttest (Bayesian t-test)

This function is for examining the difference in means between two groups. rtmb_ttest is designed to easily perform effect size (delta) estimation and Bayes factor calculation, in addition to standard t-tests.

First, let’s generate some sample data and analyze it.

library(BayesRTMB)
set.seed(123)

# Data generation (2 groups)
y1 <- rnorm(30, mean = 0, sd = 1)
y2 <- rnorm(30, mean = 1, sd = 1)

# Method passing vectors directly
mdl_ttest <- rtmb_ttest(y1, y2)

# Method using formula format (when you have a dataframe)
# df <- data.frame(Y = c(y1, y2), group = rep(c("A", "B"), each = 30))
# mdl_ttest <- rtmb_ttest(Y ~ group, data = df)

# MCMC estimation
fit_ttest <- mdl_ttest$sample()
fit_ttest$summary()
## variable    mean    sd     map    q2.5   q97.5  ess_bulk  ess_tail  rhat 
## lp        -88.05  1.23  -87.08  -91.16  -86.63      1543      2467  1.00 
## delta       1.24  0.30    1.25    0.65    1.84      1830      1654  1.01 
## mean        0.57  0.12    0.58    0.34    0.81      3058      2436  1.00 
## sd          0.93  0.09    0.92    0.78    1.13      2371      2755  1.00 
## diff        1.15  0.25    1.11    0.64    1.63      2309      1695  1.01 
## mean0      -0.00  0.17   -0.01   -0.34    0.34      2645      2468  1.00 
## mean1       1.14  0.17    1.14    0.80    1.49      2507      2364  1.00 

Checking the Generated Code

You can check what rtmb_code the wrapper function generated internally by using the print_code() method on the model object. This can be used as a reference if you want to extend the wrapper’s model into your own custom model.

# Display the internally generated RTMB code
mdl_ttest$print_code()
## === RTMB Model Code ===
## 
## rtmb_code(
##   parameters = {
##     mean = Dim(1)
##     sd = Dim(1, lower = 0)
##     delta = Dim(1)
##   }, 
##   transform = {
##     diff <- delta * sd
##     mean0 <- mean - diff/2
##     mean1 <- mean + diff/2
##   }, 
##   model = {
##     Y1 ~ normal(mean0, sd)
##     Y2 ~ normal(mean1, sd)
##     delta ~ cauchy(0, r)
##     mean ~ normal(0, 10)
##     sd ~ exponential(0.1)
##   }, 
##   <environment>
## )

Calculating the Bayes Factor

By using the bayes_factor() method, you can compute a comparison (Bayes factor) against a null model where the effect size is δ=0\delta = 0.

# Compare with the model where effect size delta = 0
bf <- fit_ttest$bayes_factor(null_model = "delta")
bf
## Bayes Factor (BF12) : 4777.563 
## Log Bayes Factor    : 8.4717 (Approx. Error = 0.0039)
## Interpretation      : Decisive evidence for Model 1 

2. rtmb_lm (Linear Regression Analysis)

You can perform linear regression with a format identical to standard lm(). Here, we use the discussion data included in the package (simulated data regarding discussion satisfaction).

data(discussion)

# Model predicting satisfaction using talk and skill
mdl_lm <- rtmb_lm(satisfaction ~ talk + skill, data = discussion)

# Quick check via MAP estimation
fit_lm <- mdl_lm$optimize()
fit_lm$summary()
## Call:
## MAP Estimation via RTMB
## 
## Negative Log-Posterior: 416.94
## Approx. Log Marginal Likelihood (Laplace): -425.12
## 
## Point Estimates and 95% Wald CI:
##    variable  Estimate  Std. Error  Lower 95%  Upper 95% 
## Intercept     2.14693     0.20761    1.74001    2.55384 
## b[talk]       0.28612     0.05434    0.17961    0.39264 
## b[skill]      0.20106     0.06604    0.07162    0.33050 
## sigma         0.92284     0.03725    0.85265    0.99880 
## Intercept_c   3.43324     0.05333    3.32871    3.53777 

When calculating Bayes factors (using Bridge Sampling or the bayes_factor method), the choice of priors strongly influences the results. BayesRTMB recommends automatically configuring appropriate Weakly Informative Priors by setting use_weak_info = TRUE and specifying the theoretical minimum and maximum values of the objective variable in y_range.

# Create a model using weakly informative priors (assuming satisfaction ranges from 1 to 5)
mdl_lm_weak <- rtmb_lm(satisfaction ~ talk + skill,
  data = discussion,
  use_weak_info = TRUE,
  y_range = c(1, 5)
)
mdl_lm_weak$print_code()
## === RTMB Model Code ===
## 
## rtmb_code(
##   setup = {
##     N <- length(Y)
##     K <- ncol(X)
##     half_d_y <- diff(y_range)/2
##     base_scale <- half_d_y * weak_info_prior$sd_ratio
##     alpha_prior_sd <- half_d_y
##     mid_y <- mean(y_range)
##     sigma_rate <- 1/base_scale
##     tau_rate <- 1/base_scale
##     X_mean <- apply(X, 2, mean)
##     X_sd <- apply(X, 2, sd)
##     beta_prior_sd <- weak_info_prior$max_beta * base_scale/X_sd
##     X_c <- X - rep(1, N) %*% t(X_mean)
##   }, 
##   parameters = {
##     Intercept_c <- Dim(1)
##     b <- Dim(K)
##     sigma <- Dim(1, lower = 0)
##   }, 
##   transform = {
##     Intercept <- Intercept_c - sum(X_mean * b)
##   }, 
##   model = {
##     # Transform
##     eta <- as.vector(Intercept_c + X_c %*% b)
##     # Likelihood (Data)
##     Y ~ normal(eta, sigma)
##     # Priors
##     sigma ~ exponential(sigma_rate)
##     Intercept_c ~ normal(mid_y, alpha_prior_sd)
##     b ~ normal(0, beta_prior_sd)
##   }
## )

3. rtmb_glm (Generalized Linear Models)

By specifying the family argument, you can perform logistic regression, Poisson regression, etc. As an example, we will execute a logistic regression predicting the experimental condition (condition: 0 or 1).

# Logistic regression (family = "bernoulli")
mdl_glm <- rtmb_glm(condition ~ satisfaction + skill,
  data = discussion,
  family = "bernoulli"
)

fit_glm <- mdl_glm$sample()
fit_glm$summary()
       variable     mean    sd      map     q2.5    q97.5  ess_bulk  ess_tail  rhat 
## lp               -213.72  1.24  -212.80  -216.90  -212.31      1567      2449  1.00 
## Intercept          -1.35  0.50    -1.32    -2.31    -0.38      3740      2794  1.00 
## b[satisfaction]     0.40  0.13     0.38     0.16     0.64      3127      2838  1.00 
## b[skill]           -0.01  0.15     0.01    -0.30     0.28      3445      2925  1.00 
## Intercept_c        -0.00  0.12    -0.03    -0.23     0.23      4129      2544  1.00 

Available Distributions (family)

In rtmb_glm and rtmb_glmer, the following distributions can be specified. Standard link functions are configured internally for each distribution.

family Link Function Description
gaussian identity Normal distribution. For standard regression analysis.
lognormal log Log-normal distribution. For data that takes positive values.
student_t identity t-distribution. Useful when you want to suppress the impact of outliers.
gamma log Gamma distribution. For positive, skewed data.
bernoulli logit Bernoulli distribution. For binary 0/1 data.
binomial logit Binomial distribution. For data representing counts of success/trials.
poisson log Poisson distribution. For count data.
neg_binomial log Negative binomial distribution. For overdispersed count data.
ordered logit Ordered logistic regression. For ordinal categorical data.

4. rtmb_glmer (Generalized Linear Mixed Models)

You can handle models including random effects (varying effects) using the (1 | group) notation, similar to the lme4 package. Let’s build a model taking into account that individual satisfaction varies depending on the group they belong to.

# Random intercept model
mdl_glmer <- rtmb_glmer(satisfaction ~ talk + (1 | group),
  data = discussion
)

# When including random effects, optimize() with Laplace approximation is fast
opt_glmer <- mdl_glmer$optimize(laplace = TRUE)
opt_glmer$summary()
## Call:
## MAP Estimation via RTMB
## 
## Negative Log-Posterior: 404.65
## Approx. Log Marginal Likelihood (Laplace): -412.59
## Note: Random effects are stored in $random_effects
## 
## Point Estimates and 95% Wald CI:
##    variable  Estimate  Std. Error  Lower 95%  Upper 95% 
## Intercept     2.60450     0.17902    2.25363    2.95537 
## b[talk]       0.27441     0.05477    0.16706    0.38176 
## sigma         0.77389     0.03867    0.70170    0.85351 
## sd[Int]       0.51833     0.06587    0.40406    0.66492 
## Intercept_c   3.43322     0.06846    3.29904    3.56740 

Regularization

In rtmb_lm, rtmb_glm, and rtmb_glmer, you can use regularization. Regularization is a method that shrinks the coefficients of excessive parameters towards 0, allowing for the construction of more parsimonious models with high predictive power.

In the BayesRTMB package, you can select the horseshoe prior (rhs) or the spike-and-slab prior (ssp) using the penalty option. rhs is relatively easier to estimate, but coefficients with small effects do not shrink exactly to 0. ssp is somewhat heavier to estimate, but it can shrink coefficients completely to 0.

When using regularization, you must also use weakly informative priors, which means you need to input the range of the objective variable. Since our data uses a 5-point scale, we input c(1, 5). MAP estimation often struggles with regularized models, so it’s better to use MCMC.

mdl_glmer <-
  rtmb_glmer(
    satisfaction ~ talk + performance + skill + condition + (1 | group),
    data = discussion,
    penalty = "ssp",
    y_range = c(1, 5)
  )

mcmc_glmer <- mdl_glmer$sample(parallel = TRUE)
mcmc_glmer
mcmc_glmer$draws("b[condition]") |> plot_dens()
Shrunk Posterior Distribution
Shrunk Posterior Distribution

5. rtmb_corr (Correlation Matrix Estimation)

This function is for estimating the correlation relationships between variables. Here, we will introduce both a 2-variable case and a full correlation matrix case using a portion of the Big Five personality data.

2-Variable Correlation and Bayes Factor

We estimate the correlation between two variables (e.g., extraversion items BF1 and BF6), and calculate a Bayes factor testing the null hypothesis of zero correlation (r=0r = 0).

data(BigFive)

# Correlation between BF1 and BF6
mdl_corr2 <- rtmb_corr(BigFive[, c("BF1", "BF6")])
mcmc_corr2 <- mdl_corr2$sample()
mcmc_corr2$summary()
##  variable     mean    sd      map     q2.5    q97.5  ess_bulk  ess_tail  rhat 
## lp         -493.27  1.55  -492.22  -497.08  -491.16      1799      2658  1.00 
## corr[rho]    -0.60  0.05    -0.60    -0.69    -0.50      2365      2539  1.00 
## mean[BF1]     3.56  0.08     3.56     3.39     3.72      3031      2914  1.00 
## mean[BF6]     2.71  0.09     2.73     2.54     2.89      2978      2872  1.00 
## sd[BF1]       1.10  0.06     1.07     0.99     1.22      3151      3313  1.00 
## sd[BF6]       1.15  0.06     1.16     1.04     1.28      3072      2477  1.00 
# Compare against the zero-correlation model (null_model = "corr")
bf_corr <- mcmc_corr2$bayes_factor(null_model = "corr")
bf_corr
Bayes Factor (BF12) : 6.839475e+15 
Log Bayes Factor    : 36.4615 (Approx. Error = 0.0043)
Interpretation      : Decisive evidence for Model 1 

Correlation Matrix Estimation

If you specify multiple variables at once, the entire correlation matrix is estimated.

# Estimate the correlation matrix of 5 variables
mdl_corr_mat <- rtmb_corr(BigFive[, 1:5])
opt_corr_mat <- mdl_corr_mat$optimize()
opt_corr_mat$summary()
## Call:
## MAP Estimation via RTMB
## 
## Negative Log-Posterior: 1254.77
## Approx. Log Marginal Likelihood (Laplace): -1289.58
## 
## Point Estimates and 95% Wald CI:
##      variable  Estimate  Std. Error  Lower 95%  Upper 95% 
## corr[BF1,BF1]   1.00000     0.00000    1.00000    1.00000 
## corr[BF2,BF1]   0.09243     0.07555   -0.05565    0.24050 
## corr[BF3,BF1]   0.09382     0.07552   -0.05420    0.24184 
## corr[BF4,BF1]   0.08807     0.07562   -0.06014    0.23629 
## corr[BF5,BF1]  -0.02158     0.07619   -0.17090    0.12775 
## corr[BF1,BF2]   0.09243     0.07555   -0.05565    0.24050 
## corr[BF2,BF2]   1.00000     0.00000    1.00000    1.00000 
## corr[BF3,BF2]  -0.03374     0.07610   -0.18289    0.11542 
## corr[BF4,BF2]  -0.13588     0.07481   -0.28250    0.01074 
## corr[BF5,BF2]  -0.08526     0.07567   -0.23357    0.06306 

6. rtmb_fa (Exploratory Factor Analysis)

Estimates the common latent factors behind observed variables.

Factor Rotation

Factor analysis can be slightly time-consuming with MCMC, making MAP estimation faster and easier to use. Factor rotation can be performed via the rotate option. The rotated factor loadings will have the rotation name appended, like _promax. Because the rotated factor loadings are placed in the generate block, MAP does not calculate standard errors for them by default. By specifying se_sampling = TRUE, you can compute their standard errors as well.

# 3-factor model, specifying Promax rotation
mdl_fa <- rtmb_fa(BigFive, nfactors = 5, rotate = "promax")

opt_fa <- mdl_fa$optimize(se_sampling = TRUE)

# Check the factor loadings (L), etc.
opt_fa$summary()
## Call:
## MAP Estimation via RTMB
## 
## Negative Log-Posterior: 4791.18
## Approx. Log Marginal Likelihood (Laplace): -5016.21
## 
## Point Estimates and 95% Wald CI:
##         variable  Estimate  Std. Error  Lower 95%  Upper 95% 
## L_promax[BF1,1]    0.86657     0.05905    0.72847    0.95902 
## L_promax[BF2,1]    0.02202     0.05325   -0.07631    0.12786 
## L_promax[BF3,1]   -0.09276     0.06886   -0.20404    0.06562 
## L_promax[BF4,1]    0.04927     0.06391   -0.07333    0.17433 
## L_promax[BF5,1]   -0.02101     0.04856   -0.11742    0.07748 
## L_promax[BF6,1]   -0.80459     0.05957   -0.89651   -0.66314 
## L_promax[BF7,1]   -0.01111     0.05307   -0.11843    0.08566 
## L_promax[BF8,1]   -0.07429     0.05917   -0.16758    0.06246 
## L_promax[BF9,1]   -0.08225     0.07363   -0.23533    0.05242 
## L_promax[BF10,1]   0.03086     0.05210   -0.07256    0.13540 

The sort_loadings() function is useful to sort and display the rotated factor loadings nicely.

opt_fa$generate$L_promax |> sort_loadings()
##        V1    V2    V3    V4    V5
## V1   .867 -.117 -.179  .089  .057
## V6  -.805  .000  .048  .200 -.065
## V11  .584  .087  .126  .169 -.197
## V16  .516  .087  .276 -.009  .040
## V2   .022 -.806 -.076 -.013  .043
## V7  -.011 -.800 -.175 -.054  .002
## V12  .065 -.797 -.045 -.083 -.088
## V17  .007 -.763  .070  .009 -.012
## V8  -.074  .172  .830 -.044 -.061
## V3  -.093  .044  .533  .101  .060
## V18 -.036 -.196  .512 -.156 -.066
## V13 -.005 -.078  .488 -.121  .154
## V4   .049  .118 -.080  .607  .058
## V14 -.096 -.037 -.031  .586 -.022
## V19  .094  .054  .001  .506 -.085
## V9  -.082  .064 -.059  .463  .019
## V5  -.021 -.132  .021  .206 -.795
## V20 -.022 -.126  .161  .321 -.762
## V15  .000 -.138  .075  .228  .730
## V10  .031 -.120  .047  .317  .712

Factor Scores

Factor scores can also be output by using the score = TRUE option. These scores are additionally pushed to the generate output.

mdl_fa <- rtmb_fa(BigFive, nfactors = 5, rotate = "promax", score = TRUE)

opt_fa <- mdl_fa$optimize()

opt_fa$generate$score |> head()
##             [,1]        [,2]        [,3]       [,4]       [,5]
## [1,]  0.49722956 -1.18644757 -1.29011492 -0.1941201 -0.4712234
## [2,] -1.21954058 -0.46063346  1.17665101 -0.8095066  1.0632201
## [3,] -0.05617309  0.09471994 -1.47004442  0.5897946 -0.1808377
## [4,]  0.48324361 -1.40302694  0.04676041 -0.8297736 -0.4993311
## [5,] -0.23744265  1.18767815  0.51330882 -0.3034767 -1.6688545
## [6,]  0.20824541 -0.38166057 -1.12921376  0.8736080 -0.5483280

Regularized Factor Analysis

It also supports regularized factor analysis using a Spike-and-Slab Prior (SSP) for estimating sparse factor loadings.

Regularization shrinks the effects of unnecessary parameters toward zero, facilitating the construction of more parsimonious and predictive models. When applied to factor analysis, it can remove rotational indeterminacy, yielding a unique solution.

Since regularized factor analysis can sometimes be difficult to estimate via MAP, it is usually better to use MCMC or ADVI. Here, we present the results from ADVI (Automatic Differentiation Variational Inference).

mdl_fa <- rtmb_fa(BigFive, nfactors = 5, rotate = "ssp")

vb_fa <- mdl_fa$variational(iter = 5000, parallel = TRUE)
vb_fa
## variable      mean     sd       map      q2.5     q97.5 
## lp        -5362.15  33.02  -5353.45  -5449.35  -5322.61 
## L[BF1,1]     -0.02   0.07     -0.00     -0.22      0.04 
## L[BF2,1]     -0.82   0.04     -0.83     -0.88     -0.74 
## L[BF3,1]      0.00   0.04     -0.00     -0.05      0.05 
## L[BF4,1]      0.02   0.07      0.00     -0.07      0.25 
## L[BF5,1]     -0.02   0.06     -0.00     -0.19      0.05 
## L[BF6,1]      0.00   0.03      0.00     -0.02      0.05 
## L[BF7,1]     -0.82   0.04     -0.82     -0.88     -0.73 
## L[BF8,1]      0.01   0.05      0.00     -0.05      0.11 
## L[BF9,1]      0.01   0.04      0.00     -0.03      0.08 

The output summary of MCMC or ADVI allows access via EAP or MAP. With regularization, the MAP summary makes the output easier to read as the shrunk parameters will rigorously collapse to 0.

vb_fa$MAP("L") |> sort_loadings()
##        V1    V2    V3    V4    V5
## V7  -.823  .000 -.001  .001  .010
## V2  -.820  .001  .000  .000  .001
## V12 -.801 -.001  .000  .001  .000
## V17 -.738  .000  .000  .000  .001
## V1  -.001 -.795 -.001  .000  .001
## V6   .000  .773  .001 -.045  .002
## V11  .004 -.610  .096 -.099 -.021
## V16 -.001 -.554 -.001  .000 -.212
## V15 -.001 -.001 -.822 -.001 -.001
## V10  .000 -.001 -.809  .000  .000
## V5  -.004 -.002  .711 -.339 -.001
## V20  .000 -.001  .668 -.448 -.007
## V4   .002 -.001  .000 -.637  .002
## V19  .001  .000 -.001 -.569  .000
## V14 -.001 -.001 -.001 -.540  .002
## V9   .000  .002  .001 -.429  .002
## V8   .001 -.001  .000  .001 -.854
## V3  -.001 -.001 -.001  .000 -.487
## V13 -.027 -.001 -.003  .000 -.428
## V18 -.220 -.001  .001  .001 -.425

7. rtmb_irt (Item Response Theory)

rtmb_irt() constructs Item Response Theory (IRT) models suitable for analyzing test data or questionnaire surveys. Currently, the following models are supported:

  • 1PL / 2PL / 3PL Models: Analysis of binary data (correct/incorrect).
  • Graded Response Model (GRM): Analysis of polytomous data (e.g., Likert scales).

Because IRT models are computationally demanding, it is highly effective to utilize the fast approximation via variational() or marginalize the ability parameters using optimize(laplace = TRUE) when handling large datasets.


Summary

By leveraging the wrapper functions in BayesRTMB, you gain the following benefits: - Perform sophisticated analyses using familiar syntax without writing complex model codes from scratch. - Seamlessly switch between MAP, MCMC, and VB estimations using the same model object. - Calculate Bayes factors smoothly.

If you require a more customized model, check out the Quick Start and try building your own using rtmb_code().


Summary

By leveraging the wrapper functions in BayesRTMB, you gain the following benefits: - Perform sophisticated analyses using familiar syntax without writing complex model codes from scratch. - Seamlessly switch between MAP, MCMC, and VB estimations using the same model object. - Calculate Bayes factors smoothly.

If you require a more customized model, check out the Quick Start and try building your own using rtmb_code().