library(cmdstanr)
library(tidyverse)
library(tidybayes)
The challenge
Studying selection on phenotypic plasticity is challenging. First, because phenotypic plasticity is a slope – it is the change in a trait when an environmental variable changes. Secondly, because many traits of animals also cause other traits. For example, arriving later at a breeding site causes an individual to lay fewer eggs (because less food is available).
The system
Let’s begin just with a simulation of three interrelated traits:
- The date when a bird arrives, which determines
- How many eggs they lay, out of which there is
- Some number of surviving offspring
## how many birds
<- 57
nbirds # simulate arrival dates -- two weeks before and after whatever the average is
<- runif(nbirds, min = -14, max = 14) |> round()
darrive
## simulate clutch sizes -- decrease by 4% each day
<- 4.5
avg_clutch <- .97
effect_per_day
<- rpois(nbirds, exp(log(avg_clutch) + log(effect_per_day)*darrive))
clutch
plot(darrive, clutch)
As an aside, this would be 0-truncated, since birds who don’t lay eggs don’t get observed at all.
# simulate hatching success
<- rbinom(nbirds, size = clutch, prob = .86)
success
plot(darrive, success)
what’s important to see here is that there is a negative correlation, even though the arrival date has no direct effect on the outcome
summary(glm(success ~ darrive, family = "poisson"))
Call:
glm(formula = success ~ darrive, family = "poisson")
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 1.330746 0.068901 19.314 < 2e-16 ***
darrive -0.039128 0.008791 -4.451 8.54e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 78.184 on 56 degrees of freedom
Residual deviance: 58.402 on 55 degrees of freedom
AIC: 232.38
Number of Fisher Scoring iterations: 5
But, if we use a binomial model that knows about the number of possible successful chicks, then we see what we expect:
<- (glm(cbind(success, clutch - success) ~ 1, family = binomial(link = "logit")))
bin_mod
plogis(coef(bin_mod))
(Intercept)
0.8842975
Which matches the simulation above.
if we put darrive
in the model, the effect should be very close to 0 with overlap.
If we imagine that the laying date effects the survival p, then we should see an effect close to 0
summary(glm(
cbind(success, clutch - success) ~ 1 + darrive,
family = binomial(link = "logit")))
Call:
glm(formula = cbind(success, clutch - success) ~ 1 + darrive,
family = binomial(link = "logit"))
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.03026 0.20204 10.049 <2e-16 ***
darrive -0.03052 0.02683 -1.137 0.255
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 63.900 on 55 degrees of freedom
Residual deviance: 62.595 on 54 degrees of freedom
AIC: 102.23
Number of Fisher Scoring iterations: 4
One Stan model can model all of these at the same time
<- cmdstanr::cmdstan_model(
one_indiv ::here("posts/2023-02-02-selection-on-plasticity/one_indiv.stan")) here
In file included from stan/src/stan/model/model_header.hpp:11:
stan/src/stan/model/model_base_crtp.hpp:198: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, std::vector<double, std::allocator<double> >&, std::vector<int>&, std::vector<double, std::allocator<double> >&, bool, bool, std::ostream*) const [with M = one_indiv_model_namespace::one_indiv_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
198 | void write_array(boost::ecuyer1988& rng, std::vector<double>& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-2870599e2fe2.hpp:349: note: by 'one_indiv_model_namespace::one_indiv_model::write_array'
349 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
stan/src/stan/model/model_base_crtp.hpp:136: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, Eigen::VectorXd&, Eigen::VectorXd&, bool, bool, std::ostream*) const [with M = one_indiv_model_namespace::one_indiv_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
136 | void write_array(boost::ecuyer1988& rng, Eigen::VectorXd& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-2870599e2fe2.hpp:349: note: by 'one_indiv_model_namespace::one_indiv_model::write_array'
349 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
one_indiv
data{
int nbirds;
vector[nbirds] darrive;
array[nbirds] int clutch;
array[nbirds] int success;
}parameters {
real logit_psuccess;
real log_avgclutch;
real log_b_date;
}model {
success ~ binomial_logit(clutch, logit_psuccess);
clutch ~ poisson_log(log_avgclutch + log_b_date * darrive);1, .2);
logit_psuccess ~ normal(1, .2);
log_avgclutch ~ normal(0, .2);
log_b_date ~ normal( }
<- one_indiv$sample(
one_indiv_post data = list(nbirds = nbirds,
clutch = clutch,
success = success,
darrive = darrive))
Running MCMC with 4 sequential chains...
Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 1 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 1 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 1 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 1 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 1 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 1 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 1 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 1 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 1 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 1 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 1 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 1 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 1 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 1 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 1 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 1 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 1 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 1 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 1 finished in 0.1 seconds.
Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 2 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 2 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 2 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 2 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 2 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 2 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 2 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 2 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 2 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 2 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 2 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 2 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 2 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 2 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 2 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 2 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 2 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 2 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 2 finished in 0.1 seconds.
Chain 3 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 3 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 3 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 3 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 3 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 3 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 3 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 3 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 3 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 3 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 3 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 3 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 3 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 3 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 3 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 3 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 3 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 3 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 3 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 3 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 3 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 3 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 3 finished in 0.1 seconds.
Chain 4 Iteration: 1 / 2000 [ 0%] (Warmup)
Chain 4 Iteration: 100 / 2000 [ 5%] (Warmup)
Chain 4 Iteration: 200 / 2000 [ 10%] (Warmup)
Chain 4 Iteration: 300 / 2000 [ 15%] (Warmup)
Chain 4 Iteration: 400 / 2000 [ 20%] (Warmup)
Chain 4 Iteration: 500 / 2000 [ 25%] (Warmup)
Chain 4 Iteration: 600 / 2000 [ 30%] (Warmup)
Chain 4 Iteration: 700 / 2000 [ 35%] (Warmup)
Chain 4 Iteration: 800 / 2000 [ 40%] (Warmup)
Chain 4 Iteration: 900 / 2000 [ 45%] (Warmup)
Chain 4 Iteration: 1000 / 2000 [ 50%] (Warmup)
Chain 4 Iteration: 1001 / 2000 [ 50%] (Sampling)
Chain 4 Iteration: 1100 / 2000 [ 55%] (Sampling)
Chain 4 Iteration: 1200 / 2000 [ 60%] (Sampling)
Chain 4 Iteration: 1300 / 2000 [ 65%] (Sampling)
Chain 4 Iteration: 1400 / 2000 [ 70%] (Sampling)
Chain 4 Iteration: 1500 / 2000 [ 75%] (Sampling)
Chain 4 Iteration: 1600 / 2000 [ 80%] (Sampling)
Chain 4 Iteration: 1700 / 2000 [ 85%] (Sampling)
Chain 4 Iteration: 1800 / 2000 [ 90%] (Sampling)
Chain 4 Iteration: 1900 / 2000 [ 95%] (Sampling)
Chain 4 Iteration: 2000 / 2000 [100%] (Sampling)
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 1.0 seconds.
one_indiv_post
variable mean median sd mad q5 q95 rhat ess_bulk ess_tail
lp__ 19.63 19.97 1.28 1.02 17.07 20.96 1.00 1891 2648
logit_psuccess 1.56 1.56 0.13 0.13 1.35 1.77 1.00 3807 2735
log_avgclutch 1.41 1.41 0.06 0.06 1.30 1.51 1.00 3253 2731
log_b_date -0.04 -0.04 0.01 0.01 -0.05 -0.02 1.00 4038 2929
reasonably close to true values:
plogis(1.44)
[1] 0.8084547
log(avg_clutch)
[1] 1.504077
log(effect_per_day)
[1] -0.03045921
<- function(nbirds = 57,
simulate_some_birds log_b_date = log(.97),
log_avgclutch = log(4.5),
logit_psuccess = qlogis(.84)){
# simulate arrival dates -- two weeks before and after whatever the average is
<- runif(nbirds, min = -14, max = 14) |> round()
darrive
## simulate clutch sizes -- decrease each day
<- rpois(nbirds, exp(log_avgclutch + log_b_date*darrive))
clutch
## simulate success
<- rbinom(nbirds, size = clutch, prob = plogis(logit_psuccess))
success
return(list(
data_list = list(
nbirds = nbirds,
darrive = darrive,
clutch = clutch,
success = success
),true_values = tribble(
~variable, ~true_value,
"log_b_date", log_b_date,
"log_avgclutch", log_avgclutch,
"logit_psuccess", logit_psuccess
)
))
}
<- simulate_some_birds()
data_for_simulation
<- one_indiv$sample(data = data_for_simulation$data_list,
one_indiv_post refresh = 0)
Running MCMC with 4 sequential chains...
Chain 1 finished in 0.1 seconds.
Chain 2 finished in 0.1 seconds.
Chain 3 finished in 0.1 seconds.
Chain 4 finished in 0.1 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.1 seconds.
Total execution time: 0.7 seconds.
<- one_indiv_post |>
comparison # tidybayes::gather_rvars(logit_psuccess, log_avgclutch, log_b_date) |>
::tidy_draws() |> tidybayes::gather_variables() |>
tidybayesright_join(data_for_simulation$true_values, by = c(".variable" = "variable"))
|>
comparison ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_point(aes(y = variable, x = true_value),
col = "orange",
pch = "|",
size = 10, data = data_for_simulation$true_values)
No 0 birds
This system is a little challenging, since we never observe 0 eggs per bird – if a bird cannot lay eggs (e.g. it does not find a nest spot) then it goes uncounted
To simulate this, I’ll drop the 0 clutches before doing the rest of the simulations. This means that sample size will be less than or equal to the “nbirds” argument.
<- function(nbirds = 57,
simulate_some_birds_nonzero log_b_date = log(.97),
log_avgclutch = log(4.5),
logit_psuccess = qlogis(.84)){
# simulate arrival dates -- two weeks before and after whatever the average is
<- runif(nbirds, min = -14, max = 14) |> round()
darrive
## simulate clutch sizes -- decrease each day
<- rpois(nbirds, exp(log_avgclutch + log_b_date*darrive))
clutch # drop 0 nests
<- which(clutch > 0)
nonzero_clutch
## simulate success
<- rbinom(nbirds, size = clutch, prob = plogis(logit_psuccess))
success
return(list(
data_list = list(
nbirds = length(nonzero_clutch),
darrive = darrive[nonzero_clutch],
clutch = clutch[nonzero_clutch],
success = success[nonzero_clutch]
),true_values = tribble(
~variable, ~true_value,
"log_b_date", log_b_date,
"log_avgclutch", log_avgclutch,
"logit_psuccess", logit_psuccess
)
))
}
set.seed(1234)
<- simulate_some_birds_nonzero(nbirds = 200) some_nonzeros
a plot to confirm that it works:
$data_list |>
some_nonzerosas.data.frame() |>
ggplot(aes(x = darrive, y = clutch)) + geom_point()
And fit the posterior
<- function(simdata, stanmodel){
plot_posterior_true <- stanmodel$sample(data = simdata$data_list,
model_post refresh = 0, parallel_chains = 4)
<- model_post |>
comparison ::tidy_draws() |>
tidybayes::gather_variables() |>
tidybayesright_join(simdata$true_values, by = c(".variable" = "variable"))
|>
comparison ggplot(aes(y = .variable, x = .value)) +
stat_halfeye() +
geom_point(
aes(y = variable, x = true_value),
col = "orange",
pch = "|",
size = 10, data = simdata$true_values)
}
plot_posterior_true(some_nonzeros, one_indiv)
Running MCMC with 4 parallel chains...
Chain 1 finished in 0.2 seconds.
Chain 2 finished in 0.2 seconds.
Chain 3 finished in 0.2 seconds.
Chain 4 finished in 0.2 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.2 seconds.
Total execution time: 0.5 seconds.
There’s already some bias happening! let’s try what happens when the average is lower (and gives more 0s)
set.seed(420)
simulate_some_birds_nonzero(log_avgclutch = log(2.4),
log_b_date = log(.7),
nbirds = 300) |>
plot_posterior_true(one_indiv)
Running MCMC with 4 parallel chains...
Chain 1 finished in 0.5 seconds.
Chain 2 finished in 0.6 seconds.
Chain 3 finished in 0.5 seconds.
Chain 4 finished in 0.6 seconds.
All 4 chains finished successfully.
Mean chain execution time: 0.6 seconds.
Total execution time: 0.8 seconds.
some preliminary repetitions show that it usually misses either the average or the hatching success, frequently both.
<- cmdstanr::cmdstan_model(
one_indiv_noZero ::here("posts/2023-02-02-selection-on-plasticity/one_indiv_noZero.stan")) here
In file included from stan/src/stan/model/model_header.hpp:11:
stan/src/stan/model/model_base_crtp.hpp:198: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, std::vector<double, std::allocator<double> >&, std::vector<int>&, std::vector<double, std::allocator<double> >&, bool, bool, std::ostream*) const [with M = one_indiv_noZero_model_namespace::one_indiv_noZero_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
198 | void write_array(boost::ecuyer1988& rng, std::vector<double>& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-2870114b1a0.hpp:363: note: by 'one_indiv_noZero_model_namespace::one_indiv_noZero_model::write_array'
363 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
stan/src/stan/model/model_base_crtp.hpp:136: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, Eigen::VectorXd&, Eigen::VectorXd&, bool, bool, std::ostream*) const [with M = one_indiv_noZero_model_namespace::one_indiv_noZero_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
136 | void write_array(boost::ecuyer1988& rng, Eigen::VectorXd& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-2870114b1a0.hpp:363: note: by 'one_indiv_noZero_model_namespace::one_indiv_noZero_model::write_array'
363 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
one_indiv_noZero
data{
int nbirds;
vector[nbirds] darrive;
array[nbirds] int clutch;
array[nbirds] int success;
}parameters {
real logit_psuccess;
real log_avgclutch;
real log_b_date;
}model {
vector[nbirds] alpha = log_avgclutch + log_b_date * darrive;
success ~ binomial_logit(clutch, logit_psuccess);1, .2);
logit_psuccess ~ normal(1, .2);
log_avgclutch ~ normal(0, .2);
log_b_date ~ normal(
clutch ~ poisson_log(alpha);// no zeros -- this normalizes the poisson density for a 0-truncated variable
target += -log1m_exp(-exp(alpha));
}
set.seed(420)
simulate_some_birds_nonzero(log_avgclutch = log(4.4),
log_b_date = log(.9),
nbirds = 300) |>
plot_posterior_true(one_indiv_noZero)
Running MCMC with 4 parallel chains...
Chain 1 finished in 1.5 seconds.
Chain 2 finished in 1.5 seconds.
Chain 3 finished in 1.6 seconds.
Chain 4 finished in 1.5 seconds.
All 4 chains finished successfully.
Mean chain execution time: 1.5 seconds.
Total execution time: 1.7 seconds.
Truncating using a different syntax
The manual uses a different syntax. To write the equation above this way, I found I needed to make a few changes:
poisson_log
has to be replaced withpoisson
and theexp()
link function
… that was actually the only change. It runs at the same speed as the previous way of writing it, and gets the same answer:
<- cmdstanr::cmdstan_model(
one_indiv_zerotrunc ::here("posts/2023-02-02-selection-on-plasticity/one_indiv_zerotrunc.stan")) here
In file included from stan/src/stan/model/model_header.hpp:11:
stan/src/stan/model/model_base_crtp.hpp:198: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, std::vector<double, std::allocator<double> >&, std::vector<int>&, std::vector<double, std::allocator<double> >&, bool, bool, std::ostream*) const [with M = one_indiv_zerotrunc_model_namespace::one_indiv_zerotrunc_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
198 | void write_array(boost::ecuyer1988& rng, std::vector<double>& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-287024f02b4b.hpp:369: note: by 'one_indiv_zerotrunc_model_namespace::one_indiv_zerotrunc_model::write_array'
369 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
stan/src/stan/model/model_base_crtp.hpp:136: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, Eigen::VectorXd&, Eigen::VectorXd&, bool, bool, std::ostream*) const [with M = one_indiv_zerotrunc_model_namespace::one_indiv_zerotrunc_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
136 | void write_array(boost::ecuyer1988& rng, Eigen::VectorXd& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-287024f02b4b.hpp:369: note: by 'one_indiv_zerotrunc_model_namespace::one_indiv_zerotrunc_model::write_array'
369 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
one_indiv_zerotrunc
data{
int nbirds;
vector[nbirds] darrive;
array[nbirds] int clutch;
array[nbirds] int success;
}parameters {
real logit_psuccess;
real log_avgclutch;
real log_b_date;
}model {
success ~ binomial_logit(clutch, logit_psuccess);1, .2);
logit_psuccess ~ normal(1, .2);
log_avgclutch ~ normal(0, .2);
log_b_date ~ normal(vector[nbirds] alpha = log_avgclutch + log_b_date * darrive;
T[1,];
clutch ~ poisson(exp(alpha)) }
set.seed(420)
simulate_some_birds_nonzero(log_avgclutch = log(4.4),
log_b_date = log(.9),
nbirds = 300) |>
plot_posterior_true(one_indiv_zerotrunc)
Running MCMC with 4 parallel chains...
Chain 3 finished in 1.9 seconds.
Chain 4 finished in 2.0 seconds.
Chain 1 finished in 2.1 seconds.
Chain 2 finished in 2.0 seconds.
All 4 chains finished successfully.
Mean chain execution time: 2.0 seconds.
Total execution time: 2.2 seconds.
These two ways of writing the model both work. Which to choose? Well, I was pleased with myself for manually normalizing the Poisson likelihood in the first model. However the second is clearer to the reader. In the first, it takes two lines of code – not even necessarily next to each other. In the second, the big T
for Truncation indicates clearly what is going on. Code is communication; clarity wins.
Zero inflated binomial success
Once in a while, a nest will just be completely destroyed by, say, a predator. This has nothing to do with anything, probably, and is just a catastrophic Act of Weasel. Let’s imagine that some small proportion of the nests just die:
<- function(nbirds = 57,
simulate_some_birds_nonzero_zeroinflated log_b_date = log(.97),
log_avgclutch = log(4.5),
logit_psuccess = qlogis(.84),
logit_pfail = qlogis(.1)){
# simulate arrival dates -- two weeks before and after whatever the average is
<- runif(nbirds, min = -14, max = 14) |> round()
darrive
## simulate clutch sizes -- decrease each day
<- rpois(nbirds, exp(log_avgclutch + log_b_date*darrive))
clutch # drop 0 nests
<- which(clutch > 0)
nonzero_clutch <- length(nonzero_clutch)
n_laid
# simulate success -- among birds which laid at least 1 egg, there is a chance of failing completely
<- rbinom(n_laid,
success_among_nonzero size = clutch[nonzero_clutch],
prob = plogis(logit_psuccess))
<- rbinom(n_laid, size = 1, prob = plogis(logit_pfail))
failed_nests
<- success_among_nonzero * (1 - failed_nests)
success_zi
# success <- rbinom(nbirds,
# size = clutch,
# prob = plogis(logit_psuccess))
# failed_nests <- rbinom(nbirds, size = 1, prob = plogis(logit_pfail))
#
# success_zi <- success * (1 - failed_nests)
return(list(
data_list = list(
nbirds = n_laid,
darrive = darrive[nonzero_clutch],
clutch = clutch[nonzero_clutch],
success = success_zi#[nonzero_clutch]
),true_values = tribble(
~variable, ~true_value,
"log_b_date", log_b_date,
"log_avgclutch", log_avgclutch,
"logit_psuccess", logit_psuccess,
"logit_pfail", logit_pfail
)
)) }
<- cmdstanr::cmdstan_model(
one_indiv_ztrunc_zinf ::here("posts/2023-02-02-selection-on-plasticity/one_indiv_ztrunc_zinf.stan")) here
In file included from stan/src/stan/model/model_header.hpp:11:
stan/src/stan/model/model_base_crtp.hpp:198: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, std::vector<double, std::allocator<double> >&, std::vector<int>&, std::vector<double, std::allocator<double> >&, bool, bool, std::ostream*) const [with M = one_indiv_ztrunc_zinf_model_namespace::one_indiv_ztrunc_zinf_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
198 | void write_array(boost::ecuyer1988& rng, std::vector<double>& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-2870469c7a24.hpp:420: note: by 'one_indiv_ztrunc_zinf_model_namespace::one_indiv_ztrunc_zinf_model::write_array'
420 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
stan/src/stan/model/model_base_crtp.hpp:136: warning: 'void stan::model::model_base_crtp<M>::write_array(boost::random::ecuyer1988&, Eigen::VectorXd&, Eigen::VectorXd&, bool, bool, std::ostream*) const [with M = one_indiv_ztrunc_zinf_model_namespace::one_indiv_ztrunc_zinf_model; boost::random::ecuyer1988 = boost::random::additive_combine_engine<boost::random::linear_congruential_engine<unsigned int, 40014, 0, 2147483563>, boost::random::linear_congruential_engine<unsigned int, 40692, 0, 2147483399> >; Eigen::VectorXd = Eigen::Matrix<double, -1, 1>; std::ostream = std::basic_ostream<char>]' was hidden [-Woverloaded-virtual=]
136 | void write_array(boost::ecuyer1988& rng, Eigen::VectorXd& theta,
|
C:/Users/UTILIS~1/AppData/Local/Temp/RtmpAv8s8z/model-2870469c7a24.hpp:420: note: by 'one_indiv_ztrunc_zinf_model_namespace::one_indiv_ztrunc_zinf_model::write_array'
420 | write_array(RNG& base_rng, std::vector<double>& params_r, std::vector<int>&
|
one_indiv_ztrunc_zinf
data{
int nbirds;
vector[nbirds] darrive;
array[nbirds] int clutch;
array[nbirds] int success;
}parameters {
real logit_psuccess;
real log_avgclutch;
real log_b_date;
real logit_pfail;
}model {
1, .5);
logit_pfail ~ normal(-1, .2);
logit_psuccess ~ normal(1, .2);
log_avgclutch ~ normal(0, .2);
log_b_date ~ normal(
// Eggs laid -- at least one
vector[nbirds] alpha = log_avgclutch + log_b_date * darrive;
T[1,];
clutch ~ poisson(exp(alpha))
// nestling success
for (n in 1:nbirds) {
if (success[n] == 0) {
target += log_sum_exp(
log_inv_logit(logit_pfail),0 | clutch[n], logit_psuccess)
log1m_inv_logit(logit_pfail) + binomial_logit_lpmf(
);else {
} target += log1m_inv_logit(logit_pfail) + binomial_logit_lpmf(success[n] | clutch[n], logit_psuccess);
}
}
}
set.seed(477)
<- simulate_some_birds_nonzero_zeroinflated(log_avgclutch = log(4.4),
some_zi_birds log_b_date = log(.9),
nbirds = 300)
|>
some_zi_birdsplot_posterior_true(one_indiv_ztrunc_zinf)
Running MCMC with 4 parallel chains...
Chain 1 finished in 5.0 seconds.
Chain 4 finished in 5.1 seconds.
Chain 2 finished in 5.3 seconds.
Chain 3 finished in 5.5 seconds.
All 4 chains finished successfully.
Mean chain execution time: 5.2 seconds.
Total execution time: 5.6 seconds.
I initially failed to recover the parameter for logit_pfail
. Here I some things I learned:
- Simulating zero-inflated numbers can be tricky! I flipped back and forth between simulating 0-inflation for all nests, and simulating for only those with at least 1 egg. In retrospect, it is clear that these are equivalent. There are two independent things here: the probability of a clutch having 0 eggs and the probability of a 0 coming from the zero-inflated binomial.
- this zero-inflated model is quite sensitive to the prior. That’s because there are not very many zeros being inflated: in my simulation at least, failed nests are rare and success is naturally low. it would be pretty important to decide in advance if sudden nest failure (e.g. by predation) is rare or common