My new(ish) lab mate Dr. Diego Barneche and I are developing an `R` package for estimating monotonic rates from biological data, and we are in desperate need of beta-testing! So… if you ever need to estimate a linear or monotonic rate from messy biological data, take a look at our GitHub page and download LoLinR

# Jetsam

Contemplations cast out to lighten the load, and maybe spark a discussion

# Caveat emptor… what it means (to me) to be a graduate student

This post is intended for current and prospective graduate students. If I am lucky enough to one day hold a faculty position with a lab and students of my own, I hope this will prove useful for prospective students. It is intended to serve as a fossil record of my thoughts on what being a graduate student means, and what characteristics I think are hallmarks of successful graduate students. I have made it a personal rule not to change any of my thoughts once I put them down so that they may serve as an accurate representation of my thoughts at the time. Hopefully, how these thoughts change over time will itself prove useful to readers.

– **Embracing ignorance: it’s turtles all the way down…**

This is a big subject, and since it has been addressed by others before me I will keep this relatively brief. Also, many of my thoughts I will be collecting here are variations on this theme, and I’d like to avoid too much repetition.

Above all, being a graduate student means having the willingness to be ignorant. No, enjoying being ignorant. Good science pushes the boundaries of the collective knowledge of everyone in the field… if you are asking the right questions, there will be no answers, just a series of fascinating insights, and a myriad of related interesting questions. It’s turtles all the way down. Being a graduate student means being comfortable sailing in these uncharted waters; enjoying taking responsibility for their own education and relying on their own judgement to explore their questions of interest. Many students are daunted by this, and many can’t stand it. It can take years, even for graduate students to come to terms with the reality that they can never be *right*, only on an interesting line of inquiry. Some never do. Many faculty and students never learn, or are unwilling to accept that there are no right answers… However, in my experience, those who routinely feel that they are ‘right’ about anything in science usually: 1) are feeding an ego that demands validation, 2) fundamentally do not understand the complexity of the questions being asked and the limits of the understanding they can achieve within the framework of current research, or 3) are purposefully over-simplifying the question as a heuristic tool (which can be good or bad, depending on the context). Personally, I think I have become deeply suspicious of any feelings of certainty regarding my own science. When I do, I routinely ask myself whether I am doing so for any of these three reasons, and this often helps me avoid pitfalls and biases in my own thinking. Learning to self-check like this is, I feel, one of the first lessons that graduate students must learn in order to be successful.

**– Ask yourself before asking others**

This is pretty straightforward, and a big part of embracing ignorance, taking responsibility for your own understanding, and learning to rely on your own judgement. Invariably, every good graduate student I have ever known does this: whenever they are stumped, frustrated, feel the need for ‘answers’, or the desire to go ask their peers or supervisor a question… they stop. They take a breath. And then feverishly investigate and refine their question(s) before they ever open their mouth. Usually, the exercise of carefully examining their questions leads them to the solution they were seeking. And if not… they have a far better understanding of what their question really is if/when they do seek feedback from others. Which leads naturally to how to seek feedback…

** – Inspire feedback
** This is a crucial point, and will go a long way towards improving interactions with both peers and supervisors. Graduate students cannot

*expect*useful feedback to help them navigate their questions and research… they must

*inspire*it. Feedback can only be as good as the questions and explanations you offer to your peers/supervisor, and the discussion

*you*create. This goes hand in hand with ‘asking yourself before asking others’, and ‘turtles’. A graduate student must explore their own questions thoroughly before seeking feedback. Remember, it’s turtles all the way down… the key to getting good feedback is inspiring your peers/supervisor with your fascinating line of inquiry, NOT asking them questions that you expect answers to. Remember too that inviting others down your rabbit hole means knowing your way around the warren… often the most useful part of seeking feedback is the exercise of preparing yourself to guide others through your lines of inquiry.

Possible future thoughts:

– Building System 2 endurance.

– The professional/personal boundary

# Best methods section ever

I love entomologists…

# The many faces of the Negative Binomial variance function

Ecologists and evolutionary biologists often confront statistical models with overdispersed count data. One common strategy for dealing with overdispersed count data is the use of generalized linear models or generalized linear mixed models that implement a negative binomial (NB) error distribution. The typical interpretation of the NB is that it describes the number of failures before r successes in independent trials with a fixed probability of success in each trial. Alternatively, the NB can be derived as a hierarchical gamma-poisson process where the Poisson intensity (the probability of observing an event) itself follows a gamma distribution. This second formulation is perhaps more intuitive for biological analyses, but the two are equivocal.

Part of what makes the NB so useful for dealing with overdispersed data is that its variance function can be formulated several different ways to describe a variety of mean~variance relations. The well known NB1 & NB2 variance functions correspond to

NB1: Var(x) = µ + θμ

NB2: Var(x) = µ + θμ^2,

and allow for the analysis of both linear and quadratic mean~variance relations. Linden & Metanyahoo (2011) demonstrated that the variance function can be extended by incorporating a second estimable overdispersion parameter

Linden: Var(x) = ωµ + θμ^2.

This formulation provides more flexibility in describing data arising from processes with non-linear mean~variance relations, but reduces to NB2 ω=1.

As part of an analysis of pollinator count data, I was interested in applying this alternative version of the NB variance function, but could not find readily available code in SAS or R. Luckliy, while trolling SAS forums for advice and ideas, I stumbled on a post by Adam Smith from Dept. Nat. Resources University of Rhode Island, from 2011 looking to do the exact same thing. Neither of us appeared to get any satisfying resolution to our questions on the forums, so I took a long shot and emailed Adam to see if he ever figured it out. We ended up passing code back and forth (mostly he gave me code), and together came up with a working implementation for the Linden NB variance function in SAS. Unfortunately, it only appears to work in PROC NLMIXED, which is extremely inefficient for larger models, or models in which any variable selection process has to happen… but anyway, here is what we came up with, along with a sample dataset provided by Adam.

data counts; input ni @@; sub = _n_; do i=1 to ni; input x y @@; output; end; datalines; 1 29 0 6 2 0 82 5 33 0 15 2 35 0 79 0 19 81 0 18 0 85 0 99 0 20 0 26 2 29 0 91 2 37 0 39 0 9 1 33 0 3 0 60 0 87 2 80 0 75 0 3 0 63 1 9 18 0 64 0 80 0 0 0 58 0 7 0 81 0 22 3 50 0 15 91 0 2 1 14 0 5 2 27 1 8 1 95 0 76 0 62 0 26 2 9 0 72 1 98 0 94 0 23 1 2 34 0 95 0 18 48 1 5 0 47 0 44 0 27 0 88 0 27 0 68 0 84 0 86 0 44 0 90 0 63 0 27 0 47 0 25 0 72 0 62 1 13 28 1 31 0 63 0 14 0 74 0 44 0 75 0 65 0 74 1 84 0 57 0 29 0 41 0 9 42 0 8 0 91 0 20 0 23 0 22 0 96 0 83 0 56 0 3 64 0 64 1 15 0 4 5 0 73 2 50 1 13 0 2 0 0 41 0 20 21 0 58 0 5 0 61 1 28 0 71 0 75 1 94 16 51 4 51 2 74 0 1 1 34 0 7 0 11 0 60 3 31 0 75 0 62 0 54 1 2 66 1 13 0 5 83 7 98 1 11 1 28 0 18 0 17 29 5 79 0 39 2 47 2 80 1 19 0 37 0 78 1 26 0 72 1 6 0 50 3 50 4 97 0 37 2 51 0 45 0 17 47 0 57 0 33 0 47 0 2 0 83 0 74 0 93 0 36 0 53 0 26 0 86 0 6 0 17 0 30 0 70 1 99 0 7 91 0 25 1 51 4 20 0 61 1 34 0 33 2 14 60 0 87 0 94 0 29 0 41 0 78 0 50 0 37 0 15 0 39 0 22 0 82 0 93 0 3 0 16 68 0 26 1 19 0 60 1 93 3 65 0 16 0 79 0 14 0 3 1 90 0 28 3 82 0 34 0 30 0 81 0 19 48 3 48 1 43 2 54 0 45 9 53 0 14 0 92 5 21 1 20 0 73 0 99 0 66 0 86 2 63 0 10 0 92 14 44 1 74 0 8 34 1 44 0 62 0 21 0 7 0 17 0 0 2 49 0 13 11 0 27 2 16 1 12 3 52 1 55 0 2 6 89 5 31 5 28 3 51 5 54 13 64 0 9 3 0 36 0 57 0 77 0 41 0 39 0 55 0 57 0 88 1 7 2 0 80 0 41 1 20 0 2 0 27 0 40 0 18 73 1 66 0 10 0 42 0 22 0 59 9 68 0 34 1 96 0 30 0 13 0 35 0 51 2 47 0 60 1 55 4 83 3 38 0 17 96 0 40 0 34 0 59 0 12 1 47 0 93 0 50 0 39 0 97 0 19 0 54 0 11 0 29 0 70 2 87 0 47 0 13 59 0 96 0 47 1 64 0 18 0 30 0 37 0 36 1 69 0 78 1 47 1 86 0 88 0 15 66 0 45 1 96 1 17 0 91 0 4 0 22 0 5 2 47 0 38 0 80 0 7 1 38 1 33 0 52 0 12 84 6 60 1 33 1 92 0 38 0 6 0 43 3 13 2 18 0 51 0 50 4 68 0 ; proc sort; by sub i; run; proc print; run; TITLE 'Negative binomial'; proc glimmix data=counts method=quad; class sub; model y = x / link = log s dist=nb; random int / subject=sub; run; title 'Linden parameterization'; proc nlmixed data=counts; *omega = 1; *constrains model to quadratic mean-variance relationship, NB2; *omega starting value in parms statement should by greater than 1 when estimating Quasi-Poisson model to avoid execution error due to zero denominator; *theta = 0; *constrains model to linear mean-variance relationship, i.e., NB1, quasiPoisson; *Starting fixed effect parameter estimates are the rounded coefficient estimates from the NB model; *Starting value for omega should be > 1 (or at least not 1) to prevent dividing by 0 in calculation of r; parms b_0=-1 b_1=0 omega=4 theta=1; eta_lambda = b_0 + b_1*x + u; /* subject level random intercept */ lambda = exp(eta_lambda); r = lambda / (omega - 1 + theta*lambda); p = r / (r + lambda); loglike = lgamma(y+r) - lgamma(y+1) - lgamma(r) + r*log(p) + y*log(1-p); /* fitting the model */ model y ~ general(loglike); random u ~ normal(0, exp(2*log_sdSUB)) subject=sub; run; /* REPLACING r,p, AND loglike WITH THE FOLLOWING GIVES*/ /* THE EQUIVALENT GAMMA-POISSON FORMULATION */ alpha = linp / (omega - 1 + theta*linp); beta = 1 / (omega - 1 + theta*linp); loglike = lgamma(y+alpha) - lgamma(y+1) - lgamma(alpha) + alpha*log(beta/(1+beta)) + y*log(1/(1+beta));

# A quote to set the stage

“Our job as scientists is to decide between the possible, the plausible, and the consequential.”

– Sam Schiener, in The Nature of Scientific Evidence Ch. 3.