Multivariate Poisson & Multivariate Exponential Distributions (not everything needs a copula ;)

While I am preparing for a more “in-depth” treatment of this Twitter thread that sparked some interest (thank my lucky stars!) , I ran into a couple of curious distributions that I think highlight an important lesson: not EVERYTHING needs a copula 😉 Sure, they’re flexible and powerful modelling approaches. But at the end of the day there are other methods to create multivariate, non-normal distributions. And many of them have very elegant theoretical properties that allow us to expand our the intuitions about these distributions from the univariate to the multivariate setting.

So here I present two distributions which can be generalized from their univariate to a multivariate definition without invoking a copula.

Multivariate Poisson Distribution

The Poisson distribution is closed under convolutions. For purposes of this post, that means that if X_{1} and X_{2} are independent, Poisson-distributed (with parameters \lambda_{1}, \lambda_{2} respectively) then Y=X_{1}+X_{2} is also Poisson-distributed, (with parameter… Yup! You got it! \lambda_{Y}=\lambda_{1}+\lambda_{2} ).

So let’s start easy, the bivariate case. To construct a bivariate Poisson random vector (X,Y)' we can use the following stochastic representation:

X = X_{1} + X_{1,2}

Y = X_{2} + X_{1,2}

Where X_{1}, X_{2}, X_{1,2} are independent Poisson random variables with parameters \lambda_{1}, \lambda_{2}, \lambda_{1,2} respectively. Since Poisson distributions are closed under convolutions, X and Y are Poisson distributed with variance \lambda_{1}+\lambda_{1,2}, \lambda_{2}+\lambda_{1,2}  respectively, and covariance Cov(X, Y) = \lambda_{1,2} . Notice that this construction implies the restriction Cov(X, Y) \geq 0 .

Using the above property we can derive the joint probability function of (X,Y)' . Consider P(X=r, Y=s) fixed values r,s Since X, Y, X_{12} can take any values between 0 and min(r,s) and X_{1}, X_{2}, X_{1,2} are mutually independent then we can use this property to define the joint probability function as:

P(X=r, Y=s)=\sum_{k=0}^{min(r,s)}f_1(r-k)f_2(s-k)f_{1,2}(k)

where f_1, f_2, f_{1,2} are the probability functions of X_1, X_2, X_{1,2} respectively. By making the proper substitutions in the f's and some collecting of terms we have:

P(X=r, Y=s)=e^{(-\lambda_1-\lambda_2-\lambda_{1,2})}\frac{\lambda^{r}_{1}}{r!}\frac{\lambda^{s}_{2}}{s!}\sum_{k=0}^{min(r,s)}(r-k)(s-k)k!\left ( \frac{\lambda_{1,2}}{\lambda_1\lambda_2} \right ) ^{k}

From this process I could expand it to, say, a trivariate Poisson random variable by expressing the 3-D vector (X, Y, Z)' as:

X = X_{1} + X_{1,2} + X_{1,3} +X_{1,2,3}

Y = X_{2} + X_{1,2} + X_{2,3} +X_{1,2,3}

                                                    Z = X_{3} + X_{1,3} + X_{2,3} +X_{1,2,3}

Where all the X’s are themselves independent, Poisson distributed and the terms with double (and triple) subscript would control the level of covariance among the Poisson marginal distributions.  If you repeat this iteratively adding more and more terms to the summation then you can increase the dimensions of the multivariate Poisson distribution.

Multivariate Exponential Distribution

The univariate exponential distribution is also (sort of) closed under convolution. But, in this very specific case, it’s closed under weighted minima convolution. For this post, that means that if Y_1, Y_2, Y_3, ..., Y_n are independent, exponential random variables, then min(w_1Y_1, w_2Y_2, w_3Y_3, ..., w_nY_n) is also exponentially-distributed for w_1, w_2, w_3, ..., w_n >0.

We can start very similarly as with the previous case by defining how the bivariate distribution would look like. Here we start with:

X = Y_{1} + Y_{1,2} 

Y = Y_{2} + Y_{1,2} 

where Y_1, Y_2, Y_{1,2} are independent, exponentially-distributed random variables with parameters \lambda_{1}, \lambda_{2}, \lambda_{1,2} (I SWEAR to God the standard mathematical notation for the parameter of both the exponential and the Poisson distribution is \lambda. Of course Y_1, Y_2, Y_{1,2} would be chosen as the minima of their respective sequences of exponential random variables. Because of this X, Y are also exponentially-distributed with parameters \lambda_{1}+\lambda_{1,2}, \lambda_{2}+\lambda_{1,2} respectively. Since Y_1, Y_2, Y_{1,2} are independent, then we have:

P(X>x, Y>y)=P(Y_1>x, Y_{1,2}>x, Y_2>y, Y_{1,2}>y)=

e^{ \lambda_1x}e^{\lambda_2y}e^{\lambda_{1,2}max(x,y)}

And the joint cumulative density function of the bivariate vector (X,Y)' would then be:

F(y_1, y_2)=(1-e^{-\lambda_{1}x})(1-e^{-\lambda_{2}y})(1-e^{-\lambda_{1,2}max(x,y)})

Moral of the story

If you know me, you’ll know that I tend to be overly critical of the things I like the most, which is a habit that doesn’t always makes me a lot of friends, heh. Because I like copula modelling and I like the idea of non-normal, multivariate structures, I also like to see and understand the cases where defining multivariate structures that do not need a copula may give us insights. For the particular two cases above, I am exploiting the fact that sums of these types of random variables also result in the same type of random variable (i.e., closed under convolution) which, for better or worse, is a very useful property that not many univariate probability distributions have. Nevertheless, when they *do* have it, it is perhaps wise to use them because, at the end of the day, using copulas to either simulate or model multivariate data does not imply the copula distribution *becomes* the “multivariate” version of that distribution. A Gaussian copula with gamma-distributed marginals is not a “multivariate gamma distribution”. It is… well… a Gaussian copula with gamma distributed marginals. If you change the copula function for something else (say an Archimidean copula) the multivariate properties change as well. Which brings us to a very sobering realization: with the exception of some very select types of multivariate distributions (usually those closed under convolution) we don’t always have well-defined extensions of multivariate distributions. For example, there are more than 10 different ways to define distributions that would satisfy what one would call a “multivariate t distribution”. Whichever characterization one chooses is usually contingent on the intended use for it. I guess this helps highlight one of the oldest pieces of advice I ever received: You know you have become a good methodologist when you realize the only correct answer to every data analysis question is simply “it depends”.