[SciPy-dev] GLMs ?

Bruce Southey bsouthey@gmail....
Sat Aug 15 08:19:04 CDT 2009


On Sat, Aug 15, 2009 at 8:12 AM, <josef.pktd@gmail.com> wrote:
> On Sat, Aug 15, 2009 at 7:35 AM, <josef.pktd@gmail.com> wrote:
>> On Sat, Aug 15, 2009 at 3:32 AM, Pierre GM<pgmdevlist@gmail.com> wrote:
>>>
>>> On Aug 15, 2009, at 3:00 AM, David Warde-Farley wrote:
>>>
>>>>
>>>> On 14-Aug-09, at 7:29 PM, josef.pktd@gmail.com wrote:
>>>>
>>>>>> Fab'.
>>>>>> FYI, I need to fit Tweedie distributions to precipitation series. I
>>>>>> have already coded the distributions in the scipy standard, and
>>>>>> now I
>>>>>> need to estimate the parameters...
>>>>>> Thanks again
>>>>
>>>> As I understand it, the Tweedie distributions are a further
>>>> generalization of the exponential family.
>>>
>>> Indeed.
>>>
>>>> Are you saying that your
>>>> parametric assumption is that they are Tweedie but not any of the
>>>> standard ones like Gaussian, Poisson, Gamma?
>>>
>>> Yes, something intermediate between Poisson and Gamma, with a variance
>>> proportional to the mean to a power 1<=p<=2.
>>>
>>>>> Are you trying to estimate parameters of the distribution themselves,
>>>>> or parameters of the distribution as function of some explanatory
>>>>> variables? In the first case, GLM won't be of much help.
>>>>
>>>> Is it that you have samples of a (nonstandard) Tweedie random variable
>>>> that you want to regress on explanatory variables?
>>>> You can probably do it by gradient descent but I don't foresee it
>>>> being pretty and probably not even convex. Either way, a GLM package
>>>> probably won't  help.
>>>
>>> I'm not sure yet whether GLMs are the way to go to my particular
>>> problem. I'm trying to reproduce an approach to model precipitation
>>> patterns (keeping track of both the number and intensities of rainfall
>>> events) described in several papers. I know that at term, I'll have to
>>> introduce extra variables and then GLMs will be the way to go. I just
>>> wanted to check what algorithms were already available.
>>> Thanks a lot for your comments.
>>
>> Using models.GLM could be as easy as adding a new distribution to the
>> family. The main algorithm is (supposed to be) independent of the
>> distribution, and all distribution specific code is supposed to be in
>> family.
>>
>> If Tweedie is like Poisson and Gamma, mainly with a different variance
>> function, then I think it *should* work with very little work.
>>
>> If you try this, then this would be a good check for how general our
>> implementation is, and whether there are still some hidden,
>> distribution specific assumptions left.
>>
>> And it will be good if we soon have more eyes on the models code,
>> because I don't think we have settled on a good API yet.
>>
>> Josef
>>
>
> I should have done a bit of homework first.
>
> The tweedie family of distributions looks very interesting, and it
> should fit in both glm and maximum likelihood framework. R/S has
> tweedie in GLM and ML in fbasics.
>
> So I would be very interested in seeing it both in models.glm and in
> scipy.stats.distributions. However, in the short term, I see two
> potential issues
>
> Wikipedia: "Apart from the four special cases identified above, their
> probability density function have no closed form. However, software is
> available that enables the accurate computation of the Tweedie
> densities (and probability distribution functions)"
>
> Currently the models code is in pure python, which makes distribution
> as a standalone package much easier, until the dust has settled, and
> models is reintegrated into scipy. Do you have the numerical
> calculations in python or compile, fortran,C? I didn't find tweedie in
> hydroclimpy.
>
> Wikipedia: "For 1 < p < 2, the distribution is continuous on the
> positive reals, plus an added mass (exact zero) at Y = 0"
>
> The generic framework for the distributions in stats.distributions
> doesn't handle, currently, distributions that have continuous and
> discrete support (masspoints). In some cases, this can be extended by
> delegation, but we could think about to handle the mixed case.
>
> Josef
>

Hi,
Technically generalized linear models only apply to a exponential
family of distributions. However, it does work for some other like
negative binomial so you have quazi-likelihoods. Typically you require
the link and inverse link functions to do it and nonlinear modeling or
iteratively reweighted least squares to solve. See Wikipedia:
http://en.wikipedia.org/wiki/Generalized_linear_model

While I have not used the Tweedie formulation the wikipedia page
http://en.wikipedia.org/wiki/Tweedie_distributions notes that these do
not have closed form. Technically this is a problem but there is an R
package:
http://cran.r-project.org/web/packages/tweedie/index.html

Given the relationships on the wikipedia, I would tend to favor using
the special cases first and see what support you have for these.

Bruce


More information about the Scipy-dev mailing list