# [SciPy-User] FW: curve fitting by a sum of gaussian with scipy

Charles R Harris charlesr.harris@gmail....
Thu Apr 18 10:16:16 CDT 2013

```On Thu, Apr 18, 2013 at 9:05 AM, Charles R Harris <charlesr.harris@gmail.com
> wrote:

>
>
> On Thu, Apr 18, 2013 at 8:59 AM, Charles R Harris <
> charlesr.harris@gmail.com> wrote:
>
>>
>>
>> On Thu, Apr 18, 2013 at 6:24 AM, Stéphanie haaaaaaaa <
>> flower_des_iles@hotmail.com> wrote:
>>
>>> Dear all,
>>>
>>>
>>> I'm doing bioinformatics and we map small RNA on mRNA. We have the
>>> mapping coordinate of a protein on each mRNA and we calculate the relative
>>> distance between the place where the protein is bound on the mRNA and the
>>> site that is bound by a small RNA.
>>> I obtain the following dataset :
>>>
>>>
>>> dist    eff-69 3-68 2-67 1-66 1-60 1-59 1-58 1-57 2-56 1-55 1-54 1-52 1-50 2-48 3-47 1-46 3-45 1-43 10   11   22   123   184   185   136   97   78   59   310  113  214  315  216  217  218  219  220  221  322  124  125  126  128  231  138  140  2
>>>
>>>
>>> When i plot the data, i have 3 pics : 1 at around 3/4 another one around
>>> 20 and a last one around -50. (see attached file, upper graph)
>>>
>>> I try cubic spline interpolation, but it does'nt work very well for my
>>> data (see attached file 2, red curve).
>>> My idea was to do curve fitting with a sum of gaussians. For example in
>>> my case, estimate 3 gaussian curve around the peak (at point 5,20 and -50).
>>> How can i do so ?
>>> I looked at scipy.optimize.curve_fit(), but how can i fit the curve at
>>> precise intervalle ? How can i add the curve to have one single curve ?
>>>
>>>
>> That's interesting. On thinking about it, I think if you used the design
>> matrix for, say, fitting a uniform spline with fairly closely spaced sample
>> points, that it would be pretty singular, which would be a good thing
>> because the pseudo inverse would minimize the sum of squares of the
>> coefficients, which in turn would knock down the curve where there is no
>> data. Mind, I'm just speculating here, haven't tried it. Is the data you
>> posted complete?
>>
>
> And thinking some more, always a bad sign here, this looks like a
> histogram, but you have left out all the distance data points that had zero
> matches, I think you need to keep them in.
>
>
And as a histogram, kernel density
estimation<http://en.wikipedia.org/wiki/Kernel_density_estimation>might
be a good way to go. The stats model folks should have something for
that.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130418/052560c4/attachment.html
```