[SciPy-User] FW: curve fitting by a sum of gaussian with scipy
Charles R Harris
Thu Apr 18 10:16:16 CDT 2013
On Thu, Apr 18, 2013 at 9:05 AM, Charles R Harris <firstname.lastname@example.org
> On Thu, Apr 18, 2013 at 8:59 AM, Charles R Harris <
> email@example.com> wrote:
>> On Thu, Apr 18, 2013 at 6:24 AM, Stéphanie haaaaaaaa <
>> firstname.lastname@example.org> wrote:
>>> Dear all,
>>> I'm doing bioinformatics and we map small RNA on mRNA. We have the
>>> mapping coordinate of a protein on each mRNA and we calculate the relative
>>> distance between the place where the protein is bound on the mRNA and the
>>> site that is bound by a small RNA.
>>> I obtain the following dataset :
>>> dist eff-69 3-68 2-67 1-66 1-60 1-59 1-58 1-57 2-56 1-55 1-54 1-52 1-50 2-48 3-47 1-46 3-45 1-43 10 11 22 123 184 185 136 97 78 59 310 113 214 315 216 217 218 219 220 221 322 124 125 126 128 231 138 140 2
>>> When i plot the data, i have 3 pics : 1 at around 3/4 another one around
>>> 20 and a last one around -50. (see attached file, upper graph)
>>> I try cubic spline interpolation, but it does'nt work very well for my
>>> data (see attached file 2, red curve).
>>> My idea was to do curve fitting with a sum of gaussians. For example in
>>> my case, estimate 3 gaussian curve around the peak (at point 5,20 and -50).
>>> How can i do so ?
>>> I looked at scipy.optimize.curve_fit(), but how can i fit the curve at
>>> precise intervalle ? How can i add the curve to have one single curve ?
>> That's interesting. On thinking about it, I think if you used the design
>> matrix for, say, fitting a uniform spline with fairly closely spaced sample
>> points, that it would be pretty singular, which would be a good thing
>> because the pseudo inverse would minimize the sum of squares of the
>> coefficients, which in turn would knock down the curve where there is no
>> data. Mind, I'm just speculating here, haven't tried it. Is the data you
>> posted complete?
> And thinking some more, always a bad sign here, this looks like a
> histogram, but you have left out all the distance data points that had zero
> matches, I think you need to keep them in.
And as a histogram, kernel density
be a good way to go. The stats model folks should have something for
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the SciPy-User