[SciPy-user] [half OT?] best way to store a spectrum

Bruce Southey bsouthey@gmail....
Wed May 27 15:44:50 CDT 2009


Davide Cittaro wrote:
>
> On May 27, 2009, at 7:02 PM, Bruce Southey wrote:
> Hi 
>>
>>
>
>> Can you please be more specific?
>>
>
> You're right :-)
>
>> Exactly what do you mean by 'analysis'?
>
> These spectra come from proteomic experiments in which peptides are 
> fragmented into ion series which should be matched with theoretical 
> spectra (predicted from peptide aminoacidic sequence) to identify 
> sequence themselves...
>
>> Do you actually use the intensity values or only those values above a
>> set threshold?
>
> :-) In a first attempt I don't think, theoretical spectra are 
> difficult to model on intensities. I may use intensity values to get 
> only most intense peaks
>
>> What do you really mean by a 'bunch of spectra'?
>
> Thousands or dozen of thousands usually...
>
>> Does each experimental spectrum have a unique corresponding theoretical
>> spectrum?
>
> No
>
>> Do you compare  the 'bunch of spectra' to a single theoretical spectrum?
>> Do you compare  the 'bunch of spectra' to a bunch of theoretical 
>> spectrum?
>
> I have to find which "theoretical" best matches with an experimental 
> (or viceversa)...
>
>> What exactly do you mean by 'match'?
>>
>
> LOL! Sorry if I laugh... scoring a match is a story apart :-)
>
>> To be efficient, you probably want to:
>> 1) Vectorize the operations so you want to avoid looping over each
>> spectrum. So a single large array may help.
>> 2) Find a suitable approach for your analysis as there may be more than
>> one approach. Especially getting as many of the calculations as possible
>> into lapack functions rather than Python should be faster.
>> 3) Try to factoring out constants.
>
> Thanks
>
> d
> ------------------------------------------------------------------------
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>   
Hi,
I do have some very basic understanding of the problem.

Without knowing the approach(es) that you are using, there is not a lot 
to add. Basically you need to store it in way that you can quickly 
access it in the desired format. For example, most approaches filter 
first on the overall 'protein' mass so it may be important to quickly 
retrieve spectra based on range of masses rather than going through each 
spectrum one by one.

As Gary suggests, hdf5/PyTables may be beneficial.

Bruce


More information about the SciPy-user mailing list