[SciPy-User] peer review of scientific software

Thøger Emil Rivera-Thorsen trive@astro.su...
Sun Jun 2 17:31:34 CDT 2013


On 02-06-2013 22:06, zetah wrote:
> Charles R Harris wrote:
>>> If we speak about errors, I think that most of it, like taught in
>>> Numerical analysis course, are due to human factor not understanding data
>>> types and also variety of data sources representing data differently.
>>> Trivial example that sql and netcdf databases represent same data in
>>> different format. Similarly for other data sources which in turn can be
>>> just plain text dumps. If that is handled correctly and user is familiar
>>> with the tool used, there shouldn't be any surprises.
>>>
>> At least when no one checks ;) The errors that the gods of analysis gift to
>> us are often hidden away and are easy to overlook. They also tend to creep
>> in when one is overconfident. It's all part of the devine sense of humor.
> Probably true. I know this comes from experience that I have not enough
>
>
>> I confess to my shame that I have never learned to use a spreadsheet for
>> any but the simplest things. It's just so darn complicated ;)
> That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment.
>
> For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario.
>
> Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me.
>
> There are many aspects on this subject, and please do continue if I stepped in too carelessly :)

You may of course be perfectly happy with your current work setup, but 
it seems to me like you could do everything you describe without leaving 
Python, by using Pandas. Pivot tables, slicing and dicing of 
heterogenous data types, indexing by multi-layer labels, arbitrary 
operations on pivoted, sliced and diced data frames, importing/exporting 
csv, ascii, html and even LaTeX, quick plotting for data ionspection 
purposes etc. Of course, the interactive element isn't there. On the 
other hand, it is very powerful, and you don't have to switch between 
several different environments and tools.
The frames are basically enhanced numpy arrays, so the data can be 
passed directly to numpy or matplotlib. Also, if working in the IPython 
qtconsole or notebook, simply typing the dataframe's name will show it 
nicely rendered as an html table.
I have definitely enjoyed working with it.

Sorry for going slightly off-topic.

/Emil

>
> Cheers
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-User mailing list