[SciPy-User] peer review of scientific software
Thøger Emil Rivera-Thorsen
Sun Jun 2 17:31:34 CDT 2013
On 02-06-2013 22:06, zetah wrote:
> Charles R Harris wrote:
>>> If we speak about errors, I think that most of it, like taught in
>>> Numerical analysis course, are due to human factor not understanding data
>>> types and also variety of data sources representing data differently.
>>> Trivial example that sql and netcdf databases represent same data in
>>> different format. Similarly for other data sources which in turn can be
>>> just plain text dumps. If that is handled correctly and user is familiar
>>> with the tool used, there shouldn't be any surprises.
>> At least when no one checks ;) The errors that the gods of analysis gift to
>> us are often hidden away and are easy to overlook. They also tend to creep
>> in when one is overconfident. It's all part of the devine sense of humor.
> Probably true. I know this comes from experience that I have not enough
>> I confess to my shame that I have never learned to use a spreadsheet for
>> any but the simplest things. It's just so darn complicated ;)
> That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment.
> For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario.
> Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me.
> There are many aspects on this subject, and please do continue if I stepped in too carelessly :)
You may of course be perfectly happy with your current work setup, but
it seems to me like you could do everything you describe without leaving
Python, by using Pandas. Pivot tables, slicing and dicing of
heterogenous data types, indexing by multi-layer labels, arbitrary
operations on pivoted, sliced and diced data frames, importing/exporting
csv, ascii, html and even LaTeX, quick plotting for data ionspection
purposes etc. Of course, the interactive element isn't there. On the
other hand, it is very powerful, and you don't have to switch between
several different environments and tools.
The frames are basically enhanced numpy arrays, so the data can be
passed directly to numpy or matplotlib. Also, if working in the IPython
qtconsole or notebook, simply typing the dataframe's name will show it
nicely rendered as an html table.
I have definitely enjoyed working with it.
Sorry for going slightly off-topic.
> SciPy-User mailing list
More information about the SciPy-User