[SciPy-User] SciPy-User Digest, Vol 118, Issue 4

Nadav Horesh nadavh@visionsense....
Mon Jun 3 00:26:22 CDT 2013


For me an important use case is a file transfer over the VPN. Is there any way to test it?

   Nadav.
________________________________________
From: scipy-user-bounces@scipy.org on behalf of scipy-user-request@scipy.org
Sent: 03 June 2013 01:26
To: scipy-user@scipy.org
Subject: SciPy-User Digest, Vol 118, Issue 4

Send SciPy-User mailing list submissions to
        scipy-user@scipy.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://mail.scipy.org/mailman/listinfo/scipy-user
or, via email, send a message with subject or body 'help' to
        scipy-user-request@scipy.org

You can reach the person managing the list at
        scipy-user-owner@scipy.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of SciPy-User digest..."


Today's Topics:

   1. Re: peer review of scientific software (zetah)
   2. Re: peer review of scientific software (Charles R Harris)
   3. Re: peer review of scientific software (Matthew Brett)
   4. Re: peer review of scientific software (zetah)
   5. Re: peer review of scientific software
      (Th?ger Emil Rivera-Thorsen)


----------------------------------------------------------------------

Message: 1
Date: Sun, 02 Jun 2013 20:00:54 +0200
From: "zetah" <otrov@hush.ai>
Subject: Re: [SciPy-User] peer review of scientific software
To: "SciPy Users List" <scipy-user@scipy.org>
Message-ID: <20130602180055.5CC67A6E40@smtp.hushmail.com>
Content-Type: text/plain; charset="UTF-8"

Thomas Kluyver wrote:
>'type of users' might have been a more accurate phrase, but it has an
>unfortunate negative ring that I wanted to avoid. There are a lot of people
>doing important data analysis in quite risky and hard-to-maintain ways.
>Using spreadsheets where some simple code might be more reliable is one
>symptom of that, and there have been a couple of major examples from
>economics where spreadsheet errors led to serious mistakes.
>The discussion is revolving roughly around whether and how we can push
>those users towards better tools and methods, like coding, version control
>and testing.

Thanks for overview Thomas, I read all emails on the subject and will comment briefly, for the sake of my participation, although topic is huge

I don't have experience with critical modeling, but I do and learn data analysis with historical data and generally.

If we speak about errors, I think that most of it, like taught in Numerical analysis course, are due to human factor not understanding data types and also variety of data sources representing data differently. Trivial example that sql and netcdf databases represent same data in different format. Similarly for other data sources which in turn can be just plain text dumps. If that is handled correctly and user is familiar with the tool used, there shouldn't be any surprises.

If it is of any interest, I thought to generalize my usual workflow, as single user example (hope it's not useless):
 - collecting data: if not directly available I use Python, and depending on source do validation. I don't change format if it's not necessary.
 - pre-processing: if I preprocess (usually with Python), I store data to sql server.
 - using data: single set or multiple datasets in PowerPivot (limited just by amount of RAM), where DAX allows calculations on pivoted views values. I haven't yet found any other tool that allows such diverse views in such short time.
 - post-processing: when needed I export results to CSV. Usually to just load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or Gephi.
 - versioning: data in source database(s) stays intact, and all calculations can be saved to a file (with values), and then opened again even if datasource is not available.

So I use Excel mainly for data manipulation and Python back and forth. Also I use additional tools for 3D visualization.
I never liked to learn about versioning systems, and I'm happy with my current scheme



------------------------------

Message: 2
Date: Sun, 2 Jun 2013 12:38:09 -0600
From: Charles R Harris <charlesr.harris@gmail.com>
Subject: Re: [SciPy-User] peer review of scientific software
To: SciPy Users List <scipy-user@scipy.org>
Message-ID:
        <CAB6mnx+R_4Tk3FCsQKQ3nMeSF6fYMYBpVPasLEuN8As2YipEjQ@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

On Sun, Jun 2, 2013 at 12:00 PM, zetah <otrov@hush.ai> wrote:

> Thomas Kluyver wrote:
> >'type of users' might have been a more accurate phrase, but it has an
> >unfortunate negative ring that I wanted to avoid. There are a lot of
> people
> >doing important data analysis in quite risky and hard-to-maintain ways.
> >Using spreadsheets where some simple code might be more reliable is one
> >symptom of that, and there have been a couple of major examples from
> >economics where spreadsheet errors led to serious mistakes.
> >The discussion is revolving roughly around whether and how we can push
> >those users towards better tools and methods, like coding, version control
> >and testing.
>
> Thanks for overview Thomas, I read all emails on the subject and will
> comment briefly, for the sake of my participation, although topic is huge
>
> I don't have experience with critical modeling, but I do and learn data
> analysis with historical data and generally.
>
> If we speak about errors, I think that most of it, like taught in
> Numerical analysis course, are due to human factor not understanding data
> types and also variety of data sources representing data differently.
> Trivial example that sql and netcdf databases represent same data in
> different format. Similarly for other data sources which in turn can be
> just plain text dumps. If that is handled correctly and user is familiar
> with the tool used, there shouldn't be any surprises.
>

At least when no one checks ;) The errors that the gods of analysis gift to
us are often hidden away and are easy to overlook. They also tend to creep
in when one is overconfident. It's all part of the devine sense of humor.


>
> If it is of any interest, I thought to generalize my usual workflow, as
> single user example (hope it's not useless):
>  - collecting data: if not directly available I use Python, and depending
> on source do validation. I don't change format if it's not necessary.
>  - pre-processing: if I preprocess (usually with Python), I store data to
> sql server.
>  - using data: single set or multiple datasets in PowerPivot (limited just
> by amount of RAM), where DAX allows calculations on pivoted views values. I
> haven't yet found any other tool that allows such diverse views in such
> short time.
>  - post-processing: when needed I export results to CSV. Usually to just
> load in numpy array and plot with Matplotlib, or 3D viewing in VisIt or
> Gephi.
>  - versioning: data in source database(s) stays intact, and all
> calculations can be saved to a file (with values), and then opened again
> even if datasource is not available.
>
> So I use Excel mainly for data manipulation and Python back and forth.
> Also I use additional tools for 3D visualization.
> I never liked to learn about versioning systems, and I'm happy with my
> current scheme
>

I confess to my shame that I have never learned to use a spreadsheet for
any but the simplest things. It's just so darn complicated ;)

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20130602/48da6418/attachment-0001.html

------------------------------

Message: 3
Date: Sun, 2 Jun 2013 12:51:00 -0700
From: Matthew Brett <matthew.brett@gmail.com>
Subject: Re: [SciPy-User] peer review of scientific software
To: SciPy Users List <scipy-user@scipy.org>
Message-ID:
        <CAH6Pt5oYPSakKp-e8KV5Bi-uRE11GOM-qtHsGVVj9tkE5RWtgg@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1

Hi,

On Sun, Jun 2, 2013 at 11:38 AM, Charles R Harris
<charlesr.harris@gmail.com> wrote:
>
>
> On Sun, Jun 2, 2013 at 12:00 PM, zetah <otrov@hush.ai> wrote:
>>
>> Thomas Kluyver wrote:
>> >'type of users' might have been a more accurate phrase, but it has an
>> >unfortunate negative ring that I wanted to avoid. There are a lot of
>> > people
>> >doing important data analysis in quite risky and hard-to-maintain ways.
>> >Using spreadsheets where some simple code might be more reliable is one
>> >symptom of that, and there have been a couple of major examples from
>> >economics where spreadsheet errors led to serious mistakes.
>> >The discussion is revolving roughly around whether and how we can push
>> >those users towards better tools and methods, like coding, version
>> > control
>> >and testing.
>>
>> Thanks for overview Thomas, I read all emails on the subject and will
>> comment briefly, for the sake of my participation, although topic is huge
>>
>> I don't have experience with critical modeling, but I do and learn data
>> analysis with historical data and generally.
>>
>> If we speak about errors, I think that most of it, like taught in
>> Numerical analysis course, are due to human factor not understanding data
>> types and also variety of data sources representing data differently.
>> Trivial example that sql and netcdf databases represent same data in
>> different format. Similarly for other data sources which in turn can be just
>> plain text dumps. If that is handled correctly and user is familiar with the
>> tool used, there shouldn't be any surprises.
>
>
> At least when no one checks ;) The errors that the gods of analysis gift to
> us are often hidden away and are easy to overlook. They also tend to creep
> in when one is overconfident. It's all part of the devine sense of humor.

Yes - when no-one checks!

I wish I still shared the feeling that mostly when I do stuff it's
correct, or mostly correct, or correct enough.  It was only when I
started checking that I started to worry.  I well remember the happier
times I'd write a 100 line analysis script with no tests and be
"pretty sure" that it was correct.

Cheers,

Matthew


------------------------------

Message: 4
Date: Sun, 02 Jun 2013 22:06:01 +0200
From: "zetah" <otrov@hush.ai>
Subject: Re: [SciPy-User] peer review of scientific software
To: "SciPy Users List" <scipy-user@scipy.org>
Message-ID: <20130602200602.49733A6E42@smtp.hushmail.com>
Content-Type: text/plain; charset="UTF-8"

Charles R Harris wrote:
>> If we speak about errors, I think that most of it, like taught in
>> Numerical analysis course, are due to human factor not understanding data
>> types and also variety of data sources representing data differently.
>> Trivial example that sql and netcdf databases represent same data in
>> different format. Similarly for other data sources which in turn can be
>> just plain text dumps. If that is handled correctly and user is familiar
>> with the tool used, there shouldn't be any surprises.
>>
>
>At least when no one checks ;) The errors that the gods of analysis gift to
>us are often hidden away and are easy to overlook. They also tend to creep
>in when one is overconfident. It's all part of the devine sense of humor.

Probably true. I know this comes from experience that I have not enough


>I confess to my shame that I have never learned to use a spreadsheet for
>any but the simplest things. It's just so darn complicated ;)

That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment.

For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario.

Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me.

There are many aspects on this subject, and please do continue if I stepped in too carelessly :)

Cheers



------------------------------

Message: 5
Date: Mon, 03 Jun 2013 00:31:34 +0200
From: Th?ger Emil Rivera-Thorsen <trive@astro.su.se>
Subject: Re: [SciPy-User] peer review of scientific software
To: SciPy Users List <scipy-user@scipy.org>
Message-ID: <51ABC7C6.3080506@astro.su.se>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 02-06-2013 22:06, zetah wrote:
> Charles R Harris wrote:
>>> If we speak about errors, I think that most of it, like taught in
>>> Numerical analysis course, are due to human factor not understanding data
>>> types and also variety of data sources representing data differently.
>>> Trivial example that sql and netcdf databases represent same data in
>>> different format. Similarly for other data sources which in turn can be
>>> just plain text dumps. If that is handled correctly and user is familiar
>>> with the tool used, there shouldn't be any surprises.
>>>
>> At least when no one checks ;) The errors that the gods of analysis gift to
>> us are often hidden away and are easy to overlook. They also tend to creep
>> in when one is overconfident. It's all part of the devine sense of humor.
> Probably true. I know this comes from experience that I have not enough
>
>
>> I confess to my shame that I have never learned to use a spreadsheet for
>> any but the simplest things. It's just so darn complicated ;)
> That's fine, maybe it's just a legacy habit no one wants to break or preference toward familiar data manipulation environment.
>
> For myself, even with all that numpy broadcasting magics, I'd spend much more time slicing data in Python then doing it as I currently prefer, as more abstractions I'd have to use for same outcome. Viewing the values at the same time while calculating feels more natural to me and provides instant "validation" to say. But if I want real validation I can make validation scenario.
>
> Earlier my only annoyance with pivoted data was that I couldn't do more then trivial calculations on values in pivoted view, unless using programmatic approach. Now that's possible (with DAX), and I can't imagine what else could make data manipulation more intuitive to me.
>
> There are many aspects on this subject, and please do continue if I stepped in too carelessly :)

You may of course be perfectly happy with your current work setup, but
it seems to me like you could do everything you describe without leaving
Python, by using Pandas. Pivot tables, slicing and dicing of
heterogenous data types, indexing by multi-layer labels, arbitrary
operations on pivoted, sliced and diced data frames, importing/exporting
csv, ascii, html and even LaTeX, quick plotting for data ionspection
purposes etc. Of course, the interactive element isn't there. On the
other hand, it is very powerful, and you don't have to switch between
several different environments and tools.
The frames are basically enhanced numpy arrays, so the data can be
passed directly to numpy or matplotlib. Also, if working in the IPython
qtconsole or notebook, simply typing the dataframe's name will show it
nicely rendered as an html table.
I have definitely enjoyed working with it.

Sorry for going slightly off-topic.

/Emil

>
> Cheers
>
> _______________________________________________
> SciPy-User mailing list
> SciPy-User@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user



------------------------------

_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user


End of SciPy-User Digest, Vol 118, Issue 4
******************************************




More information about the SciPy-User mailing list