# [Numpy-discussion] newbie question - large dataset

Bruce Southey bsouthey@gmail....
Sat Apr 7 15:59:27 CDT 2007

```Hi,
Why tuples as your g() function is one floats and h() is on strings
with g() and h() independent?

You should be able vectorize this very easily since everything just
depends on the tuple value (you need to write g() and/or h() in
matrix/vector notation first). But you need to supply either the
functions or some description of what g() and h() are doing. Further
you probably want to look at the full algorithm that you are using as
the 'many more functions' may require refinement especially if
function f() is the final product and everything else is temporary.

Bruce

On 4/7/07, Steve Staneff <staneff@constructiondatares.com> wrote:
> Hi,
>
> I'm looking for a better solution to managing a very large calculation.
> Set A is composed of tuples a, each of the form a = [float, string]; set B
> is composed of tuples of similar structure (b = [float, string]).  For
> each possible combination of a and b I'm calculating c, of the form c =
> f(a,b) = [g(a[0], b[0]), h(a[1], b[1])] where g() and h() are non-trivial
> functions.
>
> There are now 15,000 or more tuples in A, and 100,000 or more tuples in B.
>  B is expected to grow with time as the source database grows.  In
> addition, there are many more elements in a and b than I've stated (and
> many more functions operating on them).  I'm currently using python to
> loop through each a in A and each b in B, which takes days.
>
> If anyone can point me to a better approach via numpy ( or anything
> else!), I'd be very appreciative.
>