[Numpy-discussion] Re: [GS-discuss] Re: NumPy, Python DB-API and MySQL

Andy Dustman adustman at comstar.net
Mon Apr 24 15:19:44 CDT 2000


On Fri, 14 Apr 2000, Tim Churches wrote:

> Andy Dustman wrote:
> 
> Yes, but the problem with mysql_store_result() is the large amount of
> memory required to store the result set. Couldn't the user be
> responsible for predetermining the size of the array via a query such as
> "select count(*) from sometable where...." and then pass this value as a
> parameter to the executeNumPy() method? In MySQL at least such count(*)
> queries are resolved very quickly so such an approach wouldn't take
> twice the time. Then mysql_use_result() could be used to populate the
> initialised NumPy array with data row, so there so only ever one
> complete copy of the data in memory, and that copy is in the NumPy
> array.

After some more thought on this subject, and some poking around at NumPy,
I came to the following conclusions:

Since NumPy arrays are fixed-size, but otherwise sequences (in the
multi-dimensional case, sequences of sequences), the best approach would
be for the user to pass in a pre-sized array (i.e. from zeros(), and btw,
the docstring for zeros is way wrong), and _mysql would simply access it
through the Sequence object protocol, and update as many values as it
could: If you passed a 100-row array, it would fill 100 rows or as many as
were in the result set, whichever is less.

Since this requires no special knowledge of NumPy, it could be a standard
addition (no conditional compiliation required). This method (tentatively
_mysql.fetch_rows_into_array(array)) would return the array argument as
the result. IndexError would likely be raised if the array was too narrow
(too many columns in result set). Probably this would not be a
MySQLdb.Cursor method, but perhaps I can have a seperate module with a
cursor subclass which returns NumPy arrays.

> > Question: Would it be adequate to put all columns returned into the array?
> > If label columns need to be returned, this could pose a problem. They may
> > have to be returned as a separate query. Or else non-numeric columns would
> > be excluded and returned in a list of tuples (this would be harder).
> 
> Yes, more thought needed here - my initial thought was one NumPy array
> per column, particularly since NumPy arrays must be homogenous wrt data
> type. Each NumPy array could be named the same as the column from which
> it is derived.

Okay, I think I know what you mean here. You are wanting to return each
column as a (vertical) vector, whereas I am thinking along the lines of
returning the result set as a matrix. Is that correct? Since it appears
you can efficiently slice out column vectors as a[:,n], is my idea
acceptable? i.e.

>>> a=Numeric.multiarray.zeros( (2,2),'d')
>>> a[1,1]=2
>>> a[0,1]=-1
>>> a[1,0]=-3
>>> a
array([[ 0., -1.],
       [-3.,  2.]])
>>> a[:,0]
array([ 0., -3.])
>>> a[:,1]
array([-1.,  2.])

-- 
andy dustman       |     programmer/analyst     |      comstar.net, inc.
telephone: 770.485.6025 / 706.549.7689 | icq: 32922760 | pgp: 0xc72f3f1d
"Therefore, sweet knights, if you may doubt your strength or courage, 
come no further, for death awaits you all, with nasty, big, pointy teeth!"





More information about the Numpy-discussion mailing list