[SciPy-user] SQLite columns to scipy arrays

Vincent vincent.nijs@gmail....
Fri Aug 17 17:40:53 CDT 2007


I looked into this a few weeks back and posted about it on the numpy
list and the sqlite list.

http://projects.scipy.org/pipermail/numpy-discussion/2007-July/028584.html
http://lists.initd.org/pipermail/pysqlite/2007-July/001084.html

Below is a test program that shows probably the fastest way to get
data from an sqlite database into a numpy recarray. It is very slow
for large blocks of data. You might want to take a look at pytables
(http://www.pytables.org/moin) it is almost as fast as cPickle and has
some of the same advantages as sql databases of cPickle.

Best,

Vincent


def test_save_sqlite(fname, table = 'data'):
	# saving recarray to an sqlite file
	conn = sqlite3.connect('%s.sqlite' % fname)

	c = conn.cursor()

	# getting the variable names
	varnm = data.dtype.names
	nr_var = len(varnm)

	# transform to types sqlite knows
	types = []
	for i in data[0]:
		if type(i) == N.string_: types.append('text')
		if type(i) == N.float_: types.append('real')
		if type(i) == N.int_: types.append('integer')

	create_string = ",".join(["%s %s" % (v,t) for v,t in
zip(varnm,types)])

	# create a table if it doesn't exist yet
	try:
		c.execute('drop table %s' % table)
	except sqlite3.OperationalError:
		pass

	c.execute('create table %s (%s)' % (table,create_string))

	# putting the data into the table
	exec_string = 'insert into %s values %s' % (table,'(%s)' %
(','.join(('?')*nr_var)))
	c.executemany(exec_string, data)

	# commiting the data to the database
	conn.commit()

	# closing the connection
	conn.close()

def test_load_sqlite(fname, table = 'data', str_length = 20):

	conn = sqlite3.connect('%s.sqlite' % fname)
	c = conn.cursor()

	# get all data
	c.execute('select * from %s' % table)

	# getting data types
	types = []
	for i in c.fetchone():
		if type(i) == unicode: types.append('S%s' % str_length)
		if type(i) == float: types.append('float')
		if type(i) == int: types.append('int')

	# variable names
	varnm = [i[0] for i in c.description]

	# autodetected dtype
	dtype = zip(varnm,types)
	data = N.fromiter(c, dtype = dtype)

	# closing the connection
	conn.close()

if __name__ == '__main__':
	from timeit import Timer
	import numpy as N
	import os, sqlite3

	# making a directory to store simulate data
	if not os.path.exists('./data'): os.mkdir('./data')

	# creating simulated data and variable labels
	varnm = ['id','a','b','c','d','e','f','g','h','i','j']			# variable
labels
	nobs = 500000
	data1 =	N.random.randn(nobs,5)
	data2 =	N.random.randint(-100, high = 100, size = (nobs,5))

	# adding a string variable
	id = [('id'+str(i)) for i in range(nobs)]

	data1 = [i for i in data1.T]
	data2 = [i for i in data2.T]

	d = []
	d.append(N.array(id))
	d.extend(data1)
	d.extend(data2)

	descr = [(varnm[i],d[i].dtype) for i in xrange(len(varnm))]
	data = N.rec.fromarrays(d, dtype=descr)

	n = 20
	fname = './data/data'

	# testing sqlite
	t2 = Timer('test_save_sqlite(\"%s\")' % fname, 'from __main__ import
test_save_sqlite')
	print "\n\nTest saving recarray with sqlite\n"
	print "%.6f sec/pass" % (t2.timeit(number=n)/n)

	# testing sqlite
	t4 = Timer('test_load_sqlite(\"%s\")' % fname, 'from __main__ import
test_load_sqlite')
	print "\n\nTest loading recarray with sqlite\n"
	print "%.6f sec/pass" % (t4.timeit(number=n)/n)


On Aug 17, 9:51 am, Bill Dandreta <wjdandr...@att.net> wrote:
> What is the fastest/best way to put the columns of an SQLite database
> into scipy arrays?
>
> I've been doing it with the following pseudo-code but it is kind of slow
> for large blocks of data.
>
> col1=[]
> col2=[]
> ...
> for r in cursor:
>     col1.append(r['col1'])
>     col2.append(r['col2'])
>     ...
> acol1=S.array(col1)
> acol2=S.array(col2)
> ...
>
> --
> Bill
>
> wjdandr...@att.net
>
> Gentoo Linux X86_64 2.6.20-gentoo-r8
>
> Reclaim Your Inbox withhttp://www.mozilla.org/products/thunderbird/
>
> All things cometh to he who waiteth as long as he who waiteth worketh like hell while he waiteth.
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-u...@scipy.orghttp://projects.scipy.org/mailman/listinfo/scipy-user



More information about the SciPy-user mailing list