[SciPy-user] Record Array: How to add a column?

John Hunter jdh2358@gmail....
Tue Oct 14 05:35:06 CDT 2008

On Mon, Oct 13, 2008 at 7:41 PM, Robert Kern <robert.kern@gmail.com> wrote:

> This is somewhat more straightforward:
> http://projects.scipy.org/pipermail/numpy-discussion/2007-September/029357.html

I took Robert's suggestion from the link above and added
rec_append_fields to matplotlib.mlab  -- I think it may have been
called rec_append_field in 0.98.3, but we altered it in svn HEAD to
support multiple column adds.  There are a number of nice helper
functions for recarrays there

   * rec2txt          : pretty print a record array
   * rec2csv          : store record array in CSV file
   * csv2rec          : import record array from CSV file with type inspection
   * rec_append_fields: adds  field(s)/array(s) to record array
   * rec_drop_fields  : drop fields from record array
   * rec_join         : join two record arrays on sequence of fields
   * rec_groupby      : summarize data by groups (similar to SQL GROUP BY)
   * rec_summarize    : helper code to filter rec array fields into new fields

rec_join is really nice -- supports inner and outer joins with default
fill values and customizable postfixing of column names when joining
two record arrays with identically named fields.

Here is an example showing many of these functions in action

Illustrate the rec array utility funcitons by loading prices from a
csv file, computing the daily returns, appending the results to the
record arrays, joining on date
import urllib
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab

# grab the price data off yahoo
u1 = urllib.urlretrieve('http://ichart.finance.yahoo.com/table.csv?s=AAPL&d=9&e=14&f=2008&g=d&a=8&b=7&c=1984&ignore=.csv')
u2 = urllib.urlretrieve('http://ichart.finance.yahoo.com/table.csv?s=GOOG&d=9&e=14&f=2008&g=d&a=8&b=7&c=1984&ignore=.csv')

# load the CSV files into record arrays
r1 = mlab.csv2rec(file(u1[0]))
r2 = mlab.csv2rec(file(u2[0]))

# compute the daily returns and add these columns to the arrays
gains1 = np.zeros_like(r1.adj_close)
gains2 = np.zeros_like(r2.adj_close)
gains1[1:] = np.diff(r1.adj_close)/r1.adj_close[:-1]
gains2[1:] = np.diff(r2.adj_close)/r2.adj_close[:-1]
r1 = mlab.rec_append_fields(r1, 'gains', gains1)
r2 = mlab.rec_append_fields(r2, 'gains', gains2)

# now join them by date; the default postfixes are 1 and 2
r = mlab.rec_join('date', r1, r2)

# long appl, short goog
g = r.gains1-r.gains2
tr = (1+g).cumprod()  # the total return

# plot the return
fig = plt.figure()
ax = fig.add_subplot(111)
ax.plot(r.date, tr)
ax.set_title('total return: long appl, short goog')

More information about the SciPy-user mailing list