[SciPy-User] scipy central comments
Thu Sep 8 11:52:17 CDT 2011
Back on list.
On Thu, Sep 8, 2011 at 12:43 PM, denis <email@example.com> wrote:
> re central data: definitely useful -- see R,
> but it should be separate from scipy-central:
> don't do everything at once.
Many of the same datasets are available in R. Could be made available
as a separate package, though some of it (endog, exog) attributes are
specific to our testing and examples needs.
> The functionality oughta include
> listing: what's available, how big is it ?
> load / loadtxt to a single array
> splitting, sanitizing, summarizing: diverse, difficult
> BUT scipy-central-data may satisfy no one, in which case forget it.
> (Personally I'd like to spec it first, shoot later, but.)
What we've used:
> What I use today is this, ~ 3 pages:
> def getdata( source, N=0, Ntest=0, classcol=-1, centre=0, verbose=0,
> datadir=Datadir ):
> """ X = getdata( slearn/xx uciml/yy ... classcol=None )
> findfile, load or loadtxt
> X, classes = getdata( ... classcol = 0 or -1 )
> split off classes, astype(int)
> X, y, Xtest, ytest = getdata( Ntest > 0 )
> split first N / last Ntest
> 0 noop, 1 -= mean, 2 /= sd, 3 winsorise, 4 winsor + to_11
> def findfile( filename, datadir ):
> """ try datadir + filename + .npy .csv .csv.gz .txt .txt.gz
> expand $vars, glob # cf openplus
> got no satisfactory answer
> but you might ask the scikits-learn guys again, see where they are
Ours and their datasets module evolved from David C.'s original
proposal I believe.
More information about the SciPy-User