[SciPy-user] maxentropy

Matthew Cooper m.cooper at computer.org
Mon Mar 13 19:46:37 CST 2006

Hi Ed,

Thanks very much for your reply.  I think it helped a lot, but I may
be a bit confused about conditional versus unconditional modeling.

What I'm doing is similar to text categorization.  I observe some text
(vector X) and want to determine if a binary label (scalar y) should
be applied (y=1) or not (y=0).  So, I look at this as using maxent to
estimate P(y|X).  In this case, is my sample space simply {0,1} or is
it the space from which X is sampled?  I had thought this was
conditional modelling, but I don't want to explicitly train models for
every different X, rather I want to select and weight features from
the whole corpus that in turn imply, for each X', some P(y|X').  I
assumed it was conditional because I'm not modelling P(y',X').

I have been trying to build models using a sample space with tuples as
elements like (X_n,y_n).  I then am greedily building a set of feature
functions using information gain to select features.  It's not quite
working, and I'm worried I'm not defining the sample space properly. 
I will try to figure out how to define the sample space as {0,1} and
then redefine the features, but I'd appreciate any advice.  Also, if I
get this working, I'd be happy to try and help with adding some
feature selection examples to your documentation.  At the moment, I'm
still, as you can tell, figuring out what I'm doing.


More information about the SciPy-user mailing list