# [SciPy-user] building (very) large Markov chains

nicky van foreest vanforeest@gmail....
Tue Jun 23 01:53:27 CDT 2009

```Hi David,

The problem is like this. The states I have to deal with represent
schedules of jobs to process on a machine. Such schedules can be quite
complicated. Currently I store these schedules as a tuple.

Generating the transition matrix is a pain. The problem is that I do not
have a
simple method to enumerate all the states in the state space. I use a
trick instead that resembles building a graph by means of induction.
Specifically, for a given state ( that is a schedule stored as a
tuple) I have a rule to compute the states to which the markov chain
can move. I implemented this rule in a function generateRow(state)
which returns a list of states to which the chain can jump and a list
of probabilities associated to these states. The induction works as
follows. I apply generateRow to an initial state. This yields a set of
new states. To each of these new states I apply generateRow, which
generate new states and perhaps some old ones. I add the new states to
a set called frontier, and move the state which I just have dealt with
from this frontier set to the state space set. Then I pop another
state form the frontier set and move it to the state space set, apply
generateRow to this state, put new states in the frontier set,
etc...Once the frontier set is empy (due to the popping of states), I
must have visited all reachable states.

Now all these tuples may consume considerable amounts of memory. Worse
yet, two identitical tuples may sometimes claim memory twice. To see
this, suppose I generate the tuple (3,4,4,4,4,4,) and do a lot of
computations as sketched above. It may occur during these computations
that this very tuple is generated again. Python, now, may claim new
memory for this newly generated tuple, although the memory has already
been claimed for the initial tuple.

I am breaking my head over this problem. One way to circumvent this is
by using a dictionary that mappes tuples to indices. I include the
code below. However, I do not like to use this dictionary as it means
extra storage. Do you perhaps have a memory efficient work around?

Thanks

bye

Nicky

class StateSpace():
def __init__(self):
self.state2index = {}
self.index2state = {}
def __call__(self, state):
if type(state) == tuple:
self.returnIndex(state)
if type(state) == int:
return self.index2state[state]
def returnIndex(self, state):
if state not in self.state2index:
index = len(self.state2index)
self.state2index[state] = index
self.index2state[index] = state
return self.state2index[state]

2009/6/21 David Warde-Farley <dwf@cs.toronto.edu>

> Can you elaborate on what you're trying to do with the chains and what
> problem you're having?
>
> David
>
> On 21-Jun-09, at 2:53 PM, nicky van foreest wrote:
>
> > Hi,
> >
> > I am trying to get into contact with somebody that is interested in,
> > or has experience with, building very large sparse Markov chains (+5e5
> > states) in python.  I tried some different approaches to solve my
> > problems, but none of these meet my demands. Perhaps one of you has a
> > better idea.
> >
> > bye
> >
> > Nicky
> > _______________________________________________
> > SciPy-user mailing list
> > SciPy-user@scipy.org
> > http://mail.scipy.org/mailman/listinfo/scipy-user
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.scipy.org/pipermail/scipy-user/attachments/20090623/872e3757/attachment-0001.html
```