[IPython-user] Colorizing lines in debuggers and post-mortem

R. Bernstein rocky@panix....
Wed Feb 14 16:15:53 CST 2007


And one more follow-up thought on tokenization just to document
somewhere, or in case someone wants to pick this up.

Overall this looks like a nice interesting computer science problem to
write such an incremental tokenizer.

Here are states I would imagine
  unknown (don't know what came before)
  in a string
  probably in a string
  not in a string
  probably not in string

A debugger may know for example that you can't be inside a string (or
a comment) if you've requested listing a method, class, function, or
object. That's why there would be an optional state parameter. Also a
debugger will know that you probably aren't in a string (or comment)
when printing a source line, say from a breakpoint.

This is true, provided the line doesn't have a semicolon in it.  There
could be some complication with multi-statement lines such as

   """Now
   is the time"""; foo='bar'

Here's a heuristic that comes from Emacs TeX modes. If you are in one
of the unknown states and you see an unescaped quote that is preceded
by white space (or is the beginning of the line/string), then it's
probably an opening quote. If you see an unescaped quote in an unknown
state that is followed by a space or line end (or semicolon) it is
probably a closing quote. But again there are Python idioms like
";".join(lines)

which defeat this. So running a tokenize.tokenize over the line first
(i.e. assume not in string) may be of help. Or first check to see if
all quotes are matching quotes might help.

A long time ago I wrote a program that tried to figure out if a dot
(.) was the end of a sentence (a period) or an dot of an abbreviation,
or inside a number. In that I had various heuristics like this and
basically assigned weights as to which kind of dot we had. And
something like that -- assigning weights and using the most-likely
outcome -- could also be done here.



More information about the IPython-user mailing list