[IPython-user] Colorizing lines in debuggers and post-mortem
Wed Feb 14 16:15:53 CST 2007
And one more follow-up thought on tokenization just to document
somewhere, or in case someone wants to pick this up.
Overall this looks like a nice interesting computer science problem to
write such an incremental tokenizer.
Here are states I would imagine
unknown (don't know what came before)
in a string
probably in a string
not in a string
probably not in string
A debugger may know for example that you can't be inside a string (or
a comment) if you've requested listing a method, class, function, or
object. That's why there would be an optional state parameter. Also a
debugger will know that you probably aren't in a string (or comment)
when printing a source line, say from a breakpoint.
This is true, provided the line doesn't have a semicolon in it. There
could be some complication with multi-statement lines such as
is the time"""; foo='bar'
Here's a heuristic that comes from Emacs TeX modes. If you are in one
of the unknown states and you see an unescaped quote that is preceded
by white space (or is the beginning of the line/string), then it's
probably an opening quote. If you see an unescaped quote in an unknown
state that is followed by a space or line end (or semicolon) it is
probably a closing quote. But again there are Python idioms like
which defeat this. So running a tokenize.tokenize over the line first
(i.e. assume not in string) may be of help. Or first check to see if
all quotes are matching quotes might help.
A long time ago I wrote a program that tried to figure out if a dot
(.) was the end of a sentence (a period) or an dot of an abbreviation,
or inside a number. In that I had various heuristics like this and
basically assigned weights as to which kind of dot we had. And
something like that -- assigning weights and using the most-likely
outcome -- could also be done here.
More information about the IPython-user