Parsing Log Files with Python and the pyparsing module
Submitted by theCamel on Mon, 02/12/2007 - 21:44.
Tags:
and here's the python code:
which outputs this:
So now I need to put these tokens into a dictionary and I can start searching for problems...
Tags:
I got a wild hair and have been trying to decide the best way to go about parsing java thread dumps with python. After a few failed attempts, Paul McGuire, author of the pyparsing python module, came to my rescue. Pyparsing is a fairly simple to use (considering what it is doing) parsing module, that makes quick work out of cryptic log files...
I was trying to parse a typical java thread dump, like this one, "web81.prod.dump":
"FILE Message Writer" daemon prio=5 tid=0x0093d7c0 nid=0xf in Object.wait() [a4e81000..a4e819c0]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71cb70> (a java.util.LinkedList)
at java.lang.Object.wait(Object.java:429)
at com.sitraka.pas.common.util.queue.ListQueue.dequeue(ListQueue.java:137)
- locked <0xbe71cb70> (a java.util.LinkedList)
at com.sitraka.pas.common.log.FileLogTarget$MessageWriter.run(FileLogTarget.java:359)
at java.lang.Thread.run(Thread.java:534)
"Timestamp Updater" daemon prio=10 tid=0x00ab9fa0 nid=0xe waiting on condition [a4f81000..a4f819c0]
at java.lang.Thread.sleep(Native Method)
at com.sitraka.pas.agent.recording.Timestamp$Updater.run(Timestamp.java:159)
at java.lang.Thread.run(Thread.java:534)
"Signal Dispatcher" daemon prio=10 tid=0x000f3660 nid=0x8 waiting on condition [0..0]
"Finalizer" daemon prio=8 tid=0x000f0488 nid=0x6 in Object.wait() [f9581000..f95819c0]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
- locked <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159)
"Reference Handler" daemon prio=10 tid=0x000eeb58 nid=0x5 in Object.wait() [f9681000..f96819c0]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71cdd8> (a java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:429)
at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:115)
- locked <0xbe71cdd8> (a java.lang.ref.Reference$Lock)
"main" prio=5 tid=0x000387e0 nid=0x1 in Object.wait() [ffbee000..ffbee4a4]
at java.lang.Object.wait(Native Method)
- waiting on <0xbe71d2a0> (a weblogic.t3.srvr.T3Srvr)
at java.lang.Object.wait(Object.java:429)
at weblogic.t3.srvr.T3Srvr.waitForDeath(T3Srvr.java:1208)
- locked <0xbe71d2a0> (a weblogic.t3.srvr.T3Srvr)
at weblogic.t3.srvr.T3Srvr.run(T3Srvr.java:390)
at weblogic.Server.main(Server.java:32)
and here's the python code:
#!/usr/bin/env python
from pyparsing import *
input = open("web81.prod.dump", 'r')
data = input.read()
#------------------------------------------------------------------------
# Define Grammars
#------------------------------------------------------------------------
integer = Word(nums)
hexnums = Word(alphanums)
end = Literal("\n").suppress()
all = SkipTo(end)
threadname = dblQuotedString
daemon = Literal("daemon")
objectwait = Literal("in Object.wait()")
waitmon = Literal("waiting for monitor entry")
waitcon = Literal("waiting on condition")
runnable = Literal("runnable")
runstate = objectwait | runnable | waitmon | waitcon
memloc = Word(alphanums + "\[\].")
waitlock = Combine (Group(Literal("- waiting to lock")+ all))
waiton = Combine (Group(Literal("- waiting on")+ all))
locked = Combine (Group(Literal("- locked")+ all))
verbline = Combine (Group("at " + all))
condition = waitlock | waiton | locked
cond = ZeroOrMore(condition + restOfLine).setResultsName("condition")
cond.ignore(verbline)
priority = "prio=" + integer.setResultsName("prio")
tidref = "tid=" + hexnums.setResultsName("tid")
nidref = "nid=" + hexnums.setResultsName("nid")
logEntry = threadname.setResultsName("threadname") + daemon + priority + tidref + nidref \
+ runstate.setResultsName("runstate") + memloc.setResultsName("memloc") \
+ cond
#------------------------------------------------------------------------
for tokens in logEntry.searchString(data):
print
print "THREADNAME =\t "+ tokens.threadname
print "PRIORITY =\t "+ tokens.prio
print "TID =\t\t "+ tokens.tid
print "NID =\t\t "+ tokens.nid
print "RUNSTATE =\t "+ tokens.runstate
print "MEMORY ADDRESS = "+ tokens.memloc
print
print "CONDITIONS:"
print
for x in tokens.condition:
print x
print 50*"-"
which outputs this:
THREADNAME = "FILE Message Writer" PRIORITY = 5 TID = 0x0093d7c0 NID = 0xf RUNSTATE = in Object.wait() MEMORY ADDRESS = [a4e81000..a4e819c0] CONDITIONS: - waiting on <0xbe71cb70> (a java.util.LinkedList) - locked <0xbe71cb70> (a java.util.LinkedList) -------------------------------------------------- THREADNAME = "Timestamp Updater" PRIORITY = 10 TID = 0x00ab9fa0 NID = 0xe RUNSTATE = waiting on condition MEMORY ADDRESS = [a4f81000..a4f819c0] CONDITIONS: -------------------------------------------------- THREADNAME = "Signal Dispatcher" PRIORITY = 10 TID = 0x000f3660 NID = 0x8 RUNSTATE = waiting on condition MEMORY ADDRESS = [0..0] CONDITIONS: -------------------------------------------------- THREADNAME = "Finalizer" PRIORITY = 8 TID = 0x000f0488 NID = 0x6 RUNSTATE = in Object.wait() MEMORY ADDRESS = [f9581000..f95819c0] CONDITIONS: - waiting on <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock) - locked <0xbe71cd70> (a java.lang.ref.ReferenceQueue$Lock) -------------------------------------------------- THREADNAME = "Reference Handler" PRIORITY = 10 TID = 0x000eeb58 NID = 0x5 RUNSTATE = in Object.wait() MEMORY ADDRESS = [f9681000..f96819c0] CONDITIONS: - waiting on <0xbe71cdd8> (a java.lang.ref.Reference$Lock) - locked <0xbe71cdd8> (a java.lang.ref.Reference$Lock) --------------------------------------------------
So now I need to put these tokens into a dictionary and I can start searching for problems...
You can download pyparsing here
»
- theCamel's blog
- Add new comment
- 3832 reads

Totally sweet. I am trying to parse a similar trace, which is a 10gb file. So performance is a big deal for me. Cant decide which parsing module to use in python.
Your post helps, thanks for the example.
Glad you found this useful. I will add one more endorsement for the pyparsing module: I have yet to post a question about pyparsing (under many made up usernames) to ANY python forum where Paul McGuire, the author of the module, hasn't answered my question personally. The guy is omnipotent, brilliant, friendly, and tireless. You'll never lack support with pyparsing.
Very nice example and clear output. I will definitely give pyParsing a closer look now
Great Example. Worked right out of the box. !!!Was very helpful!!!
Thanks!!