Strace to the Rescue!

For some unfathomable reason, a mailing list that I was running, decided to stop working last week. Whenever anyone sent an email to the mailing list, they would get a bounced error and fail. I was forced to reboot the server two weeks ago, but I had not changed any settings since. So, I was rather puzzled as to what was going on.

The first thing I did was to check the mail server logs and the error surprised me:

Jul 2 15:41:23 localhost postfix/pipe[10572]: 70CB7401A4: to=, orig_to=, relay=mlmmj, delay=1.5, delays=0.96/0.02/0/0.49, dsn=5.3.0, status=bounced (Command died with signal 11: "/usr/bin/mlmmj-recieve")

This did not make any sense to me. MLMMJ was segfaulting without any reason. I tried googling around for a solution but there was none. MLMMJ is not a very well documented mailing list manager either. So, I was left to find my own solution to the problem.

The usual causes for a segfault are usually memory related. So, I thought that the server might be running out of RAM. To test this, I stopped running some other services to free up RAM. Then, I sent a test email to the list to see the same segfault error. So, it’s not a case of “lack of memory”. Then, I thought that it might be a file permissions problem so I set the test mailing list to be world read-write-able. This didn’t work either.

So, time to dig deeper into the “mlmmj-receive” programme. Reading up on the man pages, I realised that it was only a front-end software that calls two other back-end programmes to do the dirty work. It calls “mlmmj-process” to process the email and “mlmmj-send” to send the email. So, it was possible that any one of these three things were failing.

The next thing that I tried was to create a blank mailing list with no subscribers and sent a test email to it. Surprisingly, the “mlmmj-receive” completed without much ado. So, this indicated to me that the problem was likely with the “send” stage. But, it got me no closer to an answer as I couldn’t think of any reason why this stage was failing at all as the main SMTP server was running perfectly.

So, it was time to roll up the sleeves and engage STRACE. I have only used it once for debugging purposes and now it was time for me to see what was going on behind the scenes. But to deploy this, I would first need to intercept all the messages being sent to and returned from MLMMJ. So, I wrote a wrapper script “/usr/bin/mlmmj-receive” to wrap around “/usr/bin/mlmmj-recieve” (note the spelling difference).

#!/bin/sh
cat - > /tmp/mlmmj.in
cat /tmp/mlmmj.in | strace -f -o /tmp/mlmmj.log /usr/bin/mlmmj-recieve $@ 1>/tmp/mlmmj.out
cat /tmp/mlmmj.out

This simple script first intercepts all the data piped to MLMMJ and saves it in “/tmp/mlmmj.in”. The next line is the magic that uses strace to trace the actual operations, which dumps all the system calls to “/tmp/mlmmj.log”. Studying this log revealed, the interesting inner workings of MLMMJ and also provided a clue as to what was going wrong.

socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 5
connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("xx.xx.xxx.xxx")}, 28) = 0
fcntl64(5, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(5, F_SETFL, O_RDWR|O_NONBLOCK) = 0
gettimeofday({1215041316, 932534}, NULL) = 0
poll([{fd=5, events=POLLOUT, revents=POLLOUT}], 1, 0) = 1
send(5, "\v\352\1\1\nlocalhost\t\1\1", 28, 0) = 28
poll([{fd=5, events=POLLIN, revents=POLLIN}], 1, 5000) = 1
ioctl(5, FIONREAD, [28]) = 0
recvfrom(5, "\v\352\205\203\1\nlocalhost\t\1\1", 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("xx.xx.xxx.xxx")}, [16]) = 28
close(5) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++

This bit showed that just before segfaulting, the programme (in this case, mlmmj-send) was polling the DNS server for the IP address of “localhost”. This did not make sense as a computer should not be polling a DNS server for “localhost”, which is defined in “/etc/hosts”. In fact, at an earlier part of the programme, it actually opens up “/etc/hosts” and tries to find “localhost” in it.

Seems like there was a formatting problem with the file which I didn’t realise. But none of the other programmes seemed to have any problems connecting to localhost except MLMMJ. So, I fixed the problem and voila, the mailing list was working again.

Anyway, I now think that STRACE is a god-like tool. It gives me great insight into the inner workings of any software application. This would be very useful for reverse engineering purposes as well as debugging and optimisation tasks. It has an option to output timestamps, including how long each system call takes. I will look into this application more, and learn how to exploit it thoroughly. This tool is one to keep in the tool bag.

Published by

Unknown's avatar

Shawn Tan

Chip Doctor, Chartered/Professional Engineer, Entrepreneur, Law Graduate.

One thought on “Strace to the Rescue!”

Leave a comment