You did WHAT with iptables? (Part 1: The Problem)
Technically speaking: stateless source address and port translation. Only on relevant packets, which are indistinguishable at the packet level from irrelevant packets. Why? Well, Music On Hold, of course.
The article that follows is dedicated to all of my friends and family who encompass my entire career as: "He fixes computers". Here is a brief glimpse at a small part of what I actually do. In reality, I'm not even good at fixing computers anymore, and I don't like doing it. Especially when the computer is running windows. Double especially when the computer is running windows 8. What an atrocity.
WARNING: In order to understand anything written here, it is recommended that you have moderate to expert knowledge of the following technologies: Linux, NetFilter/IPTables Firewalls, UDP/IP Packets, SIP/SDP, RTP, NAT, and Cisco UCM. I think that should do it, for this article at least.
Oh, you're still here? This article is more so meant as a brain dump of documentation while it's still fresh in my mind, I can't believe you're still planning to read it. Well then, you asked for it.
It all started when we were rolling our branches over to the new version of UCM. I was testing the fancy new call queuing features when I noticed that callers from the PSTN over SIP were not able to hear the initial announcement when entering the queue, or the music on hold while waiting patiently in the queue. The caller would simply hear dead air until either they hung up, or the call was transferred to an available rep. I worked on it for a few weeks, and then got bored and gave up, never really discovering the true nature of the problem. Well, eventually, after ignoring it for as long as I could, I dove back into the problem head first. By this point, we were fully operational on our new UCM Cluster, and the old UCM Publisher had been happily retired to the basement.
I began troubleshooting and discovered that the MOH treatment was failing on inbound toll-free calls, and outbound calls. Oddly enough, it worked on inbound local calls with an identical configuration. Surely this has to be a provider issue right? I have a working and non-working setup with identical configurations, the only difference is at the provider, well, yes and no, but I'll get into that later.
I put in a couple of support tickets with my provider, which I always regret because I typically get the "we can't support late offer" dance (I am always tempted to scream "shibboleet"). Even though my old UCM cluster did nothing but delayed offer, and it worked perfectly. Thanks. So I kept digging. An oddity I noticed early on is that in the new version of UCM (actually, in every version since 8.5 or so), they started sending an "m=audio 4000," line in the MOH SDP answer. In normal SIP, that 4000 would be the UDP port to which the remote destination is supposed to send it's RTP stream. It's typically a random high port. But, that shouldn't matter in this case, because the other new thing in this SDP is "a=sendonly", which means this is a send-only stream, and that we aren't expecting to receive any RTP data back from the remote party anyway. Also, remember, that this can't be the problem, because it works fine with inbound local calls, and my UCM is doing the exact same thing for those. Just to be safe, I tried removing the "a=sendonly" line, but it didn't help.
I poured over packet capture after packet capture for hours analyzing SIP packet after SIP packet and SDP body after SDP body looking for something, anything, different between the working and non working examples. Alas, there was none. I was finally defeated enough to bring my old, decrepit, UCM Publisher up from the basement, restore my old SIP proxy from backups, and re-create that whole old environment so that I could get a packet capture of a working example from that. By this point I was capturing the full SIP dialog, and the RTP stream post firewall, just to make sure there wasn't an issue with NAT traversal (that can be a bane when dealing with SIP, you know). So I captured only what I needed (all UDP traffic on the public interface of my production firewall), and the first thing I notice is that I am receiving an RTP stream from the remote end even when the call is on hold. Interesting. Come to think of it, that makes sense because the old UCM is not sending an "a=sendonly" audio attribute. But I was still stumped because I had tried removing that in my new environment and it did not help. Not to mention, the local inbound calls work fine even with a true send-only stream.
Intrigued by this new discovery (read: not overlooking this behavior anymore), I flipped my "a=sendonly" to "a=sendrecv" in my new environment (removing it is harder, and the SDP RFC says it's the default when missing anyway). Sure enough, a new packet capture showed that I was receiving an RTP stream from the remote device during an MOH session. But still no music on hold. Comparing the old environment SDP to the new environment SDP, I went through every difference, and modified my new environment to match exactly the old one with one exception "m=audio 4000,".
When you are looking at typical networking, a client will send data to a destination server at a particular destination port. Well the client must also provide a source port for that data on which it expects to receive possible responses. In a similar way, nearly all SIP devices will utilize the remote end's destination port (learned during the SDP exchange), as the source port of it's RTP stream. This is smart because it helps with the traversal of statefull firewalls, when a connection is initiated from a trusted network, typically the firewall dynamically allows the return traffic (traffic with the opposite source and destination ip/port) through without question. This is because typically once the connection is initiated, traffic is expected to return. This makes a firewall administrator's life a lot easier. With iptables, the command which does this looks something like the following:
sudo iptables -A FORWARD -m conntrack --ctstate RELATED,ESTABLISHED -j ACCEPT
Now I already knew that this was how most SIP devices behaved, so as a test at one point in all of this I flipped any UDP traffic destined for my provider from my UCM to be from source port 4000. This still didn't work, so I quickly flipped it back to normal. The UCM box not only does RTP for music on hold, but also does call conferencing, and media relay for our voicemail system (apologies for anyone that was trying to leave a voicemail or run a conference call during that test). The odd thing to remember is that all two-way audio works perfectly fine. What in the world!?!
So I did like Winnie The Pooh, and went back to my old UCM packet capture. Sure enough, the source port of it's MOH RTP stream was in fact the same as the destination port which it had advised the remote end to send to in the SDP (even though it's music on hold). Then a light-bulb went off. I'm probably not the only sysadmin in the world that runs a dynamic firewall for established connections. So the old system worked because of the combination of a proper source port, and the remote destination actually sending the RTP stream to the supposed destination port. That RTP stream is required to establish a connection through the remote firewall, even though it's never actually used. OK, now I had a solid theory and I needed to test it. I don't really like disrupting production traffic, so I had to do something different this time. That's when I stumbled onto rtptools. Sweet, so I can use my test SIP proxy to completely obliterate my UCM's SDP, and make the remote end send RTP to what will be my desktop. This creates an established connection on the remote firewall, expecting return traffic from the correct address/port (my desktop, remember). Then, with my live capture running I can discover the current RTP destination address and port of my cell phone, and use rtpplay to send a sample RTP stream with the proper source and destination parameters. OK, here goes:
./rtpplay -f ../pcap/g711u.rtp -s 4000 188.8.131.52/12345
Wow, the most beautiful 5 second slice of hold music I've ever heard played from a cell phone speaker. I played it over and over again because I couldn't believe what I was hearing.
Now I had positively identified the problem. How would I go about solving it? Let's walk through a few things I have tried at Part 2: The Almost Solution.