VoIP and SIP Deep Dive — SIP Messages, SDP, RTP, Codecs, NAT Traversal, and PBX Architecture
SIP (Session Initiation Protocol) is the text-based signaling protocol that powers modern voice and video communications — every VoIP call, video conference, and instant messaging session uses SIP messages like INVITE, BYE, and REGISTER to set up, manage, and tear down real-time sessions across IP networks.
What You’ll Learn
- SIP message structure and the six core methods: INVITE, BYE, REGISTER, ACK, CANCEL, OPTIONS
- SDP (Session Description Protocol) for media negotiation — codecs, IP addresses, and ports
- RTP and RTCP for real-time media transport and quality monitoring
- Codec selection: G.711, G.729, Opus, and bandwidth trade-offs
- NAT traversal using STUN, TURN, and ICE
- PBX architectures: Asterisk and FreeSWITCH
- SIP trunking for connecting to the PSTN
Why This Matters
Traditional PSTN networks are shutting down worldwide. AT&T completed its PSTN sunset in 2022. BT plans to retire the UK’s PSTN by 2027. All voice communication is migrating to SIP-based VoIP. Understanding SIP deeply — not just the basics — is essential for anyone building, deploying, or troubleshooting modern communication systems.
Doda Browser uses SIP-inspired WebRTC signaling for its embedded audio/video calling. DodaZIP applies SIP session patterns for managing long-running compression tasks with state tracking.
Learning Path
flowchart LR
A["Network Basics"] --> B["VoIP & SIP Basics"]
B --> C["VoIP/SIP Deep Dive<br/>You are here"]
C --> D["RTP & Codec Tuning"]
C --> E["NAT Traversal & Security"]
D --> F["PBX Deployment"]
style C fill:#f90,color:#fff
SIP Message Architecture
SIP is modeled after HTTP — request/response with headers, methods, and status codes. Every SIP message has a start line, headers, an empty line, and an optional body (usually SDP).
SIP Methods (The Six Core)
| Method | Purpose | Direction |
|---|---|---|
| INVITE | Initiate or renegotiate a session | Client → Server |
| ACK | Confirm final response to INVITE | Both |
| BYE | Terminate a session | Either party |
| REGISTER | Register a SIP URI with a location server | Client → Registrar |
| CANCEL | Cancel a pending INVITE | Client → Server |
| OPTIONS | Query server or UA capabilities | Client → Server |
SIP Status Codes
| Range | Meaning | Examples |
|---|---|---|
| 1xx | Provisional | 100 Trying, 180 Ringing, 183 Session Progress |
| 2xx | Success | 200 OK |
| 3xx | Redirection | 302 Moved Temporarily |
| 4xx | Client Error | 401 Unauthorized, 404 Not Found, 486 Busy Here |
| 5xx | Server Error | 500 Server Internal Error, 503 Service Unavailable |
| 6xx | Global Failure | 603 Decline |
Complete SIP Call Flow
sequenceDiagram
participant A as Alice (UA)
participant P as SIP Proxy
participant L as Location Service
participant B as Bob (UA)
A->>P: REGISTER sip:proxy.example.com
P->>L: Store A's contact
P->>A: 200 OK
Note over A,B: ---- Registration Complete ----
A->>P: INVITE sip:bob@example.com
P->>L: Where is Bob?
L->>P: Bob is at 192.168.1.20:5060
P->>B: INVITE (with SDP)
B->>P: 180 Ringing
P->>A: 180 Ringing
B->>P: 200 OK (with SDP)
P->>A: 200 OK
A->>B: ACK
Note over A,B: RTP Media Session (bidirectional audio)
A->>P: BYE
P->>B: BYE
B->>P: 200 OK
P->>A: 200 OK
SDP — Session Description Protocol
SDP is embedded in SIP message bodies (Content-Type: application/sdp) to negotiate media capabilities between endpoints.
v=0 -- Version
o=alice 2890844526 2890844526 IN IP4 10.0.0.1 -- Origin
s=Call with Bob -- Session name
c=IN IP4 10.0.0.1 -- Connection data
t=0 0 -- Time (start and stop)
m=audio 7078 RTP/AVP 0 8 101 -- Media: port, transport, formats
a=rtpmap:0 PCMU/8000 -- G.711 μ-law
a=rtpmap:8 PCMA/8000 -- G.711 A-law
a=rtpmap:101 telephone-event/8000 -- DTMF tones
a=sendrecv -- DirectionSDP Negotiation Flow
- Caller (Alice) sends INVITE with SDP offer — lists all supported codecs
- Callee (Bob) responds with 200 OK containing SDP answer — picks codec from offer
- Both sides now know: which codec, IP address, and port for RTP
RTP and RTCP
RTP (Real-time Transport Protocol) carries the actual audio/video data. RTCP (RTP Control Protocol) monitors quality.
RTP Header Structure
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|V=2|P|X| CC |M| PT | sequence number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| timestamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| synchronization source (SSRC) identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Key fields:
- PT (Payload Type): 0 = PCMU, 8 = PCMA, 18 = G.729
- Sequence number: Detect packet loss (gaps in sequence)
- Timestamp: Sample timing for jitter buffer
- SSRC: Unique stream identifier
RTCP Reports
RTCP provides feedback on call quality:
- Fraction lost: % of RTP packets lost
- Cumulative lost: Total packets lost during call
- Interarrival jitter: Variance in packet timing
- Round-trip delay: Network latency
Codec Comparison
| Codec | Bitrate | MOS Score | Bandwidth (RTP + IP + Eth) | Use Case |
|---|---|---|---|---|
| G.711 (PCMU/PCMA) | 64 kbps | 4.1 | ~90 kbps | PSTN interconnect, toll quality |
| G.729 | 8 kbps | 3.9 | ~34 kbps | Low-bandwidth links (remote offices) |
| G.722 | 64 kbps | 4.5 | ~90 kbps | HD voice, wideband audio |
| Opus | 6-510 kbps | 4.5+ | Variable | Best quality, adaptive bitrate |
| G.726 | 32 kbps | 3.8 | ~55 kbps | Legacy VoIP deployments |
Real-World Codec Decision
# Simulate codec bandwidth estimation for a call center
call_bandwidth = {
"g711": 90000, # bps per call
"g729": 34000, # bps per call
"opus": 50000, # bps per call (average)
}
link_bandwidth = 100_000_000 # 100 Mbps internet
for codec, bw in call_bandwidth.items():
max_calls = link_bandwidth // bw
print(f"{codec.upper()}: {max_calls} concurrent calls on 100 Mbps link")Expected output:
G711: 1111 concurrent calls on 100 Mbps link
G729: 2941 concurrent calls on 100 Mbps link
OPUS: 2000 concurrent calls on 100 Mbps linkSIP Architecture Components
flowchart TB
subgraph "SIP Architecture"
UA1["User Agent<br/>Softphone"] --> Proxy["SIP Proxy"]
UA2["IP Phone"] --> Proxy
UA3["ATA"] --> Proxy
Proxy --> Registrar["Registrar Server"]
Proxy --> Redirect["Redirect Server"]
Proxy --> B2BUA["B2BUA"]
B2BUA --> Media["Media Server<br/>RTP Mixer"]
end
Component Roles
| Component | Function |
|---|---|
| User Agent (UA) | Endpoint — phone, softphone, ATA |
| Proxy Server | Routes SIP messages between UAs |
| Registrar Server | Accepts REGISTER requests, maps AOR to contact |
| Redirect Server | Returns alternative contact instead of proxying |
| B2BUA | Back-to-back UA — manages both call legs independently |
NAT Traversal (STUN, TURN, ICE)
NAT is the biggest challenge in VoIP. Private IP addresses in SIP and SDP don’t work on the public internet.
The NAT Problem
SIP Phone at 192.168.1.10
Sends INVITE with SDP: c=IN IP4 192.168.1.10
↓
Internet router receives INVITE but 192.168.1.10 is unreachable
→ One-way audio (no return path)Solutions
| Solution | How It Works | When To Use |
|---|---|---|
| STUN | Client discovers its public IP and port via STUN server | Simple NAT (full-cone only) |
| TURN | Relays all media through a TURN server | Symmetric NAT, firewalls |
| ICE | Combines STUN + TURN, tries all candidate pairs | Universal — always preferred |
ICE Flow
- Gather candidates: host IP, STUN-reflexive IP, TURN-relayed IP
- Prioritize: host > STUN > TURN (TURN is last resort — expensive)
- Pair and check: each side tries connectivity checks
- Nominate: best working pair is selected
# Simulate ICE candidate prioritization
candidates = [
{"type": "host", "ip": "192.168.1.10", "port": 7078, "priority": 126},
{"type": "srflx", "ip": "203.0.113.5", "port": 3478, "priority": 100},
{"type": "relay", "ip": "198.51.100.1", "port": 3478, "priority": 50},
]
sorted_candidates = sorted(candidates, key=lambda c: c["priority"], reverse=True)
print("ICE candidate priority order:")
for c in sorted_candidates:
print(f" {c['type']:>5}: {c['ip']}:{c['port']} (priority {c['priority']})")Expected output:
ICE candidate priority order:
host: 192.168.1.10:7078 (priority 126)
srflx: 203.0.113.5:3478 (priority 100)
relay: 198.51.100.1:3478 (priority 50)PBX Architecture (Asterisk, FreeSWITCH)
Asterisk Configuration
; /etc/asterisk/sip.conf
[general]
context=public
allow=ulaw,alaw,g729,g722
nat=force_rport,comedia
qualify=yes
[1001]
type=friend
host=dynamic
secret=secure_pass
context=internal
mailbox=1001@default
[1002]
type=friend
host=dynamic
secret=secure_pass
context=internal
mailbox=1002@default
[trunk_sipgate]
type=peer
host=sip.sipgate.com
username=myaccount
secret=mypass
fromuser=myaccount
context=from_trunk
insecure=port,invite; /etc/asterisk/extensions.conf
[internal]
; Internal extensions
exten => 1001,1,Dial(SIP/1001,30)
exten => 1001,n,VoiceMail(1001@default)
exten => 1002,1,Dial(SIP/1002,30)
exten => 1002,n,VoiceMail(1002@default)
; Outbound calls via SIP trunk
exten => _0X.,1,Dial(SIP/${EXTEN}@trunk_sipgate)
exten => _0X.,n,Hangup()
; IVR Menu
exten => s,1,Answer()
exten => s,n,Background(welcome)
exten => s,n,WaitExten(5)
exten => 1,1,Dial(SIP/1001,20)
exten => 2,1,Dial(SIP/1002,20)FreeSWITCH Comparison
| Feature | Asterisk | FreeSWITCH |
|---|---|---|
| Configuration | Multiple config files | XML-based unified config |
| Media handling | Bridged RTP | Native media engine |
| Scalability | Good for small-medium | Better for high-density |
| Protocol support | SIP, IAX2, MGCP | SIP, WebRTC, Verto |
| Clustering | Limited native | Built-in clustering |
SIP Trunking
SIP trunks connect your PBX to the PSTN via the internet, replacing traditional phone lines and ISDN PRI.
Asterisk PBX → SIP Trunk Provider → PSTN → Mobile/Landline
Advantages over PRI:
✓ No physical lines needed
✓ Scalable (add channels as needed)
✓ Geographic number portability
✓ Usually cheaper per channelCommon Errors
1. One-way audio (no return path)
RTP packets flow in only one direction. Root causes: NAT without STUN/TURN, firewall blocking RTP ports, SIP ALG corrupting SDP. Check: tcpdump -ni any portrange 10000-20000 to see RTP flow.
2. SIP 403 Forbidden on REGISTER
The registrar rejected authentication. Check username/password, and verify the realm matches. Use qualify=yes in Asterisk peer config to test connectivity.
3. Codec mismatch — “No common codecs”
SDP offer contains codecs that answerer doesn’t support. Ensure both sides allow overlapping codecs. Use allow=all temporarily for debugging.
4. SIP ALG corruption
Many consumer routers rewrite SIP/SDP packets, breaking VoIP. Disable SIP ALG in the router configuration. This is the single most common VoIP issue.
5. Jitter buffer underrun / overrun
Network jitter causes audio gaps or delays. In Asterisk, tune jitterbuffer=yes and adjust maxjitterbuffer and resyncthreshold values.
6. Registration timeout with failover
Phones fail to register due to temporary network issues. Configure registraton_timeout and secondary registrars for high availability.
7. Echo on calls
Acoustic or electrical echo. Causes: speakerphone, poor headset, or network echo from PSTN gateway. Check echocancelwhenbridged=yes in Asterisk.
Practice Questions
What is the purpose of the SDP body in a SIP INVITE? SDP negotiates media parameters — codecs, IP addresses, ports, and stream direction (sendrecv/sendonly/recvonly).
How does ICE solve NAT traversal? ICE gathers multiple candidate addresses (host, STUN-reflexive, TURN-relayed), pairs them, and performs connectivity checks to find the best working path.
What is the difference between G.711 and G.729? G.711 uses 64 kbps uncompressed PCM — toll quality but high bandwidth. G.729 uses 8 kbps compressed — lower quality but 8x less bandwidth, ideal for low-speed links.
What is a B2BUA in SIP? A Back-to-Back User Agent terminates both call legs independently — it receives the call as a UAS and originates a new call as a UAC. Used in PBXs for features like call recording and transfer.
Why should SIP ALG be disabled on routers? SIP ALG (Application Layer Gateway) rewrites SIP/SDP packets, often corrupting them and breaking VoIP. It is universally recommended to disable it.
Challenge: Design a distributed VoIP architecture for a company with 500 employees across three offices (New York, London, Singapore). Office locations are connected via MPLS with 10 Mbps links. Specify (1) PBX type per office, (2) codec selection for inter-office calls, (3) NAT traversal for remote workers, (4) SIP trunk provider selection for PSTN access per region, (5) failover strategy if the MPLS link goes down.
FAQ
Try It Yourself
Use Python to parse and analyze SIP messages:
import re
sip_invite = """INVITE sip:bob@192.168.1.20 SIP/2.0
Via: SIP/2.0/UDP 192.168.1.10:5060;branch=z9hG4bK74b43
From: <sip:alice@example.com>;tag=12345
To: <sip:bob@example.com>
Call-ID: abcdef-12345
CSeq: 1 INVITE
Content-Type: application/sdp
Content-Length: 142
v=0
o=alice 2890844526 2890844526 IN IP4 192.168.1.10
s=-
c=IN IP4 192.168.1.10
t=0 0
m=audio 7078 RTP/AVP 0 8 101
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:101 telephone-event/8000"""
# Parse SIP method
method = sip_invite.split(" ")[0]
print(f"SIP Method: {method}")
# Parse headers
headers = {}
for line in sip_invite.split("\n"):
if ":" in line and not line.startswith(" "):
key, val = line.split(":", 1)
headers[key.strip()] = val.strip()
print(f"From: {headers.get('From', 'N/A')}")
print(f"To: {headers.get('To', 'N/A')}")
# Parse SDP codecs
sdp_part = sip_invite.split("\n\n")[1]
codecs = re.findall(r"a=rtpmap:(\d+) (.+)", sdp_part)
print(f"Offered codecs: {[c[1] for c in codecs]}")Expected output:
SIP Method: INVITE
From: <sip:alice@example.com>;tag=12345
To: <sip:bob@example.com>
Offered codecs: ['PCMU/8000', 'PCMA/8000', 'telephone-event/8000']What’s Next
| Tutorial | What You’ll Learn |
|---|---|
| VoIP Basics Guide | Foundational VoIP concepts |
| 5G Networks | How 5G enhances mobile VoIP |
| SS7 Signaling | Legacy telecom signaling protocol |
Built by the developers of Doda Browser, DodaZIP, and Durga Antivirus Pro. Updated 2026-06-20.
Built by the developers of DodaTech
Doda Browser, DodaZIP & Durga Antivirus Pro