View
262
Download
3
Category
Tags:
Preview:
Citation preview
Georg-August-UniversitätGöttingenZentrum für Informatik
ISSN 1612-6793Nummer GAUG-ZFI-BM-2007-28
Masterarbeitim Studiengang "Angewandte Informatik"
RTP over Datagram TLS
John-Patrick Wowra
Computer Networks Group
Bachelor- und Masterarbeitendes Zentrums für Informatik
an der Georg-August-Universität Göttingen
17. September 2007
Georg-August-Universität GöttingenZentrum für Informatik
Lotzestraße 16-1837083 GöttingenGermany
Tel. +49 (5 51) 39-1 44 14
Fax +49 (5 51) 39-1 44 15
Email office@informatik.uni-goettingen.de
WWW www.informatik.uni-goettingen.de
Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keineanderen als die angegebenen Quellen und Hilfsmittel verwendet habe.
Göttingen, den 17. September 2007
Masterarbeit
RTP over Datagram TLS
John-Patrick Wowra
17. September 2007
Betreut durch Prof. Dr. FuComputer Networks Group
Georg-August-Universität Göttingen
AcknowledgementI would like to acknowledge my advisor Prof. Dr. Xiaoming Fu for excellent guidance,motivation and encouragement, my parents and Katerina for their support and ChristianDickmann for his patience and helpfulness.
Abstract
The popularity of Internet Telephony has been rising continuously in recent years. Witha rising number of users inevitably the number of malicious users rises as well. Hencesecurity is a major concern for Internet Telephony.Commonly RTP is used with Internet Telephony for transmission and reception of audioand video data. Traditionally, RTP runs over UDP, and RTP traffic is in most cases trans-mitted without any protection.Datagram TLS is a modified version of TLS that functions properly over datagram trans-port. This thesis studies an RTP extension based on DTLS, and includes conduction of aprototype implementation and further analysis of the design towards securing RTP andthus Internet Telephony.
4
Contents
1 Introduction 81.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2 Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Thesis Organisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Background 122.1 Voice over IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Real Time Transport Protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.3 SSL/TLS and DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Session Initiation Protocol SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3 Related Work 283.1 Security in VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Internet Protocol Security, IPsec . . . . . . . . . . . . . . . . . . . . . . 293.1.2 Comparison between IPsec and DTLS . . . . . . . . . . . . . . . . . . 32
3.2 Secure Real Time TransportProtocol . . . . . . . . . . . . . . . . . . . . . . . 33
4 Security Considerations for VoIP 354.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.1.1 Confidentiality in VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.1.2 Availability in VoIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.2 Threats and Attacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 RTP over DTLS 395.1 Introduction to RTP over DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.1.1 SRTP Compatibility Mode . . . . . . . . . . . . . . . . . . . . . . . . . 405.1.2 Packet size Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 415.1.3 Security Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 Implementation Design 426.1 Analysis of Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.2 System Idea/Intent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2.1 DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5
Contents
6.2.2 RTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436.2.3 SIP Softphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 RTP over DTLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.4 Choice of Libraries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6.4.1 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 466.4.2 CCRTP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 476.4.3 Twinkle Softphone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7 Design Details 497.1 Design Components: RTP - ccRTP, DTLS - OpenSSL and SIP - Twinkle . . . 49
7.1.1 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.1.2 Socket Initialisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507.1.3 Session Initialisation with ccRTP . . . . . . . . . . . . . . . . . . . . . 507.1.4 Sending Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.1.5 Receiving Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.1.6 Closing Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527.1.7 Types of Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.2 SIP Session Initiation with Twinkle . . . . . . . . . . . . . . . . . . . . . . . . 527.3 Implementation Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537.4 Class Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557.5 Problems and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8 Testing 578.1 Testing Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 578.2 Testbed Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.3 Measurement Methods and Tools . . . . . . . . . . . . . . . . . . . . . . . . . 588.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.5 Standard RTP Packet Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.6 RTP over DTLS Packet Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . 638.7 CPU Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 648.8 Test Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
9 Conclusion and Future Work 669.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 669.2 Future Work and Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Bibliography 69
6
List of Figures
2.1 Strukture of an RTP packet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.2 Schematic representation of the SSL handshake protocol with two way au-
thentication with certificates [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . 182.3 DTLS in the TCP/IP stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.4 DTLS packet struckture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5 DTLS state machine [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.6 Initialisation of a SIP session . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1 IPsec in the TCP/IP stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2 Structure of an IPsec packet with AH . . . . . . . . . . . . . . . . . . . . . . . 313.3 Structure of an IPsec packet with ESP . . . . . . . . . . . . . . . . . . . . . . 31
5.1 Struckture of an RTP packet sent over DTLS . . . . . . . . . . . . . . . . . . . 40
7.1 Implementation status after phase 1 . . . . . . . . . . . . . . . . . . . . . . . 537.2 Implementation status after phase 2 . . . . . . . . . . . . . . . . . . . . . . . 547.3 Implementation status after phase 3 . . . . . . . . . . . . . . . . . . . . . . . 547.4 RTP over DTLS class structure . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
8.1 Testbed for RTP over DTLS tests . . . . . . . . . . . . . . . . . . . . . . . . . 598.2 Delay for normal RTP packets . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.3 Delay for RTP over DTLS packets . . . . . . . . . . . . . . . . . . . . . . . . . 63
7
1 Introduction
1.1 Motivation
Today enterprises have to maintain two networks in order to use the services of Internet
and Telephone. But traditional landline phones as we all know them are bit by bit replaced
with new Internet Phones for their advantages.
Internet Telephony is the routing of voice information over the Internet (or other IP based
networks). The telephone calls are handled by protocols which are commonly referred to
as Voice over Internet Protocol (VoIP). VoIP technology provides a wide range of services
to users. As an additional feature VoIP offers for example video calls. VoIP calls are also
cheaper than traditional phone calls; calls between two VoIP participants are even free.
Enterprises with branches in different cities that are connected by a VPN might use VoIP
technology for internal communication between the branches and can thereby reduce costs
significantly.
Beside the reduction of costs for calls, the infrastructure has become more flexible because
VoIP technology provides open platforms in contrast to traditional Telephony. In tradi-
tional Telephony networks standards were only known to a small circle of developers at
the network provider. Nowadays with VoIP the protocols, software and tools can be im-
proved and adjusted to the needs of the users not only by their developers.
The total number of VoIP users has been rising continuously over the past years. With
a rising number of users inevitably the number of malicious users rises as well. Hence
8
1 Introduction
security is a major concern for Internet Telephony. Since VoIP is based on IP [3], it is vul-
nerable to all of the attacks that can plague traditional IP networks, like packet snooping,
unauthorised access, spoofing and especially denial of service attacks. Usually a conversa-
tion over a traditional phone is established over the communication’s provider’s network.
All companies involved in the connection are known and have to be trusted. With VoIP
data is transmitted through a lot of networks where not all providers are known. Anyone
with access to a machine along the path of communication could access the transmitted
data. Therefore VoIP calls are more vulnerable to eavesdropping than landline telephones.
However this is a known problem from other applications transmitting confidential data
over an insecure network such as the Internet.
Cryptographic protocols can be used to secure data from being eavesdropped or altered. A
well known and as reliable considered security protocol is Secure Sockets Layer/Transport
Layer Security (SSL/TLS) [4]. SSL/TLS residues above Transport Layer and commonly
uses the Transmission Control Protocol (TCP) [5] or alike. TCP is a reliable and connection
oriented protocol with mechanisms for buffering and retransmission. Thereby it is assured
that the received data is exactly the same as the data transmitted. This is however not the
primary desired feature in VoIP. The problem hereby is the buffering and retransmission
mechanisms. The data is sent with unreliable IP protocol. Hereby packets might not arrive
in the order they were transmitted or they can get lost on the way. TCP reassembles the
packets to the right order and waits for lost packets to be retransmitted to reassemble the
data as it was sent. This is very useful for services like e-mail where the received data is
desired to be the same as the data transmitted, but in a VoIP a stream of data is played
continuously to the receiver and a delay caused by retransmission results in a pause of the
media stream.
A delay in VoIP is defined as the time the voice takes on its way from the mouth of the
speaker to the ear of the listener. It is the sum of time needed to digitalise the voice to
audio data, fragment the stream of audio data to data packets and transmit the data to
9
1 Introduction
the destination. Delays are commonly known in traditional telephony. For instance long
distance phone calls used to have quite long delays until the spoken word is received on
the other side. These delays make a fluent conversation like in a face to face conversation
impossible. Therefore in VoIP traffic the highest priority is not the exact transmission of
the data, instead the data needs to be transmitted to the receiver as fast as possible to re-
duce the delays caused by the transmission over the Internet. Thus VoIP protocols such as
the Real Time Transport Protocol (RTP) [6] rely on connectionless transmission using the
User Datagram Protocol (UDP) [7]. UDP has no mechanism for retransmission of lost data
packets. Hence in RTP lost, damaged or late packets are discarded and the media stream
is played continuously. In case a packet gets lost the next received packet will be played
immediately and as long as the amount of lost packets does not exceed a certain amount,
the receiver does not even notice that packets are lost.
With the goal of securing real time media such as VoIP, TLS was enhanced in order to work
with UDP datagrams. This advancement of TLS is called Datagram Transport Layer Se-
curity (DTLS) [8] and it was standardised in spring 2006 by the Internet Engineering Task
Force (IETF1). In the same time the IETF published an Internet Draft on RTP over DTLS [9].
The core of this thesis is the design, implementation and test of a prototype implementa-
tion of RTP over DTLS.
1.2 Goals
Goal of this thesis is taking part in the design of a unified media security framework for
Internet Telephony, using RTP and DTLS. The focus herby lies in the interaction of RTP
and DTLS components of the framework. A critical aspect in terms of efficiency of the im-
plementation framework is the packet loss. Packet loss in media streaming occurs, when
data packets do not arrive within a time limit to be inserted into the data stream any more.
1http://www.ietf.org/
10
1 Introduction
The critical aspect hereby is the delay, the sum of time it takes to transmit voice data from
caller to callee. The recommended threshold for delays in Telephony is 150 milliseconds
according to the International Telecommunication Union Standardisation Sector (ITU-T2).
For Telephony a packet loss rate of up to 5% is still acceptable according to the ITU-T.
Therefore the implementation of RTP over DTLS shall provide a packet loss rate lower
than 5%.
A prototype of a VoIP application using RTP over DTLS was implemented in order to deter-
mine whether RTP traffic can be effectively transmitted over DTLS without compensation
of the quality of the call. This prototype was developed based on existing implementations
of RTP and DTLS. Technical premises and detailed requirements to this implementation
needed to be analysed to lead to an adequate approach. The prototype was tested for the
critical aspects in order to determine the usability of the approach in order to lay path to
further development of the unified media security framework.
1.3 Thesis Organisation
This thesis is organised as follows: It starts with an overview on basic VoIP components in
the second Chapter as an introduction and for better understanding of this thesis. Chap-
ter 3 presents related work and a discussion about alternative concepts of how Internet
Telephony can be secured. Chapter 4 provides security considerations and an overview
of possible attacks in VoIP traffic. Chapter 5 presents the general concept of RTP over
DTLS. Chapter 6 deals with the implementation design and the choice of the libraries to
use. In Chapter 7 the implementation process is described along with problems and a de-
tailed description how the libraries work together. In Chapter 8 the prototype is tested
as an evaluation of the approach. Chapter 9 deals with open issues and future work and
contains the conclusion.
2http://www.itu.int/ITU-T/
11
2 Background
This chapter describes the basic concepts which are necessary to understand this thesis.
An introduction to Internet Telephony is given along with a description of the main pro-
tocols used in this thesis. Due to space limitations the level of detail is kept moderate and
interested reader are suggested to follow the references of this thesis.
2.1 Voice over IP
VoIP also called IP Telephony, Internet telephony, Broadband telephony, Broadband Phone
and Voice over Broadband is the routing of voice conversations over the Internet or any
other IP-based network. First transmissions of digitalised audio data from one computer
to another were achieved in 1973 in the Advanced Research Projects Agency Network
(ARPANET1) with a throughput of 3.490 bit/s [10].
A VoIP call is established in a similar way like a traditional phone call. There are three
general phases: Connection Initiation, Transmission of Voice and Connection Termination.
The initiation and termination are done over a signalling protocol. A common signalling
protocol today is the Session Initiation Protocol (SIP) [11] which is presented in detail in an
upcoming section, H323 [12], IAX (InterAsterisk eXchange) [10] and Skype2 [13].
To initiate a VoIP call with SIP the caller invites the callee to a so called session. In or-
1http://www.darpa.mil/2http://www.skype.com
12
2 Background
der to establish a connection, the caller needs to know where and whether the callee is
available. Subscribers of a SIP provider have a so called SIP-Uniform Resource Iden-
tifier (SIP-URI). These addresses are similar to e-mail addresses in the URI format (e.g.
sip:username@example.com). Before any user can call another user or receive a call, the
terminal device must register to the central server of the SIP provider and thereby inform
their provider that they are online and ready to receive calls. The server has now informa-
tion about the location of the logged user, thereby the user is reachable through the server
to other SIP users.
For connection initiation the caller sends an invite message to the server, which will be for-
warded to the callee, whose terminal device will be ringing then. Upon acknowledgement
of the call an accept message is send back to the caller along with the current IP address
of the callee. The servers are not needed for the session anymore because the session is in-
tiated. The media channel is now established directly between the participants with RTP.
A detailed description of a VoIP call initiation using SIP is provided in the upcoming SIP
section. It is generally (e.g. with SIP) possible to establish a connection directly between
caller and callee without servers, but then the IP address of the callee must be known to
the caller. This is somewhat impractical as we know from telephone numbers and from
the Internet. Nobody remembers a website by its IP address but by its domain name. A
name is a much better association to a person or company and much better rememberable.
Furthermore, IP addresses are dependent on the users location (e.g. at work and at home).
The transport of the audio data is achieved with the Real Time Transport Protocol (RTP)
[6], which is presented in detail in an upcoming section. RTP divides the audio data stream
into small packets which are then transmitted via IP usually directly from speaker to lis-
tener, where an audio stream is generated from the received data packets that is played to
the receiver.
In enterprises VoIP is used more and more to reduce infrastructure costs since only one
network infrastructure is needed instead of two, one for IP and one for Telephony. For en-
13
2 Background
terprises and private users a great benefit is the saving of telephone call costs. Calls from
VoIP to VoIP are normally free. Enterprises therefore tend to use VoIP for internal commu-
nication and traditional Telephony for outbound calls. Connections to landline phones are
possible through gateway services (which are provided e.g. by SIP providers) but these
connections are usually charged. In order to be reachable through such a gateway by a
traditional phone providers offer their customers additionally to their address a landline
phone number. The users are similarly to e-mail reachable through the same address or
telephone number worldwide regardless of the current residence as long as the user is
connected to the Internet.
As terminal device a large variety of devices can be used, that can connect to networks
(IPphones, cellphones, PCs, PDAs, Analogue Phones with special adapters, ...). Another
benefit of VoIP is the flexibility provided by the open standards. Thereby new services can
easily be added to VoIP. With reduction of costs, increased reachability, flexibility and ad-
ditional services like video calls VoIP will play a significant role in the future of Telephony.
2.2 Real Time Transport Protocol
In VoIP RTP [6] is the commonly used protocol for the transmission of audio data. RTP
provides end-to-end delivery services for data with real-time characteristics, such as inter-
active audio and video or simulation data. Those services include payload type identifica-
tion, sequence numbering, time stamping and delivery monitoring.
Typically RTP runs over UDP in order to achieve timely delivery of the data packets. TCP is
not used because of its retransmission mechanism. The reordering of retransmitted packets
leads to head of line blocking in the media stream which delays the packets which arrived
in time. RTP does not provide any mechanism to ensure timely delivery or provide other
quality of service guarantees, but relies on lower-layer services to do so (e.g. NSIS3 [14]).
3http://tools.ietf.org/wg/nsis/
14
2 Background
VoIP applications using RTP require at least two participants who communicate by trans-
mitting and receiving multimedia (voice and/or video) data to each other. An association
among a set of participants communicating with RTP is called an RTP session or confer-
ence. A participant may be involved in multiple RTP sessions at the same time.
The data transport of RTP is augmented by the RTP Control Protocol (RTCP) [6] to al-
low monitoring of data delivery in a manner scalable to large multicast networks, and to
provide minimal control and identification functionality. RTCP is based on the periodic
transmission of control packets to all participants in the session. The primary function is to
provide feedback on the quality of the data distribution. In its second function RTCP car-
ries a persistent transport-layer identifier for an RTP source called the canonical name, or
CNAME. While other idenifiers, as the later explained SSRC may change during a session,
the CNAME remains the same. It is used to identify a participant during a session. By
having each participant send its control packets to all other participants of a session, each
can independently observe the number of participants. This number is used to calculate
the rate at which the packets are sent. Hereby more users in a session result in less frequent
transmission of RTCP packets by each participant. This is necessary because otherwise the
RTCP data traffic could take bandwidth from the connection and cycles from the CPU that
are needed by the RTP data traffic.
To establish an RTP session a pair of ports is reserved one for audio data and the other one
for control (RTCP) packets. The audio conferencing application (the so called VoIP-phone)
is used by each RTP session participant and sends audio data in small chucks of approx-
imately 20 ms duration. Each chunk of audio is preceded by an RTP header indicating
what kind of audio encoding (e.g. PCM, ADPCM or LPC) is contained in each packet, so
that senders can change the encoding during a conference. To cope with lost packets and
delays the RTP header contains timing information and a sequence number that allow the
receivers to reconstruct the timing produced by the source. Hence the audio stream can be
played out continuously. Conferences of both, audio and video are realised by transmit-
15
2 Background
Figure 2.1: Strukture of an RTP packet
ting each in a separate RTP session.
In case one of the participants of an RTP session has a lower bandwidth connection to the
network than the other participants, an RTP-Proxy (or so called mixer) can be used to solve
this issue. A mixer is placed in the low bandwidth area; the mixer resynchronises incom-
ing audio packets from multiple sources to a single audio stream. Thereby the audio data
can be further compressed by using a different codec to enable the user with the low band-
width connection to receive packets from multiple sources. Mixers can be used as well to
compose a single video stream as a composition of multiple sources to a group scene of the
participating users.
The source of the stream of RTP packets is identified by a numeric value in the header of
RTP packets. This 32-bit numeric value is called Synchronisation Source (SSRC) identifier.
Therefore it is independent upon the network address. Since all packets from an SSRC
form part of the same timing and number space, a receiver can group packets by the SSRC
for playback. The outgoing RTP packets of the mixer are then identified by the mixer’s
SSRC value.
The structure of an RTP packet is shown in figure 2.1 on page 16. The RTP Payload consists
of the media that is being transmitted. The RTP Header contains information related to the
payload e.g. the source, encoding etc. The RTP packet is then wrapped in a UDP packet
which is encapsulated in an IP packet to be transferred over an IP based network.
16
2 Background
2.3 SSL/TLS and DTLS
The first versions of SSL were developed by Netscape4 as a security protocol for Internet
traffic with the Netscape Internet Browser. Netscape’s competitor Microsoft5 developed its
own Security Protocol, the Private Communications Technology (PCT) which was derived
from the second version of SSL. The IETF chartered the Transport Layer Security (TLS)
working group to try to standardise an SSL like protocol in May 1996 to harmonise the
different approaches with the result that SSL was from then on enhanced under the name
TLS.
Today TLS is a widely deployed protocol for securing network traffic. It is currently used
for protecting Internet traffic (e.g. Internet Banking) with the Hyper Text Transfer Protocol
Secure (HTTPS) [15] and for e-mail protocols. It provides a secure channel to applications
with three primary security features:
• Authentication of the server
• Confidentiality of the communication channel
• Message integrity of the communication channel
Optionally TLS can provide authentication of the client. Public key based digital signa-
tures are used which are backed by certificates. The server authenticates by decrypting a
secret encrypted under his public key or by signing a random challenge.
The TLS handshake is a conventional two round trip algorithm negotiation and key estab-
lishment protocol. Hereby the most common variant is RSA based handshake [16]. Figure
2.2 on page 18 presents the handshake which can be divided into four phases: In phase
1 the client sends a client_hello to the server who responds with a server_hello. In these
messages the latest supported protocol version is transmitted to negotiate the version to be
4http://www.netscape.com5http://www.microsoft.com
17
2 Background
Figure 2.2: Schematic representation of the SSL handshake protocol with two way authentication with certificates [1].
18
2 Background
Figure 2.3: DTLS in the TCP/IP stack
used, a 32 bit random number upon which the pre-master secret will be generated, the Ses-
sion Identifier (Session ID) and the cipher suite to use. Phase two and three are optional, in
phase two the server identifies himself with a certificate to the client. The client identifies
himself to the server in case a certificate is available. Additionally the client verifies the
server certificate which contains the public key of the server. If the certificate cannot be
verified, the connection is closed. The handshake is finished in phase four with the genera-
tion of the Master Secret, a single use symmetric key that is used during the connection for
en-/ and decryption of messages. From now all messages will be transmitted encrypted.
With the rising popularity of VoIP and other multimedia services it became necessary to
use TLS as well with the faster UDP protocol. TLS itself could not be used directly, because
after a packet loss the following data packets cannot be decrypted anymore.
Datagram Transport Layer Security (DTLS) [8], which was standardised in April 2006, is
a datagram capable version of TLS; therefore it is extremely similar to TLS. The DTLS
protocol allows client/server applications to communicate in a way that is designed to
prevent eavesdropping, tampering, or message forgery. DTLS reuses almost all the proto-
col elements of TLS, with minor but important modifications for it to work properly with
unreliable transport protocols. Figure 2.3 on page 19 shows DTLS in the five layer TCP/IP
protocol stack.
DTLS packets have a structure as in figure 2.4 on page 20. In contrast to TLS in the DTLS
19
2 Background
Figure 2.4: DTLS packet struckture
handshake protocol a stateless cookie exchange is used to prevent denial of service. Addi-
tionally message fragmentation and re-assembly was added. DTLS handshake messages
may be lost, since transmission takes place over datagram transport; therefore DTLS needs
a mechanism for retransmission during handshake. This is achieved by incorporating a
timer at each end point. Each end-point keeps retransmitting its last message until a reply
is received.
Furthermore DTLS unlike TLS is vulnerable to two types of denial of Service attacks. The
first attack is a standard resource consumption attack. The second attack is an amplifi-
cation attack, where the attacker sends a client_hello message apparently sourced by the
victim. In order to avoid these attacks, DTLS uses the cookie exchange technique that has
been used in protocols such as Photuris [17].
Before the handshake proper begins, the client must replay a cookie provided by the server
in order to demonstrate that it is capable of receiving packets at its claimed IP address. The
DTLS client_hello message contains a cookie field, which is empty in case there is no cached
cookie from a prior exchange. The message contains the DTLS version, a list of algorithms
and compression methods that the client will accept. The server responds with three mes-
sages, the server_hello contains the server’s choice of version and algorithms. The certifi-
cate contains the server’s certificate chain. The server_hello_done is a message to inform
the Client that the handshake is done. Because of the possibility that DTLS handshake
messages get lost, DTLS implements retransmission using a single timer at each endpoint.
Each endpoint keeps retransmitting its last message until a reply is received.
A state machine implements the timer and resulting retransmissions. Figure 2.5 on page
20
2 Background
Figure 2.5: DTLS state machine [2]
21
2 Background
21 shows this state machine. Once in the ‘Read Message Fragment’ state, transitions are
triggered by the arrival of data fragments or the expiration of the retransmission timer. If a
data fragment is the expected next handshake message then the fragment is returned to the
higher layers and the timer is revoked. Otherwise, the fragment is buffered or discarded
as appropriate and the timer is allowed to continue ticking. When the retransmit timer
expires, the implementation retransmits the last messages that it transmitted.
DTLS is perfectly predetermined to be used with VoIP because the security of TLS is com-
bined with fast delivery of UDP filling this gap with the existing protocols.
2.4 Session Initiation Protocol SIP
The Session Initiation Protocol is a protocol to enable multi-user communication sessions
regardless of media content. SIP is specified by the IETF in RFC 3261 [11]. SIP emerged in
the mid-1990s from the research of people among whom some were involved in the spec-
ification of RTP. SIP is an application-layer control protocol that can establish, modify and
terminate multimedia sessions such as VoIP calls.
SIP transparently supports name mapping and redirection services, which supports per-
sonal mobility. Thereby SIP provides the basic requirements in communications like:
• User location
• User availability
• User capabilities
• Session setup
• Session management
User location: SIP determines the location of a user by a registration process. When a VoIP
22
2 Background
phone is activated, it sends out a registration to the SIP server announcing availability
to the communications network.
User availability: User availability is a method of determining whether a user would be
willing to answer a request to communicate. A user can have several locations regis-
tered, but might only accept incoming communications on one device. If that is not
answered, it transfers to another device or an application, such as voice-mail.
User capability: There are many methods and standards of multimedia communications,
this method checks for the users’ capabilities, for example whether a camera for video
calls is available or which encryption/decryption methods a user can support.
Session setup: SIP establishes the session parameters for both ends of communications,
the actual session establishment, when one user calls and another user answers.
Session management: This method manages for example the transfer of a call from one
device to another (e.g. from a laptop to a mobile-phone and vice versa) without caus-
ing a noticeable impact to the communication partner. Another example is the invi-
tation to a third user to a VoIP session and thereby the establishment of a conference
call (multiuser session).
SIP is not a vertically integrated communications system. SIP is rather a component that
can be used with other IETF standardisations, like RTP to build a complete multimedia
architecture. An important feature of SIP is that it does not define the type of session that
is being established, only how it should be managed. This flexibility means that SIP can
be used for a huge number of applications and services. To date, the 3G Community6 has
selected SIP as the session control mechanism for the next generation of cellular networks.
Microsoft has chosen SIP for its real-time communications strategy and has deployed it in
6http://www.3gpp.org/
23
2 Background
various products.
There are four major components in the SIP architecture:
• SIP User Agents
• SIP Registrar Server
• SIP Proxy Servers
• SIP Redirect Servers
These components deliver messages embedded with the Session Description Protocol (SDP)
[18] defining their content and characteristics to complete a SIP session. The terminal de-
vices of SIP are called the SIP User agents (UAs), which can be any kind of a device capa-
bility of transmitting voice or other media over a network (e.g. cell-phones, PCs, PDAs,...).
These devices are used to create and manage a SIP session. Every User Agent needs a
unique identifier which is called SIP-URI. SIP addresses use like e-mail addresses the URI
format: “sip:user@example.com”. Another address system are the URLs for Telephone
Calls (tel-uri) which are described in [19] where a traditional phone number can be mapped
to a SIP address. This is used by gateway servers that many SIP providers maintain in order
to enable traditional phone users to call VoIP users. Basically a connection is established,
when a User Agent Client (caller) sends an invitation message and the User Agent Server
(callee) responds to it. This initiation can be achieved directly (peer-to-peer), in case the
current IP address of the User Agent server is known. For the user it is more comfortable
to initiate the session with the SIP provider using a SIP-URI.
The SIP Registrar Servers are databases that contain the location of all User Agents within
a domain. These servers retrieve and send participants messages and other information to
the SIP Proxy Server.
SIP Proxy Servers accept session requests made by a SIP User Agent and query the SIP
Registrar Server to obtain the recipients User Agent’s addressing information. The SIP
24
2 Background
Figure 2.6: Initialisation of a SIP session
25
2 Background
Proxy Server then forwards the invitation to a session directly to the recipient User Agent
if it is located in the same domain or to a Proxy Server if it is located in another domain.
The SIP Redirect Servers allow SIP Proxy Servers to redirect SIP session invitations to ex-
ternal domains. The SIP Redirect Server, the SIP Registrar Server and The SIP Proxy Server
may reside in the same hardware. Figure 2.6 on page 25 illustrates the establishment of a
SIP session between two Internet Service Providers (ISPs).
Before any session may be established both users must power their devices and register
their availability and their IP addresses with the SIP Proxy Server in the ISP’s network in
case the connection is established with a SIP provider. User A initiates the call by notifying
the Proxy Server in domain A.com a request to communicate with User B.
1. The SIP proxy Server in Domain A recognises that User B is outside its domain upon
reception of the request from user A
2. SIP proxy Server A then queries a request for User B’s IP address to the SIP Redirect
Server which location can be in Domain A or B. Note that the lookup at the Redirect
Server is not SIP queried, it is for instance a DNS lookup.
3. The SIP Redirect Server returns User B’s Proxy Server address.
4. The SIP Proxy Server in Domain A forwards the session initiation request to the SIP
Proxy Server in Domain B.
5. The SIP Proxy requests the current IP Address of User B from the Registrar Server in
Domain B.
6. The Registrar Server returns User B’s SIP Address.
7. The SIP Proxy relays User A’s invitation to communicate with User B to User B. This
request includes information about the media (audio and/or video). Hereby SDP is
used.
26
2 Background
8. User B informs the SIP Proxy that User A’s invitation is accepted and that he is ready
to receive the message.
9. The response from User B is forwarded to User A. Hereby the return path is provided
since all servers left their address in a specific field of the invitation.
10. The response from User B is forwarded to User A.
11. User A and B create a point-to-point RTP connection enabling them to interact.
27
3 Related Work
The following chapter presents related work such as alternative approaches to secure VoIP
traffic. The IPsec protocol is presented and compared to the chosen DTLS protocol along
with reasons for this choice.
3.1 Security in VoIP
Securing a traditional phone is neither an easy task nor cheap since additional devices
would have to be installed to secure the communication channel. Both communication
partners would need such a device which results in high additional costs on both sides. It
is much easier to implement a security service to a VoIP phone; both communication part-
ners may download and install the software and are thereby able to communicate over a
secured channel without much effort. Surprisingly many consumer VoIP solutions do not
support any encryption yet. Hence it is not a complex task to eavesdrop on VoIP calls and
even change their content [10].
There are some open source solutions that facilitate sniffing of VoIP conversations. One
example is the Voice Over Misconfigured Internet Phones (VOMIT)1 [20] software enables
even unprofessional users to easily eavesdrop VoIP calls. The software extracts the au-
dio data from a stream of data that is being transmitted over an insecure network like the
Internet. Some vendors use compression to make eavesdropping more difficult. The exist-
1http://vomit.xtdnet.nl/
28
3 Related Work
ing secure standard SRTP [21] and the new ZRTP [22] protocol are available on Analogue
Telephone Adapters (ATAs) as well as various softphones. Although some devices sup-
port SRTP, and thus enabling encrypted VoIP calls, the problem herby is that in standard
configuration the keying material is transmitted unencrypted in clear text over the net.
Eavesdroppers are thereby able to access the keying material which makes the encryption
(almost) useless. Furthermore users need to study the manual to find out how to enable
the secured key sharing [23].
It is possible to use IPsec to secure peer-to-peer VoIP by using opportunistic encryption,
which will be presented in the coming section. Skype, a proprietary peer-to-peer Internet
Telephony network is closed source, which means that the source code is not published,
has over 200 million users worldwide. Skype does not use SRTP, but uses encryption which
is transparent to the Skype provider. The user cannot turn encryption on or of, and has to
rely on the software and provider. The Voice VPN solution provides secure voice for en-
terprise VoIP networks by applying Internet Protocol Security (IPsec) [24] encryption to
the digititalsed voice stream [10]. IPsec will be explained in the upcoming section as an
alternative approach to secure VoIP traffic.
3.1.1 Internet Protocol Security, IPsec
Internet Protocol Security [24] is a suite of protocols for securing Internet Protocol (IP)
[3] communications by authenticating and/or encrypting each IP packet in a data stream.
Additionally IPsec includes protocols for cryptographic key establishment. IPsec was de-
veloped in 1998 as an approach to fill the shortcomings in terms of security of IP. IPsec
provides confidentiality, authenticity and integrity.
The main document to IPsec describes the architecture of the protocol suite, referencing the
following RFCs upon which IPsec relies: Authentication Header (AH) [25], Encapsulating
Security Payload (ESP) [25] and Internet Key Exchange (IKE) [26].
29
3 Related Work
Figure 3.1: IPsec in the TCP/IP stack
IPsec operates on network layer, therefore it is capable of securing TCP- and UDP-based
protocols, which residue on transport layer, as illustrated in figure 3.1 on page 30. IPsec
operates in two different modes: transport mode and tunnel mode. In transport mode,
only the payload of the data packet is encrypted and/or authenticated. The routing is in-
tact, since the IP header is neither modified nor encrypted. Transport mode is used for
peer-to-peer communications. In tunnel mode, the entire packet is encrypted and/or au-
thenticated; therefore it must be packed into a new IP packet for routing to work. The
tunnel mode is used for peer-to-peer communication as well as for network-to-network
and host-to network connections.
The first thing that needs to be done upon connection initiation is the exchange of the key-
ing material. Hereby the possibly most complex component of IPsec is used, IKE. IKE is
using the Diffie-Hellman Key Agreement Method [27] for exchange of keys over an in-
secure network and is based on the Internet Security Association and Key Management
Protocol (ISAKMP) [28], the IPsec Domain of Interpretation (DOI) [29] and the Oakley Key
Determination protocol [30] and SKEME [31]. Both sides of the connection need to authen-
ticate themselves to the other side and agree to a keying algorithm.
The AH guarantees connectionless integrity and the data origin authentication of IP data-
gramms. It can optionally protect against replay attacks by using the sliding window tech-
nique and discarding old packets. AH protects the IP payload and all header fields of an IP
datagram except for those that might be changed during transmission. Figure 3.2 on page
31 shows a TCP packet before the AH is inserted and after.
30
3 Related Work
Figure 3.2: Structure of an IPsec packet with AH
Figure 3.3: Structure of an IPsec packet with ESP
The ESP protocol provides origin, authenticity, integrity and confidentiality of a packet.
Unlike AH, the IP packet header is not protected by ESP. Figure 3.3 on page 31 shows a
TCP packet before and after ESP is applied in tunnel mode. The IPsec support is usually
implemented in the kernel and the key management is carried out from the user space.
However, as there is a standard interface for key management, it is possible to control one
kernel IPsec stack using key management tools from a different implementation.
IPsec is part of IPv6. It was intended to provide either transport mode or tunnel mode,
where packets can be provided to several machines; furthermore it can be used to cre-
ate Virtual Private Networks [32]. In comparison to TLS IPsec is a peer-to-peer protocol,
designed as a generic security mechanism for Internet Protocols. There are a number of
problems using IPsec for securing datagram traffic generated by client server applications
31
3 Related Work
which will be discussed in the comparison of IPsec and DTLS in the next section.
3.1.2 Comparison between IPsec and DTLS
In contrast to DTLS, IPsec consists of three protocols: Authentication Header (AH) [33],
Encapsulating Security Payload (ESP) [25] and Internet Key Exchange (IKE) [26]. These
technologies work together to provide security for IP traffic. The IETF standardised IPsec
first in [28], ’Internet Security Association and Key Management Protocol’ (ISAKMP). The
architecture of IPsec is described in [24]. The key exchange and parameter management of
IPsec is provided by ISAKMP and IKE while data protection is provided by AH and ESP.
All these separate developments are connected with security associations (SAs). ISAKMP
and IKE are used to establish SAs which are used by AH and ESP to protect the data.
The SAs can somehow be compared to a DTLS Session while an SA is only unidirectional
in comparison to a DTLS Session. Each SA has a unique 32 bit identification tag which is
carried in each packet. IPsec has two methods to establish an SA, manual and automatic
keying where automatic keying is similar to the DTLS handshake. The IKE key exchange is
based on STS [34], Oakley [30] and SKEME [31]. The two parties exchange Diffie-Hellman
public keys and use the shared key to derive traffic encryption and message authentica-
tion.
One big disadvantage of IPsec is the complete failure as soon as a router performs Net-
work Address Translation (NAT). This technology allows a large number of users to use a
small amount of IP addresses. Hereby the machines behind a NAT router obtain private IP
addresses and the router translates the private address to a public address when the ma-
chine connects to a server in the Internet. Since the users private IP address is not known
outside the sub network behind the router and IPsec is used to create connections between
machines, IPsec cannot establish a connection between the private IP address behind the
router and an IP address in the Internet since outside the subnet only the public address of
32
3 Related Work
the sub network is known.
Another problem is the lack of standardisation among IPsec APIs resulting in portability
problems when an application wishes to control the keying policy. In DTLS portability can
be achieved although DTLS APIs are not standardised either since an application can be
shipped along with the DTLS toolkit. For IPsec this is not so easily achievable because of
its residence in the kernel space in contrast to DTLS which residues in application space.
In order to simplify key negotiation, IPsec uses a reliable TCP connection to secure a sepa-
rate datagram channel. This design is smart but has some problems. First, the application
now has to manage two different sockets and synchronise them, where synchronisation is a
significant programming problem. If the TCP connection is left open after key negotiation,
unnecessary system resources are wasted. On the other hand when the TCP connection is
closed after key negotiation, any renegotiation must be done over UDP requiring another
implementation for the keying negotiation over UDP which would make the key nego-
tiation over TCP obsolete. Therefore it is more useful to have key negotiation and data
transfer on the same channel.
To secure RTP traffic DTLS is more suitable since RTP runs over UDP, any unnecessary
connection (e.g. TCP for key negotiation) is a waste of system resources. VoIP is time sen-
sitive therefore the addition of a security overhead should cost the least possible system
resources thus providing enough security to be reliable. Furthermore for the use of IPsec
as it resides in the kernel, for its use on a system not supporting IPsec the TCP/IP stack
needs to be changed. To secure the application with DTLS only another application needs
to be used.
3.2 Secure Real Time TransportProtocol
The Secure Real Time Transport Protocol (SRTP) [21] defines an already implemented pro-
file for RTP, which intends to provide encryption, message authentication and integrity,
33
3 Related Work
and replay protection to the RTP data in Unicast and multicast applications. Note that
SRTP must not be confused with RTP over DTLS.
SRTP was published as RFC 3711 [21] in March 2004. This tightly coupled encryption mode
for RTP provides a number of benefits. The RTP header is left unencrypted which enables
header compression (see [35], [36], and [37]) and easy debugging. The packets appear to be
RTP packets, which is a benefit for firewall compatibility. There is a zero header overhead.
SRTP relies on an external key management protocol to set up the initial master key. Two
protocols specifically designed to be used with SRTP are ZRTP [22] and Mikey [38]. There
are also other methods to negotiate the SRTP keys, several vendors offer products that use
the SDES key exchange method.
For encryption and decryption of the data flow, SRTP standardises utilization of only a
single cipher. The Advanced Encryption Standard (AES) [39] is used by SRTP. AES can be
used in two cipher modes, which turn the originally block AES cipher into a stream cipher.
Since SRTP does not provide a keying mechanism and has to rely on other protocols it
cannot be regarded as solution to secure VoIP traffic. In combination with ZRTP VoIP traf-
fic can be secured. However SRTP is not widely used since users claim a reduced audio
quality as a reason to turn ZRTP protection off. Furthermore ZRTP is not a widely known
security architecture like TLS and therefore not as trustworthy as RTP over DTLS can be.
34
4 Security Considerations for VoIP
As already mentioned the transport of voice data over insecure networks as the Internet
is harmed by various threats. This chapter provides security considerations about these
threats, pointing out attacks and security goals to achieve in order to countervail these
attacks.
4.1 Introduction
In order to classify the threats to VoIP properly first the security-goals must be formulated.
VoIP is IP traffic and thus the same attacks can be used. This is why VoIP calls are vulnera-
ble to a variety of threats that traditional telephone calls are not. Any data being transmit-
ted is at some risk of being eavesdropped. Data packets can be eavesdropped on anywhere
along the transmission path. Alternatively the eavesdropped data could be changed and
transmitted to the receiver, who would not notice receiving altered data, which is called a
man in the middle attack. By transmitting the same message, e.g. an invitation to a VoIP
phone call many times, the receiving machine could be kept so busy that no real calls can
come through. This is called a denial of service attack.
There are three classical primary security goals in modern communication systems:
• Confidentiality
• Integrity
35
4 Security Considerations for VoIP
• Availability
Confidentiality has been defined by the International Organisation for Standardisation
(ISO) as "ensuring that information is accessible only to those authorised to have access"
[40]. Integrity is the protection of unauthorised alteration of the transmitted data. Message
integrity is as well as confidentiality a part of DTLS. It ensures the user that the received
data has not been changed without his notice. Availability means that the transmitted data
will reach its destination and will thereby be available to the receiver. The Integrity of the
voice data is hereby an important issue. Certainly it is easier to recognise whether someone
on the phone is the person he or she claims to be than to recognise whether an e-mail was
really written by the declared sender. This argument however applies mostly to private
communication and communication among people who know each other well. But voice
messages can be recorded, edited and replayed resulting in not letting the receiver notice
that the caller is not the person he or she claims to be. Besides the integrity of the voice
data, as well the signalling data needs to be integer and unaltered.
The identity of the caller and the callee needs to be protected. If an attacker manages to
manipulate his own identity he might achieve that the callee will be displayed a different
id of the caller upon reception of a call. This can be used to reach persons on the phone
who usually are not taking calls from anybody (e.g. the chief executive of a company). By
acquiring a fake identity the billing of the VoIP provider can be bypassed and called will
be charged to original owner of the account.
4.1.1 Confidentiality in VoIP
Confidentiality is an important security goal. In the context of VoIP the focus lies in the con-
fidentiality of the voice data. This means that calls cannot be eavesdropped. In VoIP con-
fidentiality is threatened more than in traditional telephony. In traditional telephony the
attacker needs to have physical access to the network. Traditional telephony runs mostly
36
4 Security Considerations for VoIP
over separate networks while in VoIP the voice data is transmitted over the Internet, where
all connected machines have the potential to be accessed through security holes. Many pro-
tocols in traditional telephony are barely published; therefore the analysis and attacks to
traditional phone calls require special hard and software. The amount of people who are
capable of eavesdropping phone calls is hereby reduced but it is not impossible.
4.1.2 Availability in VoIP
Availability in VoIP networks has two primary meanings, first, the availability of the tele-
phone service, which means that in case of SIP the SIP Proxy Servers are available and able
to initiate sessions properly. Availability may also be harmed by unwanted calls, a problem
which will be explained in the upcomming section. The second aspect of availability is the
quality of the VoIP call. Both communication partners need to be able to understand each
other clearly.
4.2 Threats and Attacks
As described in the preceding section VoIP calls are threatened by the same attack as other
applications running over IP networks. Therefore an overview of different attacks is given
in this section to classify the threats to VoIP calls. However a new technology that offers
new services to users also offers new possibilities to attackers.
Attacks can be divided into two groups, the first group is the group of passive attacks,
which include eavesdropping calls and sniffing messages which are transmitted over the
Internet. Much stronger is the second group of active attacks. Messages are manipulated
during their transmission or faked messages are sent. An example of such an active at-
tack is the Man-In-the-middle attack, where the attacker gains control of a router between
two communicating systems and redirects transmitted packets. So called network or Port-
Scans are used to plan an attack, by searching for weaknesses, hereby the attacker sends
37
4 Security Considerations for VoIP
various requests to a network or host in order to acquire information needed for further
steps, like the operating system or installed services. For a so called Spoofing Attack, mes-
sages or data packets with faked information are used. For example the IP address or
MAC address of the sender can be changed so that the receiving machine assumes that
the packet was sent from a trustworthy source. Another example for spoofing is DNS
Spoofing; hereby DNS answers are changed, which results the requesting machine to com-
municate with a machine the hacker prepared. Denial of Service attacks replay request
messages to servers in such high amounts that the server’s service is not available any-
more to regular users, targeting the availability of a system.
VoIP might also be target of new attacks which are enabled through VoIP. Spam is a com-
monly known problem these days. Spamming is the abuse of e-mail to indiscriminately
send unsolicited bulk messages. E-mail spam involves sending nearly identical messages
to numerous recipients. As already mentioned SIP uses a similar address format as e-mail
thus the problem of e-mail spam might become a problem for VoIP in the future. VoIP
spam is not yet an existent problem, nonetheless it receives a great deal of attention from
marketers and trade mark press. VoIP spam is also referred to as SPIT (Spam over Internet
Telephony). Hereby malicious users could be telemarketers or prank callers. Currently
there are rules for e-mail systems that block unwanted e-mail, such systems could (and
probably will) also be applied to VoIP systems. SIP as the technology has been designed
to support presence natively. Thereby incoming callers know the availability before even
attempting to initiate a call.
The three security services are realised through DTLS and implemented in the OpenSSL
library which makes it a reasonable choice to secure VoIP traffic. Unfortunately no encryp-
tion can prevent the biggest threat, a virus or trojan on the endpoint giving a hacker access
to the machine and thereby to the decrypted data.
38
5 RTP over DTLS
This section describes the basic idea of RTP over DTLS and possibilities for its realisation.
Furthermore the performance of the intent is considered in comparison to SRTP.
5.1 Introduction to RTP over DTLS
RTP is using UDP to transmit data over IP based networks. Implementations typically
have interfaces to UDP socket classes to open/close sockets and transmit/receive data.
DTLS is using UDP sockets as well for transmission and reception of data. Therefore RTP
can operate as well on top of DTLS instead of just UDP, when functionality for connection
initiation is added. Thus an encryption scheme is added to RTP providing key exchange
and encryption/decryption of data. The basic idea to realise this is an interface that is used
by an alternative RTP class for the underlying transport protocol that manages connection
requests and connection acknowledgements. Thereby SIP softphones could simply start
RTP over DTLS session as an alternative RTP profile, instead of a standard RTP session.
Since normal RTP and RTCP payloads are sent in a UDP packet, the can be send as well in
a DTLS packet. Therefore an RTP packet send over DTLS has the layout as in figure 5.1 on
page 40.
RTP opens typically two sessions, one for data traffic and one for RTCP traffic. In order to
secure RTP traffic at least for the data session should be a DTLS session should be initiated.
Securing the RTCP session would be possible as well but since the RTCP packets do not
39
5 RTP over DTLS
Figure 5.1: Struckture of an RTP packet sent over DTLS
contain confidential data, this is not mandatory.
RTP over DTLS is a trustworthy approach in order to achieve secured VoIP calls. DTLS is
practically designed to be used in a VoIP scenario and because of its well known predeces-
sor likely to gain the trust of users as well.
5.1.1 SRTP Compatibility Mode
SRTP Compatibility Mode is a profile for RTP over DTLS which is presented in [9]; it de-
pends on two extensions to SRTP which reduce the pre-record bandwidth of the data chan-
nel and allow partial encryption of record bodies .
This profile depends on ’Extensions for DTLS in Low Bandwidth Environments’ [41] and
on ’TLS Partial Encryption Mode’ [42]. In this profile, the RTP header is left unencrypted,
which enables header compression. With unencrypted headers the packets appear as RTP
packets which results in firewall compatibility. Furthermore this profile provides encryp-
tion with a zero header overhead, and thus improved performance in comparison to RTP
over DTLS. For this profile, implementations need to negotiate the TLS partial encryption
extension, the DTLS implicit application data header and the TLS MAC truncation exten-
sion. Thereby the RTP over DTLS packets would look identical to SRTP packets with a
10-byte MAC value. They can only be distinguished with access to the DTLS or SRTP key-
ing material.
Since the RTP header is clear, header compression and debugging both work. The security
properties of DTLS are not affected by these extensions. This extension to RTP over DTLS
40
5 RTP over DTLS
is not part of the implementation conducted in this thesis but worth to note for future
development of the profile.
5.1.2 Packet size Comparison
This section provides a comparison of packet sizes in order to estimate the performance of
RTP over DTLS in comparison to RTP, SRTP and the SRTP compatibility mode.
Since most of the RTP infrastructure is reused, the overhead for SRTP is low. A 20 ms RTP
packet encoded with G.729 codec has a size of 60 bytes. This RTP packet would be just 4
bytes longer, when SRTP is used, but only as long as SRTP is used without a master key
identifier. But as already described in a previous section this is not desired. With master
key identifier the SRTP packet has a size of 68 bytes. When DTLS is used, the same packet
would be 98 bytes long while in SRTP compatibility mode the packet size could be reduced
to 70 bytes which marks an excellent result. Therefore the SRTP compatibility mode should
be added to RTP over DTLS in the future.
5.1.3 Security Considerations
RTP over DTLS can be considered secure since DTLS is based on TLS, which has seen ex-
tensive security analysis. The handshake algorithm incorporated in DTLS works over an
insecure channel. Only the certificates have to be proved to be correct. In the standard au-
thentication strategy of DTLS a PKIX [43] certificate is exchanged. When the client verifies
the certificate he checks whether the name in the certificate matches the server’s domain
name. This works because there are relatively small number of servers with well defined
names; a situation which does not usually occur in the VoIP context [9]. Alternatively the
certificates could be self signed but then the client must be able to verify the server’s certifi-
cate correctly and vice versa. An approach to address this is using SIP [11] and the Session
Description Protocol (SDP) [18] and is described in [44] and [45].
41
6 Implementation Design
This chapter provides an analysis of requirements along with a description of the choice
of implementations used in this thesis. Hereby the chosen libraries are presented as well.
The system idea is presented in a more detailed way along with the functionality and
interaction of the single components used for the prototype implementation.
6.1 Analysis of Requirements
The most important phase of a software project is the analysis. Empirical Studies on fail-
ures of software projects have proven that indistinct formulation of goals and requirements
are with distant the most popular reasons for a failure. Small mistakes with their root in the
early development stage caused by inaccuracies can lead to big problems in the final stage
of the development process because of error propagation. The detailed documentation of
requirements in the early stage of the development process is therefore indispensable as a
guideline through the project.
The requirements and goals formulated in this section base on studies of the protocols and
their implementations and discussions with my supporting Prof. Dr. Xiaoming Fu. During
the process of development requirements can be altered or extended to reach the goals and
to react flexibly to problems on the way.
42
6 Implementation Design
6.2 System Idea/Intent
The concept of the system to implement is based on H. Tschofenigs Internet Draft [9] and
the concept was discussed in acknowledge sessions with H. Tschofenig and Professor Dr.
X. Fu.
The basic idea is to secure RTP data traffic using the DTLS protocol. In order to achieve this,
a prototype needs to be implemented to prove the functionality of the idea. In the following
steps the surrounding framework of software needs to be extended to support this option.
In the first step DTLS has to be well investigated to formulate the demands of changes
needed at the RTP side of the project to reach the goal. The second step involves analysis
of the RTP implementation and protocol to determine the functionality of the connection
DTLS shall provide. As the consecutive step the interfaces of the implementations are used
to derive the implementation design of the project.
When audio data is successfully transmitted with RTP over DTLS the next step is to prove
the functionality in a SIP application such as a softphone.
6.2.1 DTLS
The DTLS protocol is designed to secure data between communicating applications. It is
designed to run in application space, without requiring any kernel modifications.
DTLS uses regularly one UDP socket per connection and endpoint. Therefore upon con-
nection initiation at each endpoint a socket is created before the DTLS handshake can be
initiated. After successful completion of the handshake the sockets are ready to transmit
and receive secured data. Upon termination of the connection both sockets are closed.
6.2.2 RTP
RTP has no possibilities to initiate a connection between two hosts itself. Therefore addi-
tionally SIP is used to initiate a Session between two computers. Upon connection initation
43
6 Implementation Design
RTP initialises two sessions on each host, one for data and one for RTCP traffic. Each of
these sessions normally consist of two sockets, one for reception and one for transmission.
Next the RTP stack is started and packet transmission and reception starts on each session
until the RTP stack execution is stopped.
Beside unicast conferences RTP is also capable of multicast conferences. This feature can
not mapped to a DTLS secured session since the key exchange protocol of DTLS is designed
only for host to host communication and the DTLS key exchange is one of the cornerstones
of DTLS’s benefits to the implementation.
RTP data (and control) packets are usually transmitted via UDP; therefore RTP comes with
an underlying transportation layer similar to the transportation layer DTLS uses. A reuse
of these functions shall be reviewed in order to keep changes slim and simple in the up-
coming design section.
6.2.3 SIP Softphone
The RTP media channel is initiated through SIP as presented in figure 2.6 on page 25.
Therefore the RTP over DTLS session will also be initiated by the SIP softphone client
application. A softphone application is the best choice to test the implementation frame-
work. A media channel between two hosts will be established using the RTP stack with an
underlying DTLS. Herby options for key generation and administrative functions for the
certificate files should be implemented.
6.3 RTP over DTLS
The requirements regardless of implementation design can now be formulated more de-
tailed. In order to build a unified media security framework changes need to be done to all
components that interact together.
Before any connection with RTP over DTLS can take place the user must choose the option
44
6 Implementation Design
that RTP over DTLS should be used if available for both communication partners. The SIP
component needs to support mechanisms necessary to cope with basically four cases. In
first case the connection can be established without errors, when both communication part-
ners have a proper running system which supports RTP over DTLS. In second case there
is an error on the caller side which might occur, when certificates cannot be accessed. The
caller should be notified by that already when settings are adjusted to use RTP over DTLS
for calls in the setup. In case the RTP over DTLS feature is not supported by the callee
either the connection will be established without any protection, or the next supported se-
curity system supported by both sides will be used. Hereby of course the caller must be
notified that the connection is not secured in the intended way. At last there is of course
the chance that security certificates cannot be verified or the DTLS connection could not be
initialised properly for other reasons and therefore a secure connection therefore cannot be
guaranteed. In this case the users needs to be informed immediately about the situation
and get an advise what this means and what to do.
When the call is accepted by the callee and both parties have RTP over DTLS available this
component is started to initialise the DTLS sockets. The RTP session hereby needs to be
divided to a server and client (passive and active) part, where the client initiates the DTLS
connection to the server and the server accepts the client’s connection request. When the
connection is established the data transfer of RTP can start. At the end of the session the
DTLS connection needs to be properly shut down. DTLS negotiates the ciphers during
handshake (see Background section) and exchanges certificates and keys. These keys must
be generated as well and certificates provided. This task will be done by the SIP application
in connection with functions provided through OpenSSL.
45
6 Implementation Design
6.4 Choice of Libraries
This section presents the choice of libraries implementing the protocols used, like DTLS,
RTP and SIP. The choice of the DTLS implementation the prototype is based on is straight-
forward, since OpenSSL is the only known implementation supporting this protocol to the
best of our knowledge.
For RTP a choice has to be made since some implementations exist (e.g. ORTP, CCRTP...).
CCRTP provides in comparison to ORTP object oriented C++ code and is therefore better
suited for the change to a different underlying transport protocol. The online documenta-
tion of the ccRTP library is a great helper in understanding the class structure of the library.
This makes the choice of the ccRTP library easy.
To complete the prototype the Twinkle Soft phone client seems the most reasonable choice
as a SIP client using the ccRTP library as RTP stack.
6.4.1 OpenSSL
OpenSSL1 [46] is the de facto standard open source TLS/SSL implementation [2]. It has
proven to be stable and is used by numerous production quality servers such as Apache
Web Server.
OpenSSL implements SSLv2. SSLv3, TLSv1 and DTLSv1. Each of these protocols is im-
plemented by sharing as much code as possible, with virtual functions handling protocol
differences. The library is implemented in C and from the library’s standpoint, DTLS ap-
pears to be another version of the TLS protocol.
1http://www.openssl.org/
46
6 Implementation Design
6.4.2 CCRTP
GNU ccRTP2 is an implementation of RTP, the real-time transport protocol from the IETF
(RFC 3550, RFC 3551, and RFC 3555). The library is implemented in C++ and based on
GNU Common C++3. Therefore it can provide a high performance, flexible and extensible
standards-compliant RTP stack with full RTCP support. It is defined rather as an applica-
tion layer framework than a typical Internet transport protocol such as TCP or UDP.
In the design for ccRTP support for audio and video data is considered. Unicast, multi-
unicast and multicast transport models are supported, as well as multiple active syn-
chronization sources, multiple RTP sessions (SSRC spaces), and multiple RTP applications
(CNAME spaces). This allows its use for building all forms of Internet standards based
audio and video conferencing systems [47].
CcRTP uses packet queue lists for reception and transmission of data packets. The synchro-
nisation of both (outgoing and incoming) media is automatically handled within the packet
queues. There is support for RTCP and other standard and extended features needed
for both compatible and advanced streaming applications. The implementation uses tem-
plates to isolate threading and sockets related dependencies, so that it can be used to im-
plement real time streaming with different threading models and underlying transport
protocols which is an essential feature for this work. At its highest level, ccRTP provides
classes for the real-time transport of data through RTP sessions, as well as the control func-
tions of RTCP. The main concept in the ccRTP implementation of RTP sessions is the use of
packet queues to handle transmission and reception of RTP data packets/application data
units. In ccRTP, a data block is transmitted by putting it into the transmission (outgoing
packets) queue, and received by getting it from the reception (incoming packets) queue.
2http://www.gnu.org/software/ccrtp/3http://www.gnu.org/software/commoncpp/
47
6 Implementation Design
6.4.3 Twinkle Softphone
Twinkle4 [48] is a softphone for VoIP and instant messaging communications using the SIP
protocol which is based on open source and open standards. Twinkle is using the ccRTP
stack qualifying it to be the SIP application in the RTP over DTLS prototype implementa-
tion. As a useful feature the Twinkle softphone implements as well direct IP to IP phone
communication where a SIP proxy is not needed. The SIP invitation will be directly sub-
mitted to the IP address of the callee. This is a useful feature for developent and testing,
since in the testbed the whole SIP architecture with Proxy also does not need to be mapped.
The current version does not provide video calls but this feature is planned for future re-
leases and does not mark a problem at this stage for the RTP over DTLS prototype since
the focus lies primarily on functionality tests for voice calls.
Video calls and securing them will be an interesting topic for future work when it is proven
that RTP over DTLS works unobjectionably.
4http://www.twinklephone.com/
48
7 Design Details
This chapter describes the implementation process, milestones and problems which were
handled along the way. Hereby first the protocol operations are presented and then how
the components in the prototype implementation of the unified media security framework
function together.
The previous chapter provides an analysis serving all necessary information to design suc-
cessfully a solution method. In this chapter the architecture and interfaces of the compo-
nent to develop will be designed and the adaptation to the existing structure and interfaces
projected.
7.1 Design Components: RTP - ccRTP, DTLS - OpenSSL and SIP -
Twinkle
This section decribes the interaction of the components used to design the unified media
security framework. Each library used is decribed with its interaction to other libraries.
7.1.1 OpenSSL
The OpenSSL website provides an online documentation of the application programming
interface (API) to ease the implementation of a secure socket. However although DTLS is
already supported by OpenSSL for more than a year, DTLS is not mentioned at all in the
documentation. Merely TLS is mentioned as an optional protocol version.
49
7 Design Details
7.1.2 Socket Initialisation
According to the documentation at first the library must be initialised, thereby all available
ciphers and digests are registered. Next an SSL_CTX object is created as a framework to
establish SSL based connections. An SSL_method object is then assigned to the context
in order to determine the protocol version used. Various options regarding certificates,
algorithms etc. can be set in this object. After a network connection has been created, it
can be assigned to an SSL object. The SSL object has been created with the SSL_CTX ob-
ject created before. Next the handshake is performed with SSL_accept and SSL_connect.
SSL_write and SSL_read functions are used to read and write data on the connection while
SSL_shutdown is used to shut down the connection.
Additional hints how a DTLS connection can be established are provided through the
demonstration programs s_server and s_client. These all-rounder examples are able to
establish any kinds of SSL connections with their roundabout 3500 lines of code. The code
itself is barely commented and provides only poor information which functions need to be
called to establish a connection. As an example in the s_client.c file a comment in line 735
starts with "This is an ugly hack that does a lot of assumptions [...]"[46] However there is a
huge mailing archive providing a handful of issues about DTLS connections.
7.1.3 Session Initialisation with ccRTP
Upon initialisation of an RTP session an object of the class RTPSession is created. There
are two kinds of constructors. The first one takes two mandatory arguments: local net-
work address and local transport port, which is the place where incoming packets will be
expected. The second constructor is not of interest, since it takes a multicast address as
argument to join a multicast group. By calling the startRunning() method, an RTPSession
object is signalled to start execution of the stack thread.
After these steps, the application can receive data, but will not transmit to any destination.
50
7 Design Details
In order to transmit, the method addDestionation is called along with the internet-address
and port of the host to be transmitted to.
7.1.4 Sending Data
Data packets are sent through the method putData, which takes as first parameter the RTP
timestamp for the data specified as second parameter. By default, the marker bit of the sent
packets is not set. Its value for the next packet (the one that will convey the data provided
in the next call to putData can be set through the setMark method, which takes a Boolean
as argument.
CcRTP also supports fragmenting data blocks into several RTP packets. The setMaxSend-
SegmentSize method can be used to request that no RTP packet be transmitted with a
payload length greater than the value specified through setMaxSendSegmentSize.
When data blocks greater than the maximum segment size are provided through putData,
two or more packet will be inserted in the outgoing packet queue. All these packets but
the last one will have length equal to the maximum segment size, whereas the last one’s
size will be lower or equal to the maximum segment size.
7.1.5 Receiving Data
To receive data from the incoming packet queue the getData method is used. This method
checks with a defined timeout whether data can be read from the socket and in that case
then returns a pointer to an AppDataUnit object as opposed to a pointer to a memory
block. In ccRTP application data units are represented through objects of the AppDataUnit
class, which provides access to the synchronization source of the data and other related
properties. The incoming packet queue takes care of functions such as packet reordering
or filtering out duplicate packets.
51
7 Design Details
7.1.6 Closing Sessions
To close an RTP session simply the RTPSession objects have to be destroyed. The stack will
then transmit a BYE packet, indicating the end of the session, to all destinations when the
destructor of the sessions is called.
7.1.7 Types of Sessions
Upon creation of an RTPSession object, two DualRTPChannel objects are created with Du-
alUDPIPv4Socket. This defines a communication channel for RTP and/or RTCP streams.
In this class a socket is implemented as a pair of UDP IPv4 sockets, allowing both, trans-
mission and reception of packets.
The implementation relies on the Common C++ UDP Socket class and provides a flat in-
terface that includes all the services required by the RTP stack. There are two ways to use
this class, to instantiate the DualSocket template, which will be used to instantiation RTP
stack template or to directly instantiate an RTP stack template. This class offers an example
of the interface that other classes should provide in order to specialise the ccRTP stack for
different underlying protocols.
7.2 SIP Session Initiation with Twinkle
As already mentioned Twinkle can be operated in regular SIP mode using a SIP provider
for discovery of a communication partner by the SIP Address as seen in figure 2.6 on page
25 or in direct mode. In both cases when the callee accepts the call, the RTP media channel
is set up. Twinkle uses a Symmetric RTP session consisting of two Single Thread RTP
sessions, one for data traffic and one for control packets. In order to establish a secure
RTP over DTLS session the modified templates in the ccRTP library are used instead of the
regular ones.
52
7 Design Details
Figure 7.1: Implementation status after phase 1
7.3 Implementation Process
The implementation process is divided into three consecutive steps. The consecutive mo-
del allows changes in the implementation and the marking of milestones to confirm the
success of the achieved progress.
In the first phase a DTLS client server application is implemented in C++ as the bases for
any further development. As a result of the poor documentation provided by OpenSSL this
step marked a much greater challenge than expected in advance. A DTLS Client/Server
example1 written in C was used as guideline in this phase because examples provided by
OpenSSL were not clearly arranged and therefore not usable.
The result of the first step is illustrated in figure 7.1 on page 53 where a DTLS connection
is established between host A and host B. Figure As soon as a secured connection between
two hosts is possible this connection-imitation functionality can be used by the ccRTP stack
replacing the regular underlying UDP sockets with DTLS sockets. Figure
At the end of this stage first transmissions of audio data with test applications should work
and demonstrate the functionality of RTP over DTLS as presented in figure 7.2 on page 54.
Hereby test.au is an audio file in the au file format. The au file format is a simple audio file
1found at http://linux.softpedia.com/get/Security/DTLS-Client-Server-Example-19026.shtml
53
7 Design Details
Figure 7.2: Implementation status after phase 2
Figure 7.3: Implementation status after phase 3
format introduced by Sun Microsystems2. Further information can be found at [49]. Upon
setting up the RTP connection between the two hosts, the DTLS connection is established
during initialisation of the transport channel, where before the UDP sockets were initiated.
In the last stage all parts of preceding steps have to work perfectly together in order to
function as a secured VoIP call. Figure 7.3 on page 54 illustrates the progress at this stage.
While stage 3 marks the goal of this thesis this is however not the end of the process.
Further implementation work is needed to provide a usable application. These steps will
be presented in the future work section at the end of this thesis.
2http://www.sun.com
54
7 Design Details
Figure 7.4: RTP over DTLS class structure
7.4 Class Structure
This section describes the changes applied in the libraries and hereby presents the new files
inserted in to the class hierarchy.
In the ccRTP library the channel.h file defines the RTPSession types and initialisation of
underlying transport protocols. The regular RTP session inherits from the CommonC++
UDP Socket class the UDP socket and implements the functionality for the RTP stack.
In order to implement RTP over DTLS two template classes were added to the channel.h
file, RTPDTLSServer and RTPDTLSClient. Each of these templates is associated to an inter-
face to the OpenSSL library providing the functionality for connection initiation and cer-
tificate verification. Theses files are placed in the /src/rd directory of ccRTP. These classes
make direct use of the OpenSSL API and socket classes. Figure 7.4 on page 55 presents the
structure of the added components. With this structure any program using RTP is enabled
to initialise RTP over DTLS session instead or as an alternative to regular RTP sessions.
55
7 Design Details
7.5 Problems and Discussion
The solution provided is functional but unfortunately not perfect due to the manner RTP
functions. Upon initiation of an RTP session two sockets are created, one for transmission
and one for reception. Since RTP supports multicast these sockets do not have any infor-
mation about the destination host upon initialisation. RTP uses a setPeer function which is
called periodically to set the destination IP address on the socket. This feature is not com-
patible with a DTLS connection. In order to initialise a DTLS connection the destination IP
address must be known upon initialisation. Therefore the DTLS connection is initialised
upon first call of the setPeer function and not upon call to the constructor of the session
since the destination IP address could not be handed to the constructor without changes
to the RTP-stack implementation. Further calls to the add Destination function therefore
must not cause any action.
56
8 Testing
The prototype implementation of RTP over DTLS is tested in order to confirm the usability
of the approach. There is a wide range for testing the approach, however due to space and
time restrictions in this thesis not all aspects of RTP over DTLS were analysed so far.
8.1 Testing Methodology
Before the results of the tests are presented, an overview about the testing methodology
and testbed will be given. As formulated in the goals of this thesis, for Telephony a packet
loss rate of up to 5% is still acceptable according to the ITU-T. Therefore the implementation
of RTP over DTLS shall provide a packet loss rate lower than 5%. Certainly not more
packets get lost because the underlying transmission is changed to a DTLS connection, but
due to encryption, decryption and increased header size, resulting in a higher bandwidth
needed to achieve the same throughput, data packets might reach the destination too late
to be inserted into the output media stream. Therefore the question to be answered is
whether RTP over DTLS is capable of delivering the audio data within the strict time limits
allowing acceptable voice quality during a call. During the phone call the delays must be
kept within the restrictions for VoIP traffic to allow a fluent conversation. According to
the ITU-T, delays in telephony should not exceed 150 ms in order to provide a satisfying
quality for all users.
57
8 Testing
8.2 Testbed Setup
The testing experiments were run on standard PCs with a Suse Linux Kernel 2.6.18-05 with
following hardware:
• Machine A:
Intel Pentium D processor with 3.06 GHz
512 MB RAM
40 GB of hard disk
1 100MBit Network Interface Card (NIC)
• Machine B:
AMD Duron processor with 800 MHz
612 MB RAM
60 GB hard disk
1 100MBit NIC
The hosts are connected in a 100 Mbit Ethernet Network with a topology presented in
figure 8.1 on page 59.
8.3 Measurement Methods and Tools
To prove the functionality and usability of the RTP over DTLS implementation prototype,
modified versions of demonstration programs provided in the ccRTP library were used.
Timestamp output was added to the applications in order to determine the delay between
transmission and reception of a data packet. Open Office1 Calc, a spreadsheet analysis
1http://www.openoffice.org/
58
8 Testing
Figure 8.1: Testbed for RTP over DTLS tests
program was used to calculate the delay of a data packet as the time difference between
the transmission and reception timestamps. Plots and summaries from the tests were gen-
erated with Gnuplot2 [50] from the report files.
8.4 Results
This section presents the results from the experiments. Aim of the performance test is
to determine the delay caused by the encryption with DTLS for RTP traffic. Tests were
performed with modified versions of the ccRTP demonstration programs audiorx and au-
diotx. These applications initiate RTP sessions and transmit audio data from audiotx to
audiorx where audiorx plays the audio data over the systems audio interface. In the origi-
nal version these applications use the loopback address to simulate RTP traffic on a single
machine. By changing the IP addresses used, these programs are capable of transmitting
data from one host to another.
Audiorx is using a 50 ms jitter buffer to assure a continuous media stream during recep-
tion. The jitter is the variation of packet interarrival time. While the sender is expected to
transmit a packet every 20 ms, these packets can be delayed throughout the network and
2http://gnnuplot.info
59
8 Testing
not arrive at that same regular interval at the receiver side. The difference between when
the packet is expected and when it is actually received is jitter. The jitter buffer conceals
the interarrival packet delay variation. Data packets arriving with a delay greater than 50
ms will not be played; instead the next packet that arrived is played. In VoIP applications
the jitter buffer is flexible in order to adapt to the delay in the current call. In order to anal-
yse the RTP over DTLS performance instead of a regular RTP session, the RTP over DTLS
server and client session objects were initialised in these applications. In order to obtain
comparable results a 62.5 KB audio file was used for transmission to simulate voice data
of a call which has a play time of 7 seconds. Thereby 399 data packets of audio data were
transmitted.
Taking account of possible measurement inaccuracy and errors due to the experimental
environment, all tests were done repeatedly to verify the results.
60
8 Testing
0
20000
40000
60000
80000
100000
0 50 100 150 200 250 300 350
Del
ay in
mic
rose
cond
s
Packet No.
Transmission of Audio Data
RTP Packet Delay
Figure 8.2: Delay for normal RTP packets
8.5 Standard RTP Packet Delay
The first performance test examines the packet delay of regular RTP packets in order to
have a reference value for comparison. Figure 8.2 represents a typical output for this test.
The average delay of an RTP packet was measured with 13 ms. however some packets
arrived significantly later; the maximum delay was measured with 51 ms while the min-
imum delay was only 4ms. The standard deviation of the delay during the experiment
was calculated at 6.1 ms. During this experiment one data packet did not arrive within
the preset time limit, therefore the packet loss rate is 0.25%. In the graph the lost packet is
marked by the high peak (51ms delay) shortly before the 200th packet. As the audio file is
played during reception the subjective impression of the result can be expressed as well.
61
8 Testing
The sound file was played continuously without any disturbance as clear as it would be
played locally.
62
8 Testing
0
20000
40000
60000
80000
100000
0 50 100 150 200 250 300 350
Del
ay in
mic
rose
cond
s
Packet No.
Transmission of Encrypted Audio Data
RTP over DTLS Packet Delay
Figure 8.3: Delay for RTP over DTLS packets
8.6 RTP over DTLS Packet Delay
The second experiment determines the delay of RTP traffic over DTLS. For this experiment
the same demo applications were used as in the preceding one with the change that now
RTP over DTLS sessions are initiated by the programs. Repeated tests showed similar
results as in figure 8.3. The average delay of an encrypted RTP packet was measured
with 34 ms. however some packets arrived significantly later; the maximum delay was
measured with 92 ms while the minimum delay was only 9ms. The standard deviation of
the delay during the experiment was calculated at 7.7 ms. During this experiment one data
packet did not arrive within the preset time limit, therefore the packet loss rate is 0.25%. In
the graph the lost packet is marked by the high peak (92ms delay) at the beginning. The
63
8 Testing
sound file was played continuously without any disturbance as clear as it would be played
locally.
8.7 CPU Usage
In a separate experiment the average CPU load was measured. Since a machine has to
handle both, transmission and reception, when performing a VoIP call, this experiment was
carried through on a single machine. The 3.06 GHz machine was used in this experiment.
Audiotx and Audiorx were initialising the DTLS connection and transmitting an audio file
with a length of 1:25 minutes. Repeated test showed that RTP has an average CPU load of
1.45 % while RTP over DTLS has an average CPU load of 3.4 %.
The significant increase is caused by the handshake and the encryption/decrytpion during
the session. For normal PCs used for VoIP this increase does not mark a problem, but
this could be a problem for today’s generation of handheld devices. Therefore further
investigation is neccessary to analyse the impact of increased CPU load upon different
terminal devices, like cell-phones.
8.8 Test Summary
The test results of the two experiments show that RTP over DTLS works in an acceptable
manner. The delay of encrypted RTP packets was expected to be higher than the delay
of unencrypted packets, due to encryption/decryption operations and extended packet
overhead by DTLS. The question to be answered was whether the delay of encrypted RTP
packets meets the requirements for VoIP traffic.
In the experiments RTP packet delay was measured at an average rate of approximately
13 ms, a maximum of 50 ms, a minimum of 4ms and a standard deviation of 6.1 ms. The
delay of encrypted RTP packets was measured at an average rate of approximately 34 ms
64
8 Testing
with a maximum of 92 ms, a minimum of 9 ms and a standard deviation of 7.7 ms. The
important values in the results of these experiments are the average delay, the packet loss
rate and the standard deviation. The average delay is increased by approximately 20 ms
when DTLS encryption is used.
According to the ITU-T a delay of 125 ms is noticeable by humans, therefore they recom-
mend that delays should not exceed 150 ms. A delay from 200 to 280 ms still satisfies most
of the users, while delays higher than 300 ms dissatisfy some users and a delay higher than
400 ms is unacceptable because most users are dissatisfied [51]. Most of the delay in real
scenarios is caused by the network infrastructure. For a distance of less than 5000 km VoIP
connections are likely to experience a delay smaller than 150 ms. For intercontinental con-
nections delays in the mid-200 ms range can be expected, which does not mark a problem
according to the ITU-T because users expect differences to regional calls.
Compared directly, the RTP over DTLS delay average has more than twice the length of
regular RTP delays, but the delays should be set in relation to ITU-T restrictions. Thus an
average delay increase of 20 ms marks an increase of about 13% to the recommendation
of a 150 ms delay. The small increase (1.6 ms) in the standard deviation is a good result
as well. This means that the jitter buffer does not need to be increased by a relevant size.
Therefore RTP over DTLS is well suited for encryption of life media as in VoIP.
65
9 Conclusion and Future Work
9.1 Conclusion
The growing acceptance among users of VoIP Telephony brings as well new challenges
in terms of security issues. VoIP calls are threatened by various known attacks since the
data is transported over insecure networks. The security considerations section pointed
out these attacks along with security goals to achieve. Furthermore new attacks (e.g. SPIT)
which are enabled through the extended capabilities and new services introduced by VoIP
may threaten the widespread use of this technology in the future. Therefore security is a
major concern in the further development of VoIP services. So far no solution to secure
VoIP calls in an acceptable manner is widely used. The approach of RTP over DTLS has
the potential to overcome the shortcomings of other approaches and take part in future
developments of a security framework for VoIP. DTLS provides
Authentication This allows both participants of the call to verify the identity of the other
party.
Confidentiality This ensures that the VoIP call can not be eavesdropped or understood by
a third party.
Integrity This allows VoIP applications to detect if data was modified during transmis-
sion.
66
9 Conclusion and Future Work
Unfortunately DTLS cannot solve all issues in securing Internet Telephony. Denial of Ser-
vice attacks against the SIP infrastructure cannot be secured by DTLS, since RTP over DTLS
is initiated after the SIP interaction takes place to initiate the session. The approach also
does not address the issue of SPIT for the same reason, the authentication can help to solve
the issue since SPIT calls could be traced back to the users, but this would only possible
when all users use the DTLS authentication. This is however not possible yet, since reach-
ability through traditional phones is still desired.
In this thesis a prototype of RTP over DTLS was implemented and tested in order to prove
the usability of the approach.
The upcoming sections summarise and evaluate the test results of the prototype. Further-
more an outlook is given to future work which will be necessary to goal of the development
of a unified media security framework for VoIP.
The datagram capable version of TLS was designed in order to secure media streaming
without compromising the quality of the media streamed or the widely accepted security
features of TLS. The test results show the good performance of the prototype implemen-
tation of RTP over DTLS in comparison to unencrypted RTP. The increase in the delay of
approximately 20 ms is in an acceptable range in order to allow secure communication
without impact on the quality of the VoIP call. These results allow planning of future steps
that need to be done on the way to a unified media security framework for VoIP which are
presented in the upcoming section.
9.2 Future Work and Open Issues
The prototype implementation of RTP over DTLS is capable of connection establishment
and data transmission. The performance of the DTLS and RTP components was tested
with acceptable results, but not in detail. There is however the potential for improvement
of performance. There might be some optimisation possible in the connection establish-
67
9 Conclusion and Future Work
ment, since this part was developed with almost no documentation from the developers
of DTLS in OpenSSL. The next suggested step in further development includes improvisa-
tion at DTLS level. H. Tschofenig an E. Rescorla introduced the SRTP compatibility mode
[9]. With the thereby presented enhancements to RTP over DTLS the performance can be
increased since overhead is reduced to a value comparable to ZRTP. In the following the
performance of SRTP compatibility mode of RTP over DTLS can be compared with exper-
iments to ZRTP in order to evaluate the approach.
The integration to the Twinkle softphone is as well not completely finished. This thesis
focuses on taking part in the development of a unified security framework regarding all
components in the system. Due to time restrictions the focus lies on the interaction of RTP
and DTLS components to provide a basis for further development. A concept of user in-
teraction in connection with the encryption scheme needs to be designed and integrated
to the softphone and SIP. Hereby the challenge lies in the combination of understanding
what is happening and ease of use in order to achieve acceptance among users. Thereby
the management of certificates needs to be integrated to the softphone along with notifica-
tion about the state of security and proper error handling upon possible DTLS handshake
failure and user notification about the security state of the connection. Furthermore at the
SIP (and SDP) side of the framework RTP over DTLS needs to be integrated to the session
invitation, so that the caller is able to inform the callee about the wish to establish an RTP
over DTLS session when connections are initiated over the SIP network.
68
Bibliography
[1] Christian Friedrich. Schematic representation of the ssl handshake protocol with
two way authentication with certificates— Wikipedia, the free encyclopedia, 2007.
[Online; accessed August 2007].
[2] N. Modadugu and E. Rescorla. The Design and Implementation of Datagram TLS,
2004.
[3] J. Postel. Internet Protocol. RFC 791 (Standard), 1981. Updated by RFC 1349.
[4] T. Dierks and E. Rescorla. The Transport Layer Security (TLS) Protocol Version 1.1.
RFC 4346 (Proposed Standard), 2006. Updated by RFCs 4366, 4680, 4681.
[5] J. Postel. Transmission Control Protocol. RFC 793 (Standard), 1981. Updated by RFC
3168.
[6] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol
for Real-Time Applications. RFC 3550 (Standard), 2003.
[7] J. Postel. User Datagram Protocol. RFC 768 (Standard), 1980.
[8] E. Rescorla and N. Modadugu. Datagram Transport Layer Security. RFC 4347 (Pro-
posed Standard), 2006.
[9] E. Rescorla H. Tschofenig. Real Time Transport Protocol (RTP) over Datagram Trans-
port Layer Security. Internet Draft, February 2006.
69
Bibliography
[10] Wikipedia. Voice over ip — Wikipedia, the free encyclopedia, 2007. [Online; ac-
cessed 22-April-2007].
[11] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks,
M. Handley, and E. Schooler. SIP: Session Initiation Protocol. RFC 3261 (Proposed
Standard), 2002. Updated by RFCs 3265, 3853, 4320, 4916.
[12] H. Schulzrinne and C. Agboh. Session Initiation Protocol (SIP)-H.323 Interworking
Requirements. RFC 4123 (Informational), 2005.
[13] T. Berson. Skype security evaluation, October 2005.
[14] R. Hancock, G. Karagiannis, J. Loughney, and S. Van den Bosch. Next Steps in Sig-
naling (NSIS): Framework. RFC 4080 (Informational), 2005.
[15] E. Rescorla. HTTP Over TLS. RFC 2818 (Informational), May 2000.
[16] Shamir A. Rivest, R. and L.M. Adleman. Cryptographic communications system and
method. US Patent 4405829, 1977.
[17] P. Karn and W. Simpson. Photuris: Session-Key Management Protocol. RFC 2522
(Experimental), 1999.
[18] D. Brezinski and T. Killalea. Guidelines for Evidence Collection and Archiving. RFC
3227 (Best Current Practice), 2002.
[19] H. Schulzrinne. The tel URI for Telephone Numbers. RFC 3966 (Proposed Standard),
2004.
[20] Voice over misconfigured internet telephones - (vomit). http://vomit.xtdnet.nl/.
[21] M. Baugher, D. McGrew, M. Naslund, E. Carrara, and K. Norrman. The Secure Real-
time Transport Protocol (SRTP). RFC 3711 (Proposed Standard), 2004.
70
Bibliography
[22] Ed.Avaya J. Callas P. Zimmerman, A. Johnston. ZRTP: Media Path Key Agreement
for Secure RTP. Internet Draft, 2007.
[23] Jörg Schwenk André Adelsbach, Mark Manulis. Voipsec Studie. Technical report,
Bundesamt für Sicherheit in der Informationstechnik.
[24] S. Kent and K. Seo. Security Architecture for the Internet Protocol. RFC 4301 (Pro-
posed Standard), 2005.
[25] S. Kent. IP Encapsulating Security Payload (ESP). RFC 4303 (Proposed Standard),
2005.
[26] C. Kaufman. Internet Key Exchange (IKEv2) Protocol. RFC 4306 (Proposed Stan-
dard), 2005.
[27] E. Rescorla. Diffie-Hellman Key Agreement Method. RFC 2631 (Proposed Standard),
1999.
[28] D. Maughan, M. Schertler, M. Schneider, and J. Turner. Internet Security Association
and Key Management Protocol (ISAKMP). RFC 2408 (Proposed Standard), 1998.
Obsoleted by RFC 4306.
[29] D. Piper. The Internet IP Security Domain of Interpretation for ISAKMP. RFC 2407
(Proposed Standard), 1998. Obsoleted by RFC 4306.
[30] H. Orman. The OAKLEY Key Determination Protocol. RFC 2412 (Informational),
1998.
[31] H. Krawczyk. Skeme: A versatile secure key exchange mechanism for internet. In
Proceedings of the 1996 Symposium on Network and Distributed System Security (SNDSS
’96), 1996.
71
Bibliography
[32] Wikipedia. Ipsec — Wikipedia, the free encyclopedia, 2007. [Online; accessed June
2007].
[33] S. Kent. IP Authentication Header. RFC 4302 (Proposed Standard), 2005.
[34] Whitfield Diffie, Paul C. van Oorschot, and Michael J. Wiener. Authentication and
authenticated key exchanges. Designs, Codes and Cryptography, 2(2):102–125, 1992.
[35] S. Casner and V. Jacobson. Compressing IP/UDP/RTP Headers for Low-Speed Se-
rial Links. RFC 2508 (Proposed Standard), 1999.
[36] C. Bormann, C. Burmeister, M. Degermark, H. Fukushima, H. Hannu, L-E. Jons-
son, R. Hakenberg, T. Koren, K. Le, Z. Liu, A. Martensson, A. Miyazaki, K. Svanbro,
T. Wiebke, T. Yoshimura, and H. Zheng. RObust Header Compression (ROHC):
Framework and four profiles: RTP, UDP, ESP, and uncompressed. RFC 3095 (Pro-
posed Standard), 2001. Updated by RFCs 3759, 4815.
[37] T. Koren, S. Casner, J. Geevarghese, B. Thompson, and P. Ruddy. Enhanced Com-
pressed RTP (CRTP) for Links with High Delay, Packet Loss and Reordering. RFC
3545 (Proposed Standard), 2003.
[38] D. Ignjatic, L. Dondeti, F. Audet, and P. Lin. MIKEY-RSA-R: An Additional Mode
of Key Distribution in Multimedia Internet KEYing (MIKEY). RFC 4738 (Proposed
Standard), November 2006.
[39] Joan Daemen and Vincent Rijmen. The Design of Rijndael: AES—The Advanced Encryp-
tion Standard. Springer-Verlag, 2002.
[40] ISO/IEC. Information technology – security techniques – code of practice for infor-
mation security management, June 2005.
72
Bibliography
[41] E. Rescorla N. Modadugu. Extensions for dtls in low bandwidt environments. draft-
rescorla-tls-partial-00, October 2005.
[42] E. Rescorla. Tls partial encryption mode. draft-rescorla-tls-partial-00, October 2005.
[43] Certicom T. Kause A. Kapoor, R. Tschalar. Internet x.509 public key infrastructure –
transport protocols for cmp. Internet-Draft, feb 2004. http://tools.ietf.org/id/draft-
ietf-pkix-cmp-transport-protocols-05.txt.
[44] H. Tschofenig J. Fischl. Session initiation protocol (sip) for media over transport
layer security (tls), February 2006.
[45] H. Tschofenig J. Fischl. Session description protocol (sdp) indicators for datagram
transport layer security (dtls). draft-fischl-mmusic-sdp-dtls-00, February 2006.
[46] The openssl project. http://www.openssl.org.
[47] The gnu ccrtp library. http://www.gnu.org/software/ccrtp/.
[48] The twinkle softphone project. http://www.twinklephone.com/.
[49] Header file for the au-file format.
http://www.opengroup.org/public/pubs/external/auformat.html.
[50] Gnuplot. http://www.gnuplot.info/.
[51] International Telecommunication Union. Recomendation G.114 - One-way Trans-
mission Time. Series G: Transmission Systems and Media, Digital Systems and Net-
works, May 2003.
73
Recommended