of 72/72
Saarland University Faculty of Natural Sciences and Technology I Department of Computer Science Master Thesis Virtualization of Video Streaming Functions Submmited by: Birhan Tadele Teklehaimanot Advisor Goran Appelquist Supervisor Prof. Dr.-Ing. Thorsten Herfet Reviewers Prof. Dr.-Ing. Thorsten Herfet Prof. Dr. Dietrich Klakow April 25, 2016

Virtualization of Video Streaming Functions

  • View
    1

  • Download
    0

Embed Size (px)

Text of Virtualization of Video Streaming Functions

Department of Computer Science
Advisor
Eidesstattliche Erklarung
Ich erklare hiermit an Eides Statt, dass ich die vorliegende Arbeit selbststandig verfasst und
keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Ich erklare hiermit
an Eides Statt, dass die vorliegende Arbeit mit der elektronischen Version ubereinstimmt.
Statement in Lieu of an Oath
I hereby confirm that I have written this thesis on my own and that I have not used any other
media or materials than the ones referred to in this thesis. I hereby confirm the congruence of
the contents of the printed data and the electronic version of the thesis.
Saarbrucken, on April 25, 2016
Birhan Tadele Teklehaimanot
Einverstandniserklarung
Ich bin damit einverstanden, dass meine (bestandene) Arbeit in beiden Versionen in die Biblio-
thek der Informatik aufgenommen und damit veroffentlichtwird.
Declaration of Consent
I agree to make both versions of my thesis (with a passing grade) accessible to the public by
having them added to the library of the Computer Science Department.
Saarbrucken, on April 25, 2016
Birhan Tadele Teklehaimanot
Abstract
Edgeware is a leading provider of video streaming solutions to network and service operators.
The Edgeware Video Consolidation Platform(VCP) is a complete video streaming solution con-
sisting of the Convoy Management system and Orbit streaming servers. The Orbit streaming
servers are purpose designed hardware platforms which are composed of a dedicated hardware
streaming engine and a purpose designed flash as a storage system. The Orbit streaming server
is an accelerated HTTP streaming cache server which have up to 80 Gbps bandwidth and can
stream to 128000 clients from a single rack unit. In line with the new trend of moving more and
more functionalities towards a virtualized or software environment, the main goal of this thesis
is to make a performance comparison between Edgeware’s Orbit streaming server and one of the
best generic HTTP accelerators(reverse proxy severs) after implementing logging functionality
of the Orbit on top of it. This is achieved by implementing test cases for the use cases that can
help to evaluate those servers. Finally, after evaluating those proxy servers Varnish is selected
and then compared the modified Varnish and Orbit to investigate the performance difference.
iii
Acknowledgements
First and foremost, I would like to express my heartfelt gratitude to my supervisor Prof. Dr.-
Ing Thorsten Herfet for providing me an opportunity to write my thesis with him. My sincere
thanks to Goran Appelquist, for his patience and invaluable guidance. During our periodic
discussions, his constructive suggestions have helped me to gain such a wonderful experience on
doing my thesis. Besides, I have learned a lot while working with him. Furthermore, I would
like to thank my immediate family, specifically my father and my mother for helping me to
reach here even though the culture in our village is difficult to send girls to school. Last but
not least, my wholehearted gratitude to my brothers, sisters and my husband for their love,
encouragement, endless motivation and support.
Contents
1.2.2 True Streaming Technologies . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.3 HTTP Adaptive Bitrate Streaming Technologies . . . . . . . . . . . . . . 4
2 Background and Related Works 6
2.1 Content Delivery Networks(CDN) . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 Components of CDN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Proxy servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The Orbit streaming server 12
3.1 Edgeware solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.3 Proposed solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.1 Squid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
v
4.3 Nginx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.4 Varnish . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.5 Aiscaler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.1.1 Live test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.1.2 100% cache hit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1.3 90% cache hit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.2 Implementation of test cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 Configuration of proxy servers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.4 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.1 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.1 Live test case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.2 100% cache hit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
6.3 90% cached . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.6 Orbit and modified Varnish performance comparison . . . . . . . . . . . . . . . . 43
6.6.1 Varnish results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.6.2 Orbit results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
WMS Windows Media Services
TCP Transport Control Protocol
IIS Internet Information Services
ATS Apacche Traffic Server
1.1 Multimedia Streaming
Internet was originally designed to support data traffic transmission. Later in the early 1990’s,
the need of multimedia transmission emerges due to the growth of Internet in terms of users,
applications and nodes. Nowadays multimedia content is composing of large portion of the
internet traffic as more and more users are accessing multimedia content over the internet. For
instance, the share of mobile video traffic will reach up to 67% in 2017 [7] and it is estimated that
the multimedia transmission will cover up to 90% of the internet traffic within the next few years
[6]. Multimedia includes text, still images, audio, animation and video in an integrated manner.
Multimedia streaming refers to transmission of multimedia content from streaming sender to
a streaming receiver in a compressed form without downloading the whole content at the re-
ceiver device. The basic difference between multimedia streaming and textual data transfer
is that multimedia streaming requires real-time delivery but can tolerate certain amount of
data loss. Moreover, the main components of a multimedia streaming system are the encoder,
streaming server, streaming client, media transfer protocol and the underlying physical network.
The minimal set actions performed by the streaming system are, first camera captures and
produces either still images or video. The output from camera can be raw media without any
compression and this requires a very large bandwidth as it’s size is very large compared to
compressed version of it. Therefore, the media should be compressed using appropriate com-
pression techniques at the sender end to transmit over the internet. The compressed media
are then stored in the server together with its metadata which is the description of the media
such as location and timing information about the media. Hence, when the receiver requests
a certain media, the sender sends that media and its metadata. The media is received in the
form of packets and reassembled to the original compressed stream at the receiver end. Then,
1
2 Chapter 1. Introduction
the decoder takes this compressed stream and it decodes the media. Finally, the decoded media
is subsequently passed to the renderer for display.
In general, depending on the media, streaming can be classified into on-demand and live stream-
ing. In on-demand streaming media is pre-recorded, compressed and stored on the streaming
server and delivered to clients when requested. On the other hand, in live streaming media is
captured, compressed and transmitted on the fly.
1.2 Streaming Technologies
In this section the background of the most widely used streaming technologies is described in
short.
1.2.1 Traditional HTTP Download Technologies
Traditional HTTP Download is the basic technology for transmission of content over internet.
It uses HTTP protocol to download the content to the receiver device and then it play out
locally. The most commonly used traditional HTTP download methods are the following:
1.2.1.1 HTTP Download
HTTP Download is widely used technology for data transfer. When using http download the
content is downloaded and stored on the receiver’s device and playback does not begin until the
media is downloaded completely which may cause delays for large media.
1.2.1.2 HTTP Progressive Download
HTTP progressive download is a widely used media streaming technology and it uses HTTP/TCP
protocol. In progressive download, contents are downloaded partially and progressively stored
on user’s device [14] which improves HTTP Download method by reducing the delay before
playback begins. First, the metadata which tells the player how to play the media is down-
loaded. Then, playback begins after the metadata and sufficient data has been buffered on the
receiver’s device and the rest of the content is continued to be downloaded and save while the
player plays what is already downloaded data. However, there is no bandwidth adaption since
it does not consider the variation of network condition between client and server. In addition,
it cannot be used to stream a live media as it needs offline preparation and can be inefficient in
controlling the bandwidth from the ISP point of view [9].
Chapter 1. Introduction 3
1.2.1.3 HTTP Pseudo Streaming
HTTP Pseudo Streaming is very similar to HTTP Progressive Download, except that the player
can seek forward or backward even the content is not yet downloaded. The player uses byte
offset or number of seconds from the start of the video to find the desired part of video. The
player can buffer the content without saving it in the receiver’s device and it is not mandatory
to download the video from start to finish which means the player can stop the stream and
jump to different point.
1.2.2 True Streaming Technologies
True streaming technologies are the most popular streaming protocols for Flash and windows
media streaming. These technologies create a connection to a dedicated media server and the
content is sent to the end-user device as a series of small packets. True Streaming technologies
use a stateful protocol, which means from the first time a client connects to the streaming server
until the time it disconnects from the streaming server, the server keeps track of the client’s
state. Commands like PLAY, PAUSE and STOP can be issued by the end-user for playback
control and multi-bitrate delivery is supported. However, it is not common to switch between
bitrates once streaming has started. The two most common methods are Adobe Flash Media
Server and Microsoft Windows Media Server.
1.2.2.1 Adobe Flash Media Server
Adobe Flash Media Server (FMS) uses proprietary data from Adobe Systems (formerly Macro-
media) and is a hugely popular streaming platform. FMS is commonly installed on a media or
origin server running linux operating system; however, FMS can be supported on the Windows
Server operating system. FMS supports stored Video on Demand(VOD) and live media delivery
[14].
1.2.2.2 Microsoft Windows Media Services
Microsoft Windows Media Services (WMS) supports both VOD and live media delivery. WMS
is normally installed on a media or origin server running the Windows Server 2003 or 2008
operating system; however, there are proprietary variants which run on non-Windows servers.
WMS has the ability to enforce authentication and impose connection limits. The preferred
protocol for WMS is Real Time Streaming Protocol(RTSP) [14].
4 Chapter 1. Introduction
1.2.3 HTTP Adaptive Bitrate Streaming Technologies
HTTP adaptive bit-rate streaming is currently the most sophisticated method for streaming
media delivery [13]. The content is encoded at multiple bit rates allowing selection of differ-
ent encoded versions during streaming based on the available network bandwidth and client
resources. For each encoded version, the content is divided into a series of smaller segments
or chunks each with 2-10 seconds in length which are reassembled and played back as a single
continuous stream. This makes very easy for the receiver player to jump forward or backward
in the video. Eventually, a manifest file is created to act as a table of contents for the segments.
The quality of the segments can be adapted during streaming based on network quality condi-
tions which means player is able to switch seamlessly between the different fragments at any time
during the playback, so the player can select the desired quality level and adjust automatically
based on real-time network conditions. Therefore, during streaming only the relevant segments
are requested by the player and sent for reassembly and playback on the receiver device. An
additional benefit of HTTP adaptive bit rate streaming is the ability to utilize CDNs to cache
video content closer to end-viewers.
In addition, Apple HTTP live streaming, adobe HTTP dynamic streaming, microsoft IIS smooth
streaming and MPEG dynamic adaptive streaming over HTTP are the most widely used ex-
amples of HTTP adaptive bit rate streaming technologies.Each of them are described briefly as
follows.
1.2.3.1 Apple HTTP Live Streaming
Apple HTTP live streaming (HLS) is an HTTP-based media streaming technology imple-
mented by Apple in 2009 [15]. In HLS, video and/audio inputs are typically encoded using
H.264/Advanced audio coding(AAC) and then the stream segmenter breaks the stream into a
series of short segments that are saved as transport stream files(TS files)along with an index
file (.m3u8) which indicates the order of the TS files. These TS files are stored on a standard
HTTP web server for distribution along with the URL for the index file. The receiver player
begins by fetching the index file using its URL and then reading it to request the appropriate
segments, and displays content without any pauses or gaps as a continuous stream. HLS is
optimized for delivery to IOS devices and Safari browsers, but there are solutions on all other
platforms as well, with different quality.
1.2.3.2 Adobe HTTP Dynamic Streaming
HTTP Dynamic Streaming(HDS), is Adobe’s method for adaptive bitrate streaming and it sup-
ports live and on-demand delivery of MP4 media over regular HTTP connections. HDS allows
Chapter 1. Introduction 5
for adaptive streaming over HTTP to any device that’s compatible with Adobe Flash orAdobe
Integrated Runtime(AIR). It converts the content into a fragmented MP4 file format and deliv-
ers high-definition video/audio using any Flash compatible codec (H.264/AAC, VP6/MP3).On
demand content is converted using file packager utility as a post processing step; whereas, live
streams are converted real time using Adobe’s live Packer utility. An open source file specifica-
tion is used for fragmentation with .F4M for the manifest (index) file and .F4F for the media
segments which are stored in a single file. Moreover, adobe’s HTTP origin module is installed
on a standard HTTP web server to handle fragment requests and the streams are played using
Adobe Flash Player 10.1 or AIR.
1.2.3.3 Microsoft IIS Smooth Streaming
IIS(Internet Information Services) Smooth Streaming is Microsoft’s HTTP adaptive streaming
technology which uses Microsoft Silverlight as an application framework similar to Adobe Flash.
It supports multiple audio and video codecs. IIS is installed on an origin server running Windows
Server 2008 and Smooth Streaming is a plug-in software module to IIS. In IIS, media content
is stored as single file format with the segments stored as fragments MP4 within the file [14].
1.2.3.4 MPEG-Dynamic Adaptive Streaming over HTTP
MPEG Dynamic Adaptive Streaming over HTTP(DASH) is a standard which is defined by
MPEG to enable the interoperability between severs and clients of different vendors. It is a
generic solution based on HLS, HDS and Microsoft IIS Smooth Streaming. DASH is introduced
in order to standardize these proprietary solutions and have interoperability among themselves.
As a result, client applications can use all the streaming formats of different proprietary solutions
[10]. DASH, like HLS, HDS and Microsoft IIS Smooth Streaming, uses the concept of segments
and the equivalent of a playlist or manifest file, known as a Media Presentation Description
(MPD) file. Moreover, DASH can treats the video stream as a single file (does not create
segment files), so the MPD file points to offsets in the origin file rather than to segment files
[13].
Background and Related Works
In this chapter the concept of content delivery networks (CDN) is explained along with it’s
components. Furthermore, the concept of proxy servers and how caching work with reverse
proxy servers are also introduced in a brief way to insure a better understanding of this work.
2.1 Content Delivery Networks(CDN)
High bandwidth requirement and rate variation of videos in compressed format introduces
some challenging issues to the end-to-end delivery over wide area network. As a consequence,
Content Delivery Networks has been evolved in order to overcome these challenges and improve
accessibility of the Internet. CDN is a large, geographically distributed network of specialized
servers that accelerate the delivery of web content and rich media to Internet-connected devices
[8].
The main concept at the basis of this technology is the delivery at edge points of the network, in
proximity to the request areas, to improve the user’s perceived Quality of Service (QoS) when
accessing Web content. This concept of CDN uses edge caching, which entails storing replicas
of static text, image, audio, and video including various forms of interactive media streaming
content in multiple servers around the ”edges” of the Internet, so that user requests can be
served by a nearby edge server rather than by a far-off origin server. The purpose of caching is to
reduce network traffic to a minimum; this is achieved by delivering content from caches as close
to the requesting user as possible but also by ensuring the delivery device has effectively cached
the content from previous requests [14]. CDNs typically host static content including images,
video, media clips, advertisements, and other embedded objects for dynamic Web content.
Typical customers of a CDN are media and Internet advertisement companies, data centers,
Internet Service Providers (ISPs), online music retailers, mobile operators, consumer electronics
manufacturers, and other carrier companies.
6
Figure 2.1: CDN Architecture [8].
2.1.1 Components of CDN
Most CDN architectures are constructed from the following key components:
• Content delivery component: It contains origin server and a set of edge servers (cache
servers) to replicate the content and deployed as near as possible to the users. The origin
servers are the master sources for the content and can be deployed within the operator’s
network or more commonly within a content owner’s infrastructure. The primary purpose
of content delivery component is to deliver data to end users.
• Content distribution component: moves content from the origin server to cache servers
and ensures consistency. These can be deployed in a hierarchical model to allow tiered
caching and protection to any origin servers.
• Request-routing component: Direct user requests to cache servers and interact with the
distribution component to keep the content fresh.
• Accounting component: maintains logs of client accesses and records usage of the servers
assists in traffic reporting and usage-based billing.
2.2 Proxy servers
A proxy server is an intermediary server that intercepts requests from clients seeking resources
from different severs across the internet. Those resources can be images, files, web page, video,
audio,etc. Proxy server facilitates communication between clients and servers and can filter
requests based on its various rules. It can allow or reject communications by validating the
8 Chapter 2. Background and Related Works
requests against the available rules. There are different kinds of proxy servers, we will see here
three basic types of proxy servers although the focus of this work is on reverse proxy server.
2.2.1 Forward proxy servers
A forward proxy server intermediates traffic between client and the destination chosen by the
client. It enables a client to connect to a remote network to which it normally does not have
access. Moreover, it can also be used to cache data, reducing load on the networks between the
forward proxy and the remote web server. A forward proxy cache needs explicit configuration
of the browser to direct all requests to the proxy cache rather than the target web server.
2.2.2 Transparent proxy servers
Transparent proxy cache achieves the same goal as forward proxy cache, but operates trans-
parently to the browser. The browser does not need to be explicitly configured to access the
cache. Transparent caches are especially useful to ISPs, because they require no browser setup
modification. Moreover, they are the simplest way to use a cache internally on a network,
because they don’t require explicit coordination with other caches. However, many companies
like YouTube are currently trying to prevent the use of transparent proxies since they want to
have a full control of the communication between the service providers and clients.
2.2.3 Reverse proxy servers
Reverse proxy, also known as Web Server Accelerator is an intermediary server which stores
responses from the origin server( a server that contains the content) in its cache and serve
subsequent requests for the same content from this cache. It proxies on behalf of servers and
appears to the end users as the origin server. The origin servers will never be accessed from
outside since every request for the origin server is passed through the reverse proxies. When a
client requests some content, the DNS will route the request to the reverse proxy server instead
of the origin server. The reverse proxy checks the content in its cache, if not it connects to the
origin server and fetches the requested content to its cache and serves the users. Requested
content can be fetched from one or more origin servers but for the user it looks like a content of
one server. Reverse Proxy servers check validity of the stored data using the additional HTTP
headers received from the origin server. In addition, origin server controls weather a given
content should be cached by the proxy server or not using HTTP headers.
Reverse proxy server checks weather the requested data is in cache and still valid when it receives
a request on behalf of a server. So, if the content is not in the cache it forwards the request to
the origin server. Moreover, if the data is in cache but not valid, it deletes the content from the
cache and forwards the request to the origin server. On the other hand, if the data is in cache
Chapter 2. Background and Related Works 9
and it is still valid the reverse proxy forwards the requested data to the client from its cache.
It also checks weather the response is cacheable or not before storing the content to its cache,
when it receives a response from the origin servers.
Reverse proxy reduces load on the origin server rather than reducing upstream network band-
width on the client side as forward and transparent proxy servers. Reverse proxy handles all
traffics before it can reach the origin server by sitting between the client and the origin server.
Reverse proxy servers are used to reduce bandwidth usage and improve performance by stor-
ing static contents like images, videos, audios, etc on its cache and then serving users without
going to the origin server. This can also help to offload a very busy server and to reduce the
response time and enhance customer’s browsing experience. Moreover, proxy servers protect
origin servers and act as additional defence against security attacks because they intercept
requests to the origin servers.
2.2.3.1 How caching works on reverse proxy
Clients always use HTTP when talking to a caching proxy(reverse proxy) even if the application
is an FTP transfer.
• Is it cacheable
A response is called cacheable if it can be used to answer a future request. A cache decides
whether a particular response is cacheable or not by checking some parts of the request and
response. In particular it check the following: the response status code, request method,
response cache-control directives, a response validator and request authentication. More-
over, in some of the caches, valuable responses(frequently requested) are cacheable than
those requested once. The most important HTTP header tags which are used by reverse
proxy to check validity of the cached content and to check whether the response from the
origin is cacheable are:
– Last-Modified:-Tells the proxy when the file was last modified.
– Expires:- Tells the proxy when to drop the file from the cache.
– Cache-Control: Tells the proxy if the file should be cached.
– Pragma: Also tells the proxy if the file should be cached.
• Definition of Cache Hit and Miss
When a cache receives a request, it checks to see if the response has been cached. If it is
found in the cache we call it cache hit, otherwise we call it cache miss. When the object
is found, the cache has to decide if the stored is fresh or stale. A cached response is called
fresh if the expiration time has not been reached; otherwise it is stale. Moreover, fresh
10 Chapter 2. Background and Related Works
response will be send to the client immediately; on the other hand stale responses require
validation from the origin server.
Hit ratio is used in order to measure the effectiveness of cache. It refers to the percentage
of requests that are satisfied as cache hits and usually it includes the validated and non
validated hits.
• Cache replacement policies
Cache replacement refers to the process of removing the old responses when the cache is
full and there is need of space for the new ones. Usually cache assigns some kind of value
to the object cached and the least valuable are removed first. Although, the definition
of “valuable“ differs from cache to cache. Typically, an object’s value is related to the
probability that it will be requested again, thus maximizing the hit ratio. The most known
cache replacement algorithms according to caching researchers are listed below.
1 Least recently used (LRU) This algorithm is the most popular replacement algorithm
that provides high performance in almost all situations. It removes the objects that
have not used for long time. This algorithm can be implemented by a simple list and
every time an object is accessed it will be on the top of the list. The least recently
accessed object will automatically be moved to the bottom of the list.
2 First in First out (FIFO) FIFO replacement algorithm is even simpler to implement
than LRU. In this algorithm objects are removed in the order they are added to the
cache.
3 Least Frequently used (LFU) LFU is similar to LRU. However, instead of selecting
objects only in terms of time since access and it also considers the number of access
as a significant parameter. LFU replaces objects with small access count and keeps
objects with high access account.
4 Size A size based algorithm uses the object size as the primary removal criteria. This
means that the largest object is removed first from the cache. However, the algorithm
needs an algorithm that measures how old an object stayed in the cache in order to
remove old objects first. Otherwise, the cache will only have smaller objects.
5 GreedyDual-Size(GDS) GDS assigns a value for every object based on the cost of the
cache miss and size of the object. Since GDS doesn’t specify what ”cost ” means,
it offers a lot of flexibility to optimize what you want. For example cost can be
defined as the latency which is the time it takes to receive the response. It can be
also defend as the number of packets transmitted over the network or the number of
hops between the origin server and the cache.
6 Greedy-Dual Size Frequency Greedy–Dual Size Frequency policy is proposed to max-
imize hit and byte hit rates for WWW proxies. This Proposed caching strategy
Chapter 2. Background and Related Works 11
incorporates the main characteristics of a file such as file size, file access frequency
and recentness of the last access. This algorithm is an improvement of Greedy-Dual-
Size algorithm current champion among the replacement strategies proposed for Web
proxy caches. In general, GDF-like replacement policies emphasizing frequency have
better byte hit ratio but result in worse file hit ratio.
Chapter 3
The Orbit streaming server
This chapter starts with an introduction to Edgeware solutions and then the Orbit server
features and functionalities are explained which are mainly based on Edgeware’s marketing
material. Finally, we describe the proposed solution on this thesis work.
3.1 Edgeware solutions
Edgeware is a leading provider of video streaming solutions to network and service operators.
It has a special platform for providing this solutions called Edgeware Video Consolidation
Platform(VCP). VCP is a highly accelerated and consolidated platform which significantly
reduces the high infrastructure costs required for the delivery of TV and video services [11].
VCP is a highly scalable platform to deliver a high quality video services to any screen, across
any network topology. It supports all major adaptive streaming frameworks such as Microsoft,
Apple and Adobe. It also provides CDN (described in 2.1) video delivery management functions.
VCP reduces capital expenditure of a company by at least 50 % [11]. As shown in 3.1 below, VCP
contains a highly accelerated video origin, called the VCP Origin, and a widely deployed operator
Content Delivery Network (CDN) solution, called the VCP Edge. The VCP Origin solution
Figure 3.1: Components of Edgeware Video Consolidation Platform [11]
12
Chapter 3. The Orbit streaming server 13
reduces complexity, performance requirements and cost of the origin servers by offloading all
recording, ingest, re-packaging and play out capacity [11]. Therefore, load balancers or complex
file systems are not required. The VCP Edge is based on Edgeware’s widely deployed Distributed
Video Delivery Network (D-VDN) solution, incorporated into the new Video Consolidation
Platform [11]. VCP Edge is the main part of Edgeware video consolidation platform and it
addresses the network infrastructure costs of networks service providers and operators which
are growing due to the rapid increase of TV and video demand over the Internet. Moreover,
VCP Edge is a highly optimized CDN caching and distribution solution for a wide range of
service applications. It is designed to deliver next generation video services with the highest
Quality of Experience (QoE) and scalability to any screen [11]. In addition, VCP Edge is
easily integrated with any content management, middleware, conditional access and resource
management systems. The VCP Edge has two components called Orbit hardware platform
and convoy management software which are fully integrated. The convoy management software
allows an operator to set-up and manage a complete video delivery network across any network
topology. It ensures efficient and effective configuration, content management, license control,
session management, monitoring, and an open integration framework in close integration with
the optimized Orbit Delivery Servers.
3.2 Edgeware Orbit server
The Orbit servers are fully integrated with Convoy Management Software, providing highly
scalable asset propagation, session management and fault tolerance. The Orbit servers offer
advanced capabilities for operators and content providers to offer a full range of Cloud TV and
video services, irrespective of network topology and core bandwidth. These servers use a com-
bination of a dedicated hardware streaming engine and a purpose designed flash-based storage
system, coupled with a Linux based control plane to give up to 80 Gbps or 128 000 streams
from a single unit.
The main functionalities of the Orbit server are Ingest, repackaging, encryption, caching and
streaming. Repackaging and encryption are done just in time. The Orbit platform have func-
tionalities like session handling, logging and backend selector.
• Backend selector: the origin servers that contains the content in Edgeware are organized as
a group of servers which contains a set of nodes which are a model of physical computers.
Each node contains a set of IP addresses that models network interfaces. The server groups
model data center. The main functionality of the backend selector is load balancing and
fail-over. The load can be spread randomly over servers in a group, or be based on the
content requested to optimize cache utilization.
14 Chapter 3. The Orbit streaming server
• Session handling: is a module which is used to limit access to the content, for setting
maximum number of TCP connections per session(client) and to group related requests
i.e during a fragmented streaming and when user sends HTTP requests with a few seconds
apart.
• Session Logger: This logging is enabled when the session handling module is enabled and
it logs:
– TIMESTAMP: Time, relative to epoch, when gathering of data for the sample
started.
– DURATION: Duration from first request to the last transfer.
– SENDTIME: The time spend streaming from the video server.
– IP: IP address of the client (remote host) that initiated the session.
– SESSIONID: The identifier of a session.
– CONTENT: URI of the initial request.
– BYTES: Number of bytes transferred from the video server.
– REFERRER: The Referer HTTP request header, if provided by the client.
– USERAGENT: The User-Agent HTTP request header, if provided by the client.
3.3 Proposed solution
The main aim of this thesis is to replace this Orbit sever hardware with an implemented software
version and then see the impact of performance in Edgeware solution. To do this we will select
three proxy servers using their general behaviour and then implement test cases for the use
cases that can help to evaluate those three proxy servers. Finally, after evaluating the three
proxy servers we will select one proxy server that will use as a cache server and then implement
some of the Orbit server functionality explained in 3.2 on top of this selected web server(reverse
proxy server).
Chapter 4
reverse proxy servers
HTTP reverse proxy servers are proxy servers that intercept HTTP requests coming from the
clients as it is described in 2.2.3. In this section the most commonly used HTTP reverse proxy
servers i.e squid, Apache traffic server, Nginx, Varnish and aicache are compared based on
performance, flexibility and license in a general way. However, it is not easy to find up to
date research papers on the reverse proxy servers and every proxy server adds new features and
functionalities overtime. Therefore, comparing those reverse proxy servers is not an easy task.
Moreover, their performance depends on the implementation, architecture and the flexibility
room to optimize them. In this chapter the pros and cons of each reverse proxy server is
described one by one. In addition, we tried to find some benchmarks on the performance
measurement of those servers from existing companies which are currently using those proxy
servers if any. Finally, we have selected three reverse proxy servers for further evaluation and
testing.
4.1 Squid
Squid is originated from the Harvest project in 1990s and it is the oldest and well-known of the
popular HTTP reverse proxy servers [2]. It is open source software licensed under the GNU
GPL and it supports HTTP, HTTPS, FTP. Squid offers a rich access control, authorization
and logging environment. It runs on many platforms including Linux, FreeBSD, and Microsoft
Windows and it typically runs as a single-process, single-threaded, asynchronous event proces-
sor. Squid stores the content in RAM until the RAM is full and then on disk. Therefore, the
RAM size and disk speed are important factors for its performance. Thousands of web-sites
around the internet use Squid to considerably increase their content delivery [2].
15
16 Chapter 4. General comparisons of HTTP reverse proxy servers
The cache replacement algorithms which are used in squid are LRU described 2.2.3.1, Greedy-
Dual Size Frequency (GDSF) which keeps smaller objects in cache, Least Frequently Used
with Dynamic Aging (LFUDA) that keeps poplar objects in cache regardless of thier size thus
optimizes byte hit rate at the expense of hit rate since one large, popular object will prevent
many smaller, slightly less popular objects from being cached.
4.1.1 Pros and cons
Squid has the following advantages:
• Caching of static objects. These are served much faster, assuming that your cache size is
big enough to keep the most frequently requested objects in the cache.
• Buffering of dynamic content
• Nonlinear URL space/server setup. Squid can be used to do some tricks with the URL
space and/or domain-based virtual server support.
• Features: Squid is richer than any available reverse proxy servers in features
The disadvantages are:
• Buffering limit for log records: Squid could not keep more than 64KB in its buffer log.
• Speed: Squid is not very fast when compared with the other reverse proxy servers available
today. Only if you are using a lot of dynamic features then there is a reason to use Squid,
and then only if the application and the server are designed with caching in mind.
• Memory usage: Squid uses quite a bit of memory. It can grow three times bigger than
the limit provided in the configuration file.
• Stability: Compared to the other reverse proxy servers, Squid is not the most stable.
• Scalability: Squid is limited in scalability on modern multi-core systems since it runs as
a single process, single threaded, and asynchronous event processor.
4.2 Apache traffic server
Apache traffic server was originally developed by Inktomi, and later donated to the Apache
server foundation (ASF) by Yahoo. Apache traffic server is a fast, scalable and feature rich
proxy server [1]. It has feature rich plugin APIs to develop extensions. Since it is a multi-
threaded event driven server, it combines asynchronous event processing and multi-threading
technologies to deal with concurrency. Apache traffic server can draw benefits from each tech-
nologies but it also makes the code and the technology complex and sometimes difficult to
Chapter 4. General comparisons of HTTP reverse proxy servers 17
understand. Apache traffic server is free and open source software that has robust plugin APIs
to extend and modify its behaviours and functionalities. It scales very well in modern multi-core
systems because it is a multi-threaded event driven proxy server.
There are a small number of “worker threads” in Apache traffic server; each such worker thread
is running its own asynchronous event processor. In a typical setup, this means Traffic Server
will run with around 20-40 threads only. This is configurable, but increasing the number of
threads above the default (which is 3 threads per CPU core) will yield worse performance due
to the overhead caused by more threads [18]. In ATS, the cache eviction algorithms that the
RAM cache supports are, LRU, LFU and Clocked Least Frequently Used by Size ( CLFUS)
which balances recentness, frequency, and size to maximize hit rate. The default algorithm is
CLFUS, but the user can select and set it in the configuration of ATS. Besides, Apache traffic
server uses a FIFO algorithm to update it’s disk cache.
In Yahoo CDN apache traffic server deliver 350,000 requests per second and 30 Gbps(95 per-
centile ) on which there are around 100 servers distributed over all the world. Additionally in
their lab they got 105,000 request per second out of one cache for small content and 3,6 Gbps
out of one server for large content. In addition to this Comcast use ATS in their CDN.
4.2.1 Pros and cons
The advantages of using Apache:
• It is very scalable, which means it needs little configuration and it can work in many
modes
• It uses efficient subsystem storage
The disadvantages are:
• It is not stable compared to others.
• it needs restart for some cases
4.3 Nginx
Nginx is an HTTP web server that also can function as a HTTP reverse proxy server. It is free
and an open source software that has a lot of plug-ins and released under a BSD-like license.
18 Chapter 4. General comparisons of HTTP reverse proxy servers
Nginx uses event driven multiple processes to solve the concurrency problem which needs small
CPU [3]. In addition to HTTP, it can proxy several other TCP protocols, and also have a
flexible plugin interface for extensions and additions of its behaviour and functionalities. It is
also well documented and widely available compared to the other reverse proxy servers. Nginx
uses a persistent disk based cache and the OS page cache keeps object in RAM. Moreover,
Nginx uses LRU cache replacement policy to evict contents from it’s cache if the specified size
for the cache exceeds.
4.3.1 Pros and Cons
• It has high performance and is stable with simple configuration.
• Consumes lower CPU power and memory
Disadvantages:
• Nginx requires a recompile of the entire application for new plugin APIs.
• Has some latency in accepting new connections
• Storage time is unlimited
4.4 Varnish
Varnish is a free and open source software which is licensed under two-clause BSD license. It
was initiated in 2005 and the first version was released in 2006 [18]. It focuses mainly on per-
formance and flexibility. Varnish was designed originally as a reverse proxy with the principle
of solving real problems, optimize for modern hardware (64-bit, multi-core, etc) and modern
workloads, work with the kernel not against it, innovation not regurgitation [19]. It takes the
advantage of modern kernel features to simplify the code. Moreover, Varnish does not keep
track of whether your cache is on disk or in memory. Instead, Varnish will request a large
chunk of memory and leave it to the operating system to figure out where that memory really
is. The operating system can generally do a better job than a user-space program. In general
to get a simpler design and reduce the amount of work Varnish needs to do, but it sacrificed
portability. For example in 32-bit system the virtual memory address space is limited to 3GB
which limits the size of cache and number of concurrent users[4].
Varnish is developed and tested on GNU/Linux and FreeBSD and its development is governed
by Varnish Governance Board (VGB). Varnish moves a lot of complexities to the kernel by
using the advanced features of the operating system such as accept filters, epoll and kqueue.
All caching are done using virtual memory provided by the operating system and each active
Chapter 4. General comparisons of HTTP reverse proxy servers 19
connection uses up a thread. Besides, Varnish uses LRU cache replacement algorithm both in
RAM and disk to remove contents from the caches.
Varnish uses its own domain specific configuration language called varnish configuration lan-
guage which translate to C-code, compiled with a normal C compiler and then dynamically
linked directly into Varnish at run-time. Varnish configuration language is lightning fast and
gives freedom to system administrators by allowing developers to define their own policies rather
than constrained by vanish developers. Varnish also contains modules called vmods which makes
it easy to extend, add new functionalities to Varnish or integrate Varnish with other software,
such as database or other network services. Example Integration with GeoIP databases or de-
vice detection for mobile users.
Varnish has two processes called parent and child process. The parent process starts the child
process when the varnishadm daemon starts and recover it when it dies by any reason.
Varnish contains different subroutines such as vcl recv, vcl fetch, vcl pipe, vcl pass, vcl hit,
vcl miss, and vcl error. But most of the tasks of VCL can be performed by:
• vcl recv: receives the requests, parse them, makes decision of serving from the cache or a
backend etc. It is able to alter the headers as well.
• vcl fetch: This method is called when an object is retrieved from a back end. The
basic operations here are to change the header, change the backend if previous one was
unhealthy etc.
Varnish plus is the commercial version of Varnish which contains all features of Varnish plus
some. Varnish plus has a measured performance up to 20 Gbit per second per single server in
video and audio streaming and it can stream to as many as 6500 users from one single server
[20].
The main advantages Varnish are:
• It is very flexible compared to other reverse proxy servers due to its own language VCL.
• Varnish gives you access to very detailed logs that are useful when debugging problems
without any cost.
• Developers can implement their own policies using VCL.
• It has modules called Vmods which are helpful to extend its functionalities.
Disadvantages of Varnish:
20 Chapter 4. General comparisons of HTTP reverse proxy servers
• Opens new thread for every connection use the advantage of the operating system to solve
this
• It is not portable for all systems because it is designed for modern hardware(64-bit, multi-
core, etc).
4.5 Aiscaler
Aiscaler is not a pure caching solution instead it is all-in-one application delivery controller
(ADC) solution which is normally installed as a reverse proxy on a dedicated machine. Some
features of aiscaler are caching, SSL offloading, DDoS protection, multiplexed session manage-
ment, mobile device detection and IP-based geocontent delivery. Aiscaler is easy to configure
and creates better user experience by increasing speed and availability of a web site by offload-
ing request processing from the web, reducing code complexity and reducing cost for servers,
reducing space, power and cooling. Aicache is a Linux application, custom written in C. it is a
right threaded application which means it use limited number of threads (processes)
4.5.1 Pros and cons
The advantages of Aicache which is a caching feature of Aiscaler are :
• High performance
• Low resource usage
Disadvantages of Aiscaler:
• Aiscaler performs well especially with a dynamic web sites. But in our case we need a
reverse proxy for static contents.
Pages load faster with over 250,000 RPS served directly from aiScaler. However, we are not
interested in aiscaler because it is all in one solution not pure caching solution.
4.6 Conclusion
Based on the advantages and disadvantages of the above HTTP reverse proxy servers, we
selected Apache traffic server, Nginx and Varnish for further evaluation and testing to see their
performance and behaviours.
Chapter 5
Test methodology
We need to define some test cases to evaluate the performance of the generic HTTP acceleration
servers(cache servers) which are defined in chapter 4 as well as the Orbit server demonstrated
in chapter 3. We can measure the performance of cache servers using cache hit, cache miss and
live test cases. But in the case of cache miss, since the cache server should fetch the content
from the origin server (the server that contains the content) the performance will depend both
on the cache server and the origin server from which the content is fetched. Therefore, due
to such limitations only live , 100% cache hit and 90% cache hit test cases are implemented
and evaluated as part of this thesis. Moreover, to compare the cache servers we need some
parameters (characteristics) as a performance measure. In this thesis the characteristics that
we have used to evaluate those cache servers and the Orbit server are response time, CPU usage
and network traffic (bandwidth). We will describe them more in 6.
The video assets that we have used in our test case are stored in an origin server so that the
proxy servers and the Orbit server will fetch those video assets. All assets are chunked into
different length of small fragments. The fragments are grouped based on their content bitrate.
In other words, an asset contains different length of many fragments with a different sizes.
Therefore, we should state number of assets, number of fragments and their length and the
quality(content bitrate) of the fragments as an input in all test cases.
This chapter starts with the definition of test cases that we have used for our evaluation.
Following the definition, the implementation of those test cases is explained and then, the
configuration of the generic proxy servers which are selected in chapter 4 will be demonstrated.
Finally setup of the test environment is summarized in short.
21
22 Chapter 5. Test methodology
5.1 Definition of test cases
As it is mentioned in the above we have used three test cases namely, live, 100% cache hit and
90% cache hit test cases to evaluate the performance of the three reverse proxy servers selected
in chapter 4 and Orbit server. But, for Apache traffic server, the 90% cache hit test case is not
implemented due to the bad result collected from the live and 100% cache hit test cases. In the
case of 100% cache hit and 90% cache hit test cases, the video transmission was not a real video
on demand(VoD) since all clients were synchronized to spread over the assets and request the
fragments of the assets sequentially i.e all clients were requesting the same portion of the assets
at the same time. However, in a real VoD different clients can request different portion of the
assets at any time. The size of each fragment that we have used in our test cases was 132 KB.
5.1.1 Live test case
In this test case all clients were requesting a single asset with 5 second fragments at 300 kbps.
The asset was not in cache before the test so that all servers fetch and serve the first request for
a fragment from the origin server and then serve the next requests for the same fragment from
their cache. Proxy servers put data in memory to increase performance by decreasing the access
of hard disk. Every client was requesting the fragments sequentially and continuously as they
are represented in figure 5.1. Fragment length is the time that a client waits for that fragment
before requesting the next fragment which means if the client didn’t receive the fragment within
5 seconds its timeout will reach and the request will be a late request. In this test case, we have
tested the maximum number of clients(streams) in each server. In addition, the response time,
CPU usage and egress bandwidth are tested having the same number of clients for all servers.
Figure 5.1: Live fragments
5.1.2 100% cache hit
In this test case all assets were saved in the cache of the proxy servers and the Orbit server
before the test and all clients were requesting those assets to simulate video on demand stream-
ing. In the proxy servers, the clients were spread over the assets and request the fragments in
the asset sequentially. Proxy servers put data in RAM cache as much as possible to decrease
the load of disk access and to keep contents in RAM for fast access. In the case of ATS the user
can specify the size of this RAM in its configuration. However in Varnish and nginx contents
will be stored in RAM as much as there is free space in the RAM. The assets are chunked into
5 second fragments with a 300 kbps content bitrate. Figure 5.2 represents the fragments in this
test case. The length of the fragments is a time that a clients waits to receive that fragment as
it is stated in 5.1.1. The focus of this test case was to see the number of assets that each proxy
server can serve without hitting any limitation which depends on their resource usage and then
having the maximum assets that can fit in the RAM cache of each proxy server, we have tested
and compared the response time, CPU usage and bandwidth of all servers.
Figure 5.2: 100% cached fragments.
24 Chapter 5. Test methodology
5.1.3 90% cache hit
To see the ingest performance of the proxy servers, 90% test case is implemented. In this test
case, all clients spread over the assets and request the fragments sequentially the same as the
100% cache hit test case. The reverse proxy servers were serving 90% of the requests from
their cache and fetch and serve the rest from the origin. Figure 5.3 represent the 90% cache hit
fragments on which the fragments that are saved in the cache before the test are represented
by a green color and the other one represents the fragments that are not in the cache server.
Figure 5.3: 90% cached fragments.
Chapter 5. Test methodology 25
5.2 Implementation of test cases
The test cases are implemented using Python and bash script. The implementations are based
on an in-house tool called rq which is used to generate TCP load. In those implementations rq
is configured with different parameters like the destination hosts and ports, number of clients,
number of video assets, number of fragments in each asset, quality and fragment length. In
the implementations, a reporter module is used to generate output of the test results in a pdf
format. Besides, VMstat and network statistics are collected from the proxy servers machine to
produce CPU usage and network traffic streaming respectively. For each test case, the number
of assets and fragments are varying as can be clearly seen in the table 6.2. However, for the
100% and 90% cache hit test cases, the video on demand property were not simulated exactly
as it is stated in section 5.1.
5.3 Configuration of proxy servers
In this work, each proxy server was configured as a cache server. The configuration is different
for each proxy server and the default configuration could not give us a reliable result for the
implemented test cases discussed above. Therefore, a lot of optimizations were made for each
proxy server to get a result which is comparable with Orbit server.
In the case of ATS we have configured it as reverse proxy server that can be used as a cache
server. It is only configured for the live and 100% cache hit test cases. The main configuration
files are called records, remap, storage and cache. Records configuration of ATS is used to
configure the server to act as a reverse proxy server and set how much RAM should be used to
store the most accessed assets.It also sets the IP address and port on which it should be accessed
and the IP address to access the origin server and some more tunings. In remap configuration
file we have defined both map and reverse-map rules. A map rule translates the URL in the
client requests into the URL of the origin server where the content is located. It constructs
a complete request URL from the the client URL and its headers and then looks for match
with its list of targeted URLs in the remap rules. Besides, a reverse-map translates the URL of
redirect responses from the origin server into the address of ATS, so that clients are redirected
to ATS instead of accessing an origin server directly. Therefore, clients cannot access the origin
server without knowledge of Apache traffic server. Furthermore, storage configuration is used
to set how much hard disk will be used by Apache traffic server. Likewise, some caching rules
are set in the cache configuration file.
Varnish uses its own configuration language called varnish configuration language(VCL) to con-
figure it. This language was used to tell Varnish which origin server it has to use and all required
caching rules. Unlike Apache traffic server, VCL is a very flexible configuration language. More-
26 Chapter 5. Test methodology
over, another configuration file was used to tweak Varnish, to set the IP address and port on
which Varnish should listen.
In the case of Nginx, it has its own way of configuration and caching rules are defined to configure
Nginx as a reverse proxy server. In addition, we have defined the address of the origin server,
the storage to be used, the address and port on which Nginx listens and many optimizations in
the configuration file.
5.4 Test Environment
The test environment that we have used in our test case contains the following components:
client machine, Orbit server, origin server and switch. Only the Orbit machine is replaced by
the proxy server’s machine when we are testing proxy servers.
5.4.1 Test setup
The proxy servers were installed and configured one by one on the same machine so that the
test environment will be the same for all test cases. All the components of the test environment
are demonstrated in short as follows.
• Client machine with Ubuntu 12.04
Application called rq is installed in this machine. Rq is an in-house purpose built perfor-
mance streaming application which is used to generate TCP load. It can do progressive
streaming and adaptive bitrate streaming. Rq can be configured with number of clients,
number of assets, number of fragments, fragment length, ramp-up, timeout, duration, etc
and it creates tcp sockets and send GET requests based on the implementation of the test
cases. In this machine, two network interfaces each with 10 Gbps were used to communi-
cate with the proxy servers machine. In addition, rq can emulate thousands of concurrent
HTTP clients.
• Proxy server with Ubuntu 12.04
This is a machine on which we installed the reverse proxy servers to be tested. Those
are used as a cache server, when a request comes from a client the proxy server checks
if the requested data is in cache. If data is in cache it serve the client from the cache,
otherwise it fetches the data from the origin server and serve the clients simultaneously.
In this machine, we have 11 GB RAM and we specified 50 GB disk space to do the test
and two 10 Gbps interfaces were used, one is shared by the client machine and the origin
server and the other one with the client machine only. All proxy servers uses RAM cache
to serve objects as quick as possible and reduces load on disks in addition to the specified
Chapter 5. Test methodology 27
cache storage. Therefore, memory and CPU are the basic constraint in the proxy servers.
This machine is replaced with Orbit server during the Orbit tests runs.
• Origin server with Ubuntu 12.04
In order to support HTTP requests, lighttpd web server is installed and configured in this
machine. Video segments of different bit rates are stored on this server and this is the
server from which reverse proxy servers get the content when there is a cache miss.
• Switch
The above components are connected as shown in the figures 5.4 and 5.5 through switch
both in the Orbit and proxy servers test environment.
Figure 5.4: Orbit test. Figure 5.5: Proxy servers test.
28 Chapter 5. Test methodology
5.4.2 Parameters
The parameters that we have used in all test case are, number of clients, number of assets, num-
ber of fragments in each asset, ramp up in milliseconds, timeout in seconds, quality, fragment
length, duration in minutes, content type which is video. The value of each of the parameters
varies according to the specific test cases as it can be seen in table 5.1 below. We have used
10000 number of fragments in the live test case to make sure the proxy servers download the
first request from the origin server and serve the next requests from its cache to simulate the
live streaming. However, in the 100% cache hit and 90% cache hit test case there are 180 frag-
ments(the duration of the test(900 seconds) divided by the fragment length(5 seconds)) in each
asset. The number of assets are different for all proxy servers in the 100% cache hit and 90%
cache hit test cases due to the drawback of disk access and memory limitation that we have in
the proxy servers machine. In other words, the number of assets that a proxy server can serve
depends on its memory and CPU usage.
Parameters Live test case 100% cache hit 90% cache hit
Clients 25000 25000 25000
Fragments 10000 180 180
Quality 300 kbps 300 kbps 300 kbps
fragment length in seconds 5s 5s 5s
durationin minutes 15m 15m 15m
Table 5.1: Parameters of test cases.
As part of the test setup, different test cases are implemented in the client machine and the
proxy server machine was configured and optimized for each proxy server as it is described in
sections 5.2 and 5.3.
and proxy servers
The results of the above test cases for the generic proxy servers and Orbit server are presented
in this chapter to compare response time, CPU usage and bandwidth of all servers. The com-
parison is accompanied with the analysis of each result. The Orbit, client and the proxy servers
machines have two interfaces with 10 Gbps each as they are stated in the figures 5.4 and 5.5.
As it can be seen in the figures below, in the response time there is a heat-map graph with
a blue color in it’s right side which represents the number of request in percentage. 0-20% of
the requests are represented with a light blue color and the weight of the blue color increases
proportionally with the number of requests in percentage. The y-axis represents response time
in milliseconds and x-axis is the duration of the test in percentage.
In the CPU usage figures, the red graph represents the time on which the CPU is idle, the blue
and green graph are the time of CPU that are spent waiting for the system and user respec-
tively. The graph which is represented by a cyan color is the CPU time waiting for the I/O
operation(disk access).
In the figures of network streaming ports, for the proxy server tests red and blue colors are used
to represent the egress bandwidth of both interfaces(since we have two interfaces) and green
and cyan colors are used for the ingest bandwidth. Besides, in the Orbit test blue and red colors
are used for both the egress and ingest bandwidths.
29
30 Chapter 6. Performance comparison of Orbit and proxy servers
6.1 Live test case
In this test case, 25000 clients were requesting a single asset which contains 10000 fragments
each with 5 second length at 300 kbps, and the ramp up between each clients was 2 milliseconds
to simulate the live test case. Orbit, Varnish and Nginx creates only one connection with the
origin server for each cache miss even many clients request the same fragment at the same time.
However, ATS might send more than one request per fragment to the origin server if many
client requests come at the same time for that fragment, but it tries to decrease the number of
connections to the origin server for the same fragment.
Proxy servers put data in RAM cache to increase performance by decreasing the access of disk
cache. Since the size of the fragments was 132 KB and we have a single asset in this test case
the overall size of the asset was 10000 times 132 KB which is 1.32 GB. Therefore, the size of the
RAM cache was enough to save all the fragments and proxy servers was accessing the fragments
from RAM cache in this test case.
In live test case, we encountered a limitation related with the number of concurrent requests
that every proxy server can support. As a result, the maximum number of concurrent requests
that Orbit, ATS and Varnish can support are 32000. When there are more than 32000 clients,
Varnish hits memory limitation and it restarts in the mid of the test. Moreover, Apache traffic
server will be very slow when the number of concurrent clients were more than 32000 and then
there will be many late requests. On the other hand, Nginx can support up to 45000 concur-
rent requests without any late requests which implies Nginx can handle many more concurrent
requests than Varnish and Apache traffic server with the same resources. The reason for the
different number of concurrent request in the proxy servers, is due to they way they used to
handle incoming requests. Nginx uses asynchronous event-driven connection handling algorithm
on which it does not create new thread for each request. However, Varnish is a multi-threaded
program that uses one thread per each connection and ATS uses a hybrid event-driven engine
with a multi-threaded processing model to handle incoming requests.
In addition to the number concurrent requests, we used response time, CPU usage and network
traffic as a main criteria to compare the proxy servers and the Orbit server. A test that runs
with 25000 clients is used for all proxy servers and the Orbit server in this section to evaluate
the results for the above criteria. The result of live test case for all proxy servers and Orbit
server are demonstrated as follows.
• Response time
Response time is the amount of time taken between a client request and receipt of the
response. Having enough memory, Varnish is the best reverse proxy server in terms of
Chapter 6. Performance comparison of Orbit and proxy servers 31
response time. It has comparable response time with the Orbit server as it can be seen in
the figures 6.3 and 6.4. However, in Nginx 6.2 and ATS 6.1 the response time of 0-20% of
the total requests reaches up to 1.1 and 2.4 seconds respectively. As a result, Varnish is
very fast compared to Nginx and Apache traffic server in live test case.
Figure 6.1: ATS response time. Figure 6.2: Nginx response time.
Figure 6.3: Varnish response time. Figure 6.4: orbit response time
• CPU usage
As it can be seen from the figures below, in Apache traffic server 6.5 and Varnish 6.7 the
CPU is 75% idle but in Nginx CPU is idle around 80%. Therefore, Nginx 6.6 uses less
CPU compared to Varnish and Apache traffic server in live test case. However, 95% of
CPU is idle in the Orbit server 6.8 since it doesn’t use CPU and memory to perform its
activities. It uses FPGA instead of cpu and RAM and flash memory as a storage.
32 Chapter 6. Performance comparison of Orbit and proxy servers
Figure 6.5: ATS CPU usage. Figure 6.6: Nginx CPU usage.
Figure 6.7: Varnish CPU usage. Figure 6.8: Orbit CPU usage
• Network traffic
As it can be seen below, the ingest bandwidth for each interface is around zero since all
the servers send a single request per fragment to the fetch from the origin server and the
egress bandwidth is 3.5 Gbps in all interfaces. The egress bandwidth starts from zero and
then gradually grows until all clients joined and it stays constant. The egress bandwidth
is around 7 Gbps for every server since each server has two interfaces as it is stated in the
figures 5.4 and 5.5 above.
Chapter 6. Performance comparison of Orbit and proxy servers 33
Figure 6.9: ATS network traffic. Figure 6.10: Nginx network traffic.
Figure 6.11: Varnish network traffic. Figure 6.12: Orbit network traffic
Results of all servers for the live test case are summarized in the table below.
Servers Response time CPU usage Egress bandwidth
Orbit < 1 ms 10% 2*3.5 Gbps
Varnish < 1 ms 20% 2*3.5 Gbps
Nginx <= 10 ms 20% 2*3.5 Gbps
ATS <= 20 ms 30% 2*3.5 Gbps
Table 6.1: live test case results of all servers.
6.2 100% cache hit
In this test case all assets were saved in the cache before the test and the number of assets that
each proxy server can support differs due to their difference of memory and CPU usage. In this
test case, 25000 clients make requests for 5 second fragments with 300 kbps content bitrate.
There were 180 fragments in each asset and the size of every fragment was 132 KB. Moreover,
the rump up between the clients was 4 milliseconds and the test runs for 15 minutes in the
34 Chapter 6. Performance comparison of Orbit and proxy servers
proxy servers and 10 minutes in the Orbit server.
As it is stated above the RAM size in the proxy servers machine is 11 GB and the proxy servers
put contents in the RAM cache to serve objects as quickly as possible and reduces load on disk
cache. However, since the RAM size in the proxy servers machine is not large enough to store all
the contents, the proxy servers put contents in the RAM cache as much as possible and access
the rest from the disk cache. As a result, the proxy servers perform very low when the RAM
cache is full and they start to access more assets from disk cache, even though enough disk
cache storage was specified in all proxy servers. In other words, when they start to access more
requests from disk cache CPU spends a lot of its time to wait the disk I/O operation(I/O wait).
Therefore, one of the criteria that we use to compare the proxy servers with each other was how
many assets can each proxy server support without late requests and hitting any limitation;
which depends on the memory usage of the proxy servers. The Orbit server uses flash memory
as a storage system hence, there is no limitation on the number of assets up to the size of the
storage system.
ATS can only support 32 assets without late requests in the given environment which are less
than 0.76 GB in size(32 times 180 fragments in each asset times 132 KB which is the size one
fragment). When the number of assets are increased, ATS will access the disk cache many times
and the overall time of the CPU is took by the disk I/O wait operation. The main reason that
ATS supports only few assets in our test cases was, ATS puts data in RAM cache only extremely
popular objects which means it only put a requested object in RAM cache if ATS thinks it is
accessed many times. In other words, when there are more assets the number of requests for a
fragment will decrease since the clients spread over the assets and request fragments sequentially
and then it may not be put in the RAM cache since ATS thinks it is not popular. However,
Varnish and Nginx puts an object in RAM cache if it is accessed at least once. Vanish has
two storage options namely malloc(memory) and file(disk) storage. But, we use the disk option
in our test to have the same criteria with the other proxy servers. In the disk storage option,
Varnish uses the OS page cache to put data in memory and keep contents in RAM for fast
access. The maximum number of assets that can be served by Varnish without hitting memory
limitation were 330 which are 7.8 GB. This is due to Varnish needs some memory to perform its
activities and it has an overhead of about 1 kB per object regardless of the size i.e if the number
of objects stored in Varnish increases the overhead might be significant and Varnish’s memory
usage will increase accordingly. In our case, Varnish was using around 3.2 GB of the RAM for
its activity and for the overhead needed for each fragment. Besides, if the number of assets are
increased, Varnish uses more RAM to save the assets and hit memory limitation for its activity
and the overhead needed so that it restarts automatically in the mid of the test. Moreover,
Varnish file storage is not a persistent storage which means it doesn’t retain the objects in the
cache if Varnish stops or restarts. In the other hand, Nginx can support 500 assets which are
Chapter 6. Performance comparison of Orbit and proxy servers 35
11.88 GB in size which implies Nginx uses a very small memory for its activity and it was also
serving some fragments from the disk cache. Moreover, when the number of assets are increased
in Nginx, the limitation that we hit was CPU time(the CPU time waiting for the I/O operation
was increased accordingly). Therefore, the limitation for number of assets in Nginx was fully
due to the bottleneck of the disk access operation.
We have also compared the response time, CPU usage and bandwidth having limitation of the
number of assets for each proxy server stated in the above paragraph. To compare the proxy
servers and the Orbit in terms those characteristics, we have used 350 assets which are 8.3 GB
for Nginx to have the same environment with the others. In other words, all proxy servers were
serving all the assets from the RAM cache and there wasn’t any impact of disk I/O operation.
Having all assets in a RAM cache the response time, CPU usage and egress bandwidth of the
proxy servers and the Orbit are demonstrated below.
• Response time
Varnish is the best reverse proxy server in terms of response time. However, the response
time of 0-20% of the total requests in Nginx 6.14 and ATS 6.13 reaches 2 and 2.5 seconds
respectively. Therefore, Varnish is very fast 6.15 compared to Nginx 6.14 and Apache
traffic server 6.13 as for the live test case having enough RAM cache in the proxy servers.
As it can be seen in the figure 6.16 the Orbit server has very short response time and
there is no any impact on the performance even the number of assets are increased as it
is described in above paragraphs.
36 Chapter 6. Performance comparison of Orbit and proxy servers
Figure 6.13: ATS response time Figure 6.14: Nginx response time
Figure 6.15: Varnish response time Figure 6.16: Orbit response time
Chapter 6. Performance comparison of Orbit and proxy servers 37
• Network traffic
As it is shown the figures below, there is no any ingest as all the contents were saved
in the cache before the test and the egress bandwidth is zero when it starts and then
gradually grows to 3.5 Gbps in each interface. Therefore, the total bandwidth is 7 Gbps
in all servers.
Figure 6.17: ATS network traffic Figure 6.18: Nginx network traffic
Figure 6.19: Varnish network traffic Figure 6.20: Orbit network traffic
38 Chapter 6. Performance comparison of Orbit and proxy servers
• CPU usage
The figures below represents the CPU usage of all servers, in Apache traffic server 6.21
and Varnish 6.23 70% of the CPU is idle but in Nginx 6.22 CPU is idle 75%. Therefore,
Nginx uses less CPU compared to both Apache traffic server and Varnish as live test case.
The Orbit server is not used CPU as it is stated in the live test case above and as it can
be seen in the figure 6.24 below.
Figure 6.21: ATS CPU usage. Figure 6.22: Nginx CPU usage.
Figure 6.23: Varnish CPU usage. Figure 6.24: Orbit CPU usage
The overall results for the 100% cache hit test case are summarized in the table below.
Servers Response time CPU usage Bandwidth
Orbit < 1 ms 15% 2*3.5 Gbps
Varnish <= 5 ms 30% 2*3.5 Gbps
Nginx <= 15 ms 25% 2*3.5 Gbps
ATS <= 27 ms 30% 2*3.5 Gbps
Table 6.2: 100% cached results of all servers.
Chapter 6. Performance comparison of Orbit and proxy servers 39
6.3 90% cached
In this test case we only tested Nginx and Varnish due to the bad result collected from the
above test cases in ATS. Furthermore, Varnish uses virtual memory as a cache storage and
Nginx was serving many assets from disk, hence the main focus of this test case is to see the
ingest bandwidth of the proxy servers. Furthermore, this test case is not implemented for Orbit
and we did not test the Orbit server for this test case.
6.3.1 Nginx
In Nginx 32000 clients were requesting a 5 second fragments with 300kbps content bitrate. The
test runs for 15 minutes and the ramp up between the clients was 4 milliseconds. Nginx was
serving 90% of the assets from its cache and 10% from the origin server. The main focus was
the number of assets Nginx can support and the number of clients that can be served without
hitting any limitation and any late requests. Nginx can serve 550 assets each with 180 number
of 5 second fragments and it can have up to 60000 concurrent requests but there were some
late requests because the CPU was fully utilized and many actions were waiting CPU to be
performed. Therefore, we have used the test that run with 32000 clients in this case.
6.3.2 Varnish
As it is stated above Varnish can use virtual memory as a cache storage in addition to disk(file)
storage. Using this storage option, we got the following results for the 90% cache hit in Varnish.
This test was run using 32000 clients requesting 3000 assets each with 180 fragments of 5 second
length. The ramp up between the clients was 2 milliseconds and data rate 300 Kbps. Varnish
was serving 90% of the fragments from its cache and 10% from the origin server. The cache
was 5GB virtual memory and it was deleting the Least Recently Used (LRU) fragments when
the cache is full.
• Response time
As it is shown in the figures 6.25 and 6.26 , the response time of Nginx reaches up to 2.7
seconds for some requests but in Varnish only few request have response time of 9 second
in the first few seconds. Therefore, Varnish have good response time compared to Nginx
using virtual memory as a storage in the 90% cache hit test case.
40 Chapter 6. Performance comparison of Orbit and proxy servers
Figure 6.25: Nginx response time. Figure 6.26: Varnish response time.
• CPU usage
Figures 6.27 and 6.28 represents the CPU usage of nginx and Varnish respectively. CPU
time spent waiting the disk(I/O wait) reaches 50% in nginx, since Nginx was accessing the
disk for many requests and CPU time waiting for the system is around 25%. Therefore,
the CPU is idle only 25% in the worst cases in Nginx. However, in varnish 55% of the
CPU is idle since it was using virtual memory and it was not accessing any disk.
Figure 6.27: Nginx CPU usage. Figure 6.28: Varnish CPU usage
Chapter 6. Performance comparison of Orbit and proxy servers 41
• Network load
As it is shown in the figures below, the ingest bandwidth is around 0.96 Gbps and the
egress bandwidth is 4.5 Gbps for each interface for both Nginx 6.29 and Varnish 6.30.
Therefore, the total ingest bandwidth is 0.96 Gbps since we have only one interface con-
nected with origin and total egress bandwidth is 9 Gbps (we have two interfaces connected
with client)in both servers. Both ingest and egress bandwidth starts with zero and grad-
ually increases and then it stays constant after all clients joined.
Figure 6.29: Nginx network traffic. Figure 6.30: Varnish network traffic
42 Chapter 6. Performance comparison of Orbit and proxy servers
6.4 Summary
We have selected Varnish based on the results of the above comparisons and the flexibility of
extending its functionality through Vmods.
6.5 Implementation of logger
In this section, we will describe the implementation of one of the features(modules) of the Orbit
called logger(session logger) on top of Varnish. There are also other functionalities on the Orbit
server like session management and backeend selector demonstrated in 3.2 but those function-
alities are not expected to have any major impact on the performance of Varnish and they are
already implemented on same or different way in Varnish also.
The logger module calculates the duration from the start of the first request to the end of the
last request and find the sum of bytes sent and send time in each minute of the test for all
sessions(clients). Varnish can give us the bytes sent and send time(time to finish the request)
informations for a single request from its logging but we need those values per every minute
for each session. A session can generate hundreds of requests in each minute and we have to
catch and add the bytes sent and send time for all requests in that interval. Duration is a
period between the start of the first request and the end(sum of start time and send time) of
the last request in that interval. In other words, duration is not exactly one minute, but rather
as close as possible to one minute, but only including those segments that have been completely
delivered during this sample interval. On average the sample duration will be one minute.
Logger is implemented as a Varnish module(Vmod) using C++ to see if there is any impact
on the performance of Varnish. Since Varnish puts all the logs in a shared memory, we have
written some configurations to log session ID, URL, start time of the request, time taken to
finish that request and the total bytes sent to the client in that request into a file instead of
shared memory. The current implementation takes this log file as an input and read every line
to extract the required columns for the implementation.
To implement the logger functionality, a session ID is added to the URL of each request in
the test case implementation of the client machine and rewrite rules are written in the VCL
configuration of Varnish to remove this session ID and send the correct URL to the origin server
since the URL to the origin server does not include the session ID. This session ID is assigned
for each client uniquely in the test case implementations and it is used to identify the clients in
our logger implementation. Logger logs for every session:
• Session ID: This an ID which is used to identify the session (the client) uniquely.
Chapter 6. Performance comparison of Orbit and proxy servers 43
• IP: IP address of the client (remote host) that initiated the session.
• Bytes sent: is the total number of bytes that are sent to that session during the interval.
• Send time: is the total time spent actually streaming to the client, i.e. the sum of the
time spent fulfilling each HTTP request in the interval.
• Duration: is time from the beginning of the first request to the end of the last request in
that interval.
From those parameters we can approximate the download bitrate that the client got by dividing
the bytes sent to the send time and the content encoding bitrate by the division of the bytes sent
to the duration during the interval as they are shown in figure 6.31 below. This implementation
Figure 6.31
was a first, simplified, implementation to get initial results. To implement like it is implemented
in the Orbit server, the session handling feature should be implemented first and then the logger
takes its input from the session handling feature rather than from file.
6.6 Orbit and modified Varnish performance comparison
In this section we will describe the results of Varnish before and after the implementation of log-
ging in Varnish and then the results of the Orbit server with the same parameter. In the Orbit
44 Chapter 6. Performance comparison of Orbit and proxy servers
server the logger is related with the session handling functionality. This means a session object
that contains session ID, bytes sent, send time and duration informations is created using the
session management module and then the logger module uses those values to generate the log
file described in 3.2. However, due to time constraint we implement logger without implement-
ing session handling functionality of the Orbit. Therefore, the result of the implementation is
not fully optimized and we can only show that there is an impact in the performance of Varnish
when the logger functionality is enabled. This is due to Varnish uses CPU and memory unlike
the Orbit server which uses FPGA instead of CPU and memory. The parameters that we have
used in this case are stated in the table below.
Parameters Live test case 100% cache hit
Clients 9800 9800
Fragments 10000 100
Quality 2 MB 2 MB
fragment length in seconds 5s 2s
durationin minutes 10m 10m
Table 6.3: Parameters of test cases.
6.6.1 Varnish results
The implemented logger plugin uses a lot of resource since it is reading and processing file to
get the expected result. This implementation is not fully optimized since it takes a log file as
an input. As a result, the results in figures below show only additional processing will have an
impact on the streaming performance of Varnish. The limitation here was the CPU, the logger
was taking a lot of CPU to read and process the file and then there was an effect in response
time of Varnish since most of the CPU was used by the logger module. All assets were served
from RAM cache and if we increase the number of assets mo