Masterarbeit - Freie Universität · Masterarbeit Large Scale Supercomputer Assisted and Live Video Encoding with Image Statistics 18. August 2016 Bearbeitet von: Gutachter : Prof

Freie Universität BerlinFachbereich für Mathematik und InformatikInstitut für Informatik

Masterarbeit

Large Scale Supercomputer Assisted and Live VideoEncoding with Image Statistics

18. August 2016

Bearbeitet von: Gutachter : Prof. Dr. Raúl RojasHauke Jürgen Mönck Betreuer : Prof. Dr. Tim LandgrafBrentanostr. 6312163 [email protected]

Abstract

The primary purpose of this thesis is to elaborate mechanisms for encoding images asvideos for live recordings and post-hoc encoding of large amounts of images on a super-computer. A major focus was set on quality preservation and optimizing the recordingsetup.

Researching social insects behaviour has always been a human interest for various rea-sons. To examine colonies having many individuals and complex social interactions, largeamounts of sample data are necessary. In the Beesbook project this sample data comesin form of images of marked honeybees. These images shall be evaluated automaticallyusing custom software, which requires excellent image quality and resolution. This mas-ter thesis introduces various improvements to the acquisition process to obtain imagesof optimal quality at minimal cost. Also large amounts of existing data are required tobe compressed, aided by a supercomputer. For the live recordings a custom IR lightingsystem was introduced, recording software using a GPU encoder was developed and imagestatistics for calibration and surveillance have been introduced. Also software was createdto automatically do large scale compression of videos.

Evaluation has shown that the newly introduced lighting system not only comes at 23.7%of the price of a conventional lighting system, but also has good illumination properties.Different mechanisms are provided to analyse images and configure recording setup foroptimal quality. Existing data could be compressed using the HEVC video codec from289 TB to 80.6 TB, saving 72.1% of space with negligible loss in image quality. Finallyrecording software was developed to achieve this level of compression during live recording.

2

Eidesstattliche Erklärung

Hiermit erkläre ich an Eides statt, dass ich die vorliegende Arbeit selbstständig und ohne

fremde Hilfe verfasst und keine anderen als die angegebenen Hilfsmittel verwendet

habe. Diese Arbeit wurde keiner anderen Prüfungsbehörde in gleicher oder ähnlicher

Form

vorgelegt.

Berlin, den 18. August 2016

..............................................

Hauke Jürgen Mönck

Acknowledgements

Besides the author there are several people involved in the successful elaboration of athesis. Hereby I want to thank some of them.

I would first like to thank my thesis advisor Prof. Dr. Tim Landgraf of the BioroboticsLab at Freie Universität Berlin. The door to Prof. Landgrafs office was always openwhenever I ran into a trouble spot or had a question about my research and writing. Healways steered me in the right the direction whenever I needed it.Also I am grateful to Mr. Fernando Wario Vazques. He has made available his supportin a number of ways. I would like to express my gratitude to all the other members ofthe Biorobotics Lab as well, for any informative conversation and the good working atmo-sphere they provided.Furthermore I would like to thank Mr. Lutz Freitag, who always made available his sup-port and shared his knowledge about electronics.

Finally, I must express my very profound gratitude to my parents Jürgen and PetraMönck for providing me with support and encouragement throughout my years of studyand through the process of researching and writing this thesis. This accomplishment wouldnot have been possible without them. Thank you.

Hauke Jürgen Mönck

4

Contents

1. Introduction 71.1. Biorobotics Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2. Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3. Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.3.1. Legacy Honeybee Experiment System . . . . . . . . . . . . . . . . . 91.3.2. Bumblebee Training . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3.3. Bumblebee Experiment Setup . . . . . . . . . . . . . . . . . . . . . . 11

1.4. Improving the Recording . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.5. Considered Compression Schemes . . . . . . . . . . . . . . . . . . . . . . . . 14

2. State of the Art 152.1. Tracking of insects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.2. Live Video Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1. Infrared Camera Flash . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3. Post-Hoc Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3. Supercomputer Assisted Large Scale Video Encoding 183.1. Hardware and Software Facilities . . . . . . . . . . . . . . . . . . . . . . . . 183.2. Code and Adjustments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.3. Choice of Codec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.4. Container Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.5. Compression Quality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.5.1. Choosing the Quality . . . . . . . . . . . . . . . . . . . . . . . . . . 223.5.2. Choosing the Size and Encoding Time Trade-off . . . . . . . . . . . 25

3.6. Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.6.1. Timelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.6.2. Frame Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4. GPU Assisted Video Recording 284.1. Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1. Acquisition Application . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.2. Watcher Application . . . . . . . . . . . . . . . . . . . . . . . . . . . 324.1.3. Storage Management Application . . . . . . . . . . . . . . . . . . . . 37

4.2. Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.1. Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

5

4.2.2. Recording Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

5. Evaluation 445.1. Supercomputer Assisted Video Encoding . . . . . . . . . . . . . . . . . . . . 44

5.1.1. Filesize and NPL cost . . . . . . . . . . . . . . . . . . . . . . . . . . 445.1.2. Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2. Live encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2.1. Lighting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.2.2. Image Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.2.3. Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

6. Discussion 506.1. Supercomputer Assisted Video Encoding . . . . . . . . . . . . . . . . . . . . 506.2. Live Video Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7. Concluding Remarks 527.1. Supercomputer Assisted Video Encoding . . . . . . . . . . . . . . . . . . . . 527.2. Live Video Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Appendices 54

A. Calculations and Tables 55A.1. HEVC And AVC Compression Ratio For Beesbook Recordings . . . . . . . 55A.2. Lighting System Cost Calculation . . . . . . . . . . . . . . . . . . . . . . . . 55A.3. Disk Space Cost Calculation . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

B. Glossary 57

6

1. Introduction

Collective intelligence is a form of intelligence that emerges from individual agents collabo-rating. The agents sample local information, process these information and communicatethem to their peers. This enables the agents to do consensus decision making, delega-tion and organization of tasks, etc.. Observing the set of clients as a whole, a collectiveintelligence emerges. Many social insects use collective intelligence in practice. In theBiorobotics Lab of the Free University Berlin honeybees (apis mellifera) are the majormodel organism. Other animals, such as bumblebees (bombus) and guppies (poeciliareticulata) are also surveyed.

By observing these organisms behavioural models may be derived. These models notonly describe individual behaviour, but also how swarm behaviour arises from the individ-uals. In this sense, the individual organism acts as a client, the swarm intelligence being aform of collective intelligence. Bees in general have a sophisticated social structure and be-havioural patterns of which major parts remain unresearched. Seeley described in Wisdomof the Hive(37) a multitude of experiments for researching honeybees behaviour, workerbees egg laying properties for instance. The experiments required an extensive amount ofmanual preparation and evaluation. It is assumed that highly complex observations arerequired to fully capture the swarms intelligence in all its parts.(44) Hence more complexand extensive observations are expected to be required. Looking back at the experimentsdescribed by Seeley, more automated observations are required to gain deeper insights.

1.1. Biorobotics Lab

The Biorobotics Lab developed an elaborate automatized observation scheme to engageon the previously illustrated issue. In the experiment a beehive consisting of one combaccommodating from 1000 to 2000 bees at a time is being recorded continuously for severalweeks. There are four high-resolution cameras set up for the recording. The recordingis being done for several weeks during summer, when the bees are most active and pro-duce honey, depending on temperature.(18) Each bee is marked using a unique tag onthe back of its thorax as depicted in figure 1.1. In recent experiments applying the tagsis done once a bee hatches from its larva state. For each tag the day it was attached toa bee is noted. In consequence the date of hatching for every individual bee is known.In previous experiments all existing bees have been marked at once in the beginning. Inpost-hoc analysis the positions of the individual tags are found for every frame in the

7

1.2. OVERVIEW

Figure 1.1.: Tags for bee and bumblebee identification. (A) The circular matrix design exhibits15 regions used to represent a unique binary code (cells 0 to 11), to encode the tagsorientation (cells 12 and 13) and produce a clear contrast with the bee thorax (ring14). (B) Tagged bees inside the hive. (44)

high-resolution recordings. Additionally low resolution cameras observe the hive at 100Hz, enabling detection of waggle dances. Having the tag positions, motion vectors may becalculated. Sets of motion vectors can be combined to find any position of the honeybeesof their entire lifecycle in the hive. Merged with the knowledge about the waggle dances,eventually individual behaviour and interactions may be deduced. Ultimately swarm be-haviour can be inferred.

Besides the honeybee experiment there is a bumblebee experiment with a slightly dif-ferent nature. It is a continued experiment from Jin et al.(23) Here the bumblebees havetheir wings cut and are assigned slightly bigger tags than depicted in figure 1.1 to fittheir thorax. The bumblebees tags are 4 mm in diameter. They are living in a smallcolony of around 50 individuals in a terrarium for the hive. A second terrarium, calledarena, may be accessed to collect food in several training sessions. The purpose of the ex-periment is to acquire information about the bumblebees strategies of finding food sources.

To enable the algorithms to generate reliable information from the recordings, imagequality is an important factor. This means properties such as image sharpness, contrast,lighting, resolutions etc. are of major importance. As a high image quality is requiredto reliably detect tags, large amounts of data being generated is the logical consequence.Optimizing these quality measures while decreasing the amount of data to be stored is anaim of this thesis.

1.2. Overview

The Biorobotics Lab used the legacy system for honeybee recordings in 2014 and 2015.This thesis transfers the legacy system into an advanced system in two major work pack-ages as depicted in figure 1.2. In work package 1 a new data format is found to mostefficiently encode existing image data. The amount of existing image data is large andwas hence encoded using a supercomputer into the new format. This work package was

Chapter 1 H. J. Mönck 8

1.3. EXPERIMENTS

Legacy Advanced Work PackageExisting Data JPEG image compres-

sion, ca. 160 TB perseason

HEVC video compres-sion, ca. 40 TB perseason

WP 1

Hardware Constant lights, lowercontrast

asynchronous flashinglights, high contrast

WP 2.1

Recording Software JPEG image compres-sion on CPU, highbandwidth and largefiles

HEVC video com-pression on GPU, lowbandwidth and smallerfiles

WP 2.2

Figure 1.2.: Overview of the thesis structure. The experiment setups of the Biorobotics Labare composed of recording hardware, software and resulting data. In the course ofthe data this previous system, henceforth called legacy system, will be transferredinto a new system, called advanced system, in two work packages. Work packagetwo is divided in two subpackages, for hard- and software respectively.

handled in chapter 3. Work package 2 revolves around improving the legacy hardwareand recording software. Improvements in the hardware may yield more contrast rich andeventually better recorded images. Improvements to the software are necessary not onlyto comply the changes in hardware but also to generate the video data. The video datashall be stored in the new format according to elaborations done in work package 1 live.

1.3. Experiments

1.3.1. Legacy Honeybee Experiment System

There are four high-resolution cameras directed at the beehive, two at each side as depictedin figure 1.3. These cameras are connected to a single computer which manages the acqui-sition process. The images are compressed in JPEG format and bundled to TAR archivesof 256 files. As the total amount data is to vast for being stored locally, it is sent to theCray supercomputer. The Cray is a supercomputer by the HLRN, who are sponsoring theBiorobotics Lab with computational time and storage. A local NAS can buffer data fora short period, in case the Cray encounters maintenance or other downtimes. It will alsostore striping data. This is, four times a day a short recording sequence is permanentlybeing stored on NAS. It is done to even out permanent or temporary unavailability of theCray in the future. Recordings have been done from 24.07.2014 to 18.09.2014 and from19.08.2015 to 26.10.2015, so far yielding 132TB and 157TB1 of data respectively.The hive is protected by a glass pane. The FLEA 3 FL3-U3-88S2C-C camera is be-

ing used in combination with a regular lens.2 Additionally there is one modified PS-Eyecamera on each side, having an IR filter installed. Constant infra-red lighting using aset of 24 lamps of type TV6700 was used to illuminate the hive.(7) These lamps have a

1Space effectively used on file systems. Season 2014 on home, GPFS and 2015 on gfs1, Lustre file system2Available online at: http://www.rmaelectronics.com/ricoh-fl-cc1214a-2m-2-3-12mm-f1-4-

manual-iris-c-mount-lens-2-megapixel-rated/, retrieved 13.06.2016


http://www.rmaelectronics.com/ricoh-fl-cc1214a-2m-2-3-12mm-f1-4-

manual-iris-c-mount-lens-2-megapixel-rated/

1.3. EXPERIMENTS

Figure 1.3.: The legacy honeybee experiment setup. A shows a schematic of the setup. Thereare three cameras at each side of the observation hive, two high resolution FLEA3cameras and one modified PS-Eye camera. The PS-Eye cameras have their whitelight filter replaced by an IR filter and their lens replaced. The constant lights arepositioned in a circular fashion around the cameras. In B a photo of the setup2015 can be seen, not including the PS-Eye cameras. Images by Fernando WarioVazques.

light sensor, which dims the IR LEDs accordingly. These sensors have been deactivated toguarantee the constant lighting to the beehive. The lamps emit light of 840 nm centroidwavelength which is invisible to bees and bumblebees. The range of visible light of eitherspecies goes up to round about 600 nm.(26) (38) This corresponds to the colour orange inhuman perception.

The bees may leave the hive to the exterior of the building through a small tunnel. Thehive itself is inside an observation room which is kept at around 30°C at all time. Thiswill guarantee bee activity throughout the recording time to a certain degree.(18)In this thesis several improvements to this setup will be introduced.

1.3.2. Bumblebee Training

So far it is fully understood whether and to what extend bumblebees use landmarks andtheir visual capabilities to locate themselves and potential food sources. The purpose ofthe training is to train bumblebees to find the food according to landmarks. To achieveunambiguous results other means of orientation have to rendered inefficient. For exampleclearing the floor to eliminate odour tracks creating an indistinctible environment apartfrom explicitly presented landmarks.The Bumblebees have been trained four times a week in cooperation with Fernando

Wario Vazques and Franziska Lojewski. The training was done Monday to Thursday,with a day of testing on Friday.

There is one training session a day, consisting of four iterations. Each iteration lastsone hour. In the beginning of each training iteration, the food is positioned relative to


1.3. EXPERIMENTS

the same landmarks. In the subsequent training iteration, the new position of the food is

Figure 1.4.: Sketch of the bumblebee experiment panorama layout. The entrance to the arenais marked with an arrow. Within the arena, numbers 1 to 4 indicate possible foodlocations. The circles and stripes represent an example panorama design.

chosen randomly from the four positions which can be seen in figure 1.4. The panoramahas to be turned, so that the food again is in the correct relative position. Bumblebeeshave to be placed into the hive again manually if there are still some in the arena.

When all iterations are done, the tunnel to the arena is being closed and the white lightis switched off. The recording software on the PC corresponding to the arena is stoppedwhile there is no ongoing experiment.Testing is done the same way as training, but no food is supplied. The Panorama is

just rotated and the bumblebees are enabled to enter the arena. This will test whetherthe bumblebees apply the learned behaviour without having other indicators for the food.

1.3.3. Bumblebee Experiment Setup

The bumblebee experiment was conducted in 2016. It was the first to use the advancedsystem and served as a testbed. The arena box, the hive box and the frame have beenacquired from a regular building supplies store. The framing is closed from all sides usingchipboard. The front is not nailed but only attached using magnets as to not disturb thebumblebees unnecessarily.

Everything in the arena is designed to be as symmetric as possible. In figure 4.2.1 onemay notice that there are two extra holes where are no actual cameras. They are just thereto keep up the symmetry. This is done so the panorama is the only possible landmark fororientation.


1.4. IMPROVING THE RECORDING

Figure 1.5.: Sketch of the bumblebee experiment setup. In the upper portion the top view canbe seen. Note that the holes for the cameras and lights are symmetric on both,horizontal and vertical axis. In the lower portion, the side view of the setup can beseen. The exact positioning of the boxes is not indicated. In fact, the boxes maybe moved around freely. To acquire a reproducible positioning of the boxes duringthe experiment, markers on the floor have been used. The dashed line indicateswhere a vertical cardboard sheet may be added to shut off the hive from the arenain terms of lighting.

1.4. Improving the Recording

As mentioned previously, improving the image quality while decreasing the amount ofdata to store is the purpose of this thesis. This comes in two aspects. First, the recoding


1.4. IMPROVING THE RECORDING

system needs to be improved. There are different schemes to be considered to compressthe image data. The live encoding schemes will be elaborated in section 1.5. To improvethe quality as per previously mentioned methods of the camera recordings, several stepshave been taken.

• The recording system was extended to contain a configuration mode. In this con-figuration mode, multiple quality measures of the recorded images are taken live.These measures may be optimized by manually adjusting the camera lens, lighting,etc..

• A unique solution to lighten the beehive was developed. The previous setup usedconstant lighting. This cause low contrast images, because the light emitted onone side of the hive will light up the comb itself, as the combs consist of light-transmissive wax. This results in bright combs on either side. Finally it leads tobright bees on bright background, or in other words, poor contrast. This is whyflashing the individual sides asynchronously will improve contrast to the recordings.As the framerate is less or equal 4 FPS at all times, this is feasible.

• The improved encoding scheme leads to smaller files at equivalent quality. Thiscomes with multiple advantages. The recording quality may be improved using thecapacities which were acquired by improving the encoding scheme. Longer potentialdowntimes of the Cray may be buffered and longer stripe sequences may be saved.This will improve the overall reliability of the recording system. In the future itmight enable local storage and postprocessing of the recorded data.

These issues are being addressed in chapter 4.

Improving future recordings is essential as there are vast amounts of recorded data inexistence. Since the data accounts to as much as 289 TB, transferring it to local machinesfor procession and re-transferring it to the Cray seems an inept task. Inept, becausethe bandwidth to and from Cray as well as the local resources are confined. Howeverthe computational resources of the Cray are immense, as will be described in section3.1. But re-encoding the data on Cray is subtle task due to the amounts of data andparallelizing the task. The existing data is encoded in JPEG format and hence comeswith compression artefacts. Which parameters must be applied to guarantee acceptablequality for the recognition tasks? How to mux the videos to provide optimal accessibilityfor various tasks?The task also includes building unique software for the Cray architecture for encoding andattaining a strategy to deploy the input data to a multitude of processes concurrently.


1.5. CONSIDERED COMPRESSION SCHEMES

1.5. Considered Compression Schemes

As pointed out previously, the amounts of raw image data is very large and storing themcompressed in JPEG format is not sufficient. Different approaches to compress the videoshave been considered.

The first and surely best compressing approach was as following. The images of beesare being stored as low resolution and low quality videos. Before discarding the originalimages the localizer is being run on them. The localizer is a software which finds all tagsin the images with an accuracy of 90.6%.(44) Note that more recent developments mayenable the localizer to reliably detect more than 99% using CNNs. The localized tagsare then copied next to each other into a new image, preserving the very most of theiroriginal quality. But the problems about this approach are manifold. The major issueis most of the original data getting lost. This excludes the possibility to later improvethe localizer and retrospectively increase the number of detected tags. As mentioned insection 1, acquiring enough usable high quality data is crucial. Hence this method ofcompression is quite dangerous, as it is yet unknown what quality and meta data areactually required. Furthermore other tasks apart from tracking the bees may hardly bedone on data compressed this way. One example for this is the classification of objectsin the beehive as described by Ziese.(48) There the images were used to classify portionsof the beehive using neural networks. As shown by Ziese, more fine grained classification(e.g. classify a bees head and thorax) require a higher level of image details than basicclassification (e.g. classify bees and combs). Secondly compression using better imagecompression codecs were considered. The recently developed BPG codec(4) was the mostpromising candidate, according to studies as done by Hofbauer et al.(19) However, thecompression rate was still insufficient. Finally compression using videos was considered.The datasets compress well as videos, as the images are of sequential nature and containtemporal redundancy. Encoding may be feasible using different encoders. Video encoderswill be elaborated in section 2.2.


2. State of the Art

2.1. Tracking of insects

Animal tracking and behavioural analysis is a well studied field in biological science.Tracking larger animals can be done using GPS trackers as done by Patrick et al.(6),for instance. When tracking insects, such as honeybees and bumblebees, the issue of theindividuals being very small arises. Seeley observed a beehive of 2000 to 4000 bees byattaching circular markers on the thorax.(37) The tags had a distinct colour every 500tags and an Arabic number from 1 to 500 printed on them. Additionally they have beenmarked using a colour dot on their abdomen to distinguish which feeder they visited. Thisenabled him to do detailed observations of individual behaviour. However, this methoddoes not scale well, as the tags are not well suited for machine readability and henceautomatized recognition. Seeley read all the tags manually and noted his observations bypen or voice recording. This method is not suitable for large scale applications due to itslabour extensive nature.A more automatized approach has been done by Mersch et al.(27) The recording was doneusing a FCi4-14000 monochrome camera at 4560×3048 pixels. They have been trackingan ant colony by attaching square tags, similar to QR codes.(13) In the given setup, thisresulted in a size of 25–30 pixels per marker, each marker having a side length of 1.6mm.This yields a resolution of 17–18pixels/mm. Every marker encodes 36 bits, 26 of thembeing error correction.Tracking was done live using the ARTag library. While high resolution recordings werediscarded after processing, low resolution samples were kept to visualize and verify trackingresults. The results of the tracking seem appealing: 225 immobile tags could be recognizedwith 99% accuracy. Yet only 88±17% of attached tags. In worst case, these results arejust slightly better as the results achieved by Wario et al with an accuracy of 65.98%.(44)Using the recording system of Mersch et al has been considered in the first place. Thecameras owned by Wario et al have just a slightly smaller resolution, but have a regularcolour CMOS sensor. The CMOS image sensors use a Bayer pattern. The tags used byMersch et al have been tried on the bees. However, the bees will start removing the tagsfrom each others back. The tags have to be square, as this is a proprietary solution.Downscaling and applying them to a round, bent physical tag structure is not feasibleeither. They either are to small or have a too high curvature to be ever readable as awhole.

15

2.2. LIVE VIDEO ENCODING

2.2. Live Video Encoding

For many years, begin ning as of 2003, the AVC codec dominated the market in termsof compression to quality ratio.(35)(46)(33) Recently codecs have been developed whichcompete with AVC about this position. Ahead of all, there are HEVC and its free coun-terpart VP9. Studies have shown that these codecs come in fact with a better compressionto quality ratio, HEVC performing best.(36) HEVC has seen uses in science and economyto compress video data more efficiently. For example Xiao et al developed a backgroundsubtraction method for HEVC for Chinese public surveillance camera systems.(47) Thisapproach is aimed to be less computation extensive but not having a significantly bettercompression to quality ratio. The video streaming platform Netflix now offers streamingtheir media in HEVC, yet it is unknown what kind of encoding technology they use forencoding.(1)

As HEVC is still a young codec, there are few choices when it comes to encoding.Available maintained software encoders are DivxLabs x265,(10) Kvazaar,(43) the HeinrichHertz Institute’s HEVC encoder(5) and MulticoreWare x265 (20) which is being used byffmpeg. There are only two consumer feasible products for encoding HEVC via hardwarefixed functions.The first is Intel Media Server Studio.(22) Support is limited to more recent Xeon and4’th, 5’th or 6’th generation CPUs. This framework also supports encoding VP9 via fixedhardware functions.The second is the NVEncoder by NVidia. The NVEncoder supports encoding HEVCvideos as of the second generation Maxwell GPUs as well as AVC on Keppler GPUs ornewer.(30) For consumer (Geforce desktop) GPUs encoding is limited to 2 parallel encod-ing sessions per system. System in this case means per OS installation, which is a licensinglimitation.(29)

2.2.1. Infrared Camera Flash

As lighting is a crucial part of the recording setup, it will also be considered thoroughlyin this thesis. When it comes to IR lighting systems, there is a multitude of different IRcamera flashes and IR lights which might be suitable for the use case of the BioroboticsLab. For instance, the IR camera flash Meike MK930 1. Flashes like this usually come as acamera mount and are battery powered, as in the given example. This might be an obstaclefor the particular usecase but does not rule out the possibility to use it entirely. There arealso IR flash cameras for animal surveillance like the Browning Command OPs2. These

1Specifications see http://www.lightinthebox.com/meike-mk930-camera-flash-speedlite-for-canon-speedlite-dslr-400d-450d-500d-550d-600d-650d-1100d-vs-yongnuo-yn-560-ii_p1525323.html

2Specifications see http://www.trailcampro.com/collections/red-glow-infrared-flash-trail-cameras/products/2016-browning-command-ops?variant=17398568965


http://www.lightinthebox.com/meike-mk930-camera-flash-speedlite

-for-canon-speedlite-dslr-400d-450d-500d-550d-600d-650d-1100d-vs-yongnuo-yn-560-ii_p1525323.html

-for-canon-speedlite-dslr-400d-450d-500d-550d-600d-650d-1100d-vs-yongnuo-yn-560-ii_p1525323.html

http://www.trailcampro.com/collections/red-glow-infrared-flash-trail

-cameras/products/2016-browning-command-ops?variant=17398568965

2.3. POST-HOC ENCODING

recording systems come as an off-the-shelf solution with camera-flash synchronisation andusually have flashing intervals of up to 5 seconds. Simple boards with LEDs are alsoavailable. Some of them are easy to power manually to acquire a flash. High poweredflashing LED systems do not seem to exist off-the-shelf.

2.3. Post-Hoc Encoding

As mentioned previously in section 1.3.1, the Cray supercomputer provided by the ZuseInstitute Berlin is available for computation for the Beesbook project. The previous section2.2 has listed all available software and hardware encoders. Hardware encoders inarguablyoutperform software encoders in regular desktop scenarios. This is intuitive since they arededicated hardware which implements some or all of a software encoders functionality.However the Cray supercomputer offers vast computational resources in form of CPUs.(3)Software encoders offer a fairly more fine grained configuration interface than hardwareencoders, leaving more space for optimization. (42) (31) Due to the given CPU resourcessoftware encoders have been used in the post-hoc case.


3. Supercomputer Assisted Large ScaleVideo Encoding

In the legacy system the Beesbook project stored recorded image files in JPEG formatbundled in TAR files on the Cray supercomputer provided by the HLRN. These accountto approximately 132 TB for the recordings of season 2014 and 157 TB of season 2015 asmentioned in the introduction section 1.3.1.

Storing the recordings in JPEG format requires more space then video compression.This is intuitive since JPEG only compresses a single image at a time, not exploitingtemporal redundancy. Using suitable video compression the recordings can be processedin a much more effective way. The reasons to improve the compression of the recordingsare simple. First, it is necessary to acquire a scientifically sufficient amount of statistics.Observing a single colonies for two or three seasons is hardly sufficient. If more seasonscould be observed, models could be verified better and the weight of any discovery maybe greater. But storing more than three seasons at most is beyond the limitations ofthe supercomputer using legacy JPEG compression. Also using the current compressionschemes, the data throughput per second from recording station to final storage is high. Araw image is 4000x3000 pixels in size, yielding 12 MB of data. Compressed as JPEGS at80% quality the image accounts to round about 3 MB. At 3 FPS this requires a connectionan storage throughput of 3 MB * 3 FPS * 4 Cameras = 36 MB/s. Acquiring moreframes at higher quality is desirable, yet hardly feasible. This chapter elaborates differentcompression schemes, quality considerations and finally describes how the data was re-encoded in the advanced system. It describes work package 1 as per section 1.2.

3.1. Hardware and Software Facilities

The HLRN is supporting the project by offering accounts to the supercomputer, a file quotaof about 800 TB maximum and a certain amount of NPL. NPL measures the amount ofcomputation time a certain account may use. The supercomputer is organized into differ-ent kinds of nodes, however for this project only the computation nodes and managementnodes are relevant. See table 3.1 for all relevant nodes. The NPL only accounts to com-putations on the computation nodes while managements nodes may be used at will, butthey are shared between users and the number of nodes is limited. They are not meantto be used for computing, but for managing data input and output of the computation

18

3.2. CODE AND ADJUSTMENTS - WP 1

nodes. The computation nodes run in a batch system. This means there is a set of nodesand a job queue. Jobs may be submitted by a user to the job queue and are processedby a subset of free nodes. The Scheduler prefers jobs occupying more nodes.(3) However,a scheduled amount of larger jobs may leave some node offcuts. Smaller jobs may trickleinto these offcuts. This is also called Sandkornprinzip.The login nodes major purpose is interactive use, as any process running longer than

node type #CPU CPU model CPU #cores CPU clock total RAMlogin(1..2) 2 E5-2670 8 2.6GHz 256GiBlogin(3..4) 2 E5-2650v2 8 2.6GHz 256GiBdata(1..4) 2 E5-2609 4 2.4GHz 64GiBMPP2 compute 2 E5-2680v3 12 2.5GHz 64GiBMPP2 compute 2 E5-2680v3 12 2.4GHz 64GiB(...)

Figure 3.1.: Table of relevant node types in the cray supercomputer. For the full list see:https://www.hlrn.de/home/view/System3/CrayHardware

approximately 30 minutes will be killed automatically.

Software is provided via modules.(15) Using modules users can easily load, unload andswitch environmental applications. A variety of software and software libraries is availablefor loading via modules on Cray. For instance cmake, automake, QT, OpenCV, etc..Unfortunately for the encoding task an encoding library and possibly a front end is requiredwhich are not provided. Consequently these and their dependencies have to be builtmanually. There exist many off-the-shelf solutions which may be able to do the task.As the front end the application ffmpeg was chosen, HEVC as the video encoder andMatroska as the container format. Using ffmpeg we can easily read different input formats,encode the videos and mux them into any common container format. Furthermore itoffers convenience features, such as glob pattern input. Code adjustments to Wichmannsobserver, the choice of codec and choice of container are elaborated in the subsequentchapters.

3.2. Code and Adjustments

Wichmann has developed at tool for the project to batch analyse the recorded images onCray.(45) The major functionality is as follows:

• Extract and distribute a folder of TAR files contents to several job directories.

• Each job may occupy several computation nodes, thus having a variable amount ofcores.

• Jobs are automatically submitted and run while the next jobs are being preparedand submitted up to a configurable limit.


https://www.hlrn.de/home/view/System3/CrayHardware

3.2. CODE AND ADJUSTMENTS - WP 1

• Job termination is being detected and results are saved to a results folder.

• Clearing the input data is left to the process. Not deleted input will be consid-ered unprocessed, retrieved by the image distribution machinery and eventually re-distributed for procession.

• In case of a crash the process may be restarted at any time thanks to a python shelffile.

Although slight alterations are required, this software is a well suited foundation toorganize the encoding process.

The distribution of images was changed into a more sophisticated manner as round robinwill break the sequential order of the images. For each core a given number of subsequentimages X shall be supplied. Sequentiality is determined by the file name of the images.Every image is named according to pattern 3.1.1

Cam_#CAMID#_Y Y Y YMMDDhhmmss_iiiiii.jpeg (3.1)

Iff there is an image file with a time stamp less or equal one second later than its prede-cessor it is a subsequent image. The file naming enables ordering them lexicographicallyand hence find a subset of sequential images in a subset of extracted files easily usingconventional sorting algorithms. If there is no sequence of size Y with Y ≥ X, then thecore will be supplied with Y images only. This will result in shorter videos and shorterencoding time which will eventually lead to wasting a small amount of NPL. In those casesit can be hardly be avoided that some cores may be idle while waiting for the other coreson a node to finish.The resulting videos are named by the pattern according to pattern 3.2.

Cam_#CAMID#_Y Y Y YMMDDhhmmss_iiiiii_TO_

Cam_#CAMID#_Y Y Y YMMDDhhmmss_iiiiii.mkv(3.2)

Deleting the input is delegated to the results collecting process. This is because deletioncan not be done atomically in this case. In the previous scenario, each image was analysedindependently. So if the image exists, it was not analysed fully yet, otherwise it is. A videoencoding process would need to delete some thousands of input images. If the process getskilled while deleting, the original amount needs to be recalculated from the output filename. But it is also necessary to check whether the video was encoded completely. It mayhave aborted, leaving a broken video file. To simplify this, ffmpeg was modified to createa .dne file when done with error code 0. This signals the results collector that the imagesmay be deleted and the output is correct. If this file does not exist upon completion alloutput will be deleted and the input retrieved. There are cases in which a minor error, like

1Recent adjustments use UTC full length timestamp format


3.3. CHOICE OF CODEC - WP 1

a single broken frame, will still lead to ffmpeg terminating with error code 0. In these casesthe error will be printed to console. To detect and resolve these issues, the console outputis forwarded to a text file and analysed whether it contains error or similar keywords.If so, all the input and output is moved to a separate error directory for further manualinvestigation. This approach seems feasible, as this happens rarely. Retrospectively thishappened twice in 130773 video files of an average frame count of 1024 frames. In bothcases a single image was unreadable.

3.3. Choice of Codec

Rerabek et al have shown that HEVC is currently the best codec in terms of ratio of filesize and quality.(36) The average bit rate savings compared to AVC was calculated to be57.3% and 33.6% compared to VP9. In terms of perceived quality, as measured in PSNR,HEVC was similarly ahead. However, the image data of Beesbook is different from thesample videos used in any papers to the best of our knowledge. The main difference isthe static background, grey scale as colour space and a frame rate of around 3 images persecond. For this reason HEVC compression was compared to AVC using ffmpeg as seenin appendix A.1.

The experiment has shown that HEVC performs worse than estimated from Rerabek etal’s findings, but still performs significantly better than AVC. The file size of the HEVCvideo is 56% to 71% the size of the AVC counterpart at roughly the same quality accordingto the codecs CRF function. This is not an exact comparison, as the CRF implementationsmay vary.(8) It is, however, good enough to verify the results of Rerabek et al. ThePSNR has not been used here, as there is no proven correlation between PSNR and theclassification accuracy of computer vision algorithms.

3.4. Container Format

Beesbook is comprised of many different sub-projects, each relying on the recorded in-formations. Thus, the video data will be accessed by many heterogeneous clients. Themajor focus has been set on load-ability in C++, Python, the JAABA framework andconsequently Matlab.2 It also needs to be capable of storing videos of 3 FPS frame rateand ideally officially support HEVC as contained video stream. Multiple sources statethat Matroska is the only container to officially support HEVC besides MPEG4.(2)(24)MPEG4 is supposed to work well, but does not support a frame rate of 3 FPS.(9) HEVCcontaining Matroska containers can be loaded and decoded using the OpenCV frameworkin C++ and Python. This has been tested in Ubuntu 15.10 using the sample code providedalong with this thesis. The JAABA framework is based on Matlab is also able to loadMatroska containers. Matlab uses the gstreamer plugin on Linux operating systems and

2The JAABA framework requires the Matlab Compiler Runtime: http://jaaba.sourceforge.net/


http://jaaba.sourceforge.net/

3.5. COMPRESSION QUALITY - WP 1

Direct Show on Windows respectively to decode the video stream.(21) Both are verifiedto be compatible with MKV/HEVC.(2)

3.5. Compression Quality

For encoding the existing images of season 2014 and 2015 research on the optimal encodingquality and encoder parametrization was done. When encoding a video one will encounteran optimization triangle of file size, encoding time and video quality as depicted in figure3.2. To solve this problem, first an optimal quality configuration was chosen using thecurrent tag detection pipeline of the Beesbook project. Secondly an encoder configurationwas found which compresses very well on a single CPU core in feasible time. So the biashere was set on file size.

Figure 3.2.: Parameter optimization problem. The dashed line indicates the point at whichoptimization of one parameter comes at inappropriately high cost of the otherparameters. The ’optimal quality parameter’ line is found using the pipeline. Theoptimal configuration point X is determined manually.

3.5.1. Choosing the Quality

For later analysis of the videos it is not important how the videos look, but how well thetags are detectable by the pipeline. For this purpose, evaluation of the quality is solelybased on the pipeline. The tags also should still be human-readable, but this criteria islikely to match machine-readability. In the pipeline, tags are processed in multiple steps:Preprocessing, Localizing, Ellipse Fitting, Grid Fitting and Decoding. The approach ofreading tags is the following. Tags are localized first and false positives are filtered usinga neural network. Details are described by Wario et al.(44) For every found 100x100 pixelpartition of the image a 3D model of a tag is fitted on the image partition. This requiresfinding the orientation, rotation and size of the tag. Finally the decoder can read thebits from the 3D model. For a few images Ground Truth data is available.3 The GroundTruth data was manually created and describes all tags in the respective image almost

3Available online at: https://github.com/BioroboticsLab/deeplocalizer_data, retrieved 13.06.2016


https://github.com/BioroboticsLab/deeplocalizer_data


perfectly. The scheme for finding the optimal quality setting is to encode a video con-taining the Ground Truth frame in different quality settings. Then the pipeline is run onthis frame and the Ground Truth frame and performance is being compared. Comparingthe in HEVC re-encoded frame with the original JPEG pendant seems biased, as therewill always be a quality loss unless encoding lossless, which is not feasible. In fact, theused version of the NVEncoder will crash upon using lossless encoding settings.4 But aswill be shown, in near-lossless situations the performance does barely change but for noise.

As a quality metric for encoding the CRF parameter was used. CRF encoding usesa variable bitrate in general. However, how the bitrate is chosen in accordance to theparameter is implementation dependent. So it is only safe to compare results within acertain encoder implementation. Recent encoders also involve the framerate in the CRFfunction.(8)(21) The following measures are based on 25 FPS. Videos on Cray also havebeen encoded using 25 FPS for the sake of comparability. In a later muxing step, videoshave been changed to 3 FPS and muxed into MKV containers. The reason to do encodingand muxing in two steps is, that the HLRN NPL for the Cray had to be used by a givendeadline. However the Biorobotics Lab committee did not finally decide on the require-ments on muxing information at that given point.

Three different measures have been developed. Let Ex be the encoded frame of CRFvalue x in the video and G the corresponding Ground Truth frame. Let there be a setof tags T per frame and y ∈ 1..Number of tags in the frame. Every single tag can nowbe identified as Ty for its respective frame. Y is and index set of the tags in the GroundTruth frame.

First, compare the pixel coverage of the entire mid ring of a tag. Let there be the setof pixels SO

Ex,Ty. These are the pixels covered by the mid ring of tag Ty in Ex and SO

G,Ty

in G respectively. ∑y∈Y

|SOEx,Ty

∩ SOG,Ty|

|SOG,Ty|

/|Y | (3.3)

Second, compare the pixel coverage of every single mid ring. Let there be the set of pixelsSI

Ex,Ty ,Mz. These are the pixels covered by the mid ring section Mz. Let M be the set of

mid ring sections and z an individual section, z ∈ 1 .. 12.

∑y∈Y

∑11z=0 |SI

Ex,Ty ,Mz∩ SI

G,Ty ,Mz|∑11

z=0 |SIG,Ty ,Mz

|/|Y | (3.4)

4Tested using video driver Linux-x86_64-358.16, Cuda 7.5.18, NVEnc 5.0.1



Third, compare the hamming distance of the decoded tag bits in SEx,Ty and SG,Ty .

∑y∈Y

HAM(SEx,Ty , SG,Ty )/|Y | (3.5)

The index y in Ty per algorithm design always points to the best fitting tag in regard tothe Ground Truth tag set. For SG,Ty there is obviously a one-to-one relation. For SEx,Ty

the best fitting tag is selected, disregarding which candidate is the best choice accordingto the pipeline. These measures have been plotted as performance against CRF value. Asall of them indicate the same point of decreased detection performance, only one of themis shown in figure 3.4.

Figure 3.3.: Average difference in % per CRF value, mid-ring results for camera 1. An insignif-icant slope can be seen from 0 to 20, which is dominated by noise. At roughly 20the downturn starts and becomes significant at 25.

As it is yet unknown which algorithms will be used in the future, CRF was chosen to be20. Besides this insight the plots revealed that the performance per camera is different.Unfortunately these measures are not feasible to be used for camera calibration in thefuture, as each calibration step would require creation of new Ground Truth data.

As can be seen in the formulas 3.3 to 3.5, poor localizer performance is penalizedstrongly. A failure in the localizer accounts to zero pixels for the entire respective Tag.This was found to be acceptable, as the localizer is the most reliable but important com-ponent. Not penalizing a localizer failure strongly would result in high evaluation noisein videos of poor quality. E.g. when not taking into account not localized tags, a blankimage would perform perfectly. In a blank image all found tags would have been decoded



perfectly, which are none. Having found the optimal quality parameter for the video, thethe optimal configuration X as in figure 3.2 can be found.

3.5.2. Choosing the Size and Encoding Time Trade-off

The HEVC encoder has a multitude of options which impact the file size to encoding timetrade-off. To simplify this for end users, the CLI encapsulates 29 of them in the -presetswitch.(20) The user may choose from ultrafast to veryslow or placebo. However, thesewere optimized for natural videos. The beehive recordings are special in terms of colour,framerate and so on. Finding the optimal solution here means doing a gradient descenton 29 dimensions. However a gradient descent will only find a global optimum if the onlylocal optimum is the only optimum. It can be assumed that it is not the case that the lossfunction is convex, meaning there is more than one local optimum in this parameter search.

As encoding a single sample takes significant time5, exhaustively doing this kind of pa-rameter search is not feasible. Instead a more manual approach was done. First, a suitablepreset parameter was found at which the improvement in filesize becomes insignificant. As

filesize preset gain22613412 superfast -14841618 fast 34.37%14889477 medium -0.32%

Figure 3.4.: Gain in filesize compared to the respective previous preset. The gain here describesthe percentage at which a file is smaller than it’s predecessor. E.g. if A is 100MB and B is 90 MB the gain from A to B is 10%. Irrelevant results have beentruncated. As it can be seen, there is a significant gain from superfast to fast. Fromfast to medium the gain is small enough to be masked by noise which explains thenegative gain.

shown in figure 3.4, the gain from fast to medium is -0.32%. The gain is small enough tobe covered by noise. For this reason fast was chosen to be the optimal preset. The docu-mentation now yields suitable start values to simplify the manual parameter optimization.A script was created to further optimize parameters thoroughly. While specifying some

of the parameters set by the preset fast, the preset has to be set explicitly as well. Thepreset is applied first, then the fine grained parameters, regardless of the ordering. Theparameter description of –ctu states the following: The larger the maximum CU size, themore efficiently x265 can encode flat areas of the picture, giving large reductions in bitrate.However this comes at a loss of parallelism with fewer rows of CUs that can be encoded inparallel, and less frame parallelism as well.(20) CUTs are Coding Tree Units and describeblocks of size 8x8 to 64x64 in which HEVC fragments the input image. A CTU getspartitioned into CUs, Coding Units. The videos contain large portions of what basicallyis background. High ctu settings will encode this well but penalize parallelism. As every

5Encoding 30 frames ranges from 128s to 573s on the test hardware. For details see the CD.


3.6. VERIFICATION - WP 1

video is encoded on a single core, the penalty does not apply in this usecase. Furthermoreit seems reasonable to choose the minimal cu size to be large. This is because the fram-erate is low and hence the objects move large distances between two frames, compared tonatural videos.

Further parameters, namely early-skip, rd, fast-intra, tskip-fast and rdoq-level may con-siderably improve encoding time, but the penalty on filesize is not clear and were hencetested. The HEVC encoder estimates motions in the video throughout time and creates aso called motion vector. me, max-merge and merange are impacting the search pattern,length and temporal resolution of these motion vectors.(14) It is neither obvious nor deter-minable other than through testing how well motion estimation works using small vectors.Again the object motions in the videos are likely to be large due to the low framerate. Forthis reason, increasing the search range was evaluated. Additionally to the performanceoptimizing parameters, the two parameters b-adapt=0 and rc-lookahead=5 have been set.This will reduce the memory usage of the encoding process, which is necessary as it mayrun out of memory otherwise. The impact on the motion estimation is thought to beminimal, as the temporal resolution is sparse and hence normal parameters may hardlybe applicable. All test results are attached to this thesis on CD. The used parameter setcan be seen in listing 3.1.

Listing 3.1: Final Parameter Set

−pr e s e t f a s t −x265−params rd=2: ctu=64:min−cu−s i z e =32:max−cu−s i z e =64: ear ly−sk ip =1: f a s t−i n t r a =1: t sk ip−f a s t =1:rdoq−l e v e l =0:me=1:max−merge=5:merange=114:b−adapt=0: rc−lookahead=5

3.6. Verification

The legacy image data acquired in the experiments in 2014 and 2015 is crucial to theproject. Hence data loss must be avoided at any time. To guarantee the encoded videosare not missing any data, steps have been taken to validate them.

3.6.1. Timelines

Every video has a start timestamp and an end timestamp in its title. From these times-tamps timelines have been created. For verification purposes it is not relevant whether acertain period of time is available in one or multiple videos. The purpose of the timelinesis to verify whether they are available at all or not.

To generate the timelines, all timestamp pairs have been put into a set. If the endtime of one matches the start time of another, they are merged into a single pair. If theyoverlap or one contains the other, they are also merged. These steps are done exhaustively


3.6. VERIFICATION - WP 1

on the entire set.A similar process was done on the original images as well. If and only if two images areconsecutive by lexicographic order among all images and their timestamps differ by lessthan one second, they are considered consecutive. The algorithm sorts the images bylexicographic order and go through them one by one, following the order. Along the way,pairs equivalent to the videos pairs are created, if the videos have been created correctly.

The resulting sets of pairs are effectively timelines. They have been checked for discrep-ancies using a python script and eventual gaps have been fixed manually. Additionallyknown downtimes may be inserted into the sets and annotated. The sets finally havebeen visualized using the google timelines API.(16) As displaying an entire season in onetimeline might not be appropriate in terms of size, there is also a version which splits thetimelines day wise. An example is given in figure 3.5.

Figure 3.5.: Timeline of the video data of 28.09.2015. Hovering a section may give details.Annotated sections are always red, found video data blue and gaps blank. In thisexample, details for the annotated downtime (red) are shown in the info box below.

3.6.2. Frame Counting

In the preceding section the timelines have been used to verify completeness of the videodata. However one can not be sure that the video contains the amount of frames accordingto the timespan in its title. This means it is not sure for an individual file whether contentis consistent to the title. Inconsistencies might occur due to errors while providing theimages and erroneous images which get skipped during creation of the video. To ensureconsistency the frames in every video have been counted using ffmpeg. Frame countingcan be easily done using the command in listing 3.2. Again found gaps and malformedvideos have been fixed manually.

Listing 3.2: Counting frames using ffmpeg

f fmpeg − i VIDEO −vcodec copy −acodec copy −f nu l l /dev/ nu l l 2>&1| grep ’ frame=’ | cut −f 2 −d ’ ’


4. GPU Assisted Video Recording

This chapter elaborates work package 2 as described in section 1.2. That is, transferringand improving the legacy hard- and software into advanced ones. This includes improve-ments to the lighting, camera calibration, image compression and finally the file transfer.In the legacy system the hive was illuminated with constant IR light of 84 0nm. Thisrequired a large amount of IR lights: 22 lamps each having 24 LEDs. The LEDs havevery directed light (approx. ±5◦) and hence produce a great deal of reflection on theglass pane and the tags. The lighting from one side is back lighting to the cameras ofthe respective other side. This is unwanted, as it decreases contrast in the recorded im-ages. The ambition is to create more diffuse lighting and reduce or eliminate back lighting.

Previously the recording application bb_ImageAcquisition saved the retrieved images inJPEG format, all of the configuration parameters being hardcoded. The JPEG imageswere saved to a ramdisk, as the produced workload was large. A single image was ap-proximately 3 MB. This results in a workload per camera as calculated in equation 4.1.

4 cameras ∗ 3 frames/s ∗ 3 MB/frame = 48 MB/s (4.1)

The images were then stored in a TAR archive and again written to the ramdisk. AsTAR does not use compression,(25) this causes in one read and two writes of the workloadcalculated in 4.1. The archives are then sent to a long term storage on the HLRN super-computer Cray, as well as stored on a local NAS. The NAS is caching the data in caseof Cray failure or maintenance, as well as storing stripes of data every day on long term.Doing this concludes in one more read. So the workload totals to 2 reads and writes ofeach image, which are 192 MB/s. This can not be handled by a consumer hard drive. Itwould also greatly impact the lifetime of multi hard drive or SSD setups. The recordingslast 9 weeks, which concludes in a data amount as per equation 4.2. The calculation isan estimate. Since there are maintenance breaks, failures and varying file sizes, the actualrecorded amount is 157 TB. Hence equation 4.2 may be considered a worst case scenario.This scenario does not cover any gaps in the data, like gaps due to long Cray maintenanceor system failure.

48 MB/s ∗ 604800 s/w ∗ 9 w = 261273600 MB ≈ 261 TB 1 (4.2)

The objective here is to reduce the workload and most of all, reduce the total amount of1s/w is seconds per week.

28

4.1. SOFTWARE - WP 2.1

produced data.

All of these targets are contained by a very limited yet not strictly defined financialresources.

4.1. Software

In section 2.2 the state of the art for video encoding was elaborated. Choosing whichtechnology to use comes with certain obstacles. First of all, is live software encoding fea-sible? Video encoding in general is a very computationally laborious task. The recordingmachines must be able to encode 4 videos at 4000x3000 pixels and 3 frames per secondsin real time. As shown in the appendix in table A.1, the fastest tested AVC softwareencoding test ran in 1.5 s. This is just about feasible for real time encoding. There are4 cameras at 3 frames per second, eventually producing an average workload of 60% tothe CPU. However the compression ratio is poor compared to the HEVC codec or evenh264 at other configurations. This may be improved to an unsatisfying amount by using4 recording machines. Looking at the results of the post-hoc encoding, it is known that aseason may produce 40 TB of HEVC encoded data. The table also tells us that there is asaving of 8MB from approx. 50 MB to 42 MB between certain settings. Thinking of 42MB as 100%, 50 MB is round about 120%. 120% of 40 TB are 48 TB. A reasonably cheaplocal storage is the 8 TB HDD of Seagate with approximately 248e per unit.2 As filesecurity is highly desirable, at last four extra TB are required to do some kind of RAID.So 8 MB extra in the table here result in 372e extra cost per season. It can be concludedthat hardware encoders are financially advantageous in this usecase.

The two feasible options for hardware encoding are the NVEncoder(30) and the In-tel Media Server Studio.(22) As two systems and hence two NVidia GPUs are required,hardware costs are round about 330e3 minimum. The cheapest Intel CPU having thedesired hardware support appears to be the E3-1286v3 for 229e4. The major cost factoris the licence required to use the Intel Media Server Studio. The only licence with HEVCcapability is the Intel Media Server Studio Professional Edition for 3999$.5

The NVidia GPU was chosen for multiple reasons. First, the GPUs are by far the cheap-est solution. Second, other tasks such as mentioned in section 1.5 heavily profit from apowerful GPU. As the recordings of bees take approximately 9 weeks and the bumblebees

2Price per amazon, retrieved 21.06.2016: https://www.amazon.de/Seagate-Archive-interne-Festplatte-Cache/dp/B00QGFEQXU/ref=sr_1_sc_1?ie=UTF8&qid=1466504694&sr=8-1-spell&keywords=seagote+8TB+storage

3Price per amazon, retrieved 03.05.2016: http://www.amazon.de/MSI-V320-059R-GTX-950-Graphics/dp/B013SUP76U/ref=sr_1_1?s=computers&ie=UTF8&qid=1462280394&sr=1-1&keywords=gtx+950

4Price taken from Computeruniverse.net, retrieved 03.05.2016: https://www.computeruniverse.net/products/90648641/intel-xeon-e5-2603-v3.asp

5Price taken from Intel homepage, retrieved 05.05.2016: https://software.intel.com/en-us/intel-media-server-studio/try-buy


https://www.amazon.de/Seagate-Archive-interne-Festplatte-Cache/dp/B00QGFEQXU/ref=sr_1_sc_1?ie=UTF8&qid=1466504694&sr=8-1-spell&keywords=seagote+8TB+storage



http://www.amazon.de/MSI-V320-059R-GTX-950-Graphics/dp/B013SUP76U/ref=sr_1_1?s=computers&ie=UTF8&qid=1462280394&sr=1-1&keywords=gtx+950

http://www.amazon.de/MSI-V320-059R-GTX-950-Graphics/dp/B013SUP76U/ref=sr_1_1?s=computers&ie=UTF8&qid=1462280394&sr=1-1&keywords=gtx+950

https://www.computeruniverse.net/products/90648641/intel-xeon-e5-2603-v3.asp

https://www.computeruniverse.net/products/90648641/intel-xeon-e5-2603-v3.asp

https://software.intel.com/en-us/intel-media-server-studio/try-buy

https://software.intel.com/en-us/intel-media-server-studio/try-buy


6 weeks per year, there is a decent amount of spare time for the GPUs to serve a dualpurpose. Finally there is already one Geforce 960 GTX GPU available at the lab whichcould be used for feasibility checks and development. The Biorobotics Lab finally acquiredtwo Geforce 960 GTX GPUs. The MSI edition having 4 GB of RAM was chosen in favourto the Geforce 950 GTX, which does not seem to come with 4 GB of memory.

4.1.1. Acquisition Application

For the acquisition and encoding task, existing software was extended. The applicationbb_imageAcquisition was developed by Fernando Wario in order to save the recordingsin JPEG format back in 2014. A branch for video encoding was created and is publiclyavailable.6 Previously the camera configurations were fixed in the source code and forevery camera there was a thread retrieving the images. The camera was triggered by itsinternal software. So the retrieving thread had to retrieve the image, store it as JPEGand log the event within 3 FPS. If the thread was to slow to do so, caused by randomhard drive load for instance, data loss may have occurred.The acquisition software was extended to process the input as displayed in figure 4.1.

Figure 4.1.: Software architecture of the advanced acquisition application. Each worker mayhold one encoding session at any time. Interconnections depend on buffer orderand may be changed by switching around buffers as well. HD buffer entries 1 and 3in the JSON file are connected to worker one and 2 and 4 to worker two respectively.

Each camera is still being read by a single reader thread. Every reader thread holds apointer to a concurrent linked list buffer structure. Let this structure be the HD buffer.Each HD buffer has a corresponding configuration in a JSON file. This configurationspecifies how the input is supposed to be acquired and how it will be encoded by the GPUencoder. Former implies all the camera configurations: Camera serial number, logicalcamera ID, resolution, shutter time, gain and so forth. Buffers are storing raw images.The GPU encoder unfortunately does not support monochrome chroma format, so thegrayscale images are converted by the encoder thread first.(30) Let the grayscale value be

6Repository link: https://git.imp.fu-berlin.de/bioroboticslab/bb_imgacquisition


https://git.imp.fu-berlin.de/bioroboticslab/bb_imgacquisition


W, equation 4.4 can be derived from equation 4.3, as W = R = G = B.(40) The U and Vchannels always have the constant value of 128 and are hence omitted in the conversionprocess. Pointers to a constant memory only containing the same value are passed forperformance reasons.

Y = (0.257 ∗R) + (0.504 ∗G) + (0.098 ∗B) + 16 (4.3)

Y = 0.895 ∗W + 16 (4.4)

The workers are doing the encoding. A single worker may be assigned one or two HDbuffers as well as corresponding LD buffers, each of them being registered to a writeHan-dler which will be elaborated later. LD buffers may be enabled or disabled via configura-tion file. Similarly to HD buffers, the LD buffers specify their input, but here the inputhas to be an HD buffer. Whilst encoding the HD buffer, the processed image will not bediscarded but formatted according to the LD buffer and written to it. Quality settingsfor the encoding may be applied for every buffer regardless of kind via configuration file.In this fashion one may specify a high resolution and high quality HD buffer and a corre-sponding low resolution, low quality LD buffer.

The worker will always choose the buffer which consumes the most memory and is fea-sible for encoding, preferring HD buffers. LD buffers may be infeasible for the followingreason. Assume an LD buffer has the exact same quality and resolution settings as its HDcounterpart. For example, let the HD buffers frames per video be 10 and the LD buffers11. As the HD buffers images are transformed into LD buffers alongside encoding, the LDbuffer will hold 10 images after completion of encoding of the HD buffer. Upon selectingthe next buffer to be encoded, the LD buffer may not be chosen, as it will wait for the11’th frame infinitely. The only other mean to prevent this lock is to directly transformthe HD buffers into LD or do the transformation on demand. However the use case is tohave an LD video of a length which is k times the length of the HD video. So either of theproposed approaches would exhaust unnecessary amounts of memory. Switching betweenbuffers while encoding is not possible, as only 2 encoding sessions may be opened at oncedue to licensing restrictions. Ending one encoding session effectively means breaking theexploitation of sequentiality, as compression over time works only within one encodingsession. It would therefore render video encoding useless in the worst case. However,video streams can be concatenated.

The encoding function itself is based on the NVEncoder SDK examples. An encoderobject is being configured and an encoder session is opened via framework functions. Im-ages are retrieved from the buffers as mentioned previously in a loop and passed to theencoder framework function. Results are passed to the writeHandler. The input image is



transformed and passed to a LD buffer or discarded.7 Once all the frames are encoded,the encoding session is being closed and the thread returns to buffer selection.

Unfortunately only one process at once may have access to the camera devices. How-ever for many purposes it is desired to grab images at any time. For example to see thecontents of the hive live without opening it and possibly disturb the bees. To enable thisa simple shared memory interface was created. The shared memory holds a deep copy ofthe most recent image and timestamp, but does not hold any cache. The interface comeswith two advantages: First, any operation on raw images may be implemented, patchedand tested without interrupting the acquisition process. Second, the complexity in theacquisition process remains constant. New features are always critical to the stability ofthe recordings, which are highest priority because they have to run through several weeksof testing until considered stable.

Write Handler

Files are named according to the pattern as described in section 3.2. To separate dataprocession from output, the writeHandler class was created. The writeHandler only oper-ates on the virtual camera ID, as well as the configurable output and temporary folders.To prevent the storage application to attempt to send malformed or unfinished videos totheir final destination, they are held in a temporary folder. Once a buffer is associatedwith the writeHandler, a video data file and a textfile are created. A writing process maysimply use the public lock handle to write a data frame. Then the log function has to beused to log the timestamp of the frames. They will be formatted according to conventions,written and immediately flushed to the textfile. Upon destruction of the writeHandler,text and data files are moved to the output directory. The log function will keep track ofthe first and last timestamp and name the video data file accordingly. If the applicationcrashes at any point, partial video files can be recovered thanks to the text file. The textfile holds the timestamps of all frames written so far. Having interrupted the encodingsession will only harm decoding and playback of the video if half a frame was written backto the video data file. This happens if the application crashed during file writing. Howeverthis does not render the entire data useless, but will only cause errors upon processing thedata.

4.1.2. Watcher Application

An application was created to watch the acquisition processes output. There are twomain functionalities. The first is to always grab the most recent frame and display it ina window. This is done using the QT GUI framework. The second is to save the mostcurrent picture or a series of pictures as RAW images, JPG or PNG. For the later two

7Creation is optional and may be configured



formats QT functions are used as well, while RAW images are just a copy of the sharedmemory.

Live Statistics

Some image statistics are done live on the images every minute using the shared memoryinterface. Though the live statistics code is part of the original bb_imgacquisition codeit functions as a watcher in a separate process. This is because the acquisition processuses the same code for the calibration task. This may be outsourced into a library andseparate projects later.

Live statistics are done for multiple reasons. One may want to find out whether a cam-era got decalibrated during the running experiment. This might happen accidentally oron purpose, when feeding or changing the experiment setup slightly. As manual evalua-tion is a cumbersome and error prone task, support via image statistics is desirable. Thestatistics may indicate differences between cameras as well as whether a single camerahas changed in recording quality. Additionally other informations can be derived. Forexample bee activity in general or in specific parts of the hive.

The question is, what properties can be used to determine the quality of an image?The focus was set on three measures: Noise, sharpness and contrast. Farooque et aldescribed image noise as a random variation of brightness and colour in the image.(12)In the paper different kinds of noise are being identified and measures for removing noiseare analysed. Our goal is not to remove, but only to detect and measure the quantity ofnoise. Some kinds of noise are of much less interest, because we can not prevent themfrom occurring. An example for this is quantization noise. Quantization noise may occurdue to discretization of image values. It usually is uniform for the entire image and hencecan hardly be filtered by a median filter, for instance.According to Farooque, gaussian noise is sensor noise caused by poor illumination and/orhigh temperature, and/or transmission e.g. Electronic circuit noise. These issues may beresolved by changing the hardware setup and shall be detected and measured. A meanto remove this kind of noise is the median filter. The idea is now to filter an image usingthe median filter and calculate the sum of square differences of the original and filteredimages according to formula 4.5. Here N and M are the respective image dimensions andE is the median filter operation.

Noise(I) =∑N

x

∑My |I(x, y)− E(I)(x, y)|2

N ∗M(4.5)

The same approach can be applied to image contrast. According to Starck et al an imagecan be contrast enhanced using histogram equalization, wavelet- and curvelet transfor-mation. (39) It was concluded that curvelets enhance contrast the best, particularly ifthere is a considerable amount of image noise. In our usecase however we do not want to



enhance an image but rather find a measure for contrast. In the setup some regions ofthe image are rather irrelevant, as they are not in the scope of the experiment setup. Soimproving contrast by lighting in regions exterior to the setup may improve the measurealthough relevant sections of the image may or may not become worse. There are twointuitive solutions to this problem. The first is to crop the image to the relevant sections.The second approach is to compare the image to an image having good contrast. The goodimage can be acquired by taking a picture from the camera and enhance the contrast usingthird party tools. Then the image is provided as refIm.jpg to the application and colourhistograms are compared. Again the sum of square differences is taken. Additionally thecontrast ratio has been calculated. The contrast ratio is defined as lowest pixel valuedivided by highest pixel value found in a ROI as illustrated in equation 4.6.

ContrastRatio(I) = min(I)max(I) (4.6)

Obviously this metric is prone to errors caused by outliers, such as dead pixels and over-exposed areas which are of no interest. For this reason the image is first filtered with amedian filter. Then five ROIs are selected in the pattern of the five face of a regular dicewhere the size is adjustable. Finally the average of the five regions is taken.

Sharpness is the most important factor in adjusting the setup. Most digital camerasare able to focus the camera lens automatically. However the hardware being used insection 1.3.1 requires manual focusing. There have been several studies about how tofind the overall sharpnes of an image. Gross et al have been evaluating variance, summodulus difference and perceptual sharpness metric for video colonoscopy.(17) MoreoverChern et al have compared a multitude of methods as depicted in figure 4.2 Results aresobering, as almost all thinkable methods are able to do the job. Yet there are differencesin the performance of the algorithms. For example the steepness of the sharpness functionnear the sharpest point and the computation time. The sum modulus difference andperceptual sharpness metric have been chosen for implentation. SMD chosen because it iscomputationally inexpensive and performed well in both mentioned papers. Calculationwas done as in equation 4.7 according to Gross’s paper.(17)

SSMD(I) =M−1∑i=1

N−1∑j=1

(|I(i, j)− I(i− 1, j)|+ |I(i, j)− I(i, j − 1)|) (4.7)

The perceptual sharpness metric was chosen since it was not analysed by Chern et al andhence completes the view on suitable sharpness metrics. Here the image is split into blocksof 8x8 pixels. Each block is applied an discrete cosine transform as in quation 4.8.

DCTm,n(k1, k2) =7∑

i=0

7∑j=0

xi,jcos[π

8 (i+ 12)k1]cos[π8 (j + 1

2)k2] (4.8)



Figure 4.2.: Plot on the performance of image sharpness functions. It can be seen that almostall intuitively meaningful functions work reasonably well to for finding the correctglobal maximum. (28)

The coefficients of the first row and column are then weighted according to equation 4.9and added as in equation 4.10.

SSF (d) = d0.269(−3.533 + 3.533d)exp(−0.548d) (4.9)

Sm,n =8∑

d=1SSF (d)[DCTm,n(0, d− 1) +DCTm,n(d− 1, 0)] (4.10)

Finally the blocks are added and weighted by image size as in 4.11.

SP SM = 1M ∗N

M/8∑m=1

N/8∑n=1

Sm,n (4.11)

All of the the functions described in this section have been implemented. Their results arewritten live to a textfile along with the timestamp of the corresponding image and cameraID.

Memory Footprint

Images in buffers are handled using the standard libraries shared pointers. The referencecounting of shared pointers guarantees the memory to be freed once at no point in thecode a valid reference to the memory is left. This effectively means that the reader cansimply pass the same pointer to the HD buffer and the shared memory thread. Despite thethreaded nature there is no need to negotiate when or by whom the memory has to be freed.

Memory consumption is highly dependent on the number of images which need to be



cached before processing. This number depends on the number of frames per videos,whether LD videos should be done or not and the HD buffer to worker connections. Im-ages are held uncompressed in buffers.

The usecase is HD images of dimensions 4000x3000 consuming 12 MB of memory andLD images of dimensions 2000x1500 consuming 2.86 MB each. The videos are of length1024 frames HD and LD at 3 frames per second. There are two workers per system andone encoding session per worker. According to NVidias documentation the Maxwell 2’ndgeneration HEVC encoder is able to process at least 133 frames per second, depending onconfiguration.(30) So a buffer containing 1024 images can be processed in approximately7.7 s. Let us assume two cameras are connected and assigned two different workers. In thisexample, each may only process its assigned HD buffer or the corresponding LD buffer.As the HD buffer is processed, the LD buffer will accumulate 2.9 GB of data. In theprocess of encoding the LD buffer, the HD buffer will again acquire 288 MB of images.That is, image size into images per second into encoding time rounded up. As evaluatingbuild up in HD against freed memory from LD buffer may be fragile to lag spikes causedby external applications, it seems safe to round the total memory usage up to 3.4 GB perworker. This results in 6.8 GB total peak memory usage in this use case.Consider another use case. There are four cameras of the same configuration in one system.Now also two HD buffers need to be cached, resulting in four buffers served per worker intotal. While encoding HD buffer 1, HD buffer 2 and LD buffer 1 build up memory to 12.3GB + 2.9 GB = 15.2 GB. Encoding HD buffer 2 causes LD buffer 2 to build up. Afterthis there is a total memory consumption of 288 MB (HD 1) + 6.8 GB (LD 1+2). Nextup are the two LD buffers. Eventually the peak memory usage is 15.2 GB per worker,concluding in 30.4 GB per system.

Calibration

The application has a separate calibration mode. This mode may be launched by runningit with the parameter –calibrate. It will not do the normal recording procedure, but onlycalculate the quality measures as described in section 4.1.2. This is done in real time in abest-effort manner. As the whole task is computation extensive, the full framerate mightnot be achieved. Hence it is advised to calibrate only one camera at once. The resultsare printed in the console, formatted as a table. The user may now change the setup tooptimize the quality measures live.However, for some setups they might be misleading. For example, in the bumblebee hivethe focal point is desired to be on the hive. The hive is somewhat higher than the floor.Since the floor consists of cat litter it has a lot of structure. As a result, an optimalperceived sharpness metric will have the focal point on the floor. This issue may betackled by setting the resolution and region of interest of the camera accordingly. Thiswill enable the user to set the focus an arbitrary rectangular subimage.


4.2. HARDWARE - WP 2.2

4.1.3. Storage Management Application

The storage management application is called bb_imageStorage and was originally createdby Christian Tietz.(41) The legacy version accumulated the images created by bb_imageAcquisitioninto TAR archives. These archives are then sent to the Cray supercomputer. Featureswere as following.

• Transfer TAR archives automatically to Cray using SCP and keyfile authentication.

• Cache files to a temporary directory in case of Cray failure or maintenance to aseparate directory.

• Upload cached files automatically as soon as possible.

• Monitor Cray status and alert the user via console ouput.

• Store stripe data to a separate folder as per given schedule.

• All sources and targets are local folders, except for the Cray. To use a NAS, foldersneed to be mounted locally.

• Sources and targets are hard coded. Changes require recompilation.

• Qmake project files for compilation on Windows and Linux

The functionality was altered to meet the requirements of the video recordings, though theworkflow remains unchanged. The project was converted to CMake to meet the BioroboticsLab standards and all configurations are now done via JSON configuration file. Accumu-lating files is no longer necessary and hence file IO has been reduced. Video files andtextfiles take the place of the TAR files. The application can and has to distinct betweenLD videos and HD videos. They are named identically and are uploaded to different di-rectories on Cray. In case of Cray failure or maintenance, LD videos will be discarded asthey can be reproduced from the HD videos.

4.2. Hardware

4.2.1. Lighting

As mentioned in section 1.1, to acquire pictures in a good quality lighting is crucial. Thelegacy recording system, as introduced in section 1.3.1, used a high number of constantlights. This setup comes with many disadvantages.

• Conventional LEDs have a small lighting angle and hence cause high reflection ontags, glass panes or similar.

• Positioning a high number of lights to light the scene equally well is an elaboratetask.



Figure 4.3.: Circuit diagram of the LED boards.

• Most of the light emitted is not required. 3 frames per second and a shutter time of5ms result in 1.5% of the light actually being used.

• The overall system is expensive. It accounts to round about 1300e, as calculated inappendix A.2.

Board Design

For the advanced hardware setup new LED boards were designed. They run at 5V. Henceregular computer power supply units can be used to power them. Each board has a statusLED indicating three different statuses. In diagram 4.3 this is LED D3. Off means thereis no power supplied to the board. The red LED D3 being on means there is power, butthe board is not flashing. Flickering means that the board is flashing, where one off-phaseindicates one flash. The resistors R4 to R7 and R8 to R11 respectively are parallel seriesresistances to the LEDs. As the power consumption of 4 A is rather high, four resistorshave been chosen to prevent overheating. Q1 and Q2 are MOSFET gates to switch theLEDs on and off. The power supply may not adequately supply the power instantly, as thesurge current is high. The capacitors C1 to C4 are smoothing this current and guaranteea short response time while relieving stress on the power supply. The status LED D3 isconnected between VCC and the trigger line of the MOSFET. As a result it will switch ofwhenever the MOSFET is being triggered. The voltage between VCC and signal becomingnegative causes the LED to switch of. Due to its diode functionality there is almost nocurrent flowing back to VCC. If There is no signal coming from the signal amplifier, theLED is on and nothing else will happen. The current drains over the pull down resistorR2. The voltage on the signal line will be round about 1.5V compared to ground. Thisway the MOSFET will not be triggered, as the trigger voltage is round about 3V betweengate and drain.



Figure 4.4.: Voltage between the LED anode and ground during flashing at 3 flashes per second.Measured using an oscilloscope. In this test circuit, only one LED was used and nosignal amplifiers. The time delta per grid box is 50ms. The base voltage is 3V.

The power consumption of the boards is pulse wise in nature. Yet the power supplyhas to be able to serve every peak, not the average. In figure 4.4 the LED phases can beseen clearly. Switching on the LED causes the voltage to fall by 1.5V. The power supplyslowly recovers from this surge. After switching of the LED the voltage goes up to 0.5Vrelative to 3V. This is because the capacitors are recharging. Eventually the capacitors arenot fully sufficient to smooth the current. Therefore all the voltages on the power supplywill drop shortly. When connecting other devices one has to be aware that the suppliedvoltage is not smooth and constant. A connected light bulb, for instance, would flicker onevery flash of the LEDs.

The layout of the board is single-sided. Power lines have been designed as polygonswhere possible. The outer polygon is ground and the inner one is VCC. So in figure 4.5the upper connector is ground, the middle one VCC and the bottom one the trigger. Thedrill holes do not come with any pads to save space, so using synthetic screws is highly ad-vised. The Beesbook setup uses polypropylene screws. LED D3 in this design is sketchedto be on top of the board. However it is advised to solder it facing back to not obstructthe light emission of the IR LEDs.8

However since the IR light is now flashing at 3 Hz the PS-Eye cameras can not record

8Minor changes to the ordered version included. See CD for details.



Figure 4.5.: Circuit board layout

under the IR light at 100 Hz any more. Some frames will be too dark as there is hardlylight in between the flashes while some are overexposed, resulting in a flickering video.To resolve this issue the white light filters were installed again into the PS-Eye cameras.Panels of red LEDs are used to illuminate the hive constantly for the low resolutionrecordings. This way two distinct kinds of recordings can be done simultaneously usinglight which is non-visible to the honeybees.

Setup

This subsection describes how to wire the parts of the LED boards and cameras. Allwiring is according to the Biorobotics Lab usecase.

An Arduino Duemilanove was used to create the trigger impulses for camera and flashboards. The signal for two flash boards was bundled into a single PWM port. As thereare only two sides of the beehive only two different signals are required. Splitting it fur-ther into two PWM ports was done to increase the signal strength per trigger line and toincrease flexibility of the setup. The cameras are also triggered in pairs of two. As theflashing is done asynchronously, two pairs of different flashing cycles have been connectedto a single power supply. This way a single power supply only has to provide for two flashboards at a time.

Three wires are soldered to the LED boards as depicted in the top two images of fig-ure 4.6, the smaller connector being the trigger line. Any small connector will fit thispurpose, for example JST-PH-2.0 2-Pin connectors. Whenever a signal is applied to thisline, the LEDs will turn on. The MOLEX connector Standard .093" matches the powersupply of +5V as seen in common computers. Note that the power lines are invertedhere and will not fit on the regular connectors of computer power supplies. This is doneto prevent naive use, most of all connecting too many LED boards to a single power supply.



To connect the trigger lines of the boards to the Arduino, Cat-7 twisted pair wires havebeen used. One end was cut open and a 1-Pin connector plug was attached to each leadof the wire. On the other end, each twisted pair was bundled to a single one-pin malejumper wire. This way a single PWM port of the Arduino may trigger two LED boardssimultaneously as described previously.

Cameras are triggered using a separate wire. Matching the camera, 8-pin connectorswith part number HR25-7TR-8SA were connected to a Cat-7 twisted pair wire in a sim-ilar fashion as the trigger for the LED boards. To trigger the cameras only GPIO ports1 (trigger in) and 5 (GND) are required.(34) These two were connected to a twisted paireach, the striped wire being ground. On the other end, all ground leads were bundled toa single one-pin male black jumper wire. The others were bundled in pairs of two.

Finally a simple bridge which short wires multiple one-pin connectors was created asin section C of figure 4.6. This is used as a compensating current bridge. This bridgeis important: The voltage a power supply creates is relative to its ground potential. Ifthe voltages among all connected devices are not relative to the same ground potential,undesired behaviour may occur. For example, the signal might be +5V relative to the Ar-duino’s ground but only +2V relative to the power supplies ground. As a result, the LEDwill not switch on. In other cases too high voltage or power consumption may damagedevices. Note that to create the same potential on all connected devices, a small currentmay be flowing from one connected devices ground to another. To minimize this, one cansupply all devices from the same power outlet. It is a good idea to check the compensat-ing flow using a multimeter. In the tested experiment setup, the compensating flow washardly measurable using Lab devices.

A black one-pin male jumper wire has been soldered to the ground of each power supplyto be connected to the compensating current bridge.

If the Arduino is running while switching on the power supplies, damage may be causedto them. There is round about a 6%9 chance of the the switch-on time of the LEDs tostack with the inrush current of the capacitors. If the power supply is not powerful enoughto serve this, the fuse will break.

4.2.2. Recording Hardware

The cameras being used are FL3-U3-88S2C-C by point grey. The maximum resolutionis 4096 x 2160 at 21 fps but may be set to 4000x3000 at 15 FPS for free running triggermode. Note that in external trigger mode only half the frame rate can be achieved in

9There are three flashes each lasting 20 ms in one second, resulting in a 6% chance



Figure 4.6.: LED boards and wiring. Top left: The LED board. Top right: MOLEX connectorsof the LED board. Bottom left: Compensating current bridge. Bottom right:Trigger wire

general.(34) The cameras offer an USB3 interface for retrieving images and configuringthe cameras via software. They also provide 8 GPIO ports connected via Hirose HR25 8pin connector.10 The GPIO ports are used in this scenario to do external triggering only,as described in the previous section.In the advanced recording system two recording machines are being used, each comingwith the following features.

• At least 16 GB RAM

• NVIDIA GTX960 GPU including 4 GB memory

• At least a dual core CPU running at 3 GHz

• Two USB3 interfaces

• Gigabit Ethernet adapter

• A dedicated HDD for the recording cache, having 3 TB disk space

The expected data throughput per camera can be estimated as follows. The throughputmay vary according to video content, camera configuration and encoder configuration.For HD videos at 1024 frames, 3 FPS, 4000x3000, CRF 20 or equivalent the expected filesize is 400 MB to 900 MB.

Workload = (900MB ∗ 3frames/s)/(1024frames) ∗ 2 = 5.27MB/s (4.12)

10Hirose HR25 8 pin connector part number HR25-7TR-8SA



where the *2 accounts to writing and reading each file.For LD videos at 1024 frames, 3 FPS, 2000x1500, CRF 30 or equivalent the expected filesize is 5 MB to 40 MB.

Workload = (40MB ∗ 3frames/s)/(1024frames) ∗ 2 = 0.24MB/s (4.13)

So the total HDD workload per second is

2 ∗ 5.27MB/s+ 2 ∗ 0.24MB/s ≈ 11MB/s (4.14)

In conclusion, with two cameras the worst-case throughput per HDD is 22 MB/s, whichis feasible for consumer HDDs. To transfer this amount of data, Gigabit Ethernet may benecessary.


5. Evaluation

Both the evaluation and discussion sections come in two parts. This first part is evaluatingthe supercomputer assisted large scale video encoding and the second is the live videoencoding.

5.1. Supercomputer Assisted Video Encoding

5.1.1. Filesize and NPL cost

The legacy data of the seasons 2014 and 2015 accounted for 132 TB and 157 TB respec-tively as mentioned in section 1.3.1. Encoding both seasons in HEVC format yielded 44.3TB and 36.3 TB in the advanced system. Despite smaller input 2014 was encoded intolarger files. The recording quality of 2014 has been worse than 2015 and due to time con-straints exact analysis was only done for 2015. As the parameters were unsuitable for 2014data, the quality settings have been set higher by estimating. The estimation included acertain buffer to be safe. In hindsight, 2014 video data has probably better quality thanactually required.The generated previews total to 389 GB for season 2014 and 483 GB for season 2016.According to HLRN account information, the overall video encoding process used 21063NPL, including test runs. Creating the LD videos consumed round about 2000 NPL. OnMPP2 12 Cores (0.5 nodes) cost 1.2 NPL per hour. 21063 NPL correspond to 210630core hours or ca. 24.04 core years which have been used for the HD videos. In practicethe image analysis, i.e. detection of all tags of all images for an entire season, accounts to120346 NPL, which is 1203460 core hours on MPP2 or 137.38 core years. In conclusionthe NPL costs of video encoding are not negligible, but relatively small.

5.1.2. Verification

The verification methods discovered various issues and anomalies in the Beesbook legacydata and newly encoded data.

• Error logging to textfiles resulted in two erroneous videos total, both from season2014. In both cases ffmpeg reported a broken frame which was skipped. These issueshave been fixed manually by repeating the previous frame, replacing the broken one.Then the video was encoded again.

44

5.2. LIVE ENCODING

• Timelines have shown various gaps. The majority of them resulted from knownissues during the start of the recordings. At that time reconfiguration and connec-tivity issues caused gaps frequently. For example in camera 0, season 2015 therehave been 41 gaps found in the first two days and 15 more in the rest of the season.3 gaps were reported maintenance. 5 gaps are due to broken or incomplete TARfiles. Small missing sections of less than a 30 seconds caused 4 gaps. 3 more largergaps have been found, one of which was caused by Cray failure. Detailed statisticscan be found on CD.

• By counting frames some unusually short videos have been discovered. For in-stance, in season 2015 seven videos have been found. Six of them have been to shortdue to aforementioned gaps. Only one was too short for unknown reasons. This filehas been re-encoded manually.

• Checking the videos file size being greater than zero did not result in any errorsfound.

• The loss in quality can not be quantified any better as has been done in chapter3.5.1. The reason for this is, that previously analysed images can not be analysedagain in their video format due to NPL cost. Analysing smaller samples has beendone in section 3.5.1. Any upcoming image analysis will be done using a fullyrestructured pipeline and are hence not comparable.

5.2. Live encoding

5.2.1. Lighting

The camera trigger synchronization with the LED flash works seamless. The videos do lookperfectly illuminated and no negative effects of the flashing can be seen. The histogramin 5.2 shows that all image information seem to be captured in the recent recordings.Finding the exact difference in illumination is subtle, though. The vendor of the legacyLED lamps does not specify what kind of LEDs are embedded. The lamps have been takenapart, the LEDs tested thoroughly and compared to available specified ones. The LEDTSHG5210 by Vishay has been thought equivalent or better. In figure 5.1 the radiant fluxof the SFH 4716S used in the LED boards is depicted. The LEDs of the advanced systemrun at 4 A for 20 ms, which should yield 3.5×Φe (rounded values) The total radiant fluxis specified as Φe = 1030 mW whereas the Vishay LED emits Φe = 55 mW. Scaling theHPFL (high power flashing LED) up to 4 A we get Φe = 1030 mW × 3.5 = 3605 mW .This makes a single HPFL emit 65.5 times more light than the regular ones by Vishay. Asthere are 24 LEDs per lamp, a HPFL may replace 2.73 lamps. There are 4 LED boardsper side of the beehive, each having two HPFL. In theory these can replace 2.73 × 8 =21.8 lamps. In the original setup 22 lamps have been used per side. Consequently the LEDboards should be able to replace the lamps sufficiently. However the lighting system using


5.2. LIVE ENCODING

Figure 5.1.: Radiant flux of the SFH 4716S by Osram. (32)

the LED boards does not seem to yield as much light: Closing the aperture to f-stop 16 theresulting images are too dark, some portions being entirely black, so further investigationshave been done.

In the legacy 2015 setup the aperture of the cameras was closed to f-stop 16, whichis the minimum for this aperture. Due to the wave nature of light this causes a certaindegree of diffraction. The shutter time was 8ms. Douvos provided a formula to calculatethe optimal aperture, which is given in equation 5.1.(11)

Noptimum =√

375( unf

un − f− uff

uf − f) (5.1)

Where un is the nearest object, uf the furthest object and f the focal length of the lens.Here the nearest object is 48cm, the furthest 51cm and the focal length is 12mm. In-serting the values in equation 5.1 we get approximately 0.083 or 1

12 as an optimal f-stop,disregarding lens aberration. In conclusion the scenery was previously overly illuminatedwhile it is now illuminated below par.

Nonetheless the advanced setup for 2016 was done the following way. The shutter timewas set to 5ms, the gain to 17 and the f-stop to 8. This results in darker looking images,but the entire brightness spectrum is still being captured. In figure 5.2 the brightness


5.2. LIVE ENCODING

Figure 5.2.: Comparison of illumination. A.1 recording example of 2016. A.2 correspondingcolour histogram to A.1, where left is darkest and right is brightest. B.1 recordingexample of 2015. B.2 corresponding colour histogram to B.1. In B.2 differentclasses can be seen. The gaussian peak represents the combs, whereas the level tothe left represents the bees bodies. The slight peak to the very right representsoverexposed image portions. A.2 shows that the comb and bee classes have mergedapparently. The overly illuminated sections are missing and the gaussian movedslightly to the left.

histograms of 2015 recordings and recent ones are compared.

5.2.2. Image Statistics

Image statistics have been evaluated by plotting a sequence of results as a graph. Asimple scenario has shown that the algorithms respond as expected and desired. Resultsare plotted in figure 5.3. DCT was excluded since it behaves very similar to SMD butis more computationally expensive. The contrast ratio was excluded due to very highstandard deviation. It can be clearly seen that all metrics responded strongly when theaperture was opened and the image became nearly white. After adjusting gain and focus itstabilized at expected values. The image became more noisy and less sharp. The contrastdid not exactly return to its previous value, in this case indicating the images are havinga better contrast. However the variance increased although the image is in fact less sharp,which also contradicts the results of the SMD.

5.2.3. Stability

Prior to executing the bumblebee experiment, recording has been started for three weekswithout experiment to test the advanced hard- and software. At first, the image statisticsand image acquisition were run in a single process as different threads. This had theadvantage that shared pointers could be used to free image memory by reference count-ing. In the first two weeks the acquisition crashed regularly every few days. Eventually afreeze occurred which could not be recovered by the watchdog script, as the application


5.2. LIVE ENCODING

Figure 5.3.: Evaluation of Image Statistics. All cameras are recording the hive. Camera 0 to2 are reference cameras. Their internal gain parameter was set to 17, the lensfocused manually and the aperture closed to fstop 8, which seems to be a near-optimal setup. Camera 3 was selected for testing. The gain was set to 0, the lenswas refocused and the aperture opened to fstop 2. Note that values may not becomparable between cameras since different portions of the hive have been recorded.Plots created by Benjamin Wild.


5.2. LIVE ENCODING

never terminated. Due to this issues further improvements have been done: To debugany upcoming errors an event handler was integrated. This event handler will catch everysegmentation fault and print the stack trace to the console. For keeping track of any rele-vant informations, the acquisition process is now being run in a GNU screen session. Thesessions output is being dumped to log files. The image statistics functions are likely to bechanged in the future and are not crucial to the acquisition process. For this reason theapplication has been modified to run statistics in a separate process with a separate watch-dog. The images are acquired using the shared memory interface. Furthermore within theacquisition process a kind of watchdog thread was created. This thread watches all buffersin existence. If a buffer has seen a change once it is considered active. If an active bufferis not affected by any change within 60 seconds, the entire process is killed and restarted.Which buffer has been inactive buffer gets logged to console.Additionally a script was created to post on the Biorobotics Lab slack channel wheneverthe software crashed. Along provided is a short description of the error, e.g. a segmen-tation fault, SDK related error or camera connectivity issues. The error level indicateswhether it is self-recoverable or not.

After these changes crashes rarely occurred. The remaining crashes originated froman SDK function. The SDK errors are catchable and can be resolved by restarting theapplication. Recovering from these errors may be possible but was not implemented, asthe error is not easily reproducible. Statistically this kind of error occurred twice a weekfor which reason the loss was thought to be negligible. When the error gets caught therestart is done quickly. Possible gaps in the recorded data amount to as much as oneminute. The reason may be the watchdog threads timeout of one minute for a thread tobe considered dead.


6. Discussion


In the course of this thesis all of the existing legacy data has been encoded as videossuccessfully. LD videos have also been created. As the evaluation has shown, the cost ofencoding was with approximately 23000 NPL reasonably low. The quality of the videoswas evaluated thoroughly and were found to be suitable for the previously mentionedtracking algorithms. The file size went down to 0.336 times and 0.282 times the originalsize, which are around 40 TB of data. Moreover it was verified that the encoding processdid not conclude in any quantitative data loss.

Work package 1 was completed successfully, which was reducing the overall file size byencoding legacy JPEG image data as HEVC videos. This was done without loosing datain quantity and quality. The amounts of data are reasonably large and might not yet bestored locally. This is because the requirements in video quality and resolution are high.Also an extra buffer was used in terms of quality, as the data is thought to be crucial forthe project. This might improve in the future due to changes mentioned in section 7.1 butis not within the realm of possibility for now. The thesis has elaborated different com-pression schemes and found the presented compression the best semi-of-the-shelf solution.Improving compression even more might be possible using, for example, custom motionestimation algorithms in HEVC or discarding data in quality or quantity. The formerseems possible, but is beyond the scope of this thesis. Later idea has been discarded asexplained previously.


The aim of live encoding was to improve recording quality while compressing data in realtime, which is work package 2.

Work package 2.1 is advancing the recording software to generate HEVC videos on GPU.The live encoding is done according to the quality properties determined in work package1. Application stability has been proven and data transfer is even more reliable than be-fore. This is because the videos are much smaller in size, so a cache of the same size maybe sufficient for a much longer period of time. Furthermore the acquisition application has

50


been improved to offer a simple interface to interact live with the recording data. Thismakes later additions to the acquisition easier and more reliable without elaborate tests.Live statistics have been done and found to be meaningful. Semi-real-time statistics areable to reveal changes in the setup and can indicate improvements or deteriorations inimage quality.

The advanced lighting system developed according to work package 2.2 is not onlycheaper, but also more space efficient and yields better quality in the recordings as thereis no back light. However the accounted amount of lamps is not sufficient and buildingadditional lights will be appropriate.


7. Concluding Remarks


The task of encoding existing data was meant to be done only once. However there areuses for the modified observer script and verification scripts.

Quality Adaptation for a New Pipeline

In section 3.5 the quality evaluations have been done. The desired quality was basedon how well the pipeline performs. Recently the Biorobotics Lab is working on a newpipeline which is entirely different in nature. The new pipeline uses deep convolutionalneural networks (CNN) to localize and decode tags in two steps. It is a common pattern tosample the input down when using CNNs, since it is reducing complexity. For localizing,first tests have show that sampling down the image works very well. It is yet unknown howwell the decoder will perform on the data. It might be the case that the CNN approach isable to perform well on less high resolution input. Though unlikely, the contrary might bethe case as well. To evaluate this is definitely a task when the upcoming pipeline comes touse. The methods introduced in section 3.5 will not work for evaluating the quality justlike that. The general approach however seems adaptable.

Observer Scripts

In his thesis Simon Wichmann pointed out, that extending the observer might be a futuretask. In fact one of the changes he named as future work have been done in this thesis.This is, establishing sequentiality in the images provided by the observer. In his thesis,he stated this might be useful for the pipeline and tracking tasks. This is still valid. Afoundation for tasks like this has been made. The only job left to do is to enable theobserver to extract videos instead of TAR archives. A video and frame list file can betreated as a TAR file equivalent.Alternatively the video and file list tuple could be supplied to the pipeline tasks as a whole.


There are some aspects of the live encoding which may be extended or improved in thefuture.

52


Additions to Calibration

The image statistics are currently only used to ensure constant quality during the record-ings and to assist during calibration. The calibration process could be improved to not onlyassist, but partially takes over the calibration task. The camera lenses and aperture canonly be manually adjusted. But parameters of the camera software can be automaticallyadjusted via software. So a possible extension to the software is implementing focusingand configuration adjustment algorithms similar to point-and-shoot cameras.Using a different kind of lens or extending the available ones it may be possible to alsoadjust the aperture and focus via software.

Live- and Near-Live Streaming

The shared memory currently is a simple interface to grab the most recent image withoutobstructing the recordings in any way. This may be used to stream the images as a videolive to other locations, not only to a local viewer. One could inspect or maintain therecordings live over the internet or do a public live stream. For instance this could becoupled with the pipeline to show entire tracks or partial tracking results live.

Exposing Buffers for Live Processing

In the same fashion as the most recent image is being exposed using shared memory,entire buffers could be exposed. This makes doing the localization and decoding live morefeasible. The decoding task is very time extensive, but the only task which requires veryhigh resolution data. If this can be done live, HD videos might not be necessary any more.

Chapter H. J. Mönck 53

Appendices

54

A. Calculations and Tables

A.1. HEVC And AVC Compression Ratio For BeesbookRecordings

Rerabek et al have shown that HEVC compresses up to 50% better than AVC.(36) As theBeesbook recordings are of special nature, this ratio was verified sufficiently. Figure A.1shows that HEVC in fact is performing significantly better than AVC, but does not reach50%.

Threads preset fast preset superfast preset ultrafast codec4 19.1s – 65.4MB 04.4s – 71.3MB 01.6s – 88.5MB x2644 36.4s – 42.5MB 31.0s – 50.8MB 16.8s – 50.1MB x2658 16.6s – 65.4MB 03.5s – 71.3MB 01.5s – 88.5MB x2648 34.9s – 42.5MB 30.1s – 50.8MB 16,4s – 50.1MB x26516 13.0s – 65.4MB 03.6s – 71.3MB 01.6s – 88.5MB x26416 34.9s – 42.5MB 30.3s – 50.8MB 16.5s – 50.1MB x265

Figure A.1.: Comparison of encoding performance of x265 and x264 for Beesbook recordings.The sample images were a sequence of 30 images recorded from the beehive in rawformat. CRF parameter was set to 15. The first column indicates the value of thethread parameter being used, the last one the codec. The fields show the respectiveperformance according to the preset parameter in the headline. Although no realquality measure was taken apart from the CRF value, it can be seen that x265performs better in terms of filesize. All encoding times are from a single test PCand are hence only comparable among each other. Note that the fastest possibleencoding speed is 1.5s. Using a better compressing preset the fastest one is 3.5s.The file size savings range from 0.57 times the AVC encoded version to 0.71 times.

A.2. Lighting System Cost Calculation

In figures A.2 and A.3 the prices of the legacy and advanced lighting system have beencalculated. The advanced system is 347.12e or 0.268 times the price of the previous one.

A.3. Disk Space Cost Calculation

The Biorobotic Lab strives to save the recorded videos locally in the future. For thisreason a cost estimation for hard disk space has been done. Currently the archive HDD

55

A.3. DISK SPACE COST CALCULATION

Product Amount Vendor CostPCB board, 23x51, 8 days delivery 8 pcb-pool.com 62.77MOSFET AO4752 16 digikey.com 4.53LED OSRAM SFH-4716S 10 digikey.com 45.68Signal Amplifier LM358DT 10 digikey.com 2.832.2Ohm Resistor CRM2512-JW-2R2ELF 64 digikey.com 10.3010kOhm Resistor RC0603JR-0710KL 25 digikey.com 0.20120Ohm Resistor RC0603JR-07120RL 10 digikey.com 0.112.2mF Capacitor JMK107BJ225KA-T 50 digikey.com 4.15Hirose plug HR25-7TP-8P 4 digikey.com 89.84Subtotal 220.41Power supply, 500W, LC500H-12 2 conrad.de 79.98Wire, black 10m, SH1927 1 conrad.de 5.95Wire, red 10m, SH1926 1 conrad.de 5.95Arduino Duemilanove 1 amazon.de 17.79Losi Connector Set: Micro-T/B/DT 1 amazon.de 5.61Cat 7 Wire, 2m 2 amazon.de 9.90Male to Male Jumper Wires, Mixed Set 1 amazon.de 1.53Total 347.12

Figure A.2.: Price calculation of the newly designed flash boards of the advanced system in-cluding equipment. All prices in Euro. The subtotal is the equipment which hadtop be bought by Biorobotics Lab. The second part was on-stock and did not haveto be bought. All prices retrieved 21.06.2016.

Product Amount Vendor CostIR Light ABUS TV6700 24 conrad.de 971.76VOLTCRAFT USPS-600 Power Supply 24 conrad.de 323.76Total 1295.52

Figure A.3.: Price calculation of the legacy lighting system. All prices in Euro. Instead of using24 small power supplies, a single larger one may be used.

’ST8000AS0002’ by Seagate comes at a good price performance of 248e1 per 8 TB and isof feasible quality for this task. As high reliability is desired, a RAID 5 system on threedisks is considered. Slightly rounded amounts of required disk space are used: 200 TB forimage data and 40 TB for video data. Costs of management devices are not being takeninto account.

HDD cost images = 160TB/8TB ∗ 248e ∗ 1.5 = 7440e (A.1)

HDD cost videos = 40TB/8TB ∗ 248e ∗ 1.5 = 1860e (A.2)

It can be seen in equation A.1 that storing the data as HEVC video files saves about5580e on hard drives per season.

1Price per amazon, retrieved 21.06.2016: https://www.amazon.de/Seagate-Archive-interne-Festplatte-Cache/dp/B00QGFEQXU/ref=sr_1_sc_1?ie=UTF8&qid=1466504694&sr=8-1-spell&keywords=seagote+8TB+storage

Chapter A H. J. Mönck 56

https://www.amazon.de/Seagate-Archive-interne-

Festplatte-Cache/dp/B00QGFEQXU/ref=sr_1_sc_1?ie=UTF8&qid=1466504694&sr=8-1-spell&keywords=seagote+8TB+storage

Festplatte-Cache/dp/B00QGFEQXU/ref=sr_1_sc_1?ie=UTF8&qid=1466504694&sr=8-1-spell&keywords=seagote+8TB+storage

B. Glossary

AVC Advanced Video Coding. A coding standard. Also H.264/MPEG-4 AVC

CLI Command Line Interface.

CPU Central Processing Unit.

CRF Constant Rate Factor.

GND Chassis Ground (electronics).

GPU Graphics Processing Unit.

HD High Definition. Here: Videos of size 4000x3000 and high quality.

HEVC High Efficiency Video Coding. Also H.265 or MPEG-H Part 2

HLRN Norddeutscher Verbund zur Förderung des Hoch- und Höchstleistungsrechnens.

IO Input/Output

IR Infra red.

JAABA Janelia Automatic Animal Behaviour Annotator

JPEG Joint Photographic Experts Group.

JSON JavaScript Object Notation.

LD Low Definition. Here: Videos of size 2000x1500 and low quality.

LED Light-Emitting Diode.

MKV Matroska Video. A video container format.

MOLEX Molex Incorporated. A manufacturer of electronic systems.

MPEG4 Moving Picture Experts Group 4. A video coding standard.

NAS Network Attached Storage.

NPL Norddeutsche Parallelrechner-Leistungseinheit.

OS Operating System.

57

PSNR Perceived Signal to Noise Ratio.

PWM Pulse Width Modulation.

QT Derived from cute. A programming library.

RAID Redundant Array of Independent Disks.

SCP Secure Copy.

TAR Tape Archiver. Here: The format for storing archives.


Bibliography

[1] Anne Aaron. High Quality Video Encoding at Scale, 2015. URL http://techblog.

netflix.com/2015/12/high-quality-video-encoding-at-scale.html. [Online,Accessed 2016-03-15].

[2] Wikipedia Authors. Comparison of video container formats, 2016. URL https:

//en.wikipedia.org/wiki/Comparison_of_video_container_formats. [Online,Accessed 2016-05-26].

[3] Wolfgang Baumann. HLRN Homepage, 2016. URL https://www.hlrn.de/. [Online,Accessed 2016-08-05].

[4] Fabrice Bellard. BPG Image format, 2015. URL http://bellard.org/bpg/. [Online,Accessed 2016-08-05].

[5] Frank Bossen, David Flynn, Karl Sharman, and Karsten Sühring. JCTVC-SoftwareManual. 2016. URL https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/

trunk/doc/software-manual.pdf. [Online, Accessed 2016-03-22].

[6] Patrick E. Clark, Douglas E. Johnson, Mark A. Kniep, Phillip Jermann, Brad Hut-tash, Andrew Wood, Michael Johnson, Craig McGillivan, and Kevin Titus. AnAdvanced, Low-Cost, GPS-Based Animal Tracking System. Rangeland Ecology &Management, 59(3):334–340, may 2006. doi: 10.2111/05-162R.1.

[7] Security Center of Group ABUS Corporation. IR-Scheinwerfer. pages 1–60. URLhttps://firstmall.de/etc/pdf/TV6700_INT.pdf. [Online, Accessed 2016-08-05].

[8] X264 Developers. X264 Changelog, 2016. URL http://x264.nl/x264/changelog.

txt. [Online, Accessed 2016-05-26].

[9] Ogle Player development Team. MPEG headers Quick Reference, 2004. URL http:

//dvd.sourceforge.net/dvdinfo/mpeghdrs.html. [Online, Accessed 2016-03-16].

[10] DivXLabs. DivX HEVC Community Encoder, 2014. URL http://labs.divx.com/

divx265. [Online, Accessed 2016-08-05].

[11] George Douvos. OptimumCS-Pro Optical Science. URL www.georgedouvos.com/

douvos/OptimumCS-Pro_Optical_Science.html. [Online, Accessed 2016-07-12].

59

http://techblog.netflix.com/2015/12/high-quality-video-encoding-at-scale.html

http://techblog.netflix.com/2015/12/high-quality-video-encoding-at-scale.html

https://en.wikipedia.org/wiki/Comparison_of_video_container_formats

https://en.wikipedia.org/wiki/Comparison_of_video_container_formats

https://www.hlrn.de/

http://bellard.org/bpg/

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/doc/software-manual.pdf

https://hevc.hhi.fraunhofer.de/svn/svn_HEVCSoftware/trunk/doc/software-manual.pdf

https://firstmall.de/etc/pdf/TV6700_INT.pdf

http://x264.nl/x264/changelog.txt

http://x264.nl/x264/changelog.txt

http://dvd.sourceforge.net/dvdinfo/mpeghdrs.html

http://dvd.sourceforge.net/dvdinfo/mpeghdrs.html

http://labs.divx.com/divx265

http://labs.divx.com/divx265

www.georgedouvos.com/douvos/OptimumCS-Pro_Optical_Science.html

www.georgedouvos.com/douvos/OptimumCS-Pro_Optical_Science.html

Bibliography

[12] Mohd Awais Farooque and Jayant S Rohankar. Survey on Various Noises and Tech-niques for Denoising the Color Image. International Journal of Application or Inno-vation in Engineering and Management, 2(11):217–221, 2013.

[13] Mark Fiala. ARTag Rev2 Fiducial Marker System : Vision-based Trackingfor AR. Library, 2005. URL http://campar.in.tum.de/twiki/pub/ISMAR/

IarAbstractARTag/IarDetailsFialaSlides.pdf. [Online, Accessed 2016-08-05].

[14] Andrew Ford. x265 Command line options, 2014. URL http://x265.readthedocs.

io/en/default/cli.html#input-output-file-options. [Online, Accessed 2016-05-26].

[15] John L. Furlani and Peter W. Osel. Modules – Software Environment Management,2012. URL http://modules.sourceforge.net/. [Online, Accessed 2016-08-05].

[16] Google. Timelines API, 2016. URL https://developers.google.com/chart/

interactive/docs/gallery/timeline. [Online, Accessed 2016-05-26].

[17] Sebastian Gross, Martin Schink, Thomas Stehle, Alexander Behrens, Jens Tischen-dorf, Christian Trautwein, and Til Aach. Echtzeitfähige extraktion scharfer stand-bilder in der video-koloskopie. In Bildverarbeitung für die Medizin (BVM), pageaccepted, March 14–16 2010.

[18] Bernd Heinrich. The Mechanisms and Energetics of Honeybee Swarm TemperatureRegulation, volume 91. 1981. doi: 10.1126/science.212.4494.565.

[19] Heinz Hofbauer, Christian Rathgeb, Johannes Wagner, Andreas Uhl, and ChristophBusch. Investigation of better portable graphics compression for iris biometric recog-nition. In Lecture Notes in Informatics (LNI), Proceedings - Series of the Gesellschaftfur Informatik (GI), volume P-245, 2015. doi: 10.1109/BIOSIG.2015.7314609.

[20] MulticoreWare Inc. Preset Options — x265 documentation, 2014. URL http://

x265.readthedocs.org/. [Online, Accessed 2016-08-05].

[21] MathWorks Incorporation. MATLAB VideoReader - MathWorks Deutschland, 2016.URL http://de.mathworks.com/help/matlab/ref/videoreader.html. [Online,Accessed 2016-05-26].

[22] Intel Corporation. Ultra HD Experiences Made Ultra Simple. (Vdi),2016. URL http://www.sigmasd.com/partner/wp-content/uploads/2015/09/

Intel-Media-Server-Studio-2016.pdf. [Online, Accessed 2016-08-05].

[23] Nanxiang Jin, Tim Landgraf, Simon Klein, and Randolf Menzel. Walking bumble-bees memorize panorama and local cues in a laboratory test of navigation. AnimalBehaviour, 97(November):13–23, 2014. doi: 10.1016/j.anbehav.2014.08.013.


http://campar.in.tum.de/twiki/pub/ISMAR/IarAbstractARTag/IarDetailsFialaSlides.pdf

http://campar.in.tum.de/twiki/pub/ISMAR/IarAbstractARTag/IarDetailsFialaSlides.pdf

http://x265.readthedocs.io/en/default/cli.html#input-output-file-options

http://x265.readthedocs.io/en/default/cli.html#input-output-file-options

http://modules.sourceforge.net/

https://developers.google.com/chart/interactive/docs/gallery/timeline

https://developers.google.com/chart/interactive/docs/gallery/timeline

http://x265.readthedocs.org/

http://x265.readthedocs.org/

http://de.mathworks.com/help/matlab/ref/videoreader.html

http://www.sigmasd.com/partner/wp-content/uploads/2015/09/Intel-Media-Server-Studio-2016.pdf

http://www.sigmasd.com/partner/wp-content/uploads/2015/09/Intel-Media-Server-Studio-2016.pdf

Bibliography

[24] Brother John. Containerformate, 2012. URL http://encodingwissen.de/

formatedschungel/container#fn6. [Online, Accessed 2016-03-16].

[25] Gnu.org Maintainers. Basic Tar Format, 1990. URL http://www.gnu.org/

software/tar/manual/html_node/Standard.html. [Online, Accessed 2016-05-26].

[26] Randolf Menzel and M Blakers. Comparative Colour Receptors in the Bee Eye-Morphology and Spectral Sensitivity. Journal of Comparative Physiology A, 108(1):11–33, 1976. doi: 10.1007/BF00625437.

[27] Danielle P Mersch, Alessandro Crespi, and Laurent Keller. Tracking individualsshows spatial fidelity is a key regulator of ant social organization. Science Express,340(6136):1090–3, 2013. doi: 10.1126/science.1234316.

[28] N. Ng Kuang Chern, Poo Aun Neow, and M.H. Ang. Practical issues in pixel-basedautofocusing for machine vision. In Proceedings 2001 ICRA. IEEE International Con-ference on Robotics and Automation (Cat. No.01CH37164), volume 3, pages 2791–2796. IEEE, 2001. doi: 10.1109/ROBOT.2001.933045.

[29] NVidia Corporation. Nvidia Video Codec Sdk, 2013. URL https://developer.

nvidia.com/nvidia-video-codec-sdk. [Online, Accessed 2016-08-05].

[30] NVidia Corporation. Nvidia video encoder 5.0. (November):1–13, 2014. URLhttp://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_

DA-06209-001_v06.pdf. [Online, Accessed 2016-08-05].

[31] NVidia Corporation. NVENC - NVIDIA Video Encoder API Reference Manual ver-sion 6.0. 2015.

[32] Osram. OSLON Black Series (850 nm) - Datasheet. 2012. URL http://www.

osram-os.com/Graphics/XPic7/00204371_0.pdf/SFH%204716S.pdf. [Online, Ac-cessed 2016-08-05].

[33] J. Ostermann, J. Bormans, P. List, D. Marpe, M. Narroschke, F. Pereira, T. Stock-hammer, and T. Wedi. Video coding with H.264/AVC: tools, performance, andcomplexity. IEEE Circuits and Systems Magazine, 4(1):7–28, jan 2004. doi:10.1109/MCAS.2004.1286980.

[34] PointGrey. Flea3 USB 3.0 Digital Camera Technical Reference version 7.0. 2014. URLhttp://www.ptgrey.com/support/downloads/10120/. [Online, Accessed 2016-02-23].

[35] ITU-T Recommendations. ITU-T Recommendation database, 2003. URL http:

//www.itu.int/ITU-T/recommendations/rec.aspx?rec=2820. [Online, Accessed2016-04-29].


http://encodingwissen.de/formatedschungel/container#fn6

http://encodingwissen.de/formatedschungel/container#fn6

http://www.gnu.org/software/tar/manual/html_node/Standard.html

http://www.gnu.org/software/tar/manual/html_node/Standard.html

https://developer.nvidia.com/nvidia-video-codec-sdk

https://developer.nvidia.com/nvidia-video-codec-sdk

http://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_DA-06209-001_v06.pdf

http://developer.download.nvidia.com/compute/nvenc/v5.0_beta/NVENC_DA-06209-001_v06.pdf

http://www.osram-os.com/Graphics/XPic7/00204371_0.pdf/SFH%204716S.pdf

http://www.osram-os.com/Graphics/XPic7/00204371_0.pdf/SFH%204716S.pdf

http://www.ptgrey.com/support/downloads/10120/

http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=2820

http://www.itu.int/ITU-T/recommendations/rec.aspx?rec=2820

Bibliography

[36] Martin Rerabek, Philippe Hanhart, Pavel Korshunov, and Touradj Ebrahimi. Qualityevaluation of HEVC and VP9 video compression in real-time applications. In 2015Seventh International Workshop on Quality of Multimedia Experience (QoMEX), vol-ume 2013, pages 1–6. IEEE, may 2015. doi: 10.1109/QoMEX.2015.7148088.

[37] Thomas D Seeley. The wisdom of the hive: the social physiology of honey bee colonies,1997. ISSN 00280836.

[38] Peter Skorupski and Lars Chittka. Photoreceptor spectral sensitivity in the bum-blebee, Bombus impatiens (Hymenoptera: Apidae). PLoS ONE, 5(8), 2010. doi:10.1371/journal.pone.0012049.

[39] Jean-luc Starck, Fionn Murtagh, Emmanuel J Candès, and David L Donoho. Grayand Color Image Contrast Enhancement by the Curvelet Transform. IEEE Transac-tions on Image Processing, 12(6):706–717. doi: 10.1109/TIP.2003.813140.

[40] Gary Sullivan and Stephen Estrop. Recommended 8-Bit YUV Formats for VideoRendering (Windows), 2008. URL http://msdn.microsoft.com/de-de/library/

dd206750(v=vs.85).aspx. [Online, Accessed 2016-08-05].

[41] Christian Tietz. Entwurf und Implementierung einer Speicherarchitektur für Bild-massendaten zur Verhaltensanalyse von Honigbienenkolonien. 2015.

[42] VideoLAN. Multicoreware x265 wiki. URL https://bitbucket.org/

multicoreware/x265/overview. [Online, Accessed 2016-08-05].

[43] Marko Viitanen, Koivula Ari, and Lemmetti Ari. Ultra Video Joiner, 2010. URLhttp://ultravideo.cs.tut.fi/#encoder. [Online, Accessed 2016-08-05].

[44] Fernando Wario, Benjamin Wild, Margaret J. Couvillon, Raúl Rojas, and TimLandgraf. Automatic methods for long-term tracking and the detection and decod-ing of communication dances in honeybees. Frontiers in Ecology and Evolution, 3(September):1–14, 2015. doi: 10.3389/fevo.2015.00103.

[45] Simon Wichmann. Storage and Parallel Processing of Image Data Gathered for theAnalysis of Social Structures in a Bee Colony, 2014.

[46] Thomas Wiegand, Heiko Schwarz, Anthony Joch, Faouzi Kossentini, and Gary J.Sullivan. Rate-constrained coder control and comparison of video coding standards.IEEE Transactions on Circuits and Systems for Video Technology, 13(7):688–703,2003. doi: 10.1109/TCSVT.2003.815168.

[47] Jing Xiao, Liang Liao, Jinhui Hu, Yu Chen, and Ruimin Hu. Exploiting globalredundancy in big surveillance video data for efficient coding. Cluster Computing,pages 531–540, 2015. doi: 10.1007/s10586-015-0434-z.

[48] Mareike Ziese. Scene Labeling eines Bienenstocks. 2016.


http://msdn.microsoft.com/de-de/library/dd206750(v=vs.85).aspx

http://msdn.microsoft.com/de-de/library/dd206750(v=vs.85).aspx

https://bitbucket.org/multicoreware/x265/overview

https://bitbucket.org/multicoreware/x265/overview

http://ultravideo.cs.tut.fi/#encoder

Documents

Masterarbeit - Freie Universität · Masterarbeit Large Scale Supercomputer Assisted and Live Video Encoding with Image Statistics 18. August 2016 Bearbeitet von: Gutachter : Prof