Animal Population Estimation Using Flickr Images · 2018-05-04 · Animal Population Estimation Using Flickr Images Conference’17, July 2017, Washington, DC, USA were obtained,

Animal Population Estimation Using Flickr ImagesSreejith Menon

Bloomberg LPNew York New York 10022smenon59bloombergnet

Tanya Berger-WolfUniversity of Illinois at Chicago

Chicago Illinois 60607tanyabwuicedu

Emre KicimanMicroso Research

Redmond Washington 98052emrekmicrosocom

Lucas JoppaMicroso Research

Redmond Washington 98052lujoppamicrosocom

Charles V StewartRensselaer Polytechnic Institute

Troy New York 12180stewartcsrpiedu

Jason ParhamRensselaer Polytechnic Institute

Troy New York 12180parhajrpiedu

Jonathan CrallRensselaer Polytechnic Institute

Troy New York 12180cralljrpiedu

Jason HolmbergWildMeorg

Portland Oregon 97217jasonwildmeorg

Jonathan Van OastWildMeorg

Portland Oregon 97217jonwildmeorg

AbstractWhile the technologies of the Information Age have produced stag-gering amounts of data about people they are by and large failingthe worldrsquos wildlife Even the simplest and most critical piece ofinformation the number of animals of a species is either unknownor is uncertain for most species Here we propose to use imagesof wildlife posted on social media platforms together with animalrecognition soware and mark-recapture models to estimate pop-ulation sizes We show that population size estimates from socialmedia photographs of animals can produce robust results yet morework is needed to understand biases inherent in the approach

ACM Reference formatSreejith Menon Tanya Berger-Wolf Emre Kiciman Lucas Joppa Charles VStewart Jason Parham Jonathan Crall Jason Holmberg and Jonathan VanOast 2016 Animal Population Estimation Using Flickr Images In Proceed-ings of ACM Conference Washington DC USA July 2017 (Conferencersquo17)4 pagesDOI 101145nnnnnnnnnnnnnn

1 IntroductionOf the 87 Million terrestrial species which are estimated to existon earth humans have discovered and described a mere 14 [17]e International Union for Conservation of Nature (IUCN) RedList of reatened Speciestrade[23] is an internationally recognizedocial body for tracking conservation status of species across theworld [25] IUCN currently maintains and tracks conservationstatus of over 79000 species Of those over 23000 species arethreatened with extinction [13 25] e Living Planet report themost comprehensive eort to track the population dynamics ofspecies around the world includes 10300 populations of just 3000species [10] Such scarce data make it exceptionally dicult toassess some of the most pressing problems in conservation - how

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor prot or commercial advantage and that copies bear this notice and the full citationon the rst page Copyrights for components of this work owned by others than ACMmust be honored Abstracting with credit is permied To copy otherwise or republishto post on servers or to redistribute to lists requires prior specic permission andora fee Request permissions from permissionsacmorgConferencersquo17 Washington DC USAcopy 2016 ACM 978-x-xxxx-xxxx-xYYMM $1500DOI 101145nnnnnnnnnnnnnn

healthy are wildlife populations and are our conservation manage-ment actions having a positive eect Answering those questionsis impossible without leveraging computational tools - the opera-tional and nancial burdens of large scale censuses sharply limittheir use Today one of the most abundant sources of informationabout wildlife are images taken by scientists trail cameras eldassistants tourists and opportunistic photographers e laertwo image sources are available only if the photographs are shareddirectly with the scientists or publicly with the world A promisingsolution to tracking wildlife populations at large spatio-temporalscales is to turn to an opportunistic form of citizen science miningpublicly available social media photos of animals

e current era is marked by extensive use and inuence ofsocial media platforms in our day-to-day lives e proportionof total world population active on social media has seen a sharprise in the past ve years and is likely to rise up to 267 billion bythe year 2018 [8] One of the primary activities on social mediais sharing images over 3 billion images are uploaded and sharedon various social networks every day [14] While most of thoseimages do not pertain to wildlife the amount that do still dwarfs theinformation available from standard scientic surveys Howevercombining these new sources of data with traditional populationand range models is not straight-forward In this paper we discusshow accurately can we estimate population of Grevyrsquos zebra (Equusgrevyi) from images that are scraped from Flickr1 and demonstrate aproof of concept approach for using social media images for wildlifepopulation estimates

2 Methods21 Estimating population of wildlife speciesA standard method of estimating population of a wildlife speciesis using the Mark-Recapture approach [6 22 24] Typically itproceeds in two stages First some number of animals are capturedand these animals are marked with some kind of unique identiablemarking At the second stage considered to be independent of therst another set of animals from the same population is capturedand of those some already have been marked in the rst stagee simplest mark-recapture approach assumes that the ratio ofanimals with initial markings to the number of animals capturedin the second sample has to be proportional to number of animals1hpswwwflickrcom

Conferencersquo17 July 2017 Washington DC USASreejith Menon Tanya Berger-Wolf Emre Kiciman Lucas Joppa Charles V Stewart Jason Parham Jonathan Crall Jason Holmberg

and Jonathan Van Oast

captured in the rst sample to the actual population of the speciesus let n be the number of individuals captured during the rststage (capture) K be the number of individuals captured during thesecond stage and k be the number of individuals with the uniquemarking from rst stage that were captured in the second stage (iethe animals that were re-captured) en the estimated populationsize Nest is computed using the simple ratio formula known asthe Lincoln Petersen estimator [20]

Nest =Kn

ere are several major assumptions that this method makesFirst that every animal has an equal probability of being cap-tured [15] that the sampling strategy is the same in the rst andsecond stages and they are independent of each other and nallythat there are no new animals joining or animals leaving the popu-lation during the sampling period

Animal population estimation using photographic data can bedone using a very similar approach e marks and recaptures canbe transformed into sights and re-sights across dierent pictures [312] For the extent of our studies we extend the same sight-resightmodel but in a social media seing All the pictures that are knownto be shared are divided into various time epochs and a populationsize can be estimated across any (preferably consecutive) pair ofepochs is method can be used not only for population sizeestimation for a given period but can be used to observe trends anddynamics of population changes over a longer period of time

Note however that using the simple Lincoln Petersen estima-tor may not be appropriate as its assumptions may be violated bythe social media image data It is not guaranteed that the popu-lation is closed and more importantly that every animal has thesame probability of being photographed (captured) However moresophisticated models of mark-recapture such as Jolly-Seber [15]require knowledge of more parameters of the sampling processand the biases herein than we currently possess of the social mediaimage data us in absence of this information we use the simplestmethod and leave the estimation of the biases of the social mediafor future research (see Section 4)

22 Identifying individual animals using computer visionA population estimation method for wildlife species requires anability to detect and identify individual animals in our case inphotographic data HotSpoer [7] is a fast and accurate algorithmfor identifying individual animals of species with unique markingson their bodies from their pictures HotSpoer algorithm canidentify species such as Grevyrsquos zebra Plains zebra Reticulatedgirae Masai girae Humpback Whales (using ukes) Bolenosedolphins (using dorsal ns) Iberian lynx Giant sea bass Geometrictortoises Hawksbill sea turtles Giant mantas Ragged tooth nursesharks and many others

Wildbooktrade [29] is an open source (GPL v2) soware system thatstarts with images of animals and connects the image analysis witha data-management layer to enable queries about animals Wild-booktrade provides an interface for citizen scientists to report theirsightings of animals and upload animal images ese images arepassed through a detection algorithm where species of the animala bounding box of where the animal is located in the picture andother aspects are determined Every detection is assigned a uniqueannotation ID Every annotation in an image is then passed through

the HotSpoer algorithm where it is compared to already labelledannotations in the reference database to determine whether theanimal was previously sighted or not Previously unseen animalsare assigned a new unique ID while previously sighted animalsare reassigned their existing IDs Access to the photographic de-tection and identication modules is exposed as REST APIs withinWildbooktrade [21]

3 Experiment31 Grevyrsquos ZebraFor the purpose of our study we chose the species of the Grevyrsquoszebra e population of Grevyrsquos zebra in the wild is limited toonly some parts of Kenya and Ethiopia [26] is gives us a uniqueopportunity to estimate global population of a species and also thechance to validate the accuracy of the results since the ground truthpopulation of Grevyrsquos zebra is known and is a closed populationIn addition the stripe paern on Grevyrsquos zebra provides the uniquemarking which allows HotSpoer to uniquely identify individuals

e ground truth population size as well as the images for thereference database of the existing population of Grevyrsquos zebracome from the massive citizen science event e Great GrevyrsquosRally [1 21] which took place in central and northern Kenya inJanuary 2016 e Great Grevyrsquos Rally spanned over two daysabout 160 people participated in the rally and over 40000 imageswere captured in the event Using HotSpoer 1942 individuals wereuniquely identied and named e population of Grevyrsquos zebrawas then calculated using standard Lincoln-Petersen estimatordividing the animal sights and resights across day 1 and day 2 ofthe rally ese estimates are also used as the ocial populationsize estimates by the IUCN Redlist [26]

We use the population estimation stated on the IUCNrsquos Redlistas the ground truth population against which our estimates arecompared e ground truth population as of year 2016 is 2250 plusmn93 [26] e number is from January 2016 and previous populationsize estimates indicate higher numbers in years prior to 2014 (albeitwith a much lower condence level)

32 Data

Figure 1 Snapshot of images results returned by searching usingthe search term ldquozebrardquo on wwwflickrcom

Using Flickrrsquos image search APIrsquos [9] we scraped publicly sharedimages that were tagged as ldquozebrardquo ese images were then up-loaded to the HotSpoer instance and then detections and identi-cations were performed on these images Out of all the images that

Animal Population Estimation Using Flickr Images Conferencersquo17 July 2017 Washington DC USA

were obtained we used the images that had at least one Grevyrsquoszebra in it and then used those images to estimate the populationsize of the species us 1701 images taken between 2004mdash2015were scraped from Flickr Both Plains zebra and Grevyrsquos zebra im-ages were found in this sample A total of 2047 annotations weredetected and 1080 unique individuals were identied and namedFor the images scraped from Flickr we also extract spatio-temporalinformation about every image ie when and where each imagewas taken Only images of wild Grevyrsquos zebras (not zoo) takenin Kenya were used for analysis e images were then dividedinto sets on the basis of the year in which the image was takenTable 1 summarizes the images that were obtained from scrapingthe publicly shared albums on Flickr

Search query zebraNumber of images 1701Number of annotations 2047Species detected Grevyrsquos zebra Plains zebraIndividuals identied 1080Date range 2004-2015

Table 1 Search query total number of images annotations speciesdetected and individuals identied from the images scraped fromFlickr

e population was then estimated using Lincoln-Petersen esti-mator by assigning the images taken in a particular year as marksand the images taken on the subsequent year as recaptures isprocess is then repeated for all the years starting from 2004 to 2015

33 Analysis detailsPopulation estimate of Grevyrsquos zebra does not take into accountanimals in zoos is is primarily because of the higher likelihoodof zoo animals to be photographed in comparison to the animalsin the wild A higher likelihood of capture means a higher re-capture rate A higher recapture rate in the mark-recapture modelindicates lower population size us inclusion of zoo animals inthe corpus of images used for population estimation will lead toan underestimate of the population size erefore we removed allimages of zoo animals from the corpus

Let P be the ground truth population P be the estimated popu-lation and ϵ be the percent estimation error or simply error Weassume the value of P to be equal to the upper limit of knownground truth population to simplify our calculations We then com-pute the error in population estimate ϵ using the below formula

ϵ =P minus P

Ptimes 100

34 ResultsWe computed population size estimates from Flickr images fromyears 2004mdash2015 using sight-resight estimates on pairs of consec-utive years Table 2 summarizes the results and the error againstthe Januray 2016 population size estimate In the years 2014 and2015 the estimate is very accurate with 38 and 09 percent errorrespectively e population of the Grevyrsquos zebra is considered tohave been roughly stable over the last 10 years [26]

Sight year Resight Year Population Estimate Error2004 2005 No resights NA2005 2006 No resights NA2006 2007 2035 -952007 2008 1842 -18132008 2009 1521 -3242009 2010 2782 23642010 2011 6580 19242011 2012 3624 6112012 2013 4447 9762013 2014 2336 382014 2015 2272 09

Table 2 Population size estimates calculated annually using simplesight-resight model for Grevyrsquos zebra using Flickr images

4 Discussion and Future Worke results of our experiment and analysis as summarized in Ta-ble 2 provide evidence that population estimation from social mediaimages is a viable and reasonable approach While there are errorsin some years we have a good estimate from the pictures that weretaken in the years 2014 and 2015 and the estimates for prior yearsmay too be accurate when compared to the actual populationnumbers in those years e census event for Grevyrsquos zebra tookplace in the January of year 2016 thus giving us a good benchmarkagainst which we can compare our estimates for the years 2014 and2015

e next steps in our research involves repeating the experimentsand estimating population of more species In principle for everyspecies that can be detected and identied using Wildbooktrade imageanalysis tools we can estimate the population of that species usingimages from social media While Flickr has been an importantsource for all publicly shared images we are currently extractingimages from Bing and exploring options to scrape images fromother social media platforms such as Twier Facebook and others

While estimating population of a species from social media di-rectly we completely neglected bias in any form that inuences thetype of image samples we scraped from the Flickr database ereare multiple biases that inuences the nal outcome of estimatingpopulation of a certain species from images that are obtained fromsocial media Some of the most prominent biases which inuencethe data we obtain from social media are outlined in Figure 2 ereare several layers of biases accumulating in the resulting bias ofestimating animal population properties from images First thereare biases in the types of animals that people typically photographin sucient numbers in the rst place ese may be charismaticor endangered species or simply the ones easily observed Secondthere are biases in what images people take versus which onesthey decide to share publicly on social media ese range fromthe Hawthorne Eect [2 19 27 28] of changing behavior whenknowing to be observed to biases introduced by the demographicsof the person sharing [4 19] and the choice of the social mediaplatform [5 11 18] ere are biases of our notions of beauty andaesthetics and cultural dierences Any mark-recapture model usedto estimate the population size makes many assumptions and in-troduces its own biases e fundamental question however is Doany of these actually aect the estimates of the population size and

Figure 2 High level schematic representation of the problem of population estimation of wildlife species using images from social mediaits challenges and biases in play

other parameters and if so how Menon has begun to answeringthis question [16] but a lot of work remains to be done

Once we nd out a way to quantify some of these biases andinvestigate the eects of biases on a particular species of animalswe can build an accurate population estimator and adjust the resultsof the estimator for the known biases Moreover social media datacan serve as an augmenting source to more traditional data sourcesor provide the rst baseline where none exists

References[1] T Berger-Wolf J Crall J Holmberg J Parham CV Stewart B Low Mackey P

Kahumbu and D Rubenstein 2016 e Great Grevyrsquos Rally e Need MethodsFindings Implications and Next Steps Technical Report Technical Report GrevyrsquosZebra Trust Nairobi Kenya

[2] M S Bernstein A Monroy-Hernandez D Harry P Andre K Panovich andG G Vargas 2011 4chan andb An Analysis of Anonymity and Ephemeralityin a Large Online Community In ICWSM 50ndash57

[3] D T Bolger T A Morrison B Vance D Lee and H Farid 2012 A computer-assisted system for photographic markndashrecapture analysis Methods in Ecologyand Evolution 3 5 (2012) 813ndash822 DOIhpdxdoiorg101111j2041-210X201200212x

[4] J Brenner and M Duggan 2013 e demographics of social media usersConsultado en (2013)

[5] A Bruns and S Stieglitz 2014 Twier data what do they represent it-Information Technology 56 5 (2014) 240ndash245

[6] M Chap and M Chap 1975 Fallow Deer their history distribution and biology(1975)

[7] J Crall C Stewart T Berger-Wolf D Rubenstein and S Sundaresan 2013Hotspoer - paerned species instant recognition Applications of ComputerVision (WACV) (2013) 230ndash237

[8] eMarketer Statista 2017 Number of social media users worldwide from2010 to 2020 (in billions) (2017) hpswwwstatistacomstatistics278414number-of-worldwide-social-network-users Accessed May 1 2017

[9] Flickr 2005 e App Garden ickrphotossearch (2005) hpswwwflickrcomservicesapiflickrphotossearchhtml Accessed May 14 2017

[10] World Wide Fund for Nature 2016 e Living Planet Report (2016) hpwwfpandaorgabout our earthall publicationslpr 2016 Accessed May 12017

[11] S Gonzalez-Bailon N Wang A Rivero J Borge-Holthoefer and Y Moreno2014 Assessing the bias in samples of large online networks Social Networks 38(2014) 16ndash27

[12] L Hiby W D Paterson P Redman J Watkins S D Twiss and P Pomeroy 2013Analysis of photo-id data allowing for missed matches and individuals identiedfrom opposite sides Methods in Ecology and Evolution 4 3 (2013) 252ndash259 DOIhpdxdoiorg1011112041-210x12008

[13] C Hilton-Taylor 2000 2000 IUCN red list of threatened species IUCN[14] Kleiner Perkins Caueld Byers (KPCB) 2016 Internet Triends 2016 - Code

Conference (2016) hpwwwkpcbcominternet-trends Accessed May 12017

[15] CJ Krebs 1999 Ecological Methodology BenjaminCummings hpsbooksgooglecombooksid=1GwVAQAAIAAJ

[16] S Menon 2017 Animal Wildlife Population Estimation Using Social Media ImagesMasterrsquos thesis University of Illinois at Chicago Chicago IL

[17] C Mora D P Tiensor S Adl A G Simpson and B Worm 2011 How manyspecies are there on Earth and in the ocean PLoS Biol 9 8 (2011) e1001127

[18] F Morstaer J Pfeer and H Liu 2014 When is it biased assessing the repre-sentativeness of twierrsquos streaming API In Proceedings of the 23rd InternationalConference on World Wide Web ACM 555ndash556

[19] A Olteanu C Castillo F Diaz and E Kiciman 2016 Social Data BiasesMethodological Pitfalls and Ethical Boundaries (2016) 1ndash44

[20] S W Pacala and J Roughgarden 1985 Population experiments with the Anolislizards of St Maarten and St Eustatius Ecology 66 1 (1985) 129ndash141

[21] J Parham J Crall C Stewart T Berger-Wolf and D Rubenstein 2017 AnimalPopulation Censusing at Scale with Citizen Science and Photographic Identica-tion In e AAAI 2017 Spring Symposium on AI for Social Good (AISOC)

[22] R Pradel 1996 Utilization of capture-mark-recapture for the study of recruitmentand population growth rate Biometrics (1996) 703ndash709

[23] IUCN Global Species Programme 2017 e IUCN Red List of reatened Specieswebsite (2017) hpwwwiucnredlistorg Accessed May 14 2017

[24] D S Robson and H A Regier 1964 Sample Size in Petersen MarkndashRecaptureExperiments Transactions of the American Fisheries Society 93 3 (1964) 215ndash226 DOIhpdxdoiorg1015771548-8659(1964)93[215SSIPME]20CO2arXivhpdxdoiorg1015771548-8659(1964)93[215SSIPME]20CO2

[25] A S Rodrigues J D Pilgrim J F Lamoreux M Homann and T M Brooks2006 e value of the IUCN Red List for conservation Trends in ecology ampevolution 21 2 (2006) 71ndash76

[26] D Rubenstein B Low Mackey ZD Davidson F Kebede and SRB King 2016Equus grevyi e IUCN Red List of reatened Species eT7950A89624491 (2016)hpdxdoiorg102305IUCNUK2016-3RLTST7950A89624491en

[27] S Y Schoenebeck 2013 e Secret Life of Online Moms Anonymity andDisinhibition on YouBeMom com In ICWSM Citeseer

[28] M Shelton K Lo and B Nardi 2015 Online media forums as separate sociallives A qualitative study of disclosure within and beyond reddit iConference2015 Proceedings (2015)

[29] WildMe 2016 Wildbook (2016) hpwwwwildbookorg Accessed May 12017

Abstract

1 Introduction

2 Methods

21 Estimating population of wildlife species

22 Identifying individual animals using computer vision

3 Experiment

31 Grevys Zebra

32 Data

33 Analysis details

34 Results

4 Discussion and Future Work

References