Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Olivia Klose | Technical Evangelist, Microsoft
@oliviaklose
blogs.technet.com/oliviaklose
Meet Olivia | @oliviaklose
• Microsoft Technical Evangelist– Fokus: Big Data, Hadoop, Hive, etc.
• Machine Learning– Informatik mit Mathematik an der University of Cambridge, TU
München und dem IIT Bombay
– Medizinische Bildgebung
– Nuklearmedizinische Klinik in München
• IT Erfahrungen in Großunternehmen
Agenda
Modul Inhalt
1 Intro & Big Data Buzzwords
- Big Data, Hadoop, MapReduce, HDInsight
2 Big Data Szenario: Twitter-Analyse
3 Manage: Daten extrahieren und speichern- Windows Azure Blob Storage, Windows Azure SQL Database, VM
4 Analyse: Daten analysieren
- HDInsight, Hive
5 Insights: Erkenntnisse aus Daten gewinnen
- ODBC Treiber, PowerPivot & PowerView
Modul 1
Intro & Big Data Buzzwords
• Big Data
• Hadoop
• MapReduce
• HDInsight
Was ist Big Data?
Modul 1 – Intro & Big Data Buzzwords
Der Large Hadron Collider
(Teilchenbeschleuniger am CERN)
produziert 15 PB/Jahr
http://home.web.cern.ch/about/computing
Aber was, wenn ich keinen
Large Hadron Collider besitze…
Großfabrik
Fuhrpark
Smart Grids
Ökostrom
Aktienbörse
Host Protocols
Rechenzentren
Serverfarm
Google Analytics
…
Vielleicht Daten von…
“Big data is a term describing
the storage and analysis of
large and/or complex data sets
using a series of techniques
including, but not limited to:
NoSQL, MapReduce and machine learning.”
http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/
arxiv.org/abs/1309.5821
“Big data is high-volume,
high-velocity and/or
high-variety information assets
that require new forms of
processing to enable
enhanced decision making,
insight discovery and
process optimization.”
Gartner ‘s Definition of Big Data
Laney, Douglas. The Importance of “Big Data”: A Definition. Gartner. Abgerufen 21. Juni 2012.
Die 3 Vs
MB
GB
TB
PB
batch
periodic
real
time
table
data
base
un-
struc-
tured
web
Big Data, Gesellschaft für Informatik, 2013,http://www.gi.de/service/informatiklexikon/
detailansicht/article/big-data.html
In eigenen Worten…
Big Data umfasst
große und unstrukturierte
Datenvolumen aus
unterschiedlichen Datenquellen,
die in kürzester Zeit erzeugt
und analysiert werden.
Was ist Hadoop?
Modul 1 – Intro & Big Data Buzzwords
Historie
2002 2004 2006
Nutch
Doug Cutting | New York Times, 16 March 2009,
http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html
Historie
2002 2004 2006
Nutch
GFS NDFS
Doug Cutting | New York Times, 16 March 2009,
http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html
Historie
2002 2004 2006
Nutch
GFS NDFS
MapReduceNutch
MapReduce
Doug Cutting | New York Times, 16 March 2009,
http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html
Historie
2002 2004 2006
Nutch
GFS NDFS
MapReduceNutch
MapReduce Hadoop
Doug Cutting | New York Times, 16 March 2009,
http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html
Hadoop Komponenten
MapReduce
Was ist HDInsight?
Modul 1 – Intro & Big Data Buzzwords
HDInsight
LegendRed = Core HadoopBlue = Data processingGreen = PackagesDark blue = Microsoft integration points and value addsOrange = Data Movement
HDInsight / Hadoop architecture
Distributed Storage
(HDFS)
Distributed Processing
(MapReduce)
Agenda
Modul Inhalt
1 Intro & Big Data Buzzwords
- Big Data, Hadoop, MapReduce, HDInsight
2 Big Data Szenario: Twitter-Analyse
3 Manage: Daten extrahieren und speichern- Windows Azure Blob Storage, Windows Azure SQL Database, VM
4 Analyse: Daten analysieren
- HDInsight, Hive
5 Insights: Erkenntnisse aus Daten gewinnen
- ODBC Treiber, PowerPivot & PowerView
©2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.