Olivia Klose | Technical Evangelist, Microsoft ...download.microsoft.com/.../1_Intro.pdf · 1 Intro...

Preview:

Citation preview

Olivia Klose | Technical Evangelist, Microsoft

@oliviaklose

blogs.technet.com/oliviaklose

Meet Olivia | @oliviaklose

• Microsoft Technical Evangelist– Fokus: Big Data, Hadoop, Hive, etc.

• Machine Learning– Informatik mit Mathematik an der University of Cambridge, TU

München und dem IIT Bombay

– Medizinische Bildgebung

– Nuklearmedizinische Klinik in München

• IT Erfahrungen in Großunternehmen

Agenda

Modul Inhalt

1 Intro & Big Data Buzzwords

- Big Data, Hadoop, MapReduce, HDInsight

2 Big Data Szenario: Twitter-Analyse

3 Manage: Daten extrahieren und speichern- Windows Azure Blob Storage, Windows Azure SQL Database, VM

4 Analyse: Daten analysieren

- HDInsight, Hive

5 Insights: Erkenntnisse aus Daten gewinnen

- ODBC Treiber, PowerPivot & PowerView

Modul 1

Intro & Big Data Buzzwords

• Big Data

• Hadoop

• MapReduce

• HDInsight

Was ist Big Data?

Modul 1 – Intro & Big Data Buzzwords

Der Large Hadron Collider

(Teilchenbeschleuniger am CERN)

produziert 15 PB/Jahr

http://home.web.cern.ch/about/computing

Aber was, wenn ich keinen

Large Hadron Collider besitze…

Großfabrik

Fuhrpark

Smart Grids

Ökostrom

Aktienbörse

Host Protocols

Rechenzentren

Serverfarm

Twitter

Facebook

Google Analytics

Vielleicht Daten von…

“Big data is a term describing

the storage and analysis of

large and/or complex data sets

using a series of techniques

including, but not limited to:

NoSQL, MapReduce and machine learning.”

http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/

arxiv.org/abs/1309.5821

“Big data is high-volume,

high-velocity and/or

high-variety information assets

that require new forms of

processing to enable

enhanced decision making,

insight discovery and

process optimization.”

Gartner ‘s Definition of Big Data

Laney, Douglas. The Importance of “Big Data”: A Definition. Gartner. Abgerufen 21. Juni 2012.

Die 3 Vs

MB

GB

TB

PB

batch

periodic

real

time

table

data

base

un-

struc-

tured

web

Big Data, Gesellschaft für Informatik, 2013,http://www.gi.de/service/informatiklexikon/

detailansicht/article/big-data.html

In eigenen Worten…

Big Data umfasst

große und unstrukturierte

Datenvolumen aus

unterschiedlichen Datenquellen,

die in kürzester Zeit erzeugt

und analysiert werden.

Was ist Hadoop?

Modul 1 – Intro & Big Data Buzzwords

Historie

2002 2004 2006

Nutch

Doug Cutting | New York Times, 16 March 2009,

http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html

Historie

2002 2004 2006

Nutch

GFS NDFS

Doug Cutting | New York Times, 16 March 2009,

http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html

Historie

2002 2004 2006

Nutch

GFS NDFS

MapReduceNutch

MapReduce

Doug Cutting | New York Times, 16 March 2009,

http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html

Historie

2002 2004 2006

Nutch

GFS NDFS

MapReduceNutch

MapReduce Hadoop

Doug Cutting | New York Times, 16 March 2009,

http://www.nytimes.com/imagepages/2009/03/16/business/17cloud.2.inline.ready.html

Hadoop Komponenten

MapReduce

Was ist HDInsight?

Modul 1 – Intro & Big Data Buzzwords

HDInsight

LegendRed = Core HadoopBlue = Data processingGreen = PackagesDark blue = Microsoft integration points and value addsOrange = Data Movement

HDInsight / Hadoop architecture

Distributed Storage

(HDFS)

Distributed Processing

(MapReduce)

Agenda

Modul Inhalt

1 Intro & Big Data Buzzwords

- Big Data, Hadoop, MapReduce, HDInsight

2 Big Data Szenario: Twitter-Analyse

3 Manage: Daten extrahieren und speichern- Windows Azure Blob Storage, Windows Azure SQL Database, VM

4 Analyse: Daten analysieren

- HDInsight, Hive

5 Insights: Erkenntnisse aus Daten gewinnen

- ODBC Treiber, PowerPivot & PowerView

©2013 Microsoft Corporation. All rights reserved. Microsoft, Windows, Office, Azure, System Center, Dynamics and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Recommended