29
© Fraunhofer IESE Big Data meets Big Data Wie die Integration von Big-Data-Lösungen über Unternehmensgrenzen gelingt Torsten Lenhart Fraunhofer IESE Architekturen 2016 Hildesheim, 23.06.2016

Big Data meets Big Data - Gesellschaft für Informatik...Spark vs Flink –High-Level Comparison Flink Origin TU Berlin University of California, Berkeley Execution Model Directed

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

© Fraunhofer IESE

Big Data meets Big DataWie die Integration von Big-Data-Lösungen über Unternehmensgrenzen gelingt

Torsten LenhartFraunhofer IESE

Architekturen 2016Hildesheim, 23.06.2016

© Fraunhofer IESE

The Fraunhofer-Gesellschaft at a Glance

The Fraunhofer-Gesellschaft undertakes applied research of direct utility to private and public enterprise and of wide benefit to society.

24,000 staff

More than 70%is derived from contracts with industry and from publicly financed research projects.

Almost 30%is contributed by the German federal and Länder Governments.

67 institutes and research units Fin

an

ce v

olu

me

€2.1 billion

2015

Co

ntr

act

Rese

arc

h

€1.8 billion

Major infrastructure capital expenditure and defense research

© Fraunhofer IESE

Fraunhofer IESE The institute for software and systems engineering methods

Founded in 1996, headquartered in Kaiserslautern

Over 155 full-time equivalents (FTEs)

Our solutions can be scaled flexibly and are suitable for companies of any size

Our most important business areas:

Automotive and Transportation Systems

Automation and Plant Engineering

Health Care

Information Systems

Energy Management

E-Government

© Fraunhofer IESE

Our Competencies – for Your Benefit

SOFTWARE-ENABLED INNOVATIONS

for

innovative

systems

© Fraunhofer IESE

Our Competencies – for Your Benefit

SOFTWARE-ENABLED INNOVATIONS

IS /MobileES/CPS Smart Ecosystems

© Fraunhofer IESE

Data

Data

DataData

Motivation

CTR SE

ABC AG

BSP GmbH

MTR Inc

© Fraunhofer IESE

Motivation

CTR SE

ABC AG

BSP GmbH

MTR Inc…

Car Manufaturer (OEM)

1st Tier Suppliers

2nd Tier Suppliers

… … … 3rd Tier Suppliers

© Fraunhofer IESE

The research project PRO-OPT

Evaluation Partner and Data Supplierfor Automotive Diagnostics

Technology and Evaluation Partner for Production Systems

Research Partner for Data Mining andIntegration of System Components

Project Lead, Technology andEvaluation Partner for Automotive Diagnostics

Technical Project Lead, Research Partner for Access Restriction, SW Architecture & Data Quality

Multiplicator

Data Supplier andEvaluation Partner

Partners

Visualization

PRO-OPT aims at identifying valuable data and making it available for

creating additional benefit for all members of a Smart Ecosystem.

Project Duration: 01.01.2015 - 31.12.2017

PRO-OPT @ CeBIT Hannover, 14.03. – 18.03.2016

How can we bridge the isolated data islands of the partners in the ecosystem so that analysis can be

performed across these islands while the original owner of the data keeps control of it?

Main Challenge 1: Build a Technology Bridge

© Fraunhofer IESE

Main Challenge 2: Data Usage Control

Data is the DNA of a company

There are common, but also conflicting interests in the ecosystem

Data access is often perceived as a binary decision

→ Fine granular access policies with additional control and protection mechanisms are a key success factor

Data

© Fraunhofer IESE

Main Challenge 3: Substantiation of Benefits

High-Level: Everyone agrees

Concrete use cases are sometimes difficult to define

But they are needed to justify investments & compromises

A research project is well suited to resolve this deadlock

Without access tothe data, we

cannotsubstantiate the

benefits

Without knowingthe concrete

benefits, I don‘tallow access to my

data

Step 0: Gathering an holistic overview

© Fraunhofer IESE

PRO-OPT Big Data Landscape

Step 1:

Identify ing potential pillarsfor the technology brigde

© Fraunhofer IESE

Observation

Rise of multi-purpose data processing engines:

© Fraunhofer IESE

Spark vs Flink – High-Level Comparison

Flink

Origin TU Berlin University of California, Berkeley

Execution Model Directed Acyclic Graphs (DAGs) Directed Acyclic Graphs (DAGs)

Streaming Support Native Micro-Batches

Latest Stable Release 1.0.3 (11.05.2016) 1.6.1 (09.03.2016)

Spreading

Contributors ~ 200 ~ 900

Sprachen Scala, Java Scala, Python, (Java)

Project Homepage http://flink.apache.org http://spark.apache.org

Backing Company http://data-artisans.com https://databricks.com

Step 2:

Defining the overall architecture

© Fraunhofer IESE

High-Level PRO-OPT Platform Architecture

Company nCompany 1

PRO-OPT Platform

PRO-OPT Connector

Catalog

PRO-OPT Frontend

Cluster

PRO-OPT API

PEPs PDP

Company k

Catalog

FlinkCluster

Catalog

Cluster

PRO-OPT Bridge

Spark Adapter

PRO-OPT Connector

PEPs PDP

Flink Adapter

PRO-OPT Connector

PEPs PDP

Spark Adapter

© Fraunhofer IESE

PRO-OPT Components

UI for general analysis and reporting features

Also used for catalog and platform management

Interface for defining PRO-OPT programs

Inspired by the Spark and Flink API – but detailed structure is work in progress

Acts as some kind of pre-compiler and dispatcher

Conveys the PRO-OPT programs into one or multiple Spark and/or Flink programs

Passes the Spark and/or Flink programs together with the identity of theoriginator to the respective PRO-OPT connectors

Collects the results from the connectors and applies some additional processingto generate the final result

PRO-OPT Frontend

PRO-OPT API

PRO-OPT Bridge

© Fraunhofer IESE

PRO-OPT Components

Contains a list & description of all data sources of this ecosystem member

Private: Policy for each data source with fine granular usage rules:

Who is allowed to use the data?

How often can it be used?

Wich parts can be used?

What parts can be actually returned?

Do additional measures have to be applied (e.g. pseudonymisation)?

...

Catalogue

Catalogue Datenquellen

Public

Private

DatenquellenData Sources

Descriptor

Metadata Policies

© Fraunhofer IESE

PRO-OPT Components

Receives Spark and/or Flink programs (dependent on the cluster that is installed at this particular ecosystem member)

Performs a pre-processing step (e.g. replacing data source ids with actual addresses)

Enforces the data usage control rules defined through the repective policies by applying Policy Enforcement Points (PEP) and Policy Decision Points (PDP)

Data usage control is based on the Fraunhofer IESE IND2UCE Framework

PRO-OPT Connector

Integrated Distributed Data Usage Control Enforcement

© Fraunhofer IESE

Sample Scenario: Warranty Claim Process

OEM(Car Manufacturer)

Supplier

Warranty Claim

Compensation orRejection

warranty_claims_oem.csv:

vin;itemid;description;…

XYZ000051T2123456;I2726373;…;…

XYZ000068K3526889;I9102889;…;…

XYZ000052T2671171;I2727384;…;…

?

ProblemAnalysis

Cassandra table items:

type desc part1

P2728-C … … …

type desc part1 part2 part3

P2231-A … … … …

type desc part1 part2

P2728-C … … …

I2726373

I2724415

I2726329

© Fraunhofer IESE

OEM Catalog

OEM Catalog

DataSource inspection_logs

DataSource quantities

{

"id": 126,

"name":"configs",

"description":"Information on car orders",

"type":"structured",

"columns":{

"vin":{

"description":"Vehicle Identification Number",

"type":"String"

},

"dealer":{

"description":"Id of the dealer",

"type":"integer"

},

"enginetype":{

"description":"Type of the engine (diesel, petrol, etc)",

"type":"String"

}, […]

},

"restrictions":{ […] }

}

DataSource configs

© Fraunhofer IESE

OEM Catalog

OEM Catalog

DataSource inspection_logs

DataSource quantities

{

"id": 126,

[…]

"restrictions":{

"limitrows": {

"max": 100

},

"frequency": {

"maxperday": 1,

"maxpermonth": 10

},

„pseudonymize": {

"column" : "dealer"

}

}

}

DataSource configs

© Fraunhofer IESE

Sample Scenario: Warranty Claim Process

OEM(Car Manufacturer)

Zulieferer

Warranty Claim

Compensation orRejection

warranty_claims_oem.csv:

vin;itemid;description;…

XYZ000051T2123456;I2726373;…;…

XYZ000068K3526889;I9102889;…;…

XYZ000052T2671171;I2727384;…;…

ProblemAnalysis

Cassandra table items:

type desc part1

P2728-C … … …

type desc part1 part2 part3

P2231-A … … … …

type desc part1 part2

P2728-C … … …

I2726373

I2724415

I2726329

OEM

(Automobilhersteller)

Data source configs:

VIN Dealer Enginetype …

XYZ000051T2123456 122 Diesel …

XYZ000051T2123457 256 Petrol …

XYZ000051T2123458 122 Perrol …

© Fraunhofer IESE

Sample Program

PR

O-O

PT

Sup

plierO

EMP

RO

-OP

T

Read warranty_claims_oem.csv → ProOptDataSet<String> input

Map input → ProOptDataSet<WarrantyClaim> claims

Extract List<String> vins from claims Extract List<String> itemIds from claims

Filter ordersDF by vins → ordersDF

Collect ordersDF → Row[] orders

Join orders, items, claims → ProOptDataSet<WarrantyClaim,Order,Item> result

Filter itemsDF by itemIds → itemsDF

Collect itemsDF → Row[] items

Anonymize column dealer → ordersDF

Limit orders → orders

Read source with id 126 → DataFrame ordersDF Read source with id 89 → DataFrame itemsDFRead source from path → DataFrame itemsDFRead source from path → DataFrame ordersDF

Analyze result → ECU for petrol engines used

for Diesel engine

Creates or invokesPRO-OPT Program

CSV

© Fraunhofer IESE

Status & Outlook

Implementation

UI, API & Bridge (ongoing)

Connector (ongoing)

Spark Adaptor (ongoing)

Flink Adaptor (starts soon)

Extension of Catalog & Policy Language

Other Topics:

Data Quality

Crowd Sourcing

Not in scope for now: privacy issues

© Fraunhofer IESE

Contact

Torsten Lenhart

[email protected]

Fraunhofer IESE

Fraunhofer-Platz 1

67663 Kaiserslautern

Germany

Phone +49 631/6800-0

www.iese.fraunhofer.de

© Fraunhofer IESE

Image Credits

Slide Description Author Link/Information License

1 Data World Map Geralt https://pixabay.com/de/bin%C3%A4r-eins-null-kontinente-erde-1414315/ CC0 1.0

7 Car DSA GmbH © DSA GmbH – used with kind approval of DSA GmbH Proprietary

9 Golden Gate Unsplash https://pixabay.com/de/golden-gate-br%C3%BCcke-san-francisco-388917/ CC0 1.0

10 Remote control JJuni https://pixabay.com/en/remote-control-one-trillion-kinds-1143461/ CC0 1.0

10 Crown OpenClipartVectors https://pixabay.com/de/krone-golden-gelb-kaiser-zubeh%C3%B6r-576226/ CC0 1.0

11 Chicken silhouette ClkerFreeVectorImages https://pixabay.com/de/hen-huhn-gefl%C3%BCgel-bauernhof-tier-311285/ CC0 1.0

12 Earth from space Skeeze https://pixabay.com/en/panorama-earth-canada-landscape-1241289/ CC0 1.0

14 Under the bridge Unsplash https://pixabay.com/en/bridge-river-under-cityscape-1149241/ CC0 1.0

17 Blueprint Wokandapix https://pixabay.com/en/blueprint-ruler-architecture-964630/ CC0 1.0

22 Sheet icon Paomedia http://www.iconarchive.com/show/small-n-flat-icons-by-paomedia.html PD

22 Factory IconsMind https://www.iconsmind.com/ Custom

22 ECU DSA GmbH © DSA GmbH – used with kind approval of DSA GmbH Proprietary

22 Test bed DSA GmbH © DSA GmbH – used with kind approval of DSA GmbH Proprietary

26 Bulb IconLeakhttp://iconleak.com http://www.iconarchive.com/show/or-icons-by-iconleak/light-bulb-icon.html

Custom

27 Fernrohr Hans https://pixabay.com/de/fernrohr-durchblick-aussicht-blick-122960/ CC0 1.0