69
Tanmaya Mahapatra Technische Universität München Fakultät für Informatik Lehrstuhl für Software & Systems Engineering Garching, 05.07.2018 aFlux: Big Data Analytics in IoT Mashup Tools

aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Tanmaya Mahapatra

Technische Universität München

Fakultät für Informatik

Lehrstuhl für Software & Systems Engineering

Garching, 05.07.2018

aFlux: Big Data Analytics in IoT Mashup Tools

Page 2: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• IoT Application Development Landscape & Challenges• IoT Mashup Tools: Examples, Limitations• Introduction to new JVM based Mashup Tool: aFlux• aFlux Internals• Supporting Big Data Analytics• Towards a Unified Approach for querying Big Data• Stream Analytics in aFlux• aFlux: Supporting Spark• aFlux: Supporting Flink• Road ahead

2Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Slide Contents

Page 3: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

IoT Application Development:Landscape & Challenges

Page 4: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

IoT has been defined as the interconnection of ubiquitous computing devices for the realization of value to end users.

• The true potentials of IoT are realized only by its application landscape.• Data collection from devices:

• For analysis to better understand the environmental context.• Task automation for time optimization and enhancing the quality of

human life.

4Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

IoT & Application Landscape

Page 5: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Insights from data can allow the creation of sophisticated & high-impact applications. E.g. Traffic congestion can be avoided by using learned traffic

patterns.

5Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Usage of IoT Data

Mining individual and group preferences

Mining patterns of end-users (mobility models, …)

Analyzing the state of engineering structures (structural health monitoring, …)

Predicting the future state of the physical environment (flood prediction in rivers, …)

Page 6: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Unfortunately, the development of software application for IoT is notsimple.

6Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Difficulties in IoT Application Development

Write complex boiler plate codes to access devices

Perform data mediation

Device identification & Co-ordination

Page 7: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

A mashup is a composite application developed starting from reusable data, application logic, and/or user interfaces typically sourced from the web.

7Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Streamlining Application Development for IoT: Mashups

User Interface

Logic

Data

Page 8: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

IoT Mashup Tools: Examples & Limitations

Page 9: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Mashup Tools typically offer graphical interfaces for specifying the data flow between sensors, actuators & services which lowers the barrier of creating IoT applications for end-users.

9Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

IoT Mashup Tools

Page 10: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

State of the Art of Existing Mashup Tools

Device Management

• Provide simplified mechanisms to register IoT devices to the platform

• Devices can be named, grouped, deleted

Mashup Creation

• Provide a visual programming environmentto combine different services

• Provide support for pseudo codes/code-snippets for enhancing the business logic

Mashup Deployment

• Provide a run-time environment to execute the mashups.• Deployed mashups can be accessed by REST or can be re-

mashed.

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 10

Page 11: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Node-RED is an open-source mashup tool developed by IBM.• Provides a GUI where users drag-and-drop blocks that represent

components of a larger system which can either be devices, software platforms or web services that are to be connected.

• These blocks are called nodes.• A node is a visual representation of a block of JavaScript code designed

to carry out a specific task. • Additional blocks(nodes) can be placed in between these components to

represent software functions that house business logic/data mediation logic.

• With Node-RED the time and effort spent on writing boilerplate code is greatly reduced, and the developer can focus on the business logic.

11Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

IBM Node-RED

Page 12: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

12Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Flow of a Mashup App in Node-RED

Data from Sensors

Data Mediation

Business Logic Services

Mashup Design of a custom Intrusion Detection System from firewall logs

Page 13: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Strengths of Mashup Tools: At a Glance

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 13

Assist the user in application development

Abstracts the low-level complexity

Flexible & Intuitive

Support Integrated administration

Developmental methodology: Mashup + Model-based

Page 14: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

14Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Research Direction: Why Integration is Potentially Beneficial? Mashup

ToolsBig Data

Tools

Complex & massive

parallelism

Good for analysing huge data-sets

Operate on more abstract level & heterogeneous

datasets

Simple & single threaded

Good for specifying control

flow

Operate on specific datasets

Full fledged integration would enable mashup developers to specify Big Data analytics jobs and consume their results within a single

application model.

Page 15: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

15Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Existing Limitations of Mashup Tools

Blocking execution and synchronous communication in mashups

Single-threaded mashups

Visual Programming Limitations

End-user focus in mashup tools

Page 16: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

aFlux:New JVM based Mashup Tool

Page 17: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

17Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

aFlux

IBM Node-RED aFlux

Asynchronous Execution

Multi-threaded Mashup

Support for Big Data Analytic

Towards Unified Big Data querying

Easy Creation of Domain Specific Apps for end-users

Page 18: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• In the actor model, an actor is the foundation of concurrency or rather like an agent which does the actual work.

• It is analogous to a process or thread. • Actors are very different from objects

• an object can interact directly with another object i.e. changing its values or invoking a method.

• This causes synchronization issues in multi-threaded programs • In an actor system there is no direct way to invoke or interact with an

actor. • They respond to messages.

• An actor may change its internal state, perform some computation, fork new actors or send messages to new actors.

18Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

aFlux: Conceptual Approach

Page 19: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

19Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

How actors behave?

Page 20: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• In designing aFlux, we decided to go with akka, a popular actor system. • The main intuition is that when a user designs a flow we model this flow

internally in terms of actors i.e. an actor is a basic execution unit of the mashup tool for designing a flow.

• we have three actors namely A, B and C where the computation starts with actor A. On completion it sends a message to actor B and so on.

• Since the akka system can be configured in many different ways for parallel and distributed operations and it abstracts away how the actors execute within it, it makes the user created program also parallel and distributed.

20Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

aFlux

Page 21: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• In aFlux, a user can create a mashup flow which is called a “flux”. • A flux is analogous to a flow in IBM Node-RED. • The only requirement for designing a flux is that it should have a start

node and an end node.

21Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Execution Pattern

Page 22: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

22Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Asynchronous Execution Patterns

• The synchronous components typically block the execution flow, i.e. when they receive a message on their input port they start execution and pass the message through their output ports on their completion.

Synchronous

• On the other hand, asynchronous-capable components have two different types of output ports namely, blocking and non-blocking ports.

Asynchronous

Page 23: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

23Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Asynchronous Execution Patterns

Page 24: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• To abstract away independent logic within a main application flow, the system supports logical structuring units called sub-flows.

• A sub-flow encompasses a complete business logic and is independent from other parts of the mashup.

• The sub-flows in the system are like normal asynchronous-capable components. They have input and two sets of output ports (i.e. blocking and non-blocking).

• They encompass within themselves a complete flow of graphical components used for Big Data querying.

24Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Logical Structuring Units in aFlux

Page 25: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

25Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

A Sub-Flow can house a complete Flux

Page 26: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Supporting Big Data Analytics in aFlux

Page 27: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• aFlux has graphical components specific to Big Data query languages.

27Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Supporting Big Data Analytics in aFlux

Page 28: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• The last component used in a Big Data query is the Big Data executor• This executor component parses all the prior connections and generates

the final query to be passed on to the Big Data system for execution.

28Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Supporting Big Data Analytics in aFlux

Page 29: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

29Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

A Pig Flux in aFlux

Page 30: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Towards a Unified Approach for querying Big Data

Page 31: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• The graphical components for Big Data queries are tightly coupled to the semantics of the target query language.

• The target languages may evolve and hence we need to create more components• This complicates end-user development

• The idea is to have a set of uniform graphical components for query framing.

• Develop a unified query language to permit end-users to formulate their data queries.

• Internally use translation models to generate queries in different target languages depending on the need as well as context.

• It introduces a layer of abstraction between the language elementsshown on the user interface, i.e. visual elements, and the target querylanguage which is executed on Big Data systems

31Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Towards Unified Analytics

Page 32: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

32Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Unified Analytics: Internals

Page 33: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Stream Analytics in aFlux

Page 34: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

34Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Data Processing

• Streams are NOT collections• Stream is a sequence of ongoing events ordered in time• Streams have no beginning and no end (unbounded data)• Ever-changing sets of data and values always in motion• Traversable only once

Page 35: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

35Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

IoT and Stream Analytics

Page 36: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Stream Analytics platforms

36

Page 37: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Stream analytics is important in the IoT business but− IoT mashups are not optimized for stream analytics− IoT mashup tools (e.g. Node-Red) are single-threaded and

implement a synchronous model

37Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Solution

Page 38: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Benefits:−No complicated set ups−No coding to analyze data streams−Non-experts can use this tool−Short learning curve−Easy to use & user-friendly UI

Goals:1. Implement a stream analytics suite2. Parameterize stream analytics properties3. Evaluate our stream analytics tool

38Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Potential Benefits & Goals

Page 39: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Akka: Library used for stream analytics (Scala & Java)• Higher-level abstraction over Akka’s existing actor model• Implementation of the Reactive Streams specifications

39Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Solution

3 component types:1.Processing: Filter, Count, Moving Average, Max, Min2.Fan-in: Merge, Zip3.Fan-out: Broadcast

Page 40: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

40Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Stream Analytics in aFlux

Page 41: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Stream analytics component architecture

Akka component

Akka Streams component

Akka component

Akka Streams component

41

Page 42: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Ensure that the receiving side is not forced to buffer arbitrary amounts of data. (overflow strategy defined by user, e.g. back-pressure)

• Non-blocking asynchronous communication between components• Allow the queues which mediate between threads to be bounded.

(queue size defined by user, no failures due to big workload of data)− The benefits of asynchronous processing would be negated if the

communication of back-pressure were synchronous (like the single-threaded Node-Red)

Property ParameterizationBuffer size, Overflow strategy, Window parameters

42Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Benefits & Property Parameterization

Page 43: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

43Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Property Parameterization

Page 44: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

44Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Property Parameterization • Buffer size of the source queue• Overflow strategies:

• dropHead• dropTail• dropNew• dropBuffer

• Window parameters:• Window type (content/time)• Window method

(tumbling/sliding)• Window size• Sliding step

Page 45: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Scenario:• A9 – highway in Munich • All cars run in the same direction (South to North)• 3 loop detectors on 3 lanes• 4th lane (shoulder lane) is initially closed• 500th tick à accident happens• When average occupancy rate of loop detectors > 30% à open shoulder-

lane

45Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Evaluation

Page 46: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• aFlux flow for stream processing, • SUMO traffic simulator (Traffic simulation A9 project)

− TraCI + Kafka

46Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Experiment Setup

Page 47: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

• Testing parameters:1. Loop detectors’ occupancy2. Mean speeds of cars3. Lane state4. Time

• Analytics methods tested:1. One-by-one processing (no window)2. Content-based tumbling window processing of size 503. Content-based tumbling window processing of size 3004. Content-based tumbling window processing of size 5005. Content-based sliding window processing of size 500

47Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Testing Parameters & Analytics Methods Used

Page 48: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

48Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Average mean speed of cars: No Window

Page 49: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

49Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Average mean speed of cars: Diff. Windows

Page 50: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Method Responsiveness Settling Time ScalabilityNo Window Very Slow Long LowTumbling 50 Very Fast Very Long Very LowTumbling 300 Slow Very Short HighTumbling 500 Slow None Very HighSliding 500 Slow None Very High

50Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Summary of Results

Page 51: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

aFlux: Supporting Spark

Page 52: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

52Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Enabling Spark Queries in aFlux

Sample Spark Application

Page 53: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

53Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Spark Flow in aFlux

Page 54: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

54Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Typical Structure of Big Data Jobs

Page 55: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

55Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Spark Jobs in aFlux: Concepts

Page 56: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

56Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Zooming into the Core Idea

Page 57: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

aFlux: Supporting Apache Flink

Page 58: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

58Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Apache Flink: Overview

Page 59: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Smart Santander

City-scale experimental research facility• 3000 IEEE 802.15.4 devices• 200 GPRS modules• 2000 joint RFID tag/QR code labels• Static locations (streetlights, façades, bus stops) as well as on-board

of mobile vehicles (buses, taxis)

Integration of Big Data Analytics in IoT Mashup Tools | MahapatraSource: http://smartsantander.eu

59 of 20

Page 60: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Smart Santander

Use cases• Traffic Intensity Monitoring−Around 60 devices located at the main entrances of the city of

Santander • Environmental Monitoring−Around 2000 IoT devices installed mainly at the city center−Sensors installed in around 150 public vehicles, including buses,

taxis and police cars.• Outdoor parking area management• Parks and gardens irrigation

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra60 of 20

Page 61: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Smart Santander

Enabling Stream Analytics in Smart Santander with FlinkMappings of the API required:• DataStream API for stream processing• CEP Library for event processing

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Source: https://flink.apache.org/

61 of 20

Page 62: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Steps: Overview

Task #1: SmartSantander connector for Flink

Task #2: aFlux actors

Task #3: Flink API mapper

Task #4: Front-end Validation

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 62 of 20

Page 63: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Implementation Tasks: aFlux Actors

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Actors are in charge of:• Generating Java code from the user-defined properties• Generating final Flink job• Exchanging messages with previous/posterior actors → FlinkFlowMessage

Set of aFlux tools developed:

63 of 20

Page 64: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Implementation Tasks: aFlux Actors

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

Structure of a Flink job

64 of 20

Page 65: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Implementation Tasks: Front-End Validation

Semantics between nodes:

An array of ToolSemanticsCondition can be defined when creating a new toolErrors are shown to the user when creating the flow

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

<Tool A>should

come (immediately)before

<Tool B>must after

65 of 20

Page 66: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Evaluation

Goal → prove how easy it is to create Flink jobs from aFluxEvaluation based on two use cases:

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 66 of 20

UC1: Real Time Data Processing

AggregateFunction, AllWindowedStream, DataStream, FilterFunction, MapFunction, RichSourceFunction, SlidingWindow, StreamExecutionEnvironment, TumblingWindow

Code Description

UC1E1 Temperature vs. air quality in a certain area in relation with the average of the city

UC1E2 Air quality vs. traffic charge in the city center

UC1E3 Noise vs. traffic charge in the city center

UC1E4 Max/min monitor

UC2: Pattern Detection

DataStream, Pattern, PatternSelectFunction, PatternStream

Code Description

UC2E1 Traffic increasing in a certain area

UC2E2 Heatwave in the city

Page 67: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Evaluation

UC1E1: live data from SmartSantander API (6th June 2018 9am-10am @ 5-min Wind.)

Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra 67 of 20

Page 68: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

Road Ahead …

Page 69: aFlux: Big Data Analytics in IoT Mashup Tools• Introduction to new JVM based Mashup Tool: aFlux • aFlux Internals • Supporting Big Data Analytics • Towards a Unified Approach

69Integration of Big Data Analytics in IoT Mashup Tools | Mahapatra

What’s Next?Current Agenda:

1. Paper in Review: Stream Analytics in IoT Mashup Tools2. 2 Papers in Pipeline: Spark and Flink3. Development of a Uniform Methodology, Thesis Writing4. Thesis Submission Planned in June, 2019.

Thanks -- Questions? Ideas?