128
09.02.2015 Dipl-Inf. (FH) Johannes Hoppe

2015 02-09 - NoSQL Vorlesung Mosbach

Embed Size (px)

Citation preview

09.02.2015Dipl-Inf. (FH) Johannes Hoppe

1. Daten2. Vernetzung

3. Individualisierung

Trends!

Scale-upVertikale SkalierungServer auf mehr Leistungsfähigkeit trimmen

Scale-outhorizontale SkalierungEinfügen von Nodes (Rechnerknoten)

kein relationales Datenmodell (kein SQL)verteilte und horizontale Skalierbarkeitschemafrei / schwache Schemarestriktionenanderes Konsistenzmodelle

Schemafreikein ALTER TABLEkein Wartungsfenster *Datenversionierung im Code!

* morgens ausschlafen

Anforderungenan ein verteiltes System

Consistency

Konsistenz

Availability

Verfügbarkeit

Partition ToleranceAusfalltoleranz

CAP Theorem2000: E. Brewer, N. Lynch

You can satisfyat most 2out of the 3 requirements

ConsistencyThe system is in a consistent state after an operationAll clients see the same dataStrong consistency (ACID)vs. eventual consistency (BASE)

ACID: Atomicity, Consistency, Isolation and Durability

BASE: Basically Available, Soft state, Eventually consistent

Availability

System is “always on”, no downtimeNode failure tolerance– all clients can find some available replicaSoftware/hardware upgrade tolerance

PartitiontoleranceSystem continues to function even when split into disconnected subsets (network disruption)Not only for reads, but writes as well

CAP Theorem CA› Single site clusters

(easier to ensure all nodes are always in contact)

› When a partition occurs, the system blocks

› e.g. usable for two-phase commits (2PC) which already require/use blocks

CAP Theorem CA› Single site clusters

(easier to ensure all nodes are always in contact)

› When a partition occurs, the system blocks

› e.g. usable for two-phase commits (2PC) which already require/use blocks

Obviously, any horizontal scaling strategy is based on data partitioning; therefore, we are forced to decide between consistency and availability.

CAP Theorem CP› Some data may be inaccessible (availability

sacrificed), but the rest is still consistent/accurate

› e.g. sharded database

CAP Theorem AP› System is still available under partitioning,

but some of the data returned my be inaccurate

› Need some conflict resolution strategy

› e.g. Master/Slave replication

“Drum prüfe,wer sich ewig bindet.”Friedrich Schiller

KlassifizierungKey-Value stores RedisDocument stores MongoDB & RavenDBWide Column storesGraph-Datenbankenund viele weitere

Redis

Caching

Queuing

Counting views

Speed

+

PersistenzSnapshotJournal

oder

key

value

customer_22

String,binary safe

key

value

customer_22

key

value Strings

ListenMengen (Sets)

Sortierte Mengen

Hash-Werte(String-Paare)

GET & SETIn der Shell

› SET note1:title "Mittag"

› SET note1:message "nicht vergessen"

› KEYS note1:*

› GET note1:title

› DEL note1:title note1:message

GETmit C# / .NET

Live Demo https://github.com/JohannesHoppe/WebNoteNoSQL

RavenDB

JSON Transactional

LINQ Lucene.NET first AGPL / dual

RavenDbWritten by Oren Eini aka Ayende Rahien

› Hibernating Rhinos› Rhino Mocks & Rhino.ServiceBus

Written in C#

DeploymentGet it via NuGetChange defaults in Raven.Server.exe.config

› It’s safe by default

Just run the Raven.Server.exe in the /server/ folder

Units› Documents

› Collections

› Indexes

› Attachments

Safe by defaultUseful defaults

› E.g. Limited page size – No Accidental SELECT *

ACID (Transactional) *

Designed to “just work”Schema Free

› Hardly any mapping required› dynamic (C# 4) yields great power

Designed to “just work” (with .NET)

Fluent APIUnit of Work Pattern

Extensible – Plugin Support

Makes developers happy› Testable

› Interfaces all over› In-Memory Database

› Extensible – Plugin Support

In Memory InstanceEmbedded Mode

using (var documentStore = new EmbeddableDocumentStore{  RunInMemory = true}.Initialize()){    using (var session = documentStore.OpenSession()) {     // Run complex test scenarious    }}

APIs › Native .NET Client API

› HTTP API (Pseudo REST)

Indexes› Written as Linq Queries

› Indexed with Lucene .NET

› Lucene Syntax for querying

“While being RESTful is a goal of the HTTP API, it is secondary to the goal of exposing easy to use and powerful functionality”

Ayende Rahien on the HTTP API - http://ravendb.net/documentation/docs-http-api-restful

HTTP API› Caching

› E-Tags

› Lucene Queries possible

C:\>curl -X GET http://localhost:8080/docs/Categories/1 -iHTTP/1.1 200 OKContent-Type: application/json; charset=utf-8ETag: 00000000-0000-0200-0000-000000000004{

"Name" : "Normal Importance","Color" : "green"

}

MongoDB

BIGdata

Scale-outhorizontale SkalierungEinfügen von Nodes (Rechnerknoten)

Database Timeline

IBM’s IMS

Codd publishes relational model paper

in 1970

1966 1969 1970 1985 2000 2004 2007

Agile becoming more popular

1990’s 2009

CODASYL model published

Term “object-oriented database” appears

Brewer’s CAP born

Google BigTable

Amazon Dynamo

Apache Cassandra initial release

2008

MongoDB initial release

1973 1974

INGRES

SQL invented

1977

Oracle founded

10gen founded

NoSQL Movement

NoSQL

MongoDB Quick Reference Cardshttp://www.10gen.com/reference

BSON Master/SlaveJavaScript C# DriverSharding GNU AGPL*

“Deployment”› Standardverzeichnis erstellen:

c:\data\db

› Server-Start: mongod.exe

› Shell: mongo.exe

CRUD – CreateIn der Shell› use WebNote

› db.Notes.save( { Title: 'Mittag', Message: 'nicht vergessen‘ });

So funktioniert der Befehl› db.Notes.save

CRUD – Create…with a bit JavaScript

for(i=0; i<1000; i++) { ['quiz', 'essay', 'exam'].forEach(function(name) { var score = Math.floor(Math.random() * 50) + 50; db.scores.save({student: i, name: name, score:

score}); }); } db.scores.count();

CRUD – ReadQueries werden ebenso im Dokument-Stil

spezifiziert

› db.Notes.find();

› db.Notes.find({ Title: /Test/i });

› db.Notes.find( { "Categories.Color": "red"}).limit(1);

CRUD – Update

› db.Notes.update({Title: 'Test'}, {'$set': {Categories: []}});

› db.Notes.update({Title: 'Test'}, {'$push': {

Categories: {Color: 'Red'} } });

CRUD – Delete

› db.dropDatabase();

› db.Notes.drop();

› db.Notes.remove();

C# Driver

Live Demo https://github.com/JohannesHoppe/WebNoteNoSQL

Consistency

Anforderungenan ein verteiltes System

Consistency

Konsistenz

Availability

Verfügbarkeit

Partition ToleranceAusfalltoleranz

C# Driver

Strongconsistency

Eventuallyconsistency

Read

Write

Primary

Secondary

Secondary

Read

Strong Consistency

C# Driver

Eventual Consistency

Primary

Secondary

Secondary

Read

Write

Read

C# Driver

Sharding

Primary

C# Driver Primary

Primary

C# DriverFire and forgetWait for errorWait for fsyncWait for journal syncWait for replication

Write

Atomic!

kein relationales Datenmodell (kein SQL)verteilte und horizontale Skalierbarkeitschemafrei / schwache Schemarestriktionenanderes Konsistenzmodell

Hands ON!

Data Import(hands-on.zip)

cd dump_trainingmongorestore -d training -c scores scores.bson

cd dump_diggmongorestore -d digg -c stories stories.bson

Test(in the shell)

use diggdb.stories.findOne();

Exercises

1. Find all scores less than 65.

2. Find the lowest quiz score. Find the highest quiz score.

3. Write a query to find all digg stories where the view count is greater than 1000.

4. Query for all digg stories whose media type is either 'news' or 'images' and where the topic name is 'Comedy’.

5. Find all digg stories where the topic name is 'Television' or the media type is 'videos'. Skip the first 5 results, and limit the result set to 10.

CRUD – Update

› use digg;

› db.people.update({name: 'Smith'}, {'$set': {interests: []}});

› db.people.update({name: 'Smith'}, {'$push': {interests:

['chess']}});

Exercises

1. Set the proper 'grade' attribute for all scores. For example, users with scores greater than 90 get an 'A.' Set the grade to ‘B’ for scores falling between 80 and 90.

2. You're being nice, so you decide to add 10 points to every score on every “final” exam whose score is lower than 60. How do you do this update?

“MapReduce is the Uzi of aggregation tools. Everything described with count, distinct and group can be done with MapReduce, and more.”

Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide

Map Reduce

2

1

3

2

1

3

Input data Intermediate data Output dataMAP REDUCE

MapReduceTo use map-reduce, you first write a map function.

var map = function() {emit(this.user.name, {diggs: this.diggs, posts: 0});

};

MapReduceThe reduce functions then aggregation those docs

by key.

var reduce = function(key, values) { var diggs = 0; var posts = 0; values.forEach(function(doc) { diggs += doc.diggs; posts += 1; }); return {diggs: diggs, posts: posts};};

MapReduceNow both are used to perform custom aggregation.

db.stories.mapReduce(map, reduce, {out: 'digg_users'});

db.digg_users.find();

Vorsicht mein

Freund!

“MapReduce is slower and is not supposed to be used in ‘real time’. You ran MapReduce as a background job.”

Kristina Chadorow, Michael Dirolf in MongoDB – The Definitive Guide

SchemaDesign

BSONhttp://bsonspec.org

JSON

JSON BSON

All JSON documents are stored in a binary format called BSON. BSON supports a richer set of types than JSON.http://bsonspec.org

Terminologie

RDBMS MongoDB

Table Collection

Row(s) JSON Document

Index Index

Join Embedding & Linking

Partition Shard

Partition Key Shard Key

Schema DesignRelationale Datenbank

Schema DesignDokumentenbasierte DB

embedding

Schema DesignDokumentenbasierte DB

embedding

linking

Schema DesignDokumentenbasierte DB

Patterns

Vererbung

Vererbung - Tabelle

id type area radius length width

1 circle 3.14 1 NULL NULL

2 square 4 NULL 2 NULL

3 rect 10 NULL 5 2

Vererbung - Dokument

> db.shapes.find()

› { _id: "1", type: "c", area: 3.14, radius: 1}

› { _id: "2", type: "s", area: 4, length: 2}

› { _id: "3", type: "r", area: 10, length: 5, width: 2}

// Shapes mit radius > 0 finden> db.shapes.find( { radius: { $gt: 0 } } )

One to Many

One to ManyEmbedded Array

blogs: { author : “Johannes", date : ISODate("2011-09-18T09:56:06.298Z"), comments : [

{author : “Klaus",date : ISODate("2011-09-

19T09:56:06.298Z"),text : “toller Artikel"

} ]}

allesist erlaubt!

One to ManyNormalisiert (2 Collections)

blogs: { _id: 1000, author: “Johannes", date: ISODate("2011-09-18"), comments: [ {comment : 1)} ]}

comments : { _id : 1, blog: 1000, author : “Klaus", date : ISODate("2011-09-19")}

> blog = db.blogs.find({ text: "Destination Moon" });> db.comments.find( { blog: blog._id } );

Many - Many

// Jedes Produkt verlinkt die IDs der Kategorienproducts:

{ _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

Many - Many

// Jedes Produkt verlinkt die IDs der Kategorienproducts:

{ _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

// Jede Kategorie verlinkt die IDs der Produktecategories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }

categories: { _id: 21, name: "movie", product_ids: [ 10 ] }

Many - Many

// Jedes Produkt verlinkt die IDs der Kategorienproducts:

{ _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

// Jede Kategorie verlinkt die IDs der Produktecategories: { _id: 20, name: "adventure", product_ids: [ 10, 11, 12 ] }

categories: { _id: 21, name: "movie", product_ids: [ 10 ] }

// Alle Kategorien für ein Produkt> db.categories.find( { product_ids: 10 } )

Many - Many

allesist erlaubt!

// Jedes Produkt verlinkt die IDs der Kategorienproducts:

{ _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

// Kategorien beinhalten keine Assoziationencategories: { _id: 20, name: "adventure"}

Alternative: Many - Many

// Jedes Produkt verlinkt die IDs der Kategorienproducts:

{ _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

// Kategorien beinhalten keine Assoziationencategories: { _id: 20, name: "adventure"}

// Alle Produkte für eine Kategorie> db.products.find( { category_ids: 20 } )

Alternative: Many - Many

// Jedes Produkt verlinkt die IDs der Kategorienproducts:

{ _id: 10, name: "Destination Moon", category_ids: [ 20, 30 ] }

// Kategorien beinhalten keine Assoziationencategories: { _id: 20, name: "adventure"}

// Alle Produkte für eine Kategorie> db.products.find( { category_ids: 20 } )

// Alle Kategorien für ein Produkt product> product = db.products.find( { _id: some_id } )> db.categories.find({_id: {$in : product.category_ids}})

Alternative: Many - Many

JSON = BSONBSON in, BSON inside, BSON outEmbedding oder LinkingAlles ist erlaubt *

Software Tests

your codeis broken

…until proven otherwise!

Unit TestChecklist

1.

deserialize2. map-reduce

3. queries

Most important things to test.

ExternalDependenci

esIntegration Tests

“Integration Tests are a Scam” J.B. Rainsberger

Usual Problemswith Integration Tests

false redunpredictable, network down,software updates…

1.

Usual Problemswith Integration Tests

long runningslow feedbackno feedackfalse security

2.

Usual Problemswith Integration Tests

bad designexcessive setupAAA AAAAAAA

hides defects

3.

Usual Problemswith Integration Tests

comfortablemanagers, business constraints,pragmatic solutions, own laziness…Bugs will come back to haunt you!

4.

Solutions

Or better: how to reduce the amount of problems

false red1.

Express

long running2.bad design3.

In Memory InstanceEmbedded Mode

In Memory InstanceEmbedded Mode

using (var documentStore = new EmbeddableDocumentStore{  RunInMemory = true}.Initialize()){    using (var session = documentStore.OpenSession()) {     // Run complex test scenarious    }}

VielenDank!

NoSQL: Einstieg in die Welt nicht-relationaler Web 2.0 Datenbanken

MongoDB:The Definitive Guide

MongoDB in ActionRavenDB Mythology Documentationhttps://s3.amazonaws.com/daily-builds/RavenDBMythology-11.pdf

Bildnachweise

Bug © 123RF Stock FotoCloud web © vege – Fotolia.comRace car - red and black © braverabbit – Fotolia.comPC - Computerkomponenten - Icons Nr. 1 © vanhorden – Fotolia.comDer Ordner © beermedia – Fotolia.comAusgewählter Ordner © Spectral-Design – Fotolia.comfunny cartoon builder © artenot – Fotolia.com3D rendering of an architecture model 2 © Franck Boston –

Fotolia.com

Alle verwendeten Logos und Markenzeichensind Eigentum ihrer eingetragenen Besitzer.

PAUSE!