Yearly Archives: 2016

Hadoop stream processing

Apache Storm Spark Apache Samza Apache Flink Apache Apex Developer Hortonworks (Twitter) Databricks  LinkedIn dataArtisans DataTorrent Computation model Storm – streaming Trident – micro-batching Micro-batching Streaming Streaming or batching Streaming with time boundaries API Storm – programmatic Trident – declarative

Veröffentlicht in Allgemein, BigData, Java, Messaging

Hadoop file format comparison

Use case and environment IoT datalake use case. 6000 devices (with unique ID), measuring 3 values 60 time per second (60 Herz). One day of data (24 hours) – records in database. Row in a table – [ID:int, timestamp:long,

Veröffentlicht in Allgemein, BigData, DWH

Hive vs Spak vs Impala

Hive 0.13 Spark 1.6 Impala 2.1 Support Hortonworks + Yahoo DataBricks + Yahoo Cloudera Cluster Management YARN YARN, Mesos, local YARN (Llama) Engine MR, Tez Spark impalad Where are tables stored HDFS HDFS (through Hive Metastore). Distributed shared object space

Veröffentlicht in Allgemein, BigData