Monthly Archives: April 2016

Hadoop file format comparison

Use case and environment IoT datalake use case. 6000 devices (with unique ID), measuring 3 values 60 time per second (60 Herz). One day of data (24 hours) – 31.104.000.000 records in database. Row in a table – [ID:int, timestamp:long,

Veröffentlicht in Allgemein, BigData, DWH

Hive vs Spak vs Impala

Hive 0.13 Spark 1.6 Impala 2.1 Support Hortonworks + Yahoo DataBricks + Yahoo Cloudera Cluster Management YARN YARN, Mesos, local YARN (Llama) Engine MR, Tez Spark impalad Where are tables stored HDFS HDFS (through Hive Metastore). Distributed shared object space

Veröffentlicht in Allgemein, BigData