Tag Archives: HBase I/O

How-to: Manage Time-Dependent Multilayer Networks in Apache Hadoop

Categories: Graph Processing Hadoop Use Case

Using an appropriate network representation and the right tool set are the key factors in successfully merging structured and time-series data for analysis.

In Part 1 of this series, you took your first steps for using Apache Giraph, the highly scalable graph-processing system, alongside Apache Hadoop. In this installment, you’ll explore a general use case for analyzing time-dependent, Big Data graphs using data from multiple sources.

Read more

Introduction to Apache HBase Snapshots, Part 2: Deeper Dive

Categories: HBase

In Part 1 of this series about Apache HBase snapshots, you learned how to use the new Snapshots feature and a bit of theory behind the implementation. Now, it’s time to dive into the technical details a bit more deeply.

What is a Table?

An HBase table comprises a set of metadata information and a set of key/value pairs:

  • Table Info: A manifest file that describes the table “settings”,

Read more

Apache HBase I/O – HFile

Categories: HBase

Introduction

Apache HBase is the Hadoop open-source, distributed, versioned storage manager well suited for random, realtime read/write access.

Wait wait? random, realtime read/write access?
How is that possible? Is not Hadoop just a sequential read/write, batch processing system?

Yes, we’re talking about the same thing, and in the next few paragraphs, I’m going to explain to  you how HBase achieves the random I/O, how it stores data and the evolution of the HBase’s HFile format.

Read more