Blog

Off-heap Memory in Apache Flink and the curious JIT compiler

16 Sep 2015 by Stephan Ewen (@stephanewen)

Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, "off-heap" has become almost something like a magic word to solve these problems.

In this blog post, we will look at how Flink exploits off-heap memory. The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java's JIT compiler for highly optimized methods and loops.

Continue reading »




Introducing Gelly: Graph Processing with Apache Flink

24 Aug 2015

This blog post introduces Gelly, Apache Flink’s graph-processing API and library. Flink’s native support for iterations makes it a suitable platform for large-scale graph analytics. By leveraging delta iterations, Gelly is able to map various graph processing models such as vertex-centric or gather-sum-apply to Flink dataflows.

Continue reading »


Announcing Apache Flink 0.9.0

24 Jun 2015

The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far.

Continue reading »



Juggling with Bits and Bytes

11 May 2015 by Fabian Hüske (@fhueske)

Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.

In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.

Continue reading »


Announcing Flink 0.9.0-milestone1 preview release

13 Apr 2015

The Apache Flink community is pleased to announce the availability of the 0.9.0-milestone-1 release. The release is a preview of the upcoming 0.9.0 release. It contains many new features which will be available in the upcoming 0.9 release. Interested users are encouraged to try it out and give feedback. As the version number indicates, this release is a preview release that contains known issues.

Continue reading »



Peeking into Apache Flink's Engine Room

13 Mar 2015 by Fabian Hüske (@fhueske)

Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins.

Continue reading »