16 Sep 2015 by Stephan Ewen (@stephanewen)
Running data-intensive code in the JVM and making it well-behaved is tricky. Systems that put billions of data objects naively onto the JVM heap face unpredictable OutOfMemoryErrors and Garbage Collection stalls. Of course, you still want to to keep your data in memory as much as possible, for speed and responsiveness of the processing applications. In that context, "off-heap" has become almost something like a magic word to solve these problems.
In this blog post, we will look at how Flink exploits off-heap memory. The feature is part of the upcoming release, but you can try it out with the latest nightly builds. We will also give a few interesting insights into the behavior for Java's JIT compiler for highly optimized methods and loops.
Continue reading »
03 Sep 2015
Flink Forward 2015 is the first
conference with Flink at its center that aims to bring together the
Apache Flink community in a single place. The organizers are starting
this conference in October 12 and 13 from Berlin, the place where
Apache Flink started.
Continue reading »
01 Sep 2015
The Flink community is happy to announce that Flink 0.9.1 is now available.
Continue reading »
24 Aug 2015
This blog post introduces Gelly, Apache Flink’s graph-processing API and library. Flink’s native support
for iterations makes it a suitable platform for large-scale graph analytics.
By leveraging delta iterations, Gelly is able to map various graph processing models such as
vertex-centric or gather-sum-apply to Flink dataflows.
Continue reading »
24 Jun 2015
The Apache Flink community is pleased to announce the availability of the 0.9.0 release. The release is the result of many months of hard work within the Flink community. It contains many new features and improvements which were previewed in the 0.9.0-milestone1 release and have been polished since then. This is the largest Flink release so far.
Continue reading »
14 May 2015 by Kostas Tzoumas (@kostas_tzoumas)
The monthly update from the Flink community. Including the availability of a new preview release, lots of meetups and conference talks and a great interview about Flink.
Continue reading »
11 May 2015 by Fabian Hüske (@fhueske)
Nowadays, a lot of open-source systems for analyzing large data sets are implemented in Java or other JVM-based programming languages. The most well-known example is Apache Hadoop, but also newer frameworks such as Apache Spark, Apache Drill, and also Apache Flink run on JVMs. A common challenge that JVM-based data analysis engines face is to store large amounts of data in memory - both for caching and for efficient processing such as sorting and joining of data. Managing the JVM memory well makes the difference between a system that is hard to configure and has unpredictable reliability and performance and a system that behaves robustly with few configuration knobs.
In this blog post we discuss how Apache Flink manages memory, talk about its custom data de/serialization stack, and show how it operates on binary data.
Continue reading »
13 Apr 2015
The Apache Flink community is pleased to announce the availability of
the 0.9.0-milestone-1 release. The release is a preview of the
upcoming 0.9.0 release. It contains many new features which will be
available in the upcoming 0.9 release. Interested users are encouraged
to try it out and give feedback. As the version number indicates, this
release is a preview release that contains known issues.
Continue reading »
07 Apr 2015
March has been a busy month in the Flink community.
Continue reading »
13 Mar 2015 by Fabian Hüske (@fhueske)
Joins are prevalent operations in many data processing applications. Most data processing systems feature APIs that make joining data sets very easy. However, the internal algorithms for join processing are much more involved – especially if large data sets need to be efficiently handled. In this blog post, we cut through Apache Flink’s layered architecture and take a look at its internals with a focus on how it handles joins.
Continue reading »