Apache Flink 0.7.0 available

04 Nov 2014

We are pleased to announce the availability of Flink 0.7.0. This release includes new user-facing features as well as performance and bug fixes, brings the Scala and Java APIs in sync, and introduces Flink Streaming. A total of 34 people have contributed to this release, a big thanks to all of them!

Download Flink 0.7.0 here

See the release changelog here

Overview of major new features

Flink Streaming: The gem of the 0.7.0 release is undoubtedly Flink Streaming. Available currently in alpha, Flink Streaming provides a Java API on top of Apache Flink that can consume streaming data sources (e.g., from Apache Kafka, Apache Flume, and others) and process them in real time. A dedicated blog post on Flink Streaming and its performance is coming up here soon. You can check out the Streaming programming guide here.

New Scala API: The Scala API has been completely rewritten. The Java and Scala APIs have now the same syntax and transformations and will be kept from now on in sync in every future release. See the new Scala API here.

Logical key expressions: You can now specify grouping and joining keys with logical names for member variables of POJO data types. For example, you can join two data sets as persons.join(cities).where(“zip”).equalTo(“zipcode”). Read more here.

Hadoop MapReduce compatibility: You can run unmodified Hadoop Mappers and Reducers (mapred API) in Flink, use all Hadoop data types, and read data with all Hadoop InputFormats.

Collection-based execution backend: The collection-based execution backend enables you to execute a Flink job as a simple Java collections program, bypassing completely the Flink runtime and optimizer. This feature is extremely useful for prototyping, and embedding Flink jobs in projects in a very lightweight manner.

Record API deprecated: The (old) Stratosphere Record API has been marked as deprecated and is planned for removal in the 0.9.0 release.

BLOB service: This release contains a new service to distribute jar files and other binary data among the JobManager, TaskManagers and the client.

Intermediate data sets: A major rewrite of the system internals introduces intermediate data sets as first class citizens. The internal state machine that tracks the distributed tasks has also been completely rewritten for scalability. While this is not visible as a user-facing feature yet, it is the foundation for several upcoming exciting features.

Note: Currently, there is limited support for Java 8 lambdas when compiling and running from an IDE. The problem is due to type erasure and whether Java compilers retain type information. We are currently working with the Eclipse and OpenJDK communities to resolve this.

Contributors

  • Tamas Ambrus
  • Mariem Ayadi
  • Marton Balassi
  • Daniel Bali
  • Ufuk Celebi
  • Hung Chang
  • David Eszes
  • Stephan Ewen
  • Judit Feher
  • Gyula Fora
  • Gabor Hermann
  • Fabian Hueske
  • Vasiliki Kalavri
  • Kristof Kovacs
  • Aljoscha Krettek
  • Sebastian Kruse
  • Sebastian Kunert
  • Matyas Manninger
  • Robert Metzger
  • Mingliang Qi
  • Till Rohrmann
  • Henry Saputra
  • Chesnay Schelper
  • Moritz Schubotz
  • Hung Sendoh Chang
  • Peter Szabo
  • Jonas Traub
  • Fabian Tschirschnitz
  • Artem Tsikiridis
  • Kostas Tzoumas
  • Timo Walther
  • Daniel Warneke
  • Tobias Wiens
  • Yingjun Wu