Announcing Apache Flink 1.1.0

08 Aug 2016

Important: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use 1.1.1 or 1.1.1-hadoop1 as the Flink version.

The Apache Flink community is pleased to announce the availability of Flink 1.1.0.

This release is the first major release in the 1.X.X series of releases, which maintains API compatibility with 1.0.0. This means that your applications written against stable APIs of Flink 1.0.0 will compile and run with Flink 1.1.0. 95 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. See the complete changelog for more details.

We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome!

Some highlights of the release are listed in the following sections.

Connectors

The streaming connectors are a major part of Flink’s DataStream API. This release adds support for new external systems and further improves on the available connectors.

Continuous File System Sources

A frequently requested feature for Flink 1.0 was to be able to monitor directories and process files continuously. Flink 1.1 now adds support for this via FileProcessingModes:

DataStream<String> stream = env.readFile(
  textInputFormat,
  "hdfs:///file-path",
  FileProcessingMode.PROCESS_CONTINUOUSLY,
  5000, // monitoring interval (millis)
  FilePathFilter.createDefaultFilter()); // file path filter

This will monitor hdfs:///file-path every 5000 milliseconds. Check out the DataSource documentation for more details.

Kinesis Source and Sink

Flink 1.1 adds a Kinesis connector for both consuming (FlinkKinesisConsumer) from and producing (FlinkKinesisProduer) to Amazon Kinesis Streams, which is a managed service purpose-built to make it easy to work with streaming data on AWS.

DataStream<String> kinesis = env.addSource(
  new FlinkKinesisConsumer<>("stream-name", schema, config));

Check out the Kinesis connector documentation for more details.

Cassandra Sink

The Apache Cassandra sink allows you to write from Flink to Cassandra. Flink can provide exactly-once guarantees if the query is idempotent, meaning it can be applied multiple times without changing the result.

CassandraSink.addSink(input)

Check out the Cassandra Sink documentation for more details.

Table API and SQL

The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (for both Java and Scala).

Table custT = tableEnv
  .toTable(custDs, "name, zipcode")
  .where("zipcode = '12345'")
  .select("name")

An initial version of this API was already available in Flink 1.0. For Flink 1.1, the community put a lot of work into reworking the architecture of the Table API and integrating it with Apache Calcite.

In this first version, SQL (and Table API) queries on streams are limited to selection, filter, and union operators. Compared to Flink 1.0, the revised Table API supports many more scalar functions and is able to read tables from external sources and write them back to external sinks.

Table result = tableEnv.sql(
  "SELECT STREAM product, amount FROM Orders WHERE product LIKE '%Rubber%'");

A more detailed introduction can be found in the Flink blog and the Table API documentation.

DataStream API

The DataStream API now exposes session windows and allowed lateness as first-class citizens.

Session Windows

Session windows are ideal for cases where the window boundaries need to adjust to the incoming data. This enables you to have windows that start at individual points in time for each key and that end once there has been a certain period of inactivity. The configuration parameter is the session gap that specifies how long to wait for new data before considering a session as closed.

input.keyBy(<key selector>)
    .window(EventTimeSessionWindows.withGap(Time.minutes(10)))
    .<windowed transformation>(<window function>);

Support for Late Elements

You can now specify how a windowed transformation should deal with late elements and how much lateness is allowed. The parameter for this is called allowed lateness. This specifies by how much time elements can be late.

input.keyBy(<key selector>).window(<window assigner>)
    .allowedLateness(<time>)
    .<windowed transformation>(<window function>);

Elements that arrive within the allowed lateness are still put into windows and are considered when computing window results. If elements arrive after the allowed lateness they will be dropped. Flink will also make sure that any state held by the windowing operation is garbage collected once the watermark passes the end of a window plus the allowed lateness.

Check out the Windows documentation for more details.

Scala API for Complex Event Processing (CEP)

Flink 1.0 added the initial version of the CEP library. The core of the library is a Pattern API, which allows you to easily specify patterns to match against in your event stream. While in Flink 1.0 this API was only available for Java, Flink 1.1. now exposes the same API for Scala, allowing you to specify your event patterns in a more concise manner.

A more detailed introduction can be found in the Flink blog and the CEP documentation.

Graph generators and new Gelly library algorithms

This release includes many enhancements and new features for graph processing. Gelly now provides a collection of scalable graph generators for common graph types, such as complete, cycle, grid, hypercube, and RMat graphs. A variety of new graph algorithms have been added to the Gelly library, including Global and Local Clustering Coefficient, HITS, and similarity measures (Jaccard and Adamic-Adar).

For a full list of new graph processing features, check out the Gelly documentation.

Metrics

Flink’s new metrics system allows you to easily gather and expose metrics from your user application to external systems. You can add counters, gauges, and histograms to your application via the runtime context:

Counter counter = getRuntimeContext()
  .getMetricGroup()
  .counter("my-counter");

All registered metrics will be exposed via reporters. Out of the box, Flinks comes with support for JMX, Ganglia, Graphite, and statsD. In addition to your custom metrics, Flink exposes many internal metrics like checkpoint sizes and JVM stats.

Check out the Metrics documentation for more details.

List of Contributors

The following 95 people contributed to this release:

  • Abdullah Ozturk
  • Ajay Bhat
  • Alexey Savartsov
  • Aljoscha Krettek
  • Andrea Sella
  • Andrew Palumbo
  • Chenguang He
  • Chiwan Park
  • David Moravek
  • Dominik Bruhn
  • Dyana Rose
  • Fabian Hueske
  • Flavio Pompermaier
  • Gabor Gevay
  • Gabor Horvath
  • Geoffrey Mon
  • Gordon Tai
  • Greg Hogan
  • Gyula Fora
  • Henry Saputra
  • Ignacio N. Lucero Ascencio
  • Igor Berman
  • Ismaël Mejía
  • Ivan Mushketyk
  • Jark Wu
  • Jiri Simsa
  • Jonas Traub
  • Josh
  • Joshi
  • Joshua Herman
  • Ken Krugler
  • Konstantin Knauf
  • Lasse Dalegaard
  • Li Fanxi
  • MaBiao
  • Mao Wei
  • Mark Reddy
  • Martin Junghanns
  • Martin Liesenberg
  • Maximilian Michels
  • Michal Fijolek
  • Márton Balassi
  • Nathan Howell
  • Niels Basjes
  • Niels Zeilemaker
  • Phetsarath, Sourigna
  • Robert Metzger
  • Scott Kidder
  • Sebastian Klemke
  • Shahin
  • Shannon Carey
  • Shannon Quinn
  • Stefan Richter
  • Stefano Baghino
  • Stefano Bortoli
  • Stephan Ewen
  • Steve Cosenza
  • Sumit Chawla
  • Tatu Saloranta
  • Tianji Li
  • Till Rohrmann
  • Todd Lisonbee
  • Tony Baines
  • Trevor Grant
  • Ufuk Celebi
  • Vasudevan
  • Yijie Shen
  • Zack Pierce
  • Zhai Jia
  • chengxiang li
  • chobeat
  • danielblazevski
  • dawid
  • dawidwys
  • eastcirclek
  • erli ding
  • gallenvara
  • kl0u
  • mans2singh
  • markreddy
  • mjsax
  • nikste
  • omaralvarez
  • philippgrulich
  • ramkrishna
  • sahitya-pavurala
  • samaitra
  • smarthi
  • spkavuly
  • subhankar
  • twalthr
  • vasia
  • xueyan.li
  • zentol
  • 卫乐