Apache Flink: Announcing Apache Flink 1.1.0

08 Aug 2016

Important: The Maven artifacts published with version 1.1.0 on Maven central have a Hadoop dependency issue. It is highly recommended to use 1.1.1 or 1.1.1-hadoop1 as the Flink version.

The Apache Flink community is pleased to announce the availability of Flink 1.1.0.

This release is the first major release in the 1.X.X series of releases, which maintains API compatibility with 1.0.0. This means that your applications written against stable APIs of Flink 1.0.0 will compile and run with Flink 1.1.0. 95 contributors provided bug fixes, improvements, and new features such that in total more than 450 JIRA issues could be resolved. See the complete changelog for more details.

We encourage everyone to download the release and check out the documentation. Feedback through the Flink mailing lists is, as always, very welcome!

Some highlights of the release are listed in the following sections.

Connectors

The streaming connectors are a major part of Flink’s DataStream API. This release adds support for new external systems and further improves on the available connectors.

Continuous File System Sources

A frequently requested feature for Flink 1.0 was to be able to monitor directories and process files continuously. Flink 1.1 now adds support for this via FileProcessingModes:

DataStream<String> stream = env.readFile(
  textInputFormat,
  "hdfs:///file-path",
  FileProcessingMode.PROCESS_CONTINUOUSLY,
  5000, // monitoring interval (millis)
  FilePathFilter.createDefaultFilter()); // file path filter

This will monitor hdfs:///file-path every 5000 milliseconds. Check out the DataSource documentation for more details.

Kinesis Source and Sink

Flink 1.1 adds a Kinesis connector for both consuming (FlinkKinesisConsumer) from and producing (FlinkKinesisProduer) to Amazon Kinesis Streams, which is a managed service purpose-built to make it easy to work with streaming data on AWS.

DataStream<String> kinesis = env.addSource(
  new FlinkKinesisConsumer<>("stream-name", schema, config));

Check out the Kinesis connector documentation for more details.

Cassandra Sink

The Apache Cassandra sink allows you to write from Flink to Cassandra. Flink can provide exactly-once guarantees if the query is idempotent, meaning it can be applied multiple times without changing the result.

CassandraSink.addSink(input)

Check out the Cassandra Sink documentation for more details.

Table API and SQL

The Table API is a SQL-like expression language for relational stream and batch processing that can be easily embedded in Flink’s DataSet and DataStream APIs (for both Java and Scala).

Table custT = tableEnv
  .toTable(custDs, "name, zipcode")
  .where("zipcode = '12345'")
  .select("name")

An initial version of this API was already available in Flink 1.0. For Flink 1.1, the community put a lot of work into reworking the architecture of the Table API and integrating it with Apache Calcite.

In this first version, SQL (and Table API) queries on streams are limited to selection, filter, and union operators. Compared to Flink 1.0, the revised Table API supports many more scalar functions and is able to read tables from external sources and write them back to external sinks.

Table result = tableEnv.sql(
  "SELECT STREAM product, amount FROM Orders WHERE product LIKE '%Rubber%'");

A more detailed introduction can be found in the Flink blog and the Table API documentation.

DataStream API

The DataStream API now exposes session windows and allowed lateness as first-class citizens.

Session Windows

Session windows are ideal for cases where the window boundaries need to adjust to the incoming data. This enables you to have windows that start at individual points in time for each key and that end once there has been a certain period of inactivity. The configuration parameter is the session gap that specifies how long to wait for new data before considering a session as closed.

input.keyBy(<key selector>)
    .window(EventTimeSessionWindows.withGap(Time.minutes(10)))
    .<windowed transformation>(<window function>);

Support for Late Elements

You can now specify how a windowed transformation should deal with late elements and how much lateness is allowed. The parameter for this is called allowed lateness. This specifies by how much time elements can be late.

input.keyBy(<key selector>).window(<window assigner>)
    .allowedLateness(<time>)
    .<windowed transformation>(<window function>);

Elements that arrive within the allowed lateness are still put into windows and are considered when computing window results. If elements arrive after the allowed lateness they will be dropped. Flink will also make sure that any state held by the windowing operation is garbage collected once the watermark passes the end of a window plus the allowed lateness.

Check out the Windows documentation for more details.

Scala API for Complex Event Processing (CEP)

Flink 1.0 added the initial version of the CEP library. The core of the library is a Pattern API, which allows you to easily specify patterns to match against in your event stream. While in Flink 1.0 this API was only available for Java, Flink 1.1. now exposes the same API for Scala, allowing you to specify your event patterns in a more concise manner.

A more detailed introduction can be found in the Flink blog and the CEP documentation.

Graph generators and new Gelly library algorithms

This release includes many enhancements and new features for graph processing. Gelly now provides a collection of scalable graph generators for common graph types, such as complete, cycle, grid, hypercube, and RMat graphs. A variety of new graph algorithms have been added to the Gelly library, including Global and Local Clustering Coefficient, HITS, and similarity measures (Jaccard and Adamic-Adar).

For a full list of new graph processing features, check out the Gelly documentation.

Metrics

Flink’s new metrics system allows you to easily gather and expose metrics from your user application to external systems. You can add counters, gauges, and histograms to your application via the runtime context:

Counter counter = getRuntimeContext()
  .getMetricGroup()
  .counter("my-counter");

All registered metrics will be exposed via reporters. Out of the box, Flinks comes with support for JMX, Ganglia, Graphite, and statsD. In addition to your custom metrics, Flink exposes many internal metrics like checkpoint sizes and JVM stats.

Check out the Metrics documentation for more details.

List of Contributors

The following 95 people contributed to this release:

Abdullah Ozturk
Ajay Bhat
Alexey Savartsov
Aljoscha Krettek
Andrea Sella
Andrew Palumbo
Chenguang He
Chiwan Park
David Moravek
Dominik Bruhn
Dyana Rose
Fabian Hueske
Flavio Pompermaier
Gabor Gevay
Gabor Horvath
Geoffrey Mon
Gordon Tai
Greg Hogan
Gyula Fora
Henry Saputra
Ignacio N. Lucero Ascencio
Igor Berman
Ismaël Mejía
Ivan Mushketyk
Jark Wu
Jiri Simsa
Jonas Traub
Josh
Joshi
Joshua Herman
Ken Krugler
Konstantin Knauf
Lasse Dalegaard
Li Fanxi
MaBiao
Mao Wei
Mark Reddy
Martin Junghanns
Martin Liesenberg
Maximilian Michels
Michal Fijolek
Márton Balassi
Nathan Howell
Niels Basjes
Niels Zeilemaker
Phetsarath, Sourigna
Robert Metzger
Scott Kidder
Sebastian Klemke
Shahin
Shannon Carey
Shannon Quinn
Stefan Richter
Stefano Baghino
Stefano Bortoli
Stephan Ewen
Steve Cosenza
Sumit Chawla
Tatu Saloranta
Tianji Li
Till Rohrmann
Todd Lisonbee
Tony Baines
Trevor Grant
Ufuk Celebi
Vasudevan
Yijie Shen
Zack Pierce
Zhai Jia
chengxiang li
chobeat
danielblazevski
dawid
dawidwys
eastcirclek
erli ding
gallenvara
kl0u
mans2singh
markreddy
mjsax
nikste
omaralvarez
philippgrulich
ramkrishna
sahitya-pavurala
samaitra
smarthi
spkavuly
subhankar
twalthr
vasia
xueyan.li
zentol
卫乐