Ecosystem


Apache Flink supports a broad ecosystem and works seamlessly with many other data processing projects and frameworks.

Connectors

Connectors provide code for interfacing with various third-party systems.

Currently these systems are supported:

To run an application using one of these connectors, additional third party components are usually required to be installed and launched, e.g. the servers for the message queues. Further instructions for these can be found in the corresponding subsections.

Third-Party Projects

This is a list of third party packages (ie, libraries, system extensions, or examples) built on Flink. The Flink community collects links to these packages but does not maintain them. Thus, they do not belong to the Apache Flink project, and the community cannot give any support for them. Is your project missing? Please let us know on the user/dev mailing list.

Apache Zeppelin

Apache Zeppelin (incubator) is a web-based notebook that enables interactive data analytics and can be used with Flink as an execution engine (next to others engines). See also Jim Dowling’s Flink Forward talk about Zeppelin on Flink.

Apache Mahout

Apache Mahout in a machine learning library that will feature Flink as an execution engine soon. Check out Sebastian Schelter’s Flink Forward talk about Mahout-Samsara DSL.

Cascading

Cascading enables an user to build complex workflows easily on Flink and other execution engines. Cascading on Flink is build by dataArtisans and Driven, Inc. See Fabian Hueske’s Flink Forward talk for more details.

Apache Beam (incubating)

Apache Beam (incubating) is an open source, unified programming model that you can use to create a data processing pipeline. Flink is one of the back-ends supported by the Beam programming model.

GRADOOP

GRADOOP enables scalable graph analytics on top of Flink and is developed at Leipzig University. Check out Martin Junghanns’ Flink Forward talk.

BigPetStore

BigPetStore is a benchmarking suite including a data generator and will be available for Flink soon. See Suneel Marthi’s Flink Forward talk as preview.

FastR

FastR in an implemenation of the R language in Java. FastR Flink exeutes R workload on top of Flink.

Apache SAMOA

Apache SAMOA (incubating) a streaming ML library featuring Flink an execution engine soon. Albert Bifet introduced SAMOA on Flink at his Flink Forward talk.

Python Examples on Flink

A collection of examples using Apache Flink’s Python API.

WordCount Example in Clojure

Small WordCount example on how to write a Flink program in Clojure.

Anomaly Detection and Prediction in Flink

flink-htm is a library for anomaly detection and prediction in Apache Flink. The algorithms are based on Hierarchical Temporal Memory (HTM) as implemented by the Numenta Platform for Intelligent Computing (NuPIC).

Apache Ignite

Apache Ignite is a high-performance, integrated and distributed in-memory platform for computing and transacting on large-scale data sets in real-time. See Flink sink streaming connector to inject data into Ignite cache.