To demonstrate how Flink can be applied to unbounded datasets, here’s a selection of real-word Flink users and problems they’re solving with Flink.
For more examples, please see the Powered by Flink page.
Optimization of e-commerce search results in real-time: Alibaba’s search infrastructure team uses Flink to update product detail and inventory information in real-time, improving relevance for users.
Stream processing-as-a-service for data science teams: King (the creators of Candy Crush Saga) makes real-time analytics available to its data scientists via a Flink-powered internal platform, dramatically shortening the time to insights from game data.
Network / sensor monitoring and error detection: Bouygues Telecom, one of the largest telecom providers in France, uses Flink to monitor its wired and wireless networks, enabling a rapid response to outages throughout the country.
ETL for business intelligence infrastructure: Zalando uses Flink to transform data for easier loading into its data warehouse, converting complex payloads into relatively simple ones and ensuring that analytics end users have faster access to data.
We can tease out common threads from these use cases. Based on the examples above, Flink is well-suited for:
A variety of (sometimes unreliable) data sources: When data is generated by millions of different users or devices, it’s safe to assume that some events will arrive out of the order they actually occurred–and in the case of more significant upstream failures, some events might come hours later than they’re supposed to. Late data needs to be handled so that results are accurate.
Applications with state: When applications become more complex than simple filtering or enhancing of single data records, managing state within these applications (e.g., counters, windows of past data, state machines, embedded databases) becomes hard. Flink provides tools so that state is efficient, fault tolerant, and manageable from the outside so you don’t have to build these capabilities yourself.
Data that is processed quickly: There is a focus in these use cases on real-time or near-real-time scenarios, where insights from data should be available at nearly the same moment that the data is generated. Flink is fully capable of meeting these latency requirements when necessary.
Data in large volumes: These programs would need to be distributed across many nodes (in some cases, thousands) to support the required scale. Flink can run on large clusters just as seamlessly as it runs on small ones.
And for more user stories, we recommend the sessions from Flink Forward 2016, the annual Flink user conference.