This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.
Overview
Flink APIs #
Apache Flink provides two main APIs for building streaming and batch applications: the Table API and the DataStream API. Both APIs can be used with Java, Scala, and Python.
Choosing the Right Approach #
Flink offers a spectrum from fully declarative to fully imperative programming:
| Approach | Description | When to Use |
|---|---|---|
| Declarative (SQL / Table API) | Built-in relational operators | Standard ETL, analytics, joins, aggregations |
| Extended (+ User-Defined Functions) | Custom functions within declarative pipelines | When built-in functions don’t cover your needs |
| Hybrid (+ ProcessTableFunction) | Low-level control for specific operations | Event-driven patterns, state/timers within a Table pipeline |
| Imperative (DataStream API) | Full control over the entire application | Complete control over state, windows, and processing logic |
Start Declarative #
For most use cases, start with SQL or the Table API:
- Flink optimizes your queries automatically
- Built-in operators cover common patterns (joins, aggregations, windows)
- Pattern matching is available via
MATCH_RECOGNIZE
Extend When Needed #
When built-in capabilities aren’t enough:
- User-Defined Functions (UDFs): Add custom scalar, table, or aggregate functions
- ProcessTableFunction: Access state and timers for specific parts of your pipeline while staying in the Table API
Go Imperative for Full Control #
Use the DataStream API when:
- You want to build the entire application imperatively
- Your use case doesn’t fit the relational/table abstraction
- You need control over aspects that Table API doesn’t expose
Comparing the APIs #
| Table API | DataStream API | |
|---|---|---|
| Paradigm | Declarative (what to compute) | Imperative (how to compute) |
| Abstraction | Relational operations on tables | Stream transformations |
| Optimization | Automatic query optimization | Manual optimization |
| State management | Automatic (with ProcessTableFunction for custom state) | Manual (fine-grained control) |
| SQL integration | Full SQL support | Limited |
Mixing APIs #
The Table API and DataStream API can be used together. You can:
- Convert a DataStream to a Table for relational operations
- Convert a Table back to a DataStream for low-level processing
- Use SQL within a DataStream application
See DataStream API Integration for details.
Where to Go Next #
- Table API: Declarative API for relational operations.
- DataStream API: Imperative API for stream processing.
- Configuration: Project setup and dependencies.
Getting Started Tutorials #
- Flink SQL Tutorial: Get started with SQL (no programming required).
- Table API Tutorial: Build a streaming pipeline with the Table API.
- DataStream API Tutorial: Build an event-driven application with the DataStream API.