This documentation is for an unreleased version of Apache Flink. We recommend you use the latest stable version.

Orc Format #

Format: Serialization Schema Format: Deserialization Schema

The Apache Orc format allows to read and write Orc data.

Dependencies #

In order to use the ORC format the following dependencies are required for both projects using a build automation tool (such as Maven or SBT) and SQL Client with SQL JAR bundles.

Maven dependency	SQL Client
`<dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-orc</artifactId> <version>2.3-SNAPSHOT</version> </dependency>` Copied to clipboard!	Only available for stable releases.

How to create a table with Orc format #

Here is an example to create a table using Filesystem connector and Orc format.

CREATE TABLE user_behavior (
  user_id BIGINT,
  item_id BIGINT,
  category_id BIGINT,
  behavior STRING,
  ts TIMESTAMP(3),
  dt STRING
) PARTITIONED BY (dt) WITH (
 'connector' = 'filesystem',
 'path' = '/tmp/user_behavior',
 'format' = 'orc'
)

Format Options #

Option	Required	Default	Type	Description
format	required	(none)	String	Specify what format to use, here should be 'orc'.

Orc format also supports table properties from Table properties. For example, you can configure orc.compress=SNAPPY to enable snappy compression.

Data Type Mapping #

Orc format type mapping is compatible with Apache Hive. The following table lists the type mapping from Flink type to Orc type.

Flink Data Type	Orc physical type	Orc logical type
CHAR	bytes	CHAR
VARCHAR	bytes	VARCHAR
STRING	bytes	STRING
BOOLEAN	long	BOOLEAN
BYTES	bytes	BINARY
DECIMAL	decimal	DECIMAL
TINYINT	long	BYTE
SMALLINT	long	SHORT
INT	long	INT
BIGINT	long	LONG
FLOAT	double	FLOAT
DOUBLE	double	DOUBLE
DATE	long	DATE
TIMESTAMP	timestamp	TIMESTAMP
ARRAY	-	LIST
MAP	-	MAP
ROW	-	STRUCT