zarrow

zarrow

English 中文

Apache Arrow, built natively for Zig.

zarrow is a Zig-native implementation of Apache Arrow for zero-copy, columnar data interchange. It provides Arrow-compatible array layouts, builders, validation, IPC stream support, and Arrow C Data / C Stream ABI interoperability without wrapping C++ or Rust.

The goal is to make Zig a serious option for Arrow-based systems, analytics tooling, storage engines, query engines, and data interoperability work.


Why zarrow?

Apache Arrow is one of the core foundations of modern data infrastructure. It defines a language-independent columnar memory format used across analytics engines, databases, dataframe libraries, and storage systems.

Zig has strong properties for this space: explicit memory management, predictable performance, simple cross-language boundaries, and excellent systems-level control. zarrow exists to bring Arrow into Zig directly, as a native library rather than as a wrapper around another ecosystem.

Use zarrow when you want to:

  • construct Arrow arrays in Zig;
  • validate Arrow-compatible memory layouts;
  • read or write Arrow IPC streams;
  • exchange Arrow data with other runtimes through the Arrow C Data / C Stream ABI;
  • build downstream Arrow tooling such as compute kernels, IPC utilities, query engines, or storage integrations.

Status

zarrow is under active development. It is already useful for experimentation, interoperability testing, and early downstream libraries, but the public API is still evolving.

Area Status
Core Arrow memory layout Supported
Array builders Supported
Layout validation Supported
Zero-copy slicing Supported
Primitive arrays Supported
Boolean, string, binary arrays Supported
List, struct, map, union, dictionary, run-end encoded arrays Supported / evolving
Temporal and decimal types Supported / evolving
Record batches Supported
Arrow IPC stream read/write Supported
Arrow IPC interop fixtures Supported
Arrow C Data / C Stream ABI Supported
zarrow-core module for downstream libraries Supported
Full compute kernels Downstream project: zarrow-compute
Parquet Downstream project: zarrow-parquet

The project favors correctness, explicit ownership, interop testing, and spec-aligned behavior over premature API stabilization.


Highlights

  • Zig-native Apache Arrow implementation
  • Arrow-compatible in-memory array layouts
  • Builders for primitive, boolean, string, binary, nested, dictionary, temporal, decimal, and advanced array types
  • Null bitmap handling and layout validation
  • Zero-copy slicing
  • Record batch support
  • Arrow IPC stream read/write
  • Arrow C Data / C Stream ABI import/export
  • Shared C ABI library surface: zarrow_c
  • zarrow-core module for downstream libraries that do not need IPC/FFI
  • Interop smoke tests against Arrow Go, Arrow C++, and arrow-rs tooling
  • Compute framework foundation for downstream zarrow-compute
  • Examples, benchmarks, fuzz corpus, and CI-oriented compatibility tools

Requirements

master currently targets Zig 0.15.x.

Because Zig has frequent breaking changes between compiler versions, zarrow maintains versioned tags that encode both:

  • v<zarrow-version> = zarrow release version
  • zig0.15 = Zig compiler major/minor compatibility target

For example, v0.0.2-zig0.15 means:

  • zarrow release version 0.0.2
  • Zig compatibility target 0.15.x

Recommended tag mapping:

Zig version Use tag
0.15.x v<zarrow-version>-zig0.15
0.14.x v<zarrow-version>-zig0.14
0.13.x v<zarrow-version>-zig0.13

If you are using older Zig versions, use the matching -zig0.xx tag instead of master.


Installation

Add zarrow to your Zig project:

zig fetch --save "git+https://github.com/tylitianrui/zarrow#master"

For release-pinned usage, prefer a release tag:

zig fetch --save "git+https://github.com/tylitianrui/zarrow#v<zarrow-version>-zig0.15"

Then wire the module in your build.zig:

const zarrow_dep = b.dependency("zarrow", .{
    .target = target,
    .optimize = optimize,
});

exe.root_module.addImport("zarrow", zarrow_dep.module("zarrow"));

If your downstream library only needs core data structures and compute framework APIs, import zarrow-core instead:

exe.root_module.addImport("zarrow-core", zarrow_dep.module("zarrow-core"));

Quick Start

Create a nullable Int32Array:

const std = @import("std");
const zarrow = @import("zarrow");

pub fn main() !void {
    var builder = try zarrow.Int32Builder.init(std.heap.page_allocator, 3);
    defer builder.deinit();

    try builder.append(10);
    try builder.appendNull();
    try builder.append(30);

    var arr_ref = try builder.finish();
    defer arr_ref.release();

    const arr = zarrow.Int32Array{ .data = arr_ref.data() };

    std.debug.print("len={d}, v0={d}, isNull1={any}, v2={d}\n", .{
        arr.len(),
        arr.value(0),
        arr.isNull(1),
        arr.value(2),
    });
}

Expected output:

len=3, v0=10, isNull1=true, v2=30

Public Modules

zarrow exposes two Zig modules:

Module Purpose
zarrow Full public package surface, including arrays, schemas, record batches, IPC, and public Arrow data structures.
zarrow-core Core-only surface for downstream libraries that do not require IPC or FFI dependencies. Useful for compute libraries such as zarrow-compute.

The build also provides a C ABI shared library:

Target Purpose
zarrow_c Shared C ABI surface for Arrow C Data / C Stream import and export.

Build the C ABI library with:

zig build c-api-lib

Feature Overview

The Status table above is the single capability summary source in this README.

Use this section as a quick navigation map:

Common commands:

zig build example-ipc_stream
zig build c-api-lib
zig build example-c_abi_roundtrip
zig build interop-fixture-writer -- .interop-fixtures/zarrow.arrow canonical stream
zig build interop-fixture-check -- .interop-fixtures/zarrow.arrow canonical stream

Compute Framework

zarrow includes a compute framework foundation for downstream projects such as zarrow-compute.

The core library provides framework-level building blocks:

  • FunctionRegistry
  • Kernel
  • KernelSignature
  • ExecContext
  • Datum
  • typed options
  • result type inference
  • scalar / array / chunked input representation
  • null propagation helpers
  • unary, binary, and n-ary execution helpers
  • chunk-aware helpers for take, filter, and permutation-style operations

zarrow intentionally does not try to own the full compute function catalog in this repository. Concrete kernels such as arithmetic, comparison, filter, sort, cast, and aggregate functions should live in downstream libraries such as zarrow-compute.

For the downstream API boundary, see:


Examples

Run the default example:

zig build run

Run all examples:

zig build examples

Run a specific example:

zig build example-primitive_builder
zig build example-string_builder
zig build example-struct_builder
zig build example-ipc_stream
zig build example-c_abi_roundtrip
zig build example-compute_lifecycle

Available examples include builders for primitive, unsigned integer, boolean, string, large string, binary, large binary, decimal, dictionary, list, large list, fixed-size list, fixed-size binary, map, struct, union, run-end encoded, temporal, interval, view, IPC, C ABI, and compute lifecycle workflows.


Development

Clone the repository and run tests:

git clone https://github.com/tylitianrui/zarrow.git
cd zarrow
zig build test

Run formatting before opening a PR:

zig fmt src tools examples benchmarks build.zig

Run benchmarks:

zig build benchmark-smoke
zig build benchmark
zig build benchmark-full

Run built-in fuzz corpus replay:

zig build fuzz-corpus

Run IPC fuzz harnesses manually:

zig build fuzz-array-layout
zig build fuzz-ipc-reader

Interoperability Verification

zarrow treats interoperability as a core project requirement.

Current interop areas include:

Area Coverage
Arrow Go IPC Fixture generation and validation in both directions
Arrow C Data ABI Import/export smoke checks
Arrow C Stream ABI Streaming-preserving import/export path
Arrow C++ C ABI smoke tooling
arrow-rs C ABI smoke tooling
PyArrow compute Compatibility check tooling for compute-alignment cases

Useful commands:

zig build c-api-lib
zig build example-c_abi_roundtrip
zig build compute-compat-check -- numeric

See the interop docs for the full local verification matrix:


API Stability

The public API is evolving.

The project uses three stability levels:

Level Meaning
Stable Intended for external use, with compatibility preserved across patch/minor releases.
Experimental Public but still evolving; may change with release notes.
Internal No compatibility guarantees.

See:


Ecosystem

zarrow is intended to be the base layer for a Zig Arrow ecosystem.

Downstream and related projects:

  • zarrow-compute — Arrow compute kernels in Zig, built on top of zarrow / zarrow-core.
  • zarrow-parquet — Parquet read/write integration built as a downstream project.

Planned or natural ecosystem directions:

  • compute kernels;
  • dataframe/query-engine integration;
  • Parquet integration;
  • Arrow Flight / IPC tooling;
  • PyArrow / Arrow C++ / arrow-rs interoperability examples;
  • more downstream examples using zarrow-core.

If your project uses zarrow, feel free to open a PR adding it to this section.


Roadmap

Near-term priorities:

  • keep Arrow layout behavior spec-aligned;
  • expand type coverage tests and examples;
  • strengthen IPC and C ABI interop coverage;
  • improve downstream zarrow-core ergonomics;
  • support zarrow-compute with stable compute framework boundaries;
  • add more real-world interop demos;
  • improve documentation for ownership, memory layout, and lifecycle rules.

Non-goals for this repository:

  • becoming a full query engine;
  • owning every compute kernel directly inside zarrow;
  • implementing Parquet in the core repository;
  • hiding Arrow ownership semantics behind implicit runtime behavior.

Contributing

Contributions are welcome, especially in these areas:

  • correctness tests;
  • error-path tests;
  • Arrow IPC edge cases;
  • C ABI lifecycle coverage;
  • examples and documentation;
  • benchmarks;
  • downstream integration feedback;
  • type coverage improvements.

Before opening a PR:

zig fmt src tools examples benchmarks build.zig
zig build test
zig build benchmark-smoke

Please keep changes focused and mention any compatibility impact on public API, IPC behavior, or interop behavior.

See CONTRIBUTING.md for details.


License

Licensed under the Apache License, Version 2.0.

See LICENSE.