Notes on Streaming JSON

This document is not formThis document is not formally part of the Connected JSON or Extended CJ specification.

It describes the rationale for the Streaming recommendations within the Connected JSON spec. The concepts in this document apply to all JSON formats that might get streamed.

Recommendations for streaming JSON.

Streaming JSON is easy with formats like JSON Lines (aka JsonL). However, formats like Connected JSON have a more complex nesting structure. With some care, such formats can also be streamed.

What happens technically when streaming data?

  • A sender generates a data stream

  • A receiver parses the incoming stream first as a stream of bytes, them characters, then JSON events, and finally events on the domain object layer. A database might get filled or domain objects might be created.

Crucially, data is streamed to avoid buffering it completely in memory. Let us look at each step in the receiver pipeline:

Table 1. Receiver Pipeline
Layer Events Required buffer

Bytes

byte(x)

none

Characters

codepoint(x)

few bytes, e.g., to decode UTF8

JSON

object-start, object-end, array-start, array-end, key(x), primitive(x)

as large as a primitive value (could be split)

Domain

domain objects

Usually whole domain objects are constructed

Connected JSON is challenging, due to the recursive nature of graphs having nodes, nodes potentially having subgraphs, and then these graphs having nodes …​ and so on.

Domain objects (e.g., graph) consists of attribute-properties (e.g., id, label) and child-properties (e.g., nodes and edges).
For good streaming, all attribute-properties need to be sent before all child-properties.

Attribute properties contain JSON primitives or smaller, non-recursive JSON structures.
Child properties may contain a large or even unbounded number of child elements and/or may have recursion as part of their data model.