Notes on Streaming JSON
This document is not formThis document is not formally part of the Connected JSON or Extended CJ specification.
It describes the rationale for the Streaming recommendations within the Connected JSON spec. The concepts in this document apply to all JSON formats that might get streamed.
Streaming JSON is easy with formats like JSON Lines (aka JsonL). However, formats like Connected JSON have a more complex nesting structure. With some care, such formats can also be streamed.
What happens technically when streaming data?
-
A sender generates a data stream
-
A receiver parses the incoming stream first as a stream of bytes, them characters, then JSON events, and finally events on the domain object layer. A database might get filled or domain objects might be created.
Crucially, data is streamed to avoid buffering it completely in memory. Let us look at each step in the receiver pipeline:
Layer | Events | Required buffer |
---|---|---|
Bytes |
byte(x) |
none |
Characters |
codepoint(x) |
few bytes, e.g., to decode UTF8 |
JSON |
object-start, object-end, array-start, array-end, key(x), primitive(x) |
as large as a primitive value (could be split) |
Domain |
domain objects |
Usually whole domain objects are constructed |
Connected JSON is challenging, due to the recursive nature of graphs having nodes, nodes potentially having subgraphs, and then these graphs having nodes … and so on.
Domain objects (e.g., graph) consists of attribute-properties (e.g., id, label) and child-properties (e.g., nodes and edges). For good streaming, all attribute-properties need to be sent before all child-properties. |
Attribute properties contain JSON primitives or smaller, non-recursive JSON structures.
Child properties may contain a large or even unbounded number of child elements and/or may have recursion as part of their data model.