Connected JSON Specification

1. Introduction

We want a JSON-based document for exchanging graphs. Graphs contain nodes and edges. Undirected edges, directed edges (DAG), typed edges (Hello RDF), weighted edges (Hello flow algorithms) and even hyper-edges (Hello biologists). We want subgraphs (Hello diagrams). We want data attached to nodes and edges (Hello knowledge graphs).

1.1. Goals and Motivation

Yes, we know, but the last effort (JGF, the JSON Graph Format) is over 10 years old and GraphML over 20 years by now. And some GraphML features (mixed hyper-edges, nested graphs) are not supported in JGF. In fact, none of the existing JSON graph interchange formats has the same breadth of features as the over 20-year-old XML-based GraphML.

Connected JSON aims to be a full GraphML replacement. It supports the semantic capabilities and data representation found in GraphML, while adopting a more flexible, schema-less JSON approach.

This format is intended as a universal interchange format for all kinds of graphs, which can be as complex as what GraphML allows — and that is a lot.

For ways how to interpret similar, much more flexible formats unambiguously as Connected JSON, look into Extended CJ.

To support streaming for large graphs (> 1 GB) and to make textual diffing Connected JSON files easy, we also define Canonical Connected JSON.

1.2. Example

Connected JSON Example File
{
  "connectedJson": {
    "versionDate": "2025-07-14",
    "versionNumber": "5.0.0"
  },
  "baseUri": "http://example.org/",
  "graphs": [{
    "nodes": [
      { "id":  "12" },
      { "id":  "a",
        "ports": [
          { "id": "a1"},
          { "id": "a2",
            "ports": [ "a2-1", "a2-2" ]
          }]},
      { "id":  "b", "data": {"foo": "bar"} },
      { "id":  "c" },
      { "id":  "d" },
      { "id":  "e" },
      { "id":  "f" }
    ],
    "edges": [
      { "endpoints": [
        { "direction": "in", "node":  "12"},
        { "direction": "out", "node":  "a"}
      ]},
      { "endpoints":  [
        { "direction": "in", "node": "12", "port":  "a2-1"},
        { "direction": "out", "node": "a"}
      ]},
      { "endpoints":  [
        { "direction": "in", "node": "12"},
        { "direction": "in", "node": "a", "port": "a2-1" },
        { "direction": "out", "node": "d"},
        { "direction": "out", "node": "e"}
      ]},
      { "endpoints":  [
        { "direction": "in", "node": "12"},
        { "direction": "in", "node": "a"},
        { "direction": "out", "node": "d"},
        { "direction": "out", "node": "e"},
        { "direction": "undir", "node": "f"}
      ]}
    ],
    "data": {
      "hello": ["My data","can be","here"]
    }
  }]
}

1.3. Change Log

2025-07-14: Version 5.0.0
  • Split spec into two parts: Connected JSON for writing strict files, where there is always only one option to encode a structure and Extended CJ which is much more liberal and flexible in parsing.

  • Moved edgeDefault to Extended CJ.

2025-07-10: Version 4.0.0
  • Simplified graph nesting. Now a CJ document is a graph (or array of graphs).

2025-07-03: Version 3.0.0
  • Renamed all properties with a dash to camelCase form. This makes it pragmatically more easy to represent properties in programming languages as variable names or enum values.

    • type-nodetypeNode

    • type-uritypeUri

  • Renamed some lowercase properties to camelCase form. This avoids IDEs and editors complaining about spelling.

    • baseuribaseUri

    • edgedefaultedgeDefault

2025-06-26: Version 2.0.0
  • Multilingual labels (Label): switched from a JSON object with language tags as property keys to a more canonical array-form.

2025-04-30: Version 1.1.0
  • Clearer ID section

  • Allow graph inside edge (consistent with diagram an GraphML)

2025-04-08: Version 1.0.0

Initial public release

2. Overview

Suggested MIME type: application/connected+json (not yet registered).

We define two main formats:

Connected JSON (CJ)

A strict format for writing. There is always only one option to encode a structure.

Extended CJ (ECJ)

A relaxed superset of CJ for reading. It offers many aliases, shortcuts and variants to interpret JSON as as graph. See Extended CJ Specification.

These main formats are refined based on allowing comments (JSON5 adds comments to JSON) and canonicalization:

Table 1. The Connected JSON Formats
Name Default file extension Purpose Allows JSON Comments

Defined in Connected JSON (this specification)

Connected JSON

.cj or .cj.json

Written by tools

no

Connected JSON

.cj.json5

Written by tools, commented by humans.

yes

Canonical Connected JSON

.cj

Optimized for streaming and diffing

no

Defined in Extended CJ

Extended Connected JSON

.json

Read diverse JSON files

no

Extended Connected JSON

.json5

Read diverse JSON files

yes

All formats restrict JSON to the I-JSON subset defined in RFC 7493: No duplicate object properties, UTF-8 encoding, no unpaired UTF-8 surrogate pairs.

2.1. Conceptual Model

Before diving into JSON structures, it is helpful to describe how Connected JSON sees a graph. In general, Connected JSON supports hyperedges with mixed directionality, like GraphML. It also keeps the node and optional port model from GraphML. It supports two ways of Graph Nesting. Connected JSON allows (multilingual) labels on many elements.

  • A document contains graphs.

  • A graph contains nodes and edges.

  • A node may optionally consist of a hierarchical tree of ports.

  • An edge refers to nodes via endpoints.

  • An endpoint defines for each edge-node connection, what the direction is (is the node going into the edge, out of the edge or has no direction)

  • An endpoint can connect to a node and optionally fine-tune to a port within that node.

Conceptual Model
Figure 1. Conceptual Model

3. Elements

3.1. Document

Every file is a document.

Table 2. Property Table in Canonical / Streaming Order
Property Type Description

connectedJson

object(Document Metadata)

Optional. Document Metadata

baseUri

string(URI)

Optional. Is used to fine-tune the Interpretation as RDF.

data

any

Optional. Allows user-attached Data.

graphs

array(Graph [])

Default: Empty. See also Graph Nesting.

3.1.1. Document Metadata

A graph may state a connectedJson property, which is only interpreted at root level.

Property Type Description

versionDate

string

Optional. Version date identifier to define the Connected JSON version used by the document. E.g. 2025-07-10

versionNumber

string

Optional. Version number identifier to define the Connected JSON version used by the document. E.g. 4.0.0

3.2. ID

IDs (identifiers) are used in Connected JSON to address nodes, ports, edges and graphs. Ids are strings.

If an array contains elements with an id (this mechanism is used in graphs, nodes, edges) then the ids must be unique within that array. If an id is for multiple entries in the array, later entries are interpreted as JSON Merge Patch on the earlier ones and a parse warning MUST be emitted. The merging is done as defined in RFC 7386.

3.2.1. Identifier Scope

The identifiers for different elements have different scopes in which they must be unique.

Scope

Comment

Document

Node ids, Edge ids and Graph ids are unique per document. Nested graphs do not provide a new id scope.

Node

Port ids are only unique within their corresponding Node.

3.3. Label

Labels are used in Connected JSON to label nodes, ports, edges and graphs. In Connected JSON, labels are multilingual: They consist of an object with an optional language property and a required value property. The label itself is an array of such label entries.

[
    {"language":"de", "value": "Hallo, Welt"},
    {"language":"en", "value": "Hello, World"},
    // a value without language information is also allowed
    { "value": "Hi"}
]

If a language tag (including the empty one) is used multiple times, later entries are interpreted as JSON Merge Patch on the earlier ones and a parse warning MUST be emitted. The merging is done as defined in RFC 7386.

Table 3. Property Table in Canonical / Streaming Order
Property Type Description

language

string

Optional. Language tag. Usually according to BCP 47.

value

string

Required. The label value.

data

any

Optional. Allows user-attached Data.

Multilingual labels in Connected JSON have been modelled similar to labels in JSON-LD 1.1, expanded form.

3.4. Graph

Contains one or more nodes and/or one or more edges.

Table 4. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Optional. Unique identifier for the graph within a Document. See ID.

meta

object(Graph Metadata)

Optional. Graph Metadata

label

object

Optional. Label (name) of the graph. See Label.

data

any

Optional. Allows user-attached Data.

nodes

array(Node [])

0 to n nodes. Default: Empty.

edges

array(Edge [])

0 to n edges (which may be bi- oder hyperedges). Default: Empty.

graphs

array(Graph [])

Default: Empty. See Graph Nesting.

3.4.1. Graph Metadata

To make handling large graphs easier, a graph may include a meta header. This head is most useful at the root graph, before any nodes and edges are stated.

Property Type Description

canonical

boolean

Optional. If true, this graph is considered a canonical representation of the graph. I.e., all properties are ordered according to the property tables. Default: false.

nodeCountTotal

number(integer)

Optional. Total count of nodes in this graph including all nodes in subgraphs.

edgeCountTotal

number(integer)

Optional. Total count of edges in this graph including all edges in subgraphs.

nodeCountInGraph

number(integer)

Optional. The count of nodes directly in this graph excluding nodes in subgraphs.

edgeCountInGraph

number(integer)

Optional. The count of edges directly in this graph excluding edges in subgraphs.

3.5. Node

A node is an atom in the graph.

Table 5. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Required. Unique identifier for the node. See ID.

label

object

Optional. Label (name) of the graph. See Label.

ports

array(Port [])

Optional array of Port.

data

any

Optional. Allows user-attached Data.

graphs

array (Graph [])

Optional. Graph(s) nested within the node. This turns the node into a compound node. The edges in a subgraph can refer to nodes higher up in the tree of graphs. See Graph Nesting.

3.6. Port

A port is always a part of a Node. A layout should place a port on the border of the node widget. Ports may be hierarchically nested. This is used in practice graphical editors, where a port is a connection point on a node.

Table 6. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Required. ID unique within the Node. All ports, even nested one, share the same ID space per node. See also ID.

label

object

Optional. Label (name) of the graph. See Label.

ports

array(Port [])

Optional array of sub-ports. Recursively.

data

any

Optional. Allows user-attached Data.

3.7. Edge

Uses endpoints to link to nodes. However, simple bi-edges with only two ends have a shortcut syntax.

The structural model for any edge is this:

Edge Model
Figure 2. Edge Model
  • An edge has n endpoints.

  • An endpoint defines the direction of the attached node, relative to the edge. Is the node incoming, outgoing or undirected (from the perspective of the edge).

  • A target can be a node or a port attached to a port. Yes, a port can also be nested within other ports, forming a kind of recursive port-tree. GraphML has this.

Edges have been modelled like GraphML. They have been extended with a type-property, to make it easier to express RDF.
Table 7. Property Table in Canonical / Streaming Order
Property Type Description

id

string

Optional id. Unique per graph. See ID.

label

object

Optional. Label (name) of the graph. See Label.

type

string

Optional. The kind of edge. Any type define here applies to all endpoints. Endpoints override this type, if set. See Edge Endpoint and Interpretation as RDF.

typeUri

string

typeNode

string

endpoints

array (Edge Endpoint [])

The endpoints define the nodes to which this edge is attached.

data

any

Optional. Allows user-attached Data.

graphs

array (Graph [])

Optional. Graph(s) nested within the edge. This turns the edge into a compound edge. The edges in a sub-graph can refer to edges higher up in the tree of graphs. See Graph Nesting.

Precedence between type, typeUri and typeNode is the same as defined for Edge Endpoint.

3.8. Edge Endpoint

Table 8. Property Table in Canonical / Streaming Order
Property Type Description

node

string

Required. Node id. A string containing a single nodeId (ID). This is the id of the Node to which this endpoint is attached.

port

string

Optional. Port id. Port ids are only unique per node/port. See ID. If a port is referenced, it defines in addition to the node where precisely the endpoint is attached. NOTE: All port ids are unique within a node (see Identifier Scope), so that a single string can address all ports directly.

direction

One of: in, out or undir

Optional. Maps to incoming (in), outgoing (out), or undirected (undir). Default is undir.

type

string

Optional. The type of relation from the edge entity to the endpoint node. If a URI is given, us typeUri instead. This property states the relation as a string, e.g. works at or knows. Default is related.

typeUri

string(URI)

Optional. The type of relation from the edge entity to the endpoint node.

typeNode

string

Uses a node in the graph (referenced by node id, see ID) to define the kind of relation.
This is the same strategy that RDF uses: property URIs are themselves RDF resources, which can have a label and other edges attached to them.

data

any

Optional. Allows user-attached Data.

Edge Type (type, typeUri, typeNode)
  • Either type, typeUri, or typeNode MAY be used. If several are given, typeUri has precedence, then typeNode, then type. Usually, the type of edge is defined at the Edge level. However, in hyper-edges more complex relations (tuples) may need to be expressed. In this case, endpoint-level typing can be used.
    If both edge and endpoint types are given, the endpoint type has precedence. See also Interpretation as RDF.

4. Features

4.1. Data

User-defined data can be attached to Document, Graph, Node, Edge, Port and Edge Endpoint via the data property.The value may be any JSON value. An array can be used, together with the OCIF extension mechanism.

This can be used, for example, to attach style data (e.g. line-color), domain data (e.g. population, sales volume), provennance data (e.g. source), or any other relevant information.

4.2. Graph Nesting

Graphs can be nested within other graphs (Graphs In Graphs) or within other nodes and edges (Graphs In Nodes And Edges; a GraphML mechanism). The nesting depth is not limited. This allows for hierarchical, recursive graph structures.

All nodes in a top-level graph, including all nodes nested within subgraphs, recursively, share the same ID space. The same is true for edges. Any edges, including those nested in nested graphs, may link to any node within the top-level graph, including those within nested graphs.

Graph Nesting
Figure 3. Graph Nesting

4.2.1. Graphs In Graphs

It partitions nodes and edges into subsets. All nodes and edges are treated as one large graph. Any edge can refer to any node. The subgraph is merely used as a container entity. Its id and label do not contribute to the resulting nodes and edges model.

4.2.2. Graphs In Nodes And Edges

In Connected JSON, like in GraphML, nodes and edges can also contain subgraphs. Those subgraphs are additionally turning their container node into a compound node (or their container edge into a compound edge).

In a compound node, the ID and Label of the subgraph(s) are mapped to id and label of synthetic, implied compound node(s). Typically, this is represented in an application by adding synthetic 'contains'-edges from container element to contained elements.

4.3. Streaming

JSON in general is not ideal for streaming data, see also Notes on Streaming JSON. However, Canonical CJ is designed to be streamed efficiently. The property tables are sorted for optimized stream processing. This order is in contrast to RFC 8785 (JSON Canonicalization Scheme, JCS), which defines strict lexicographical order. Canonical CJ requires the order of properties to be followed exactly.

Rationale

Most entities are expected to be reasonably small, so that they can be completely processed in memory. Some entities may occur a large number of times. In general, small properties must come before the large properties (due to values with many child elements).

5. Canonical Connected JSON

Canonical CJ defines a strict order on property keys, compatible with Streaming, so that files can also be used in textual diffs. Canonical CJ is a strict subset of Connected JSON. It forbids using comments (no JSON5). Canonical CJ mandates a strict formatting, described below. Properties in which the value is an empty array should be omitted.

Summary
  • Mandatory pretty-printing

  • Mandatory property order

5.1. Formatting

There is no RFC defining JSON pretty-printing. So here is a small spec. We need a compact, defined, format, so that different CJ tools create the exact same syntax. Also, we need line-breaks to make textual diffing work. Canonical CJ compliant tools MUST adhere to these rules:

Indentation
  • Each level of nesting within an object or array must be indented.

  • The indentation must consist of two spaces. Tabs must not be used.

Line-Breaks
  • The line break character is \n.

  • The opening brace { of an object and the opening bracket [ of an array must be placed on the same line as their corresponding key or at the beginning of the document.

  • Each key-value pair in an object and each element in an array must be placed on its own line.

  • The closing brace } or bracket ] must be placed on a new line, aligned with the indentation level of its opening brace or bracket.

Spacing
  • There must be one space after the colon : in a key-value pair.

  • No other whitespace (except the indentation spaces and line-breaks) is permitted.

Commas
  • A comma , must follow every element in an array and every key-value pair in an object, except for the last one.

Example
{
  "connectedJson": {
    "versionDate": "2025-07-14",
    "versionNumber": "5.0.0"
  },
  "baseUri": "http://example.org/",
  "data": {
    "author": "Max Völkel"
  },
  "graphs": [
    {
      "id": "graph1",
      "meta": {
        "canonical": true
      },
      "label": {
        "language": "en",
         "value": "Example Graph"
      },
      "nodes": [
        {
          "id": "node1",
          "label": {
            "language": "en",
            "value": "Node 1"
          }
        }
      ],
      "edges": [
        {
          "id": "edge1",
          "label": {
            "language": "en",
            "value": "Edge from Node 1 to Node 2"
          },
          "endpoints": [
            {
              "node": "node1",
              "direction": "out"
            }
          ]
        }
      ]
    }
  ]
}

Appendix A: JSON Schema

Download

Appendix B: Reserved Property Names

The following property names are used by Connected JSON in certain places.

Property Usage

baseUri

Graph base URI for RDF interpretation

connectedJson

Document

canonical

Graph

data

Reserved property for user data. Connected JSON does not interpret this property for any element.

direction

Edge Endpoint direction (in/out/undir)

edges

Graph edges

endpoints

Edge endpoints

graphs

Node nested graphs, Edge nested graphs

id

Node id, Edge id, Graph id, Port id

label

Node, Edge, Graph, Port

language

Label

meta

Graph

node

Edge Endpoint referenced node id

nodes

Graph nodes

port

Edge Endpoint referenced port id

ports

Node ports

type

Edge, Edge Endpoint

typeNode

Edge, Edge Endpoint

typeUri

Edge, Edge Endpoint

value

Label

versionDate

Document Metadata

versionNumber

Document Metadata