How LadybugDB’s rich type system transforms property graphs from flat key-value stores into expressive, queryable knowledge structures — and why this matters for AI memory, sovereign agents, and graph
Introduction: Properties Are Not an Afterthought
In most graph databases, properties are second-class citizens. You get a string key, a scalar value, maybe a JSON blob if you’re lucky — and that’s it. The graph structure carries the meaning; properties are just metadata hanging off the side.
LadybugDB takes a fundamentally different position. Built on the Kùzu engine, LadybugDB enforces typed schemas at write time: every node table and every relationship table declares its properties with explicit types, and the database rejects anything that doesn’t conform. This isn’t just type safety for its own sake. It means properties can be complex, nested, queryable, and semantically meaningful — not just bags of bytes attached to graph elements.
This article explores LadybugDB’s property type system in depth. We’ll cover STRUCT for complex embedded data, LIST and ARRAY for multi-valued properties, UNION for polymorphic values, and then step back to ask a deeper question: what happens when you stop treating properties as attributes and start treating them as first-class graph citizens? The answer leads us into Semantic Spacetime and the HAS_PROPERTY relation — a pattern that turns the property graph model inside out.
Sovereign Agentic AI (Volodymyrs View) is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.
STRUCT — Complex Types That Actually Work
What STRUCT Is
A STRUCT in LadybugDB is a fixed-schema composite type: a mapping of named keys to typed values, where every instance of that STRUCT must contain the same set of keys. Think of it as an embedded row — a typed record living inside a single property column.
RETURN {first: 'Ada', last: 'Lovelace'} AS name;
The result is not a string. It’s a structured value with two fields, each independently typed and individually accessible via dot notation:
RETURN {first: 'Ada', last: 'Lovelace'}.first AS given_name;
-- Returns: 'Ada'
You can also construct STRUCTs explicitly with STRUCT_PACK:
RETURN STRUCT_PACK(first := 'Ada', last := 'Lovelace') AS name;
Defining STRUCT Properties in Node Tables
The real power appears when you embed STRUCTs in your schema definitions:
CREATE NODE TABLE Agent (
id STRING PRIMARY KEY,
name STRING,
config STRUCT(
model STRING,
temperature DOUBLE,
max_tokens INT64
),
location STRUCT(
lat DOUBLE,
lon DOUBLE,
city STRING
)
);
Every Agent node now carries a structured configuration and a structured location — not as opaque JSON, not as flattened columns, but as typed nested records that the query engine understands at the storage level.
Nested STRUCTs: Depth Without Compromise
STRUCTs compose. You can nest a STRUCT inside a STRUCT, and query into the nesting with chained dot notation:
CREATE NODE TABLE Document (
id STRING PRIMARY KEY,
title STRING,
metadata STRUCT(
author STRUCT(name STRING, email STRING),
created TIMESTAMP,
version INT32
)
);
-- Querying nested fields:
MATCH (d:Document)
WHERE d.metadata.author.name = 'Volodia'
RETURN d.title, d.metadata.version;
This is native, compiled, optimized access — not a string parse at query time.
Why STRUCT Beats JSON
The question inevitably arises: why not just store JSON? LadybugDB does support a JSON extension, and JSON blobs are convenient. But there are structural reasons to prefer STRUCT for data you intend to query.
Schema enforcement. A STRUCT column is typed at table creation. If your schema says config STRUCT(model STRING, temperature DOUBLE), then every row must provide exactly those fields with exactly those types. JSON blobs are bags of bytes — missing keys, wrong types, and schema drift are invisible until you try to parse at query time.
Storage efficiency. Because the engine knows the fixed layout of a STRUCT at write time, it can use columnar storage for each field independently. The temperature field across a million Agent nodes is stored as a contiguous column of doubles, which means vectorized scans are possible. A million JSON blobs are a million variable-length strings that require parsing before any comparison.
Query optimization. When you write WHERE d.metadata.author.name = 'Volodia', the query planner knows it’s filtering on a STRING field at a known offset inside a known structure. It can push this predicate down, apply indexes if available, and avoid deserializing the entire metadata blob. With JSON, the best you can do is a string-contains predicate or a function call that extracts a value at runtime.
Type safety. JSON doesn’t distinguish between an integer and a float, or between a missing key and a null value. STRUCT does. In a knowledge graph where property semantics matter — where similarity: 0.85 is a DOUBLE, not a string “0.85” — this distinction is not academic.
In short: JSON is for interchange. STRUCT is for storage and query.
Handling JSON as STRUCT
In practice, data often arrives as JSON — from APIs, from LLM outputs, from log streams. LadybugDB (through the JSON extension inherited from Kùzu) can ingest JSON files, and nested JSON objects are automatically mapped to STRUCT types.
Consider a JSON file agents.json:
[
{
"id": "agent-001",
"name": "Scout",
"config": {
"model": "claude-sonnet-4-20250514",
"temperature": 0.7,
"max_tokens": 4096
}
}
]
When you load this:
LOAD FROM 'agents.json' RETURN *;
LadybugDB infers the config field as STRUCT(model STRING, temperature DOUBLE, max_tokens INT64). You can then COPY FROM directly into a node table whose schema matches this structure. The JSON-to-STRUCT mapping is seamless — but once the data is inside the database, it’s no longer JSON. It’s typed, columnar, and queryable.
Using STRUCT in Queries
STRUCT properties participate fully in Cypher expressions:
-- Filter on nested STRUCT fields
MATCH (a:Agent)
WHERE a.config.temperature < 0.5
AND a.location.city = 'Berlin'
RETURN a.name, a.config.model;
-- Return STRUCT as a whole
MATCH (a:Agent {id: 'agent-001'})
RETURN a.config AS agent_configuration;
-- Project specific STRUCT fields
MATCH (a:Agent)
RETURN a.name,
a.config.model AS model,
a.location.lat AS latitude;
STRUCT for Conversations: Messages as a Single Node
Here is where STRUCT becomes especially interesting for AI memory systems. Consider the problem of storing a conversation. The naive approach creates a node per message and links them in a chain:
(Conv)-[:HAS_MSG]->(M1)-[:NEXT]->(M2)-[:NEXT]->(M3)
This works, but it means a single conversation turns into potentially hundreds of nodes and edges. For an agent that has thousands of conversations, the graph becomes dominated by message plumbing rather than semantic content.
With STRUCT and LIST, you can represent an entire conversation as a single node:
CREATE NODE TABLE Conversation (
id STRING PRIMARY KEY,
topic STRING,
started_at TIMESTAMP,
ended_at TIMESTAMP,
messages STRUCT(
role STRING,
content STRING,
timestamp TIMESTAMP
)[],
summary STRING,
embedding FLOAT[384]
);
The messages field is a LIST of STRUCTs — an ordered sequence of typed records. Each message has a role (”user”, “assistant”, “system”), content, and timestamp, all type-checked and stored efficiently.
Inserting a conversation:
CREATE (:Conversation {
id: 'conv:2025-03-14-tea',
topic: 'Liu Bao aging chemistry',
started_at: timestamp('2025-03-14T10:00:00'),
ended_at: timestamp('2025-03-14T10:32:00'),
messages: [
{role: 'user', content: 'How does post-fermentation affect Liu Bao?',
timestamp: timestamp('2025-03-14T10:00:00')},
{role: 'assistant', content: 'The golden flowers (Eurotium cristatum) produce...',
timestamp: timestamp('2025-03-14T10:00:15')},
{role: 'user', content: 'Is this similar to shou puer processing?',
timestamp: timestamp('2025-03-14T10:01:30')}
],
summary: 'Discussion of Liu Bao post-fermentation and comparison with shou puer',
embedding: NULL
});
Querying into the message list:
-- Find conversations where the user mentioned 'puer'
MATCH (c:Conversation)
WHERE list_any_value(
c.messages,
m -> m.role = 'user' AND m.content CONTAINS 'puer'
)
RETURN c.id, c.topic, c.started_at;
The advantage is compression without loss of structure. The conversation node participates in graph relationships — it can connect to topic nodes, agent nodes, knowledge entities extracted from the conversation — while the message sequence lives compactly inside a single property. This is the right granularity for memory graphs: conversations are the atomic units of experience, not individual messages.
Lists, Arrays, and Multi-Label Emulation
LIST: Variable-Length Typed Collections
A LIST in LadybugDB is an ordered, variable-length sequence of values, all of the same type. It is declared with square bracket syntax:
CREATE NODE TABLE Entity (
id STRING PRIMARY KEY,
label STRING,
tags STRING[],
scores DOUBLE[]
);
Lists are created with bracket notation:
CREATE (:Entity {
id: 'e:berlin',
label: 'Berlin',
tags: ['city', 'capital', 'european', 'historical'],
scores: [0.95, 0.88, 0.72]
});
And queried with list functions:
-- Filter on list contents
MATCH (e:Entity)
WHERE list_contains(e.tags, 'capital')
RETURN e.label;
-- Extract by index
MATCH (e:Entity {id: 'e:berlin'})
RETURN e.tags[1] AS first_tag;
-- List length
MATCH (e:Entity)
RETURN e.label, len(e.tags) AS tag_count;
ARRAY: Fixed-Length Typed Collections
ARRAY is a special case of LIST where the length is fixed at schema definition. This is critical for embedding vectors:
CREATE NODE TABLE Memory (
id STRING PRIMARY KEY,
content STRING,
embedding FLOAT[384]
);
Every Memory node must have exactly 384 floats in its embedding. This fixed-length constraint enables HNSW vector indexes for approximate nearest neighbor search — you can’t build a vector index on a variable-length list.
Multi-Label Emulation with LIST
In LadybugDB, every node belongs to exactly one node table (one “label”). This is a consequence of the typed-table architecture: a row lives in one table. But real-world entities often have multiple roles. A person is simultaneously an author, a researcher, and a tea enthusiast. A concept is both a “tool” and an “algorithm.”
The idiomatic solution is a labels LIST property:
CREATE NODE TABLE EntityNode (
id STRING PRIMARY KEY,
label STRING,
kind STRING,
labels STRING[],
layer STRING
);
CREATE (:EntityNode {
id: 'e:ladybugdb',
label: 'LadybugDB',
kind: 'tool',
labels: ['database', 'graph-database', 'embedded-database', 'kùzu-fork'],
layer: 'instance'
});
Now you can query multi-label semantics:
-- Find all entities that are both a database and embedded
MATCH (e:EntityNode)
WHERE list_contains(e.labels, 'database')
AND list_contains(e.labels, 'embedded-database')
RETURN e.label, e.labels;
-- Find entities by any label in a set
MATCH (e:EntityNode)
WHERE list_any_value(e.labels, l -> l IN ['graph-database', 'knowledge-graph'])
RETURN e.label;
This is more expressive than a single kind field, and it avoids the explosion of node tables that you’d need if every label combination required its own table. The LIST property becomes a lightweight set-membership predicate.
Part III: UNION Types — Polymorphic Properties
What UNION Is
UNION in LadybugDB (inherited from Kùzu) is a tagged variant type, analogous to std::variant in C++ or a sum type in functional programming. A UNION column can hold a value of one of several declared types, with a tag field indicating which type is currently active.
Internally, UNION is implemented as a STRUCT with an extra tag key.
CREATE NODE TABLE PropertyValue (
id STRING PRIMARY KEY,
name STRING,
value UNION(
str_val STRING,
int_val INT64,
float_val DOUBLE,
bool_val BOOLEAN,
list_val STRING[]
)
);
A PropertyValue node can hold a string, an integer, a double, a boolean, or a list of strings — but only one at a time. The tag field tells you which.
Why UNION Matters for Knowledge Graphs
Knowledge graphs are inherently polymorphic. The “value” of a property can be anything: a name (string), an age (integer), a confidence score (double), a truth value (boolean), or a set of tags (list). In a traditional property graph, you’d either create separate columns for each type (wasteful and rigid) or serialize everything to strings (losing type information).
UNION lets you store the actual typed value while maintaining a single column:
CREATE (:PropertyValue {
id: 'pv:age-alice',
name: 'age',
value: union_value(int_val := 34)
});
CREATE (:PropertyValue {
id: 'pv:name-alice',
name: 'name',
value: union_value(str_val := 'Alice')
});
CREATE (:PropertyValue {
id: 'pv:skills-alice',
name: 'skills',
value: union_value(list_val := ['Rust', 'Python', 'Cypher'])
});
Querying by tag:
MATCH (pv:PropertyValue)
WHERE union_tag(pv.value) = 'int_val'
RETURN pv.name, union_extract(pv.value, 'int_val') AS int_value;
This pattern is essential when you model properties as graph nodes (more on this below).
Properties in Nodes — The Power of Typed Schemas
Properties as Ontological Commitments
When you define a node table in LadybugDB, you are not merely allocating storage. You are making an ontological commitment: “every entity of this type has these properties, with these types, and I will enforce this at write time.”
CREATE NODE TABLE SimilarEdge (
id STRING PRIMARY KEY,
layer STRING,
kind STRING,
context STRING,
similarity DOUBLE,
learned_at TIMESTAMP,
expired_at TIMESTAMP
);
This declaration says: similarity is always a DOUBLE between 0 and 1, context is always a STRING, and every edge-node carries temporal bounds. This is not a suggestion — it is a contract.
The power of this contract is that queries can rely on it:
MATCH (src:EntityNode)-[:CONNECTS]->(se:SimilarEdge)-[:BINDS]->(tgt:EntityNode)
WHERE se.similarity > 0.8
AND se.expired_at IS NULL
AND se.layer = 'domain'
RETURN src.label, se.similarity, tgt.label
ORDER BY se.similarity DESC;
The query planner knows that similarity is a DOUBLE, that expired_at is a nullable TIMESTAMP, and that layer is a STRING. It can compile this into efficient vectorized operations without any runtime type checking.
Properties on Relationships
Relationship tables in LadybugDB can also carry properties:
CREATE REL TABLE CONNECTS (
FROM EntityNode
TO SimilarEdge | ContainsEdge | HasPropertyEdge | LeadsToEdge,
weight DOUBLE DEFAULT 1.0,
added_at TIMESTAMP DEFAULT current_timestamp()
);
This is important for the bipartite Semantic Spacetime schema. The CONNECTS and BINDS relationships themselves can carry metadata — a weight, a timestamp, an annotation — without polluting the edge-node’s own property space.
Properties as Graph Nodes — The HAS_PROPERTY Relation
The Shift in Perspective
Everything we’ve discussed so far treats properties as columns inside a table. This is the standard property graph view: nodes have attributes, attributes are scalar or nested values, and you query them with dot notation.
But there’s another way to think about properties — one that aligns with how Semantic Spacetime models knowledge: properties are themselves entities, connected to their subjects by the HAS_PROPERTY relation.
In Mark Burgess’s γ(3,4) framework, EXPRESSES_PROPERTY (or HAS_PROPERTY) is one of the four fundamental relation types alongside NEAR/SIMILAR_TO, LEADS_TO, and CONTAINS. It is not merely an attribute assignment — it is a directed assertion that a subject manifests a particular quality, and that quality has its own identity, context, and lifetime.
The HasPropertyEdge Pattern
In the Semantic Spacetime schema for LadybugDB, we make this explicit:
CREATE NODE TABLE HasPropertyEdge (
id STRING PRIMARY KEY,
layer STRING,
kind STRING, -- 'intrinsic' | 'relational' | 'derived'
property_name STRING,
learned_at TIMESTAMP,
expired_at TIMESTAMP
);
And we connect property values as entities:
-- The entity
CREATE (:EntityNode {
id: 'e:alice',
label: 'Alice',
kind: 'actor',
layer: 'instance',
learned_at: timestamp('2025-01-01T00:00:00'),
expired_at: NULL
});
-- The property value as an entity
CREATE (:EntityNode {
id: 'e:skill-rust',
label: 'Rust Programming',
kind: 'skill',
layer: 'domain',
learned_at: timestamp('2023-06-15T00:00:00'),
expired_at: NULL
});
-- The HAS_PROPERTY edge-node connecting them
CREATE (:HasPropertyEdge {
id: 'hp:alice-rust',
layer: 'instance',
kind: 'relational',
property_name: 'has_skill',
learned_at: timestamp('2023-06-15T00:00:00'),
expired_at: NULL
});
-- Wire the bipartite connections
MATCH (alice:EntityNode {id: 'e:alice'}),
(hp:HasPropertyEdge {id: 'hp:alice-rust'})
CREATE (alice)-[:CONNECTS]->(hp);
MATCH (hp:HasPropertyEdge {id: 'hp:alice-rust'}),
(rust:EntityNode {id: 'e:skill-rust'})
CREATE (hp)-[:BINDS]->(rust);
Why Promote Properties to Nodes?
This might seem like unnecessary indirection. Why not just store skills: ['Rust', 'Python'] as a LIST property? There are several reasons:
Property-to-property relations. Once a property is a node, it can participate in relationships. “Rust Programming” can be connected via SimilarEdge to “C++ Programming” (with similarity: 0.72). It can be contained in “Systems Programming Languages” via ContainsEdge. It can lead to “Memory Safety Expertise” via LeadsToEdge. None of this is possible if “Rust” is a string in a list.
Temporal provenance. The HasPropertyEdge node carries its own learned_at and expired_at timestamps. When did Alice learn Rust? When did she stop using it? This temporal provenance is first-class in the graph, not an annotation buried in a property.
Contextual attribution. The kind field on HasPropertyEdge distinguishes intrinsic properties (height, birth date), relational properties (friendship, membership), and derived properties (computed similarity, inferred category). This matters for reasoning: intrinsic properties are stable, relational properties depend on context, and derived properties need recomputation.
Query expressiveness. With properties as nodes, you can ask questions that are impossible in a flat property model:
-- What skills are similar to Alice's skills?
MATCH (alice:EntityNode {id: 'e:alice'})
-[:CONNECTS]->(hp:HasPropertyEdge {property_name: 'has_skill'})
-[:BINDS]->(skill:EntityNode)
-[:CONNECTS]->(se:SimilarEdge)
-[:BINDS]->(related_skill:EntityNode)
WHERE se.similarity > 0.6
RETURN skill.label AS known_skill,
related_skill.label AS related,
se.similarity;
-- What properties do Alice and Bob share?
MATCH (alice:EntityNode {id: 'e:alice'})
-[:CONNECTS]->(hp1:HasPropertyEdge)-[:BINDS]->(prop:EntityNode),
(bob:EntityNode {id: 'e:bob'})
-[:CONNECTS]->(hp2:HasPropertyEdge)-[:BINDS]->(prop)
RETURN prop.label AS shared_property,
hp1.property_name AS property_type;
These are graph traversals through the property space — something that flat attributes simply cannot support.
Property Sets and Polymorphic Entities
Objects as Sets of Properties
There is a deep idea here that connects to foundational thinking in philosophy and type theory: an object is not a thing with properties — an object is its set of properties.
In the Semantic Spacetime view, an EntityNode is a point in semantic space. Its identity is established by its id, but its meaning is determined by the constellation of HasPropertyEdge connections radiating outward. “Alice” is the intersection of “has skill Rust”, “lives in Berlin”, “works at Anthropic”, “speaks Ukrainian” — and if you removed all these properties, what remains is an empty identifier.
This is the property-set view of entities. It aligns with:
Feature-based semantics in linguistics, where the meaning of a word is a vector of features.
Prototype theory in cognitive science, where category membership is determined by shared properties.
Structural typing in programming languages, where a type is defined by its interface (the set of operations/properties it supports), not its name.
Polymorphism Through Property Sets
In LadybugDB, polymorphism emerges naturally from the property-set view. Two entities don’t need to share a node table to be “similar” — they need to share property connections.
-- Find entities with similar property profiles
MATCH (a:EntityNode)-[:CONNECTS]->(hp:HasPropertyEdge)-[:BINDS]->(prop:EntityNode),
(b:EntityNode)-[:CONNECTS]->(:HasPropertyEdge)-[:BINDS]->(prop)
WHERE a.id <> b.id
WITH a, b, collect(prop.label) AS shared_properties,
count(prop) AS overlap
WHERE overlap >= 3
RETURN a.label, b.label, shared_properties, overlap
ORDER BY overlap DESC;
This query finds entities that share at least three property-values — regardless of what “type” they are. A person and an organization might both “have property: located in Berlin”, “have property: works with Rust”, “have property: publishes in English.” The shared property set makes them structurally similar even if they belong to different ontological categories.
This is polymorphism at the knowledge level, not the code level.
FIRE Ontology and Property Kinds
The kinds of properties map to the FIRE ontology pattern (Facts, Inferences, Relations, Experiences):
Intrinsic properties are Facts: stable, self-describing attributes (name, creation date, type).
Relational properties are Relations: attributes that exist only in context (membership, skill-application, authorship).
Derived properties are Inferences: computed or learned attributes (embedding similarity, predicted category, aggregated score).
By tagging each HasPropertyEdge with its kind, the agent can reason about which properties to trust (intrinsic), which might be context-dependent (relational), and which need recomputation (derived).
Bringing It Together — A Memory Schema in Practice
Let’s assemble a practical schema that uses all these features — STRUCT for compact embedded data, LIST for multi-valued attributes, UNION for polymorphic values, and HasPropertyEdge for queryable property relations:
-- Entities with structured metadata and multi-label support
CREATE NODE TABLE EntityNode (
id STRING PRIMARY KEY,
label STRING,
kind STRING,
labels STRING[], -- multi-label emulation
layer STRING,
metadata STRUCT(
source STRING,
confidence DOUBLE,
version INT32
),
learned_at TIMESTAMP,
expired_at TIMESTAMP
);
-- Conversations with embedded message sequences
CREATE NODE TABLE Conversation (
id STRING PRIMARY KEY,
topic STRING,
messages STRUCT(role STRING, content STRING, ts TIMESTAMP)[],
summary STRING,
embedding FLOAT[384],
started_at TIMESTAMP,
ended_at TIMESTAMP
);
-- Edge-node for property relations
CREATE NODE TABLE HasPropertyEdge (
id STRING PRIMARY KEY,
layer STRING,
kind STRING,
property_name STRING,
value UNION(
str_val STRING,
int_val INT64,
float_val DOUBLE,
bool_val BOOLEAN,
list_val STRING[]
),
learned_at TIMESTAMP,
expired_at TIMESTAMP
);
-- Bipartite wiring
CREATE REL TABLE CONNECTS (
FROM EntityNode | Conversation
TO HasPropertyEdge
);
CREATE REL TABLE BINDS (
FROM HasPropertyEdge
TO EntityNode
);
In this schema:
STRUCT handles the metadata block on entities and the messages list on conversations — compact, typed, queryable.
LIST handles multi-label (labels STRING[]) and the message sequence (STRUCT(...)[]).
UNION handles the polymorphic value field on property edge-nodes, where a property might carry a string, a number, or a list.
HasPropertyEdge promotes property attribution to a first-class graph relation, enabling property-to-property traversals and temporal provenance.
The result is a graph where properties are simultaneously efficient (columnar storage, vectorized scans) and expressive (queryable, composable, connected). This is what it means to take properties seriously in a graph database — not as metadata, but as structure.
Properties All the Way Down
The property graph model gets its name from properties, yet most implementations treat them as flat annotations. LadybugDB’s type system — STRUCT, LIST, ARRAY, MAP, UNION — gives properties the same structural richness that nodes and edges enjoy.
But the deeper insight comes from Semantic Spacetime: properties are not just values on nodes. They are relationships between entities and qualities, carrying their own identity, temporality, and semantics. The HAS_PROPERTY relation in the γ(3,4) framework is as fundamental as SIMILAR_TO, CONTAINS, and LEADS_TO.
When you model properties as graph nodes, several things happen at once. Property-to-property relations become possible. Temporal provenance becomes natural. Polymorphic entities emerge from shared property sets. And the boundary between “schema” and “data” blurs productively — because in a knowledge graph, the schema is the knowledge.
LadybugDB gives you both levels. Use STRUCT and LIST for properties that should be compact and fast. Promote to HasPropertyEdge for properties that need to participate in the graph. The two approaches coexist, and the choice between them is itself a meaningful ontological decision.
This is the view from Semantic Spacetime: objects are sets of properties, properties are graph relations, and the graph is the ontology. With LadybugDB, all of this runs on your laptop — or your phone — embedded in your agent process, enforced at write time, and queryable in Cypher.
To Learn more read my book :