Project details
The RFP seeks to develop tools and techniques that refine how we work with graph-based data and integrate with the reasoning capabilities of AGI systems. Photrek aims to build a flexible framework that improves the process of specifying digital systems in a rich, connected hypergraph.
Data can mean different things in different contexts, and sometimes those contexts overlap in complicated ways. Duplicated data should only be stored once on the file system, with every use case referencing the same link. Sometimes you're working with a large set of data with uniform types, and other times you're working with a smaller set of data with rich connectivity between the nodes in the graph. General graph-based knowledge management systems need to take both of these situations into account, allow them to coexist in the same knowledge space, and give the user control over how to efficiently store info on disk.
Performance is often given higher priority than flexibility, but there are benefits to exploring the possibility space before optimizing a solution. MeTTa is an interesting language to describe relationships between symbolic representations and to unify logical constraints within a set of rules, with potential for building flexible systems. We will investigate how our approach to knowledge graphs maps to MeTTa’s semantics, and we will develop a strategy for how Dims should augment the Hyperon framework with added capabilities. Integrating the MORK graph database with Dims will enhance many of the use-cases we hope to improve, but our main aim is to maximize flexibility by leaving most of the constraints up to the user to experiment with different underlying architectures and interfaces. These will be exposed to the user as part of the knowledge graph itself through the metaprogramming features of our implementation language, Zig.
The first and easiest step in the process of building Dims will be to implement a prototype of its core semantic primitives and use them to facilitate formalization of the spec. Because the system’s architecture is meant to be part of the knowledge graph, we don’t need to focus on performance up front. If the initial implementation works, iterating on our solution should be fairly easy for anyone who would want to make changes. In the following paragraphs, we will try to make it clear why this is the case by looking at the thought process behind the core primitives of Dims, and how they are beneficial for managing knowledge.
At the lowest level, Dims aims to store all data in bit-vectors, or binary blobs with specified lengths, and associated sets of rules that dictate how they should be interpreted in a given context. Rules are basically functions, or logical predicates, that return true or false based on whether the binary data conforms to its contextual constraints. Rule sets should be composable as long as they don’t cause logical contradictions. You can have as many or as few rules as you like. The intersections between possible rule sets forms a semilattice where you can choose between the least upper bound and the greatest lower bound for a composition strategy. Do you want to enforce all constraints of both sets, or only agree on the rules that exist in both? Are there rules in one set that preclude rules in the other? Our approach aligns with the logic of truthmaker semantics (Fine, 2017; Champollion, 2024).
For the rest of this description, it will be beneficial for us to focus on higher level goals and save the technical details for the formal spec. First, we will take a look at the internal structure of the Git version control system, and let that inspire our approach. Then, we will take a look at some ideas from personal knowledge management (PKM), and combine them with our ideas from Git. Finally, we will look at ways to govern the expansion of knowledge in a distributed environment.
When you use Git for version control in your software project, it stores compressed versions of your files in the `.git` folder, and references this with the cryptographic hashes of their content. Folders are stored as lists of references to file data, and other lists of references that represent subdirectories in your file hierarchy. Git also stores all the versions of the files, and the diffs between versions, maintaining a persistent history so that you can go back and check out any previous version, if it makes sense to do so.
One of the main advantages of Git over previous version control systems is the ability to push and pull changes between instances of any given Git repository. There is no need for a central server, and you can have as many branches in your code base as you want, to work on different features. The structure of a Git repository is a directed acyclic graph, where people can fork or merge code however they like, to iterate on projects they contribute to. With a slightly different approach to Git's internal structure, and by adding some simple type constraints, it's possible to construct much richer knowledge graphs with much more powerful features.
There are some limitations of Git that we aim to address with Dims. Git's deltas between different versions of a file are line-based, as it was primarily designed for storing text files. When working with arbitrary binary data, it makes more sense to encode changes as a list of ranges over the immutable historical version of the file. Each element in this overlay should point to the new data that replaces part of the old file. Constraints on the start and end positions in each range can be applied based on how you tokenize the data.
For example, if you specify your data as UTF-8 encoded and enforce line-based tokenization, you would achieve a result similar to how Git works. However, there are many other ways to tokenize data for different use cases. Your interfaces in the knowledge graph should allow you to structurally edit data at the tokenization level of your choosing, whether you’re doing version control or mapping text to an LLM’s vector space. Overlays are one of the core structures in Dims to specify how data at a given contextual level maps to other parts of the knowledge graph.
Photrek plans to integrate ideas from PKM systems to add rich documentation capabilities to your knowledge graphs. Graph-based PKM systems, like Obsidian, allow you to link ideas together in arbitrary ways. Using links to connect your documents means that they don't have to conform to the constraints of a file hierarchy. It's often beneficial to use the same idea in many different contexts at the same time. One strategy in PKM systems is to use maps of content (MOCs) as entry points instead of folders, since files and other MOCs can be referenced in many places at the same time. Another beneficial feature of PKM systems is the ability to see backlinks, or all the places where a document is referenced.
By adding these features to a system like Git, the version control aspects of your tool are no longer constrained to individual file hierarchies. You could clone multiple repositories into the same knowledge graph, without worrying about where they are stored on disk. If there are shared substructures between repositories with the same cryptographic hash, the data only needs to be stored once on your file system. Collaboration, where work is shared between projects, is an important goal here. This is effectively the same as what package managers do, but it doesn't require a unified registry, since Git-style version control is decentralized. Different people could create their own registries that can be forked and merged as anyone sees fit.
Our approach treats AGI systems like any other user contributing to the shared knowledge space. Users retain full control over their local repositories of knowledge, and others can subscribe to public work that benefits them. When people only work with private data, there is no need for extra constraints, besides what they impose on themselves. Mutual subscription to a shared context, on the other hand, needs some process of agreement.
With Dims, we will define a context as a hyperedge in the knowledge graph that attaches a set of rules to the limited resources, or commons, governed by them in that scope. Many contexts can be specified in this way: A garbage collector manages the allocation of a limited amount of memory, a consent process limits how funds are allocated to proposals, an agreement on a wage allocates an employee’s time, CRDT-based collaborative editing manages a shared perspective on data (Almeida, 2023), the Pony programming language uses deny capabilities to manage references between concurrent actors without the need for locks (Clebsch et al., 2015), and the list could go on forever. The point is that minimizing governance to only the scopes that require agreement can maximize concurrent experiments in all levels of the knowledge space.
The picture we have painted so far encompasses a wide range of governance structures, from a scenario where everyone acts independently and ignores others to nation-wide democratic governance, or even the sole control of a benevolent dictator for life over a project. The system derived from Git grants individuals autonomy, meaning they are only subject to a set of rules if they consent to them for the sake of accessing the commons. If you disagree with the project's direction, you can fork the project and make your own changes, or you can accept the developers' choices and focus on what matters most to you.
With this approach, all governance is non-invasive, as everyone has the right to choose where to participate. Anyone can fork a process and introduce changes, fostering a multitude of concurrent experiments in the idea space. People can adopt the ideas that make sense to them, whether from humans or AGIs, and good ideas will spread through grassroots innovation, exploring endless possibilities.
Join the Discussion (0)
Please create account or login to post comments.