Welcome (Wikipages)

Syscat - the Systematic Catalogue

Originally it was the System Catalogue ("From spare parts to org charts"), and I was treating it separately from general-purpose knowledge-management.

Over time, I came to understand that the hippies are right: everything is interconnected. The richer your knowledgebase gets, the less sense these arbitrary distinctions make. So now it's a knowledge-management system that's pretty good as a Single Source of Truth for IT environments.

What is it?

The first truly comprehensive Single Source of Truth for an IT environment.

It's designed specifically to track all the things and all the interconnections between them, both within and between layers. It does this across multiple organisations, and is designed to be user-extensible and to scale under load. Importantly for an IT infrastructure team, its API-first design makes it as easy as possible to build automation around it.

It's designed to represent the environment you have, in whatever level of detail you actually have, without opinions about how you should have architected it. However, it does have opinions about how things are described: the strongest is that it makes a distinction between how things should be, and how they've been observed to be.

The storage layer is Neo4j, an extremely capable graph database.

That's great. But what does it do?

Captures your entire infrastructure in one place, in a structured, self-consistent form with a consistent, predictable REST-like HTTP API, and gets out of your way.

That's actually all it does, by itself. The value is that it captures everything, it's all in one place, and it can be updated or queried by anything that speaks HTTP and JSON. Also, now all that data is accessible via the same HTTP API, so you only need to query one API with one convention.

Now think about what you can't do right now because of what your SSoT doesn't record, or connections it won't let you make, or that's simply made harder by the friction of having some of your information in this system with one authentication system, and some of it over there in that one with its own auth scheme. Now imagine something that records all those things in one place and removes all that friction - that is what Syscat does. The greater the number and variety of data-sources, and of systems that use that data, the more friction it removes.

If you're scanning this page for keywords, it gives you all of these in one system:

But how can this be the source of truth, if all its data comes from other places?

There's an important difference between the source of the data, and the source of truth. Sometimes they're the same thing, but they don't have to be.

Syscat is designed and intended to be a shared reference for all that data, from the perspective of the people and systems that use that data.

To use an analogy, if your organisation is a village, Syscat is its well. The well taps into water that was carried here by rivers, and the rivers in turn are fed by springs. The springs are each of the systems that data is fetched from, and the rivers are integrations which fetch that data and update Syscat with it.

The villagers can trek to each of the springs to get their water; some will go to one, while some will go to another. But with a well in the centre of the village, everybody can go to the same place, and draw on water mixed from all the same sources.

Status

Current development status: beta, minimum viable product.

It's very much at the early-adopter evaluation stage, but it's already useful to some extent.

Summary: it's not yet ready for prime-time, but you can do useful things with it.

Features

Ideas for new features are tracked as issues in both the Syscat project itself and in Restagraph. This is because Restagraph is the engine that Syscat is built on.

Use-cases, or "So what can it do for me?"

Infrastructure - system administrators and network engineers

Security

Why Syscat? Aren't there already plenty of SSoTs on the market?

There are. However, I'm still not aware of one that's truly comprehensive.

If there were, I'd have been using it instead of building this thing.

Why doesn't it have built-in discovery/monitoring/insert-feature-here?

It's entirely passive with regard to data entry: all data has to be entered via the HTTP API, one way or another. This is a deliberate design decision because

In the same vein, it doesn't initiate any actions in the outside world. I do plan to implement a webhook-style feature, where additions/changes/deletions of data will trigger an HTTP call to some other service, but that's somewhere in the future.

Technical details

Database

The Neo4j property-graph database provides the actual data storage. It's capable of multi-datacentre clustering with read-replicas, providing for geographically diverse resilience as well as maintaining speed of response across geographically dispersed operations.

This layer can be scaled independently from Syscat's application server, according to where the performance bottleneck is actually located and what kind of performance or resilience challenges you're addressing.

APIs

Similar to REST, it uses POST, GET, PUT and DELETE for the CRUD verbs. However, the basic idea is adapted to suit a graph database, where REST assumes a relational one.

The main API enforces what types of resource you can store, what attributes each type can have, and what relationships you can record between which which resourcetypes, according to a schema stored within the database itself.

The thinking behind the design was to provide a schema with the same spirit as a relational database, while taking advantage of the referential flexibility that you can only get from a graph database, without losing the ACID assurances of data integrity.

The APIs are:

Predictable and consistent URIs in the raw API

URIs are dynamically validated against the schema, with an indefinitely-repeatable pattern of Resourcetype/uid/RELATIONSHIP.

For example, if I wanted all the addresses on network interface eth0 on router`, I'd make this query:

curl http://192.0.2.1/raw/v1/Devices/router1/INTERFACES/NetworkInterfaces/eth0/ADDRESSES

That would return a JSON list of Ipv4Addresses and Ipv6Addresses objects (assuming the interface itself is configured for dual-stack operation).

If I just wanted the IPv6 addresses from that interface, I'd extend that URI to include the resourcetype at the end of that relationship:

curl http://192.0.2.1/raw/v1/Devices/router1/INTERFACES/NetworkInterfaces/eth0/ADDRESSES/Ipv6Addresses

Yes, those URIs are verbose. But after you're done screaming in horror, remember that this API is not designed for human interaction. It's designed for people to build automation tools against, and to build GUIs on, and for those purposes it's more valuable to be consistent than it is to be concise. It could be worse: it could encode all that in XML.

Basic design

Design principles

This is the functional specification - the "what" that comes before the "how".

Architecture

A web application server fronting a Neo4j graph database. The schema is defined in the database, and is used to dynamically construct the API in response to each query - this is the key to Syscat's extensibility.

Because the schema is in the database, you can bypass the server and use Cypher to guide your analyses more directly.

The API is HTTP-based, and REST-inspired, though it also bears similarity to GraphQL. It uses the standard HTTP verbs, but has a few well-defined endpoints, and the rest is dynamically validated according to both the data and the schema that defines its structure.

The API validates incoming data, ensuring that anything added to the database through that API adheres to the schema. As useful as I've tried to make the API, really sophisticated analysis will require querying the database directly - but it's pre-organised, making that analysis easier.

Why a graph database?

Relational databases just run out of breath - in practical terms, they can't provide the flexibility. RDF databases are optimised for offline (very large) analysis, and this absolutely needs to be an online system that's continually being updated.

Although I wasn't thinking in those terms when I began this project, it turned out that there's a crucial difference in the worldview of relational vs graph databases: graph databases separate what something means from what it is, and use the relationships to represent that meaning in terms of context, where relational databases conflate the thing with its meaning. And Neo4j provides the ACID dependability that we've learned to rely on from an RDBMS.

Additionally, a graph database frees you from the fixed frame of reference that relational models are prone to. Instead of being baked into the schema at design-time, the reference-point is defined dynamically by the starting-point of a query. In concrete terms: when this kind of application is based on a relational database, all queries are usually expressed in terms of how things relate to the organisation, because that's the natural way to design the schema. With this system, by contrast, it's a question of which organisation your query starts with, or whether you start with a person, or a network device, or...

It's true that you can do this in a relational database. However, all those many-to-many join tables accumulate quickly, and the DBMS eventually just grinds to a halt. There are problems for which they're just not a good fit, and this is one of them.

Contact

If you have feedback, or if you want to know more: