Syscat - the Systematic Catalogue

Originally it was the System Catalogue ("From spare parts to org charts"), and I was treating it separately from general-purpose knowledge-management.

Over time, I came to understand that the hippies are right: everything is interconnected. The richer your knowledgebase gets, the less sense these arbitrary distinctions make. So now it's a knowledge-management system that's pretty good as a Single Source of Truth for IT environments.

What is it?

The first truly comprehensive Single Source of Truth for an IT environment.

It's designed specifically to track all the things and all the interconnections between them, both within and between layers. It does this across multiple organisations, and is designed to be user-extensible and to scale under load. Importantly for an IT infrastructure team, its API-first design makes it as easy as possible to build automation around it.

It's designed to represent the environment you have, in whatever level of detail you actually have, without opinions about how you should have architected it. However, it does have opinions about how things are described: the strongest is that it makes a distinction between how things should be, and how they've been observed to be.

The storage layer is Neo4j, an extremely capable graph database.

That's great. But what does it do?

Captures your entire infrastructure in one place, in a structured, self-consistent form with a consistent, predictable REST-like HTTP API, and gets out of your way.

That's actually all it does, by itself. The value is that it captures everything, it's all in one place, and it can be updated or queried by anything that speaks HTTP and JSON. Also, now all that data is accessible via the same HTTP API, so you only need to query one API with one convention.

Now think about what you can't do right now because of what your SSoT doesn't record, or connections it won't let you make, or that's simply made harder by the friction of having some of your information in this system with one authentication system, and some of it over there in that one with its own auth scheme. Now imagine something that records all those things in one place and removes all that friction - that is what Syscat does. The greater the number and variety of data-sources, and of systems that use that data, the more friction it removes.

If you're scanning this page for keywords, it gives you all of these in one system:

Asset management
Configuration Management Database (CMDB)
IP Address Management (IPAM)

But how can this be the source of truth, if all its data comes from other places?

There's an important difference between the source of the data, and the source of truth. Sometimes they're the same thing, but they don't have to be.

Syscat is designed and intended to be a shared reference for all that data, from the perspective of the people and systems that use that data.

To use an analogy, if your organisation is a village, Syscat is its well. The well taps into water that was carried here by rivers, and the rivers in turn are fed by springs. The springs are each of the systems that data is fetched from, and the rivers are integrations which fetch that data and update Syscat with it.

The villagers can trek to each of the springs to get their water; some will go to one, while some will go to another. But with a well in the centre of the village, everybody can go to the same place, and draw on water mixed from all the same sources.

Status

Current development status: beta, minimum viable product.

It's very much at the early-adopter evaluation stage, but it's already useful to some extent.

Core functionality is in place.
- Very basic GUI.
Available as a set of Docker images, plus an example docker-compose file, for ease of deployment.
The schema and API are roughly where they should be.
- The API should be stable.
  - There may be minor tweaks, but any major changes will most likely result in a new version (yes, the API is versioned).
Documentation, test suites and general hardening still need lots of work.
- It's fairly resilient in the sense that it's unlikely to break in response to bad input.
- Security is still an issue: it's very much personal/internal-use only, at the moment. More time and expertise are needed.
Feedback on the current design will be very welcome, especially on the schema, but also the API itself.
- It's vitally important that this describes things you actually have in your environment, in terms that are clear and familiar, so I really do want to know where I got it wrong.
- The API design works for me(TM). If there are things about its design that you think could be improved, I'd love to know about them.
Security: it has not been properly examined by an expert.
- Do not put sensitive information in a publicly-accessible instance! I don't trust it that far yet, and neither should you.
- It will become something you can trust. It's just not there yet.

Summary: it's not yet ready for prime-time, but you can do useful things with it.

Features

Comprehensive
- It's designed explicitly to track everything, and it's multi-organisational by design. Extend your map of your own environment to include partners, ISPs, subsidiaries and service vendors, plus anything you know about the connections between them, all at the same level of detail that's available for your own organisation.
- Provides a real single source of truth, assuring full consistency.
- "Everything" means that it's theoretically capable of mapping the entire internet, including every last computer, spare part, network cable and person involved. Everything.
User-extensible - if it's not included in the default installation, you can add it.
- Extensions are first-class elements, not tacked on afterwards.
- Share schema extensions with other users, to gain a shared view and vocabulary, and to avoid duplicated data-modelling work.
- You can extend it to cover new technologies or approaches, without waiting for the vendor to get around to it.
API-first design - if you can do it in the GUI, you can already do it via the API.
- Automation-friendly: one of the design goals was to make it as easy as possible for users to build their own tooling around it.
- You can build your own GUI, if the provided one doesn't do what you need.
IPAM API that takes care of all the details around subnet and address management
- IPv4/IPv6 dual-stack capable.
- Multiple VRFs.
- Multiple organisations.
File upload/download API
- It can hold your reference documents, scanned contracts, and photos of the back of that router with the weird cabling.
Distinguishes between intended and actual state, i.e. what should be there vs what's actually there
- You use different terms when allocating than when configuring. Syscat embraces this, instead of trying to pretend they're the same thing.
Horizontally scalable
- Deploy as many Syscat instances as necessary, without additional configuration. Want a read-only instance for a batch-process to hammer, without slowing down the one that serves the web GUI? No problem.
- Because it's based on Neo4j, the database layer can also scale out horizontally, independently of the appserver layer.
Variable levels of detail
- Record what information you do have, and flesh it out in more detail as you decide/discover more.
- Sometimes you just don't need fine-grained detail in order to get useful things done.
Designed for production deployment
- Docker is the main deployment method, so it's easy to install and operate.
- Schemas are uploaded as JSON documents in text files, making it easy to manage them in a version-control system, and simplifying the dev->test->staging->production workflow.
Versioned schemas
- Create a new schema-version at any time.
- Rollback (and roll-forward) between schema versions is trivial, making it easier to test changes and recover from mistakes.
- Old versions can be deleted, so you can remove cruft instead of accumulating it.
Versioned data
- Every time you modify a resource (like this page) a new version is created.
- That means you can roll back to a previous version, or compare two versions.
  - Note: this GUI does nothing with versions yet, but it's already there in the API.
Separation of upgrades between the engine and the schema.
- When you deploy a new version of the Docker image, it won't automatically upgrade the schema, even if it includes a newer version.
- You can download and install a new version of the schema without having to upgrade the engine.
- This does mean that if you want to upgrade both the engine and the schema, you have to do both things as separate operations.
No AI.
- It doesn't have any, and it isn't going to.
- It's here to extend the abilities of your mind (and the collective abilities of your team's minds). It's not here to replace them.

Ideas for new features are tracked as issues in both the Syscat project itself and in Restagraph. This is because Restagraph is the engine that Syscat is built on.

Use-cases, or "So what can it do for me?"

Infrastructure - system administrators and network engineers

IP Address Management
Asset tracking
Auto-generate configs for monitoring, firewalls, AWS security-groups, etc.
Intent-driven deployment: define what should be there, and add tooling to close the gaps.
- Note that this tooling has not yet been built, but it wants to be.
Incident response
- Fault isolation
- Impact analysis
- Stakeholder identification, for communications
- Easier handover between shifts/teams in long-running, complex incidents.
Change planning
Cost analysis

Security

Attack-path analysis
Identify unexpected network traffic
Identify things in the environment that shouldn't be there (like rogue WiFi APs) by auditing discovered devices and comparing them with what was expected.
- This automation has not yet been built either, partly because each environment has different requirements.
- This will almost certainly require multiple separate tools.
Vulnerability management
- You can't patch what you don't know about. Remember scrambling to identify all your deployed instances of log4j? If it's all in Syscat, you can just ask it where they are, and who manages each of those things.

Why Syscat? Aren't there already plenty of SSoTs on the market?

There are. However, I'm still not aware of one that's truly comprehensive.

If there were, I'd have been using it instead of building this thing.

Why doesn't it have built-in discovery/monitoring/insert-feature-here?

It's entirely passive with regard to data entry: all data has to be entered via the HTTP API, one way or another. This is a deliberate design decision because

Data comes from a variety of sources, each with their own interfaces and their own take on the world, e.g. Active Directory synchronisation, network discovery tools, and human data-entry via GUIs.
- It's just not feasible to cover all possible data sources from within a single tool.
- It's simply more scalable to provide a common API that can be used by any discovery tool, then work at building them. Because the same interface is presented to customers and third-party vendors, they can fill their own gaps without having to wait for me, then share them with other users.
There isn't a one-size-fits all approach anyway, especially for network discovery. Some environments are a good fit for a single, centralised service, others really need a distributed fleet of agents, and then there's the case for querying the AWS API to find out what's in there.
Trying to fit all these capabilities into a single product leads to a bloated, complex thing that's increasingly hard to maintain and to get value out of. Better to have a simple core, with just the set of add-ons that serve the needs you actually have.

In the same vein, it doesn't initiate any actions in the outside world. I do plan to implement a webhook-style feature, where additions/changes/deletions of data will trigger an HTTP call to some other service, but that's somewhere in the future.

Technical details

Database

The Neo4j property-graph database provides the actual data storage. It's capable of multi-datacentre clustering with read-replicas, providing for geographically diverse resilience as well as maintaining speed of response across geographically dispersed operations.

This layer can be scaled independently from Syscat's application server, according to where the performance bottleneck is actually located and what kind of performance or resilience challenges you're addressing.

APIs

Similar to REST, it uses POST, GET, PUT and DELETE for the CRUD verbs. However, the basic idea is adapted to suit a graph database, where REST assumes a relational one.

The main API enforces what types of resource you can store, what attributes each type can have, and what relationships you can record between which which resourcetypes, according to a schema stored within the database itself.

The thinking behind the design was to provide a schema with the same spirit as a relational database, while taking advantage of the referential flexibility that you can only get from a graph database, without losing the ACID assurances of data integrity.

The APIs are:

/raw - the main one, for almost all CRUD operations.
/ipam - for IP Address Management.
- Adding and removing IP addresses, subnets and VRFs, and for searching for them.
- Takes care of all the book-keeping that you really don't want to do manually, e.g. when splitting or merging subnets from which you've already allocated a bunch of addresses.
/files - for uploading, downloading and deleting files.
- File metadata is accessible via the /raw API, so you can link a contract to a scan of its contents, for example, or confirm who uploaded a particular file.
- Deduplication is performed automatically, so there's a set of metadata for each time a different user uploads the same file, but only one copy of the file itself is stored on disk.
/schema - for managing the schema itself, and thus the API that's derived from it.
- Use this to upload schema fragments as JSON documents, query the current schema, and create a new schema version or roll back to a previous one.

Predictable and consistent URIs in the raw API

URIs are dynamically validated against the schema, with an indefinitely-repeatable pattern of Resourcetype/uid/RELATIONSHIP.

For example, if I wanted all the addresses on network interface eth0 on router`, I'd make this query:

curl http://192.0.2.1/raw/v1/Devices/router1/INTERFACES/NetworkInterfaces/eth0/ADDRESSES

That would return a JSON list of Ipv4Addresses and Ipv6Addresses objects (assuming the interface itself is configured for dual-stack operation).

If I just wanted the IPv6 addresses from that interface, I'd extend that URI to include the resourcetype at the end of that relationship:

curl http://192.0.2.1/raw/v1/Devices/router1/INTERFACES/NetworkInterfaces/eth0/ADDRESSES/Ipv6Addresses

Yes, those URIs are verbose. But after you're done screaming in horror, remember that this API is not designed for human interaction. It's designed for people to build automation tools against, and to build GUIs on, and for those purposes it's more valuable to be consistent than it is to be concise. It could be worse: it could encode all that in XML.

Basic design

Design principles

This is the functional specification - the "what" that comes before the "how".

Model the whole thing: all the things, and all the relationships between them.
Model the network as it is, however messed-up that might be - both the design and the reality.
- If it doesn't fit some theoretical model of how things should be deployed, too bad. Model it anyway. A tool like this that gets in the way for ideological purity, is a tool that gets in the way of solving the problem.
Distinguish between how things should be, and how they actually are.
- These are different things, even if they have the same value.
Distinguish between values you intend, and those that have to be discovered.
- One one hand, subnet allocations; on the other, SNMP index IDs.
- Conveniently, these correspond neatly to "intended" and "actual" groupings, so they can be managed together.
Automation-friendliness is vital, so build the API first, then build the GUI on top of that.
- Anything you can do with a GUI can also be automated.
- If the vendor-supplied GUI doesn't suit the user's use-case, now they can build one that does.
User-extensibility must be provided, and must be first-class.
- No one model fits all use-cases, and all organisations have some custom use-cases. Provide users with a way to seamlessly extend it to cover any gaps.
Make sure this integration is first-class, not some afterthought gaffer-taped to the side.
People don't always have the full depth of detail. Allow for this.
- Enable users to record whatever information they do have on hand, and evolve the picture as new information comes to hand. E.g, you can assign an IP address to a host, then move it to the correct interface, then assign that interface to a routing instance, all without losing any information such as incoming links from other things.
Design from the start to cover multiple organisations, because very few IT organisations are totally air-gapped.
Troubleshooting becomes much easier when you have reference information about connected networks and how they relate to yours.
This also allows for mergers, spin-offs, subsidiaries and subcontractors.
Enable modelling of secondary resources, i.e. things that only make sense in the context of a parent entity.
Open standards beat proprietary ones.

Architecture

A web application server fronting a Neo4j graph database. The schema is defined in the database, and is used to dynamically construct the API in response to each query - this is the key to Syscat's extensibility.

Because the schema is in the database, you can bypass the server and use Cypher to guide your analyses more directly.

The API is HTTP-based, and REST-inspired, though it also bears similarity to GraphQL. It uses the standard HTTP verbs, but has a few well-defined endpoints, and the rest is dynamically validated according to both the data and the schema that defines its structure.

The API validates incoming data, ensuring that anything added to the database through that API adheres to the schema. As useful as I've tried to make the API, really sophisticated analysis will require querying the database directly - but it's pre-organised, making that analysis easier.

Why a graph database?

Relational databases just run out of breath - in practical terms, they can't provide the flexibility. RDF databases are optimised for offline (very large) analysis, and this absolutely needs to be an online system that's continually being updated.

Although I wasn't thinking in those terms when I began this project, it turned out that there's a crucial difference in the worldview of relational vs graph databases: graph databases separate what something means from what it is, and use the relationships to represent that meaning in terms of context, where relational databases conflate the thing with its meaning. And Neo4j provides the ACID dependability that we've learned to rely on from an RDBMS.

Additionally, a graph database frees you from the fixed frame of reference that relational models are prone to. Instead of being baked into the schema at design-time, the reference-point is defined dynamically by the starting-point of a query. In concrete terms: when this kind of application is based on a relational database, all queries are usually expressed in terms of how things relate to the organisation, because that's the natural way to design the schema. With this system, by contrast, it's a question of which organisation your query starts with, or whether you start with a person, or a network device, or...

It's true that you can do this in a relational database. However, all those many-to-many join tables accumulate quickly, and the DBMS eventually just grinds to a halt. There are problems for which they're just not a good fit, and this is one of them.

Contact

If you have feedback, or if you want to know more:

I'm on Mastodon (a.k.a. the Fediverse).
You can email me at support (at) sysc.at
You can always create a ticket or pull-request on Codeberg if that's your jam.

Welcome (Wikipages)