ASDF Systems - the Syscat data model (Wikipages)

System boundaries in Common Lisp is not a simple topic.

It starts not-simple, and gets increasingly less simple the closer you look at it.

The naïve view: one repo, one .asd file, one app, yes?

Well... no.

You see, CL is an image-based language, like Smalltalk. I already knew that conceptually, but it was only when I tried to properly model this stuff that the implications really started sinking in.

With languages like C, it's very cut and dried: you have the source code on one side, a compiler in the middle, and a discrete executable file (or library) on the other. When you execute one of those compiled artifacts, it gets loaded into memory, its instructions are carried out, and then it's erased from memory. If it depends on some libraries, those get loaded into memory before the instructions are followed, and then they're dropped from memory afterwards as well.

(Yes, I'm glossing over things like memory caching. That isn't relevant here.)

With Lisp or Smalltalk, it works more like a virtual machine. The runtime environment is started (the VM is created), and a bootstrap image is loaded (the OS is booted inside the VM). Then you define functions and variables, or load them in form of libraries; this is equivalent to hot-patching the VM while it's running. There is no discrete output artifact that you can simnply execute. SBCL has save-lisp-and-die for creating executable images, but this is more akin to a self-contained Docker system than to a C-style executable. On a practical level you can present that image as an application, and it'll do the same job, but under the surface it just isn't the same thing. This has important implications for endeavours like this, or Quicklisp.

CL, thanks to ASDF, has systems. These are analogous to packages, crates or DLLs in other languages. There might be one or several in one repo - at the least there should be the main system plus the system containing its test-suite. There might be subsystems. Each of these might be defined in a separate .asd file, or there might be one file containing all the definitions, or any other permutation.

This has important implications for creating a model of CL's idea of packages, systems, and versioning of stuff. Not only are their boundaries fuzzy and permeable, but it also means that there isn't really such a thing as a "CL application." There's a project, and there are libraries (oh wow, do we ever have those!) and there are app-servers such as Hunchentoot, but applications, per se? Not so much.

The problem is complicated by the fact that Syscat is intended to be comprehensive and inclusive - it's not just for CL and ASDF, but also for Python, Haskell and the rest. This means dealing with the fact that terms like "software project" have different meanings in different contexts, so terms get a little fuzzy around the edges.

The data model

The moving parts, and how they connect to each other.

Some of these are in the base Syscat subschema, and some are specific to the ASDF one. For the purposes of this document, it doesn't matter which are which.

Each heading in this section is the name of a resourcetype, hence the use of PascalCase rather than multiple words

SourceCodeRepositories

These are individual repos, assumed to be hosted online somewhere, at least for this purpose.

SoftwareProjects

This is one of the fuzzier concepts, and thus terms, when you try to make it work across languages.

For Syscat's purposes, I'm using source-code repos as the default delimiter, i.e. a project's boundary is the repo it's stored/developed in. It could be argued that I may as well just use the same term to refer to both, but there are two reasons not to do that:

  1. They're not the same thing. Pretending that they are would give more confusion than convenience.
  2. Projects get forked into multiple repos. Treating them as separate things makes it possible to keep track of the various sources from which a project is available. There are several considerations here:
    • Sometimes the original repo ceases to be maintained, and maintenance is continued elsewhere.
    • Sometimes development is just multi-headed. These forks can be regarded as a type of branch of the same project.
    • Sometimes a fork turns into a recognisably separate project altogether. Once that's happened, it's not hard to define that new project in Syscat, and to change which project that repo is recorded as serving.

For these reasons, I've defined the relationship from projects to repos as 1:many. That is,

It's possible those assumptions aren't correct, and that there's a surprising number of repos that serve multiple projects, which aren't just mono-repos with lots of sub-projects. If that's shown to be the case, it can be redefined as many:many, but for now I'm betting against this.

AsdfSystems

A single ASDF system, as defined via defsystem.

Any number of these could be present in a single repo, but the working assumption here is that each of those systems represents some subdivision of a single overall project.

There is no mention of .asd files here, because I genuinely don't see the value in tracking it in Syscat. I'll add it if a real-world use-case arises.