ASDF
ASDF is the de facto standard build-system for Common Lisp.
It defines systems, which are analogous enough to packages that they serve that purpose for software distribution.
Quicklisp
Quicklisp was a huge advance in making it easy for people to discover and installl CL libraries in the form of ASDF systems. It rapidly became the de facto means of finding and downloading ASDF systems.
It uses cl-test-grid to distribute the load of testing 1800+ systems.
However, some limitations have revealed themselves over time:
- It's showing signs of being overwhelmed by scale.
- New distributions used to be released every month. Now it's down to 2-3 releases a year.
- It's based around the idea of distributions, which are a point-in-time snapshot of the latest release on the
main
branch of each project, or whichever branch was specified at the time of inclusion.- There's no provision for specifying a version, commit or commit-tag for a given project.
- It tries to be universal, so inclusion of a package requires that it build on a variety of distributions.
- This is both a feature and a limitation, depending on your view of this criterion.
- It only uses the latest commit on the specified branch of a project, rather than pinning the last-known-good one.
- This causes libraries to be dropped as a consequence of newly-introduced bugs that are difficult to reproduce, diagnose and/or resolve.
- Only one source is considered for each project.
- Forks are not handled; there's one canonical source for a system, and that's it.
Enter Syscat
On that basis, I'm investigating whether and how Syscat can step into the breach by providing existing capabilities such as discovery of existing ASDF systems, while adding more:
- Track dependencies between system versions, in a way that can be easily browsed and followed.
- In both directions, so you can see what systems depend on this one, as well as those that it depends on.
- The reason for specifying versions is that the dependencies for a system can change from one release to another, so it's simply not meaningful to state the dependencies for a system in general.
- Record any number of specific versions of a system.
- Track which versions of its dependencies that this version is known to build with.
- Now you can pull one specific version of a system, plus the web of its dependencies that are known good, without being tied to a point-in-time snapshot of the entire category.
- Record permutations of platforms that a system might (or might not) build on, like this
- CL implementation, per
(lisp-implementation-type)
- Versions of CL implementations, per
(lisp-implementation-version)
- Versions of CL implementations, per
- Operating systems, per
(software-type)
- Versions of OSes, per
(software-version)
. Or of kernels, in the case of Linux.
- Versions of OSes, per
- Hardware architectures, per
(machine-type)
- CL implementation, per
- Now we can track which specific version of a system builds and passes its tests (or doesn't) on which platform permutation.
- A given system may compile on one CL implementation, but not another. Or on different versions of the same one. Or on X86-64, but not ARM6.
- The same applies to whether it passes its tests.
- Whether a system builds is tracked separately from whether its tests pass, and these are tracked independently for each platform-permutation.
- This system won't attempt to build any software or to run tests itself. It only records the result of a test performed on a CI server, or on somebody's workstation.
- These will not be mandated; it'll just record what information has been provided. Thus, the absence of a pass/fail only means "we don't know for sure."
- A given system may compile on one CL implementation, but not another. Or on different versions of the same one. Or on X86-64, but not ARM6.
- Record multiple possible sources of a system.
- Now we can choose between forks.
- Remember the part about dependencies changing between releases? This makes it even more complicated.
- Mark a repo as abandoned/unmaintained.
- Now we can choose between forks.
- Mark versions of a system as being in various states:
- Stable
- This could be especially helpful to distinguish between projects where work has ceased because it's "done," vs projects that have been abandoned.
- Development
- Buggy/broken/insecure/do-not-use
- Obsolete
- E.g, db driver for versions of the dbms' protocol that are no longer supported.
- Stable
- Multiple Quicklisp-style distributions, each with their own versions, so you can create and/or follow whichever suits you best.
- The catalogue and the client are loosely coupled by an explicit, documented protocol.
- Now you can integrate the catalogue into whatever would benefit from it.
- Or just write your own client with a UI that suits you better.
Important note: it could be hosted on another, CL-specific instance of the software. There's no reason it has to be tied to this one. In fact, a CL-specific instance is probably the better idea, so think of this as a prototype or proof of concept, rather than the definitive solution.
Further, it's possible that this entire exercise will result in the definition of a protocol, which then gets implemented on Syscat/Restagraph, and possibly in other ways as well.
Benefits of using Syscat to keep track of ASDF systems
Those I see include, among those I forgot to write down:
- Potential for a multi-moderator system.
- This may need some kind of workflow capability, which Restagraph doesn't have (so far).
- Because it's a server-based system with a defined API, we can have as many clients and integrations as we have use-cases (and time and willingness, of course).
- Multiple, mutually compatible ways of updating the system with build/test status for a version:
- Automated updates from build systems, so a CI system can automatically update the build and test statuses for a given version and platform.
- End-user client apps can do the same.
- Maintainers can see what other systems are using things they built.
- Provides for multiple concurrent approaches to package-curation:
- Quicklisp-style point-in-time distributions (plural), each with their own set of criteria.
- Python
pip
-style "latest version of this system" and/or "that version/tag/commit of this system."
- System updates become available to clients as soon as their version/tag/commit is added to the database.
- No more waiting for the next distribution update.
Possible other features/capabilities
Of course, this is just a sketch. I haven't even begun to properly look at things like
- Implementing a client.
- I did make sure I based the platform elements on things that can be queried by standard CL functions.
- Implementing integrations with things like source-code repos and CI systems, such as cl-test-grid.
- Moderation options.
- I've never moderated anything in my life. Experienced input will be very welcome, because otherwise I'm going to be figuring it out as I go, according to whatever references I can find.
- Authorisation policies.
- Admittedly, this is partly because Restagraph itself is quite unsophisticated in that respect.
- I just realised that dependencies can vary from one version of a system to another, as capabilities are added, removed and re-implemented.
- This means probably having to drop the tracking of dependencies between systems, and managing it exclusively at the level of versions.
- A "use this version instead" link from a known-buggy release to its replacement.
- I'm noting this idea here in case it looks like something people will actually use.
- The Racket Package Index has a great design, in this exact space.
- There's an excellent chance I'll -swipe- take inspiration from some of its features.
Potential issues
Library/DLL-hell
It would be possible to have one version of a library required via some transitive dependency, when at the same time you want to use another. This would be problematic.
Worse yet, conflicting versions could be required via different transitive dependencies.
Arguably, this is already possible, and graph-traversal would make it easier to find these.
Syscat would only record the versioning requirements already specified in a project; it's up to the client (and possibly its user) whether to obey them. I see value in a client that identifies these conflicts, and communicates them back to the user to decide how to resolve them.
ASDF-specific complications
- No equivalent of Python's
virtualenv
- It's possible we could build one, by controlling where ASDF searches for systems
- This may involve a
virtualenv
-style shell-script, or using ASDF'sclear-source-registry
andinitialize-source-registry
functions.
- This may involve a
- Feature idea: fork-handling, such that if one or more repos and/or forks are available for a given package, the user can select among them.
- asdf-world looks like a start on this kind of thing, but I haven't explored it yet.
- It's possible we could build one, by controlling where ASDF searches for systems
- Versions (and versioned dependencies) are supported, but:
- I lack confidence that enough project maintainers systematically update these.
- I would be delighted to be proven wrong.
- I'm including support for these anyway, just in case.
- Sometimes you need a specific commit between versions (hopefully temporarily) where one thing has been fixed/added, but another has not yet been broken/removed.
- Having inherited and maintained a project that depended crucially on an ancient version of that one library, it seems like a good idea for the client app to check for a newer version on startup, and prompt the user to try upgrading to it.
- I lack confidence that enough project maintainers systematically update these.
Things still to be discussed/negotiated
Everything's still negotiable at this stage, but there are things I have strong opinions about, and things that need further untangling:
Marking a system's build and test status on a given platform permutation
There's no simple, single answer here. Unfortunately, this is also one thing that we'd have to handle in a single, agreed manner if it's going to be of any use. The basic options that I see are:
- A relationship from
<system-version>
to<platform-permutation>
indicating the status:- ASDF_SYSTEM_BUILDS_ON_PERMUTATION
- ASDF_SYSTEM_FAILS_TO_BUILD_ON_PERMUTATION
- ASDF_SYSTEM_PASSES_TESTS_ON_PERMUTATION
- ASDF_SYSTEM_FAILS_TO_PASS_TESTS_ON_PERMUTATION
- A resource/thing connected to both the system-version and the platform-permutation, which encapsulates the time-stamped results for a single client, indicating success/failure of the build and test steps.
I see advantages and disadvantages of both.
The first one is simple and clear. At first, I thought it was a bad thing that a system-version could be recorded as both working and not-working on a given platform-permutation at the same time, but there may be cases where it works for one person and not for another, most likely due to something like differences between Linux distributions. The downside is that it offers no information about who it works or doesn't work for, which makes it harder to ask follow-up questions.
The second option is richer in information, and potentially gives a proportional view of how many people it works or fails for. But would this information really be that useful? There's also the objection that a large number of failure reports could be demoralising for a maintainer, to which I'm sympathetic. It could tell the maintainer who it failed for so they can ask for more information, but library users can already contact the maintainer, so that point is moot. This would also enable secondary social effects, where people can see who is/isn't submitting reports, and from there we can get social pressure that is less than constructive.
So on balance, I'm leaning towards the first option. As you can probably tell from the very specific list above, I've already implemented it, so it also has that momentum going for it :)
Things you need to know about Syscat
Until I port the main documentation, these get listed here, to help make sense of this thing's UI, and of some of the design decisions described in this document.
Two separate types of link
You'll see these in all current Restagraph-based sites, because I built one general-purpose GUI, and deployed it everywhere.
- Within the text.
- These are Markdown hyperlinks, so they're inline in the text.
- In the "Dependent resources" and "Outbound links" section at the bottom of the page.
- These are actually recorded in the database itself.
- These are what provide that explicit context for the things (resources) by presenting its relationships to other things.
In the backend, that text is stored as text, and that's it - no assumptions are made about the formatting of the contents. That means that if you want to build an app on Restagraph that uses, say, an XML-based markup, you can do that.
This means that there's no inherent relationship between links in the text, and database links between resources, and that will never be implemented in the backend. It'd be possible to add on some kind of helper service that automatically adds database links according to the content, and edits the text to update links when the targets get renamed, but
- that would always be a separate service
- it's not feasible right now because I haven't implemented a webhooks-type feature. It's something I see the value in; I just haven't gotten to it yet.
Named relationships
This is based on a graph database, not a relational one. Ironically, graph databases are the ones that give the relationships between things equal status to the things themselves. They make it possible to separate a thing's intrinsic characteristics from the context that gives it meaning, so you get a clearer picture of which is which.
Dependent resources
This is a feature I had to invent for Restagraph. It expresses the idea that some things only exist (or perhaps make sense) in the context of some other thing:
- Reviews only exist in the context of the thing being reviewed.
- Chapters only exist within books.
- Language implementations only exist in the context of a specification.
- Implementation versions only exist in the context of an implementation.
...you get the idea.
This is why you see two sections at the bottom of a page:
- Dependent resources
- Things that exist solely in the context of the thing you're looking at.
- Outbound links
- Other kinds of things, to which this thing has a named relationship.
This is also why things like this source-code commit for Syscat have such ludicrously long URLs.
API-first design
The REST-ish HTTP API server is the heart of this thing; the rudimentary GUI you're looking at is 100% replaceable.
This is because I wanted to make sure that everything you can do with these systems can be done via automation; if it's in the GUI, it's already there in the API. I also wanted to make sure that people could build their own UIs on top of it, to meet their own needs. Thus, it's already possible to implement a REPL client app that uses this system.
The engine stores information - it doesn't do things
Syscat began (and continues) as a means of tracking information about things - a single source of truth.
Early in development, I considered the idea of extending it to incorporate things like network monitoring, and quickly decided that this is best left to more specialised apps, which can fetch config information from Syscat.
The closest it will come to taking action on the outside world is when I add a webhook-style feature, so that when a thing is added/changed/deleted, the engine will make an HTTP call out to some other app with that information to be acted on.
Feedback
This is just the first draft, so please feel free to provide feedback via the Fediverse (a.k.a "Mastodon"). I'm @gothnbass@linuxrocks.online.
Or you can go straight to the source (literally) and comment on this issue on Codeberg.
This document incorporates some feedback that I've already received via Fedi, and I'll continue updating it as feedback comes in. At some point it should settle into relative maturity, and that will tail off.