Briefly looked at the vanilla repo and I really like it.
The focus seems to be on the paper the sw was published along with as mere attachment; I can get as much seeing that the doc consists of an excerpt of that paper. There's no guide on how to run a simple demo. It doesn't state license conditions. It uses tabling by default but it's Wild West in your own words ;) The canonical tabling implementation would be XSB Prolog anyway, but the sw seems to require SWI. It uses "modules" but if that is justified by code base size even, for Prolog code working on theories stored in the primary Prolog db specifically, such as ILP, this is just calling for trouble with predicate visibility and permission, and it shows in idiosyncratic code where both call(X) and user:call(X) is used.
What do you really expect from putting it on github? It's nice that you've got unit tests as you're saying but there aren't any in the repo. I'm sure the code serves its original purpose as a proof-of-concept or prototype in the context of academic discourse on the happy path well, but due to the issues I mentioned picking it up already involves nontrivial time investment when it isn't sure to save time as it stands compared to developing it from scratch based on widely available academic literature. In contrast, Aleph has test data where the authors have gone out of their way to publish reproducible results (as stated in TFA), end-user documentation, coverage in academic literature for eg. troubleshooting, a version history with five or at least two widely used versions, a small community behind or even so much as a demonstration that more than a single person could make sense of it, and a perspective for long term maintenance as an ISO port
*) Not to speak of the "modules" spec being withdrawn and considered bogus and unhelpful for a really long time now, nor of "modules" pushing Prolog as general-purpose language when the article is very much about using basic idiomatic Prolog combinatorial search for practical planning problems in the context of code generation by LLMs.
With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
Re ISO as I said the focus here is solving practical problems in a commercial setting with standardization as an alignment driver between newer Prolog developers/engines. You know, as opposed to people funded by public money sitting on de facto implementations for decades, which are still not great targets for LLMs. Apart from SWI this would in particular be the case for YAP Prolog which is the system Aleph was originally developed for and making use of its heuristic term indexing, which however has been in a long term refactoring spree such that it's difficult to compile on modern systems
Note: Vanilla is a "learning and reasoning engine" so you can't do anything with it stand-alone. You have to develop a learning system on top of it. There are four example systems that come with Vanilla: Simpleton, (a new implementation of) Metagol, (a new implementation of) Louise, and Poker. You'll find them under <vanilla>/lib/, e.g. <vanilla>/lib/poker is the new system, Poker.
>> The focus seems to be on the paper the sw was published along with as mere attachment;
Do you mean this paper?
https://hmlr-lab.github.io/pdfs/Second_Order_SLD_ILP2024.pdf
That, and a more recent pre-print, use Vanilla but they don't go into any serious detail on the implementation. Previous Meta-Interpretive Learning (MIL) systems did not separate the learning engine from the learning system, so you would not be able to implement Vanilla from scratch based on the literature; there is no real literature on it, to speak of. I feel there is very little academic interest for scholarly articles about implementation details, at least in AI and ILP where I tend to publish, so I haven't really bothered. I might at some point submit something to a Logic Programming venue or just write up a tech report/ manual; or complete Vanilla's REAMDE.
To clarify, the documentation I meant I have, is in the structured comments accompanying the source. This can be nicely displayed in a browser with SWI-Prolog's PlDoc library. You get that automatically if you start Vanilla by consulting the `load_project.pl` project load file which also launches the SWI IDE, or with `?- doc_browser.` if you start Vanilla in "headless" mode with `?-[load_headless].` Or you can just read the comments as text, of course. I should really have put all that in my incomplete README file.
The current version of Vanilla definitely "requires" SWI in the sense that I developed it in SWI and I have no idea whether, or how, it will run in other Prologs. Probably not great, as usual. SWI has predicates to remove and rebuild tables on the fly, which XSB does not. That's convenient because it means you don't need to restart the Prolog session between learning runs. Still it's a long-term plan to get Vanilla to run on as many Prolog implementations as possible, so I really wouldn't expect you (or any other Prolog dev) to do anything to port it. That's my job.
Prolog modules are a shitshow of a dumpster fire, no objection from me. Still, better with, than without. Although I do have to jump through hoops and rudely hack the SWI module implementation to get stuff done. Long story short, in all the learners that come with Vanilla, a dataset is a module and all dataset modules have the same module name: "experiment_file". That is perfectly safe as long as usage instructions are followed, which I have yet to write (because I only recently figured it out myself). A tutorial is a good idea.
>> What do you really expect from putting it on github?
Feedback, I suppose. Like I say, no, you wouldn't be able to implement Vanilla by reading the literature on MIL, which is languishing a couple of years behind the point where Vanilla was created. Aleph enjoys ~30 years of publications and engagement with the community, but so does SWI and you're developing your own Prolog engine so I shouldn't have to argue about why that's OK.
From my point of view, Aleph is the old ILP that didn't work as promised: no real ability to learn recursion, no predicate invention, and Inverse Entailment is incomplete [1]. Vanilla is the new ILP that ticks all the boxes: learning recursion, predicate invention, sound and complete inductive inference. And it's efficient even [2].
Unfortunately there is zero interest in it, so I'm taking my time developing it. Nobody's paying me to do it anyway.
>> It's nice that you've got unit tests as you're saying but there aren't any in the repo.
Oh. I thought I had them included. Thanks.
>> With that in mind, I was personally more interested in ILP as a complementary technique to LLM code generation.
I have no interest in that at all! But good luck I guess. I've certainly noticed some interest in combining Logic Programming with LLMs. I think this is in the belief that the logic in Logic Programming will somehow magickally percolate up to the LLM token generation. That won't happen of course. You'd need all the logic happening before the first token is generated, otherwise you're in a generate-and-filter regime that is juuust a little bit more controllable than an LLM, and more wasteful. Think of all the poor tokens you're throwing away!
Edit: Out of curiosity, did you have to edit Aleph's code to get it to run on your engine? I had to use Aleph on the job a while ago and I couldn't get it to work, not with YAP, not with SWI (I tried a version that was supposed to be specifically for SWI, but it didn't work). I only managed to get it to run from a version that was sent to me by a colleague, which I think had been tweaked to work with SWI (but wasn't the "official" ish SWI port).
_____________
[1] Akihiro Yamamoto, Which hypotheses can be found with inverse entailment? (2005)
https://link.springer.com/chapter/10.1007/3540635149_58
[2] Without tabling.