logoalt Hacker News

tromptoday at 11:33 AM1 replyview on HN

The corresponding repo https://github.com/VictorTaelin/LamBench describes this as:

    λ-bench
    A benchmark of 120 pure lambda calculus programming problems for AI models.
    → Live results
    What is this?
    λ-bench evaluates how well AI models can implement algorithms using pure lambda calculus. Each problem asks the model to write a program in Lamb, a minimal lambda calculus language, using λ-encodings of data structures to implement a specific algorithm.
    The model receives a problem description, data encoding specification, and test cases. It must return a single .lam program that defines @main. The program is then tested against all input/output pairs — if every test passes, the problem is solved.
"Live results" wrongly links to https://victortaelin.github.io/LamBench/ rather than the correct https://victortaelin.github.io/lambench/

An example task (writing a lambda calculus evaluator) can be seen at https://github.com/VictorTaelin/lambench/blob/main/tsk/algo_...

Curiously, gpt-5.5 is noticeably worse than gpt-5.4, and opus-4.7 is slightly worse than opus-4.6.


Replies

lioeterstoday at 2:22 PM

As an admirer of your work with binary lambda calculus, etc., I'm curious to hear your thoughts on the author's company with HVM and interaction combinators. https://higherorderco.com/ I've always felt there was untapped potential in this area, and their work seems like a way toward a practical application for parallel computing and maybe leveraging LLMs using a minimal language specification.