Anthropic's original take home assignment open sourced

204 points • by myahio • today at 2:54 AM • 74 comments • view on HN

Comments

I consider myself rather smart and good at what I do. It's nice to have a look at problems like these once in a while, to remind myself of how little I know, and how much closer I am to the average than to the top.

➕ show 1 reply

pvalue005 • today at 5:36 AM

I suspect this was released by Anthropic as a DDOS attack on other AI companies. I prompted 'how do we solve this challenge?' into gemini cli in a cloned repo and it's been running non-stop for 20 minutes :)

➕ show 2 replies

languid-photic • today at 6:50 AM

Naively tested a set of agents on this task.

Each ran the same spec headlessly in their native harness (one shot).

Results:

    Agent                        Cycles     Time
    ─────────────────────────────────────────────
    gpt-5-2                      2,124      16m
    claude-opus-4-5-20251101     4,973      1h 2m
    gpt-5-1-codex-max-xhigh      5,402      34m
    gpt-5-codex                  5,486      7m
    gpt-5-1-codex                12,453     8m
    gpt-5-2-codex                12,905     6m
    gpt-5-1-codex-mini           17,480     7m
    claude-sonnet-4-5-20250929   21,054     10m
    claude-haiku-4-5-20251001    147,734    9m
    gemini-3-pro-preview         147,734    3m
    gpt-5-2-codex-xhigh          147,734    25m
    gpt-5-2-xhigh                147,734    34m

Clearly none beat Anthropic's target, but gpt-5-2 did slightly better in much less time than "Claude Opus 4 after many hours in the test-time compute harness".

➕ show 3 replies

bytesandbits • today at 6:41 AM

Having done a bunch of take home for big (and small) AI labs during interviews, this is the 2nd most interesting one I have seen so far.

➕ show 1 reply

sureglymop • today at 5:29 AM

Having recently learned more about SIMD, PTX and optimization techniques, this is a nice little challenge to learn even more.

As a take home assignment though I would have failed as I would have probably taken 2 hours to just sketch out ideas and more on my tablet while reading the code before even changing it.

➕ show 1 reply

avaer • today at 5:17 AM

It's pretty interesting how close this assignment looks to demoscene [1] golf [2].

[1] https://en.wikipedia.org/wiki/Demoscene [2] https://en.wikipedia.org/wiki/Code_golf

It even uses Chrome tracing tools for profiling, which is pretty cool: https://github.com/anthropics/original_performance_takehome/...

➕ show 1 reply

NitpickLawyer • today at 5:57 AM

The writing was on the wall for about half a year (publicly) now. The oAI 2nd place at the atcoder world championship competition was the first one, and I remember it being dismissed at the time. Sakana also got 1st place in another atcoder competition a few weeks ago. Google also released a blog a few months back on gemini 2.5 netting them 1% reduction in training time on real-world tasks by optimising kernels.

If the models get a good feedback loop + easy (cheap) verification, they get to bang their tokens against the wall until they find a better solution.

Maro • today at 6:03 AM

> This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.

Was the screening format here that this problem was sent out, and candidates had to reply with a solution within 2 hours?

Or, are they just saying that the latest frontier coding models do better in 2 hours than human candidates have done in the past in multiple days?

kristianpaul • today at 5:44 AM

“If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at [email protected] with your code (and ideally a resume) so we can be appropriately impressed and perhaps discuss interviewing.”

➕ show 1 reply

Incipient • today at 6:43 AM

>so we can be appropriately impressed and perhaps discuss interviewing.

Something comes across really badly here for me. Some weird mix of bragging, mocking, with a hint of aloof.

I feel these top end companies like the smell of their own farts and would be an insufferable place to work. This does nothing but reinforce it for some reason.

➕ show 1 reply

tucnak • today at 5:31 AM

The snarky writing of "if you beat our best solution, send us an email and MAYBE we think about interviewing you" is really something, innit?

➕ show 6 replies

tayo42 • today at 6:34 AM

I wonder if the Ai is doing anything novel? Or if it's like a brute force search of applying all types of existing optimizations that already exist and have been written about.

koolba • today at 4:26 AM

What is the actual assignment here?

The README only gives numbers without any information on what you’re supposed to do or how you are rated.

➕ show 3 replies

mips_avatar • today at 5:13 AM

Going through the assignment now. Man it’s really hard to pack the vectors right

dhruv3006 • today at 6:05 AM

I wonder if OpenAI follows suit.

➕ show 1 reply

greesil • today at 5:17 AM

This is a knowledge test of GPU architecture?

➕ show 2 replies

zeroCalories • today at 5:19 AM

It shocks me that anyone supposedly good enough for anthropic would subject themselves to such a one sided waste of time.

➕ show 4 replies

OhNoNotAgain_99 • today at 7:27 AM

[dead]

myahio • today at 2:54 AM

[flagged]

jackblemming • today at 4:44 AM

Seems like they’re trying to hire nerds who know a lot about hardware or compiler optimizations. That will only get you so far. I guess hiring for creativity is a lot harder.

And before some smart aleck says you can be creative on these types of optimization problems: not in two hours, it’s far too risky vs regurgitating some standard set of tried and true algos.

➕ show 5 replies

alt Hacker News

Anthropic's original take home assignment open sourced

Comments