logoalt Hacker News

Anthropic's original take home assignment open sourced

204 pointsby myahiotoday at 2:54 AM74 commentsview on HN

Comments

lbreakjaitoday at 7:10 AM

I consider myself rather smart and good at what I do. It's nice to have a look at problems like these once in a while, to remind myself of how little I know, and how much closer I am to the average than to the top.

show 1 reply
pvalue005today at 5:36 AM

I suspect this was released by Anthropic as a DDOS attack on other AI companies. I prompted 'how do we solve this challenge?' into gemini cli in a cloned repo and it's been running non-stop for 20 minutes :)

show 2 replies
languid-photictoday at 6:50 AM

Naively tested a set of agents on this task.

Each ran the same spec headlessly in their native harness (one shot).

Results:

    Agent                        Cycles     Time
    ─────────────────────────────────────────────
    gpt-5-2                      2,124      16m
    claude-opus-4-5-20251101     4,973      1h 2m
    gpt-5-1-codex-max-xhigh      5,402      34m
    gpt-5-codex                  5,486      7m
    gpt-5-1-codex                12,453     8m
    gpt-5-2-codex                12,905     6m
    gpt-5-1-codex-mini           17,480     7m
    claude-sonnet-4-5-20250929   21,054     10m
    claude-haiku-4-5-20251001    147,734    9m
    gemini-3-pro-preview         147,734    3m
    gpt-5-2-codex-xhigh          147,734    25m
    gpt-5-2-xhigh                147,734    34m
Clearly none beat Anthropic's target, but gpt-5-2 did slightly better in much less time than "Claude Opus 4 after many hours in the test-time compute harness".
show 3 replies
bytesandbitstoday at 6:41 AM

Having done a bunch of take home for big (and small) AI labs during interviews, this is the 2nd most interesting one I have seen so far.

show 1 reply
sureglymoptoday at 5:29 AM

Having recently learned more about SIMD, PTX and optimization techniques, this is a nice little challenge to learn even more.

As a take home assignment though I would have failed as I would have probably taken 2 hours to just sketch out ideas and more on my tablet while reading the code before even changing it.

show 1 reply
avaertoday at 5:17 AM

It's pretty interesting how close this assignment looks to demoscene [1] golf [2].

[1] https://en.wikipedia.org/wiki/Demoscene [2] https://en.wikipedia.org/wiki/Code_golf

It even uses Chrome tracing tools for profiling, which is pretty cool: https://github.com/anthropics/original_performance_takehome/...

show 1 reply
NitpickLawyertoday at 5:57 AM

The writing was on the wall for about half a year (publicly) now. The oAI 2nd place at the atcoder world championship competition was the first one, and I remember it being dismissed at the time. Sakana also got 1st place in another atcoder competition a few weeks ago. Google also released a blog a few months back on gemini 2.5 netting them 1% reduction in training time on real-world tasks by optimising kernels.

If the models get a good feedback loop + easy (cheap) verification, they get to bang their tokens against the wall until they find a better solution.

Marotoday at 6:03 AM

> This repo contains a version of Anthropic's original performance take-home, before Claude Opus 4.5 started doing better than humans given only 2 hours.

Was the screening format here that this problem was sent out, and candidates had to reply with a solution within 2 hours?

Or, are they just saying that the latest frontier coding models do better in 2 hours than human candidates have done in the past in multiple days?

kristianpaultoday at 5:44 AM

“If you optimize below 1487 cycles, beating Claude Opus 4.5's best performance at launch, email us at [email protected] with your code (and ideally a resume) so we can be appropriately impressed and perhaps discuss interviewing.”

show 1 reply
Incipienttoday at 6:43 AM

>so we can be appropriately impressed and perhaps discuss interviewing.

Something comes across really badly here for me. Some weird mix of bragging, mocking, with a hint of aloof.

I feel these top end companies like the smell of their own farts and would be an insufferable place to work. This does nothing but reinforce it for some reason.

show 1 reply
tucnaktoday at 5:31 AM

The snarky writing of "if you beat our best solution, send us an email and MAYBE we think about interviewing you" is really something, innit?

show 6 replies
tayo42today at 6:34 AM

I wonder if the Ai is doing anything novel? Or if it's like a brute force search of applying all types of existing optimizations that already exist and have been written about.

koolbatoday at 4:26 AM

What is the actual assignment here?

The README only gives numbers without any information on what you’re supposed to do or how you are rated.

show 3 replies
mips_avatartoday at 5:13 AM

Going through the assignment now. Man it’s really hard to pack the vectors right

dhruv3006today at 6:05 AM

I wonder if OpenAI follows suit.

show 1 reply
greesiltoday at 5:17 AM

This is a knowledge test of GPU architecture?

show 2 replies
zeroCaloriestoday at 5:19 AM

It shocks me that anyone supposedly good enough for anthropic would subject themselves to such a one sided waste of time.

show 4 replies
OhNoNotAgain_99today at 7:27 AM

[dead]

myahiotoday at 2:54 AM

[flagged]

jackblemmingtoday at 4:44 AM

Seems like they’re trying to hire nerds who know a lot about hardware or compiler optimizations. That will only get you so far. I guess hiring for creativity is a lot harder.

And before some smart aleck says you can be creative on these types of optimization problems: not in two hours, it’s far too risky vs regurgitating some standard set of tried and true algos.

show 5 replies