logoalt Hacker News

Tiny hackable CUDA language model implementation

26 pointsby markusheimerllast Friday at 5:41 PM2 commentsview on HN

Comments

yobbotoday at 5:37 AM

Looks very nice, but I can't find numerical gradient checks, which is helpful when verifying that backward pass is correct:

https://github.com/markusheimerl/gpt/blob/main/transformer/a...

show 1 reply