The data sets aren't naively fed into the training runs. Instead, training attempts to sample...

NiloCK • yesterday at 3:32 PM • 1 reply • view on HN

The data sets aren't naively fed into the training runs.

Instead, training attempts to sample more heavily from higher quality sources, with, I'm sure, a mix of manual and heuristic labeling.

Replies

ffsm8 • yesterday at 5:32 PM

fwiw, no llm ive ever used generated in the writing style newspapers and -sites use - hence i honestly doubt they've been given a meaningful boost in relevancy.

their idioms would leak occasionally otherwise

alt Hacker News

Replies