I thought the title meant the training data used was ethics content and ethical reasoning. Turns out "ethically trained" means the training data used doesn't violate copyright laws.
Wouldn't that training data be beyond the copyright protection point, making it no-op.
I believe the works are no longer under copyright. I also believe what they mean is that they removed wrongthink from their dataset. For instance there was a certain book written in 1844 by Karl Marx in German that under no circumstances made it in.
This ofc means that the LLM is completely pointless.
I thought it was trained trained using Victorian ethics at first... Like it was only trained on computers powered by coal mined by children.