Take a look at when this was invented, it's a critical detail in evaluating all this, it was 1913! They were working with the very limited technology they had, they couldn't detect the letters and map them to a particular new tone or chord that might be easier to understand, that tech just wasn't possible [0]. They had to directly translate the image of the letters on simple photo receptors into a corresponding frequency value.
[0] As I was writing this I did have the wild thought that in theory if you had the weights already you could, in theory, implement a very basic character recognition neural net with analog circuitry using vacuum tubes that could recognize letters for direct mapping to sound but it's entirely impractical to create from scratch in reasonable time frames. Maybe over the span of decades you could manually tune one?