I worked for a while on extremeophile Archaeal viruses - the type that infect organisms that manage to live in volcanic hot springs, for instance. These are ecological niches that are old, and extremely divergent. There's little genetic exchange between life around the hot springs, and life within them.
The typical route of discovering those viruses was first genetic. When you get a genome (especially back when this work was initiated), you'd BLAST all the gene sequences against all known organisms to look for homologs. That's how you'd annotate what the gene does. Much more often than not, you'd get back zero results - these genes had absolutely no sequence similarity to anything else known.
My PI would go through and clone every gene of the virus into bacteria to express the protein. If the protein was soluble, we'd crystallize it. And basically every time, once the structure was solved, if you did a 3D search (using Dali Server or PDBe Fold), there would be a number of near identical hits.
In other words, these genes had diverged entirely at the sequence level, but without changing anything at the structural (and thus functional) level.
Presumably, if AlphaFold is finding the relationship, there's some information preserved at the sequence level - but that could potentially be indirect, such as co-evolution. Either way, it's finding things no human-guided algorithm has been able to find.
What about convergent evolution? Are you ruling that out because you reason that there are many possible structures that could do the same job so it's too much of a coincidence how close it matches?
Can you explain to a layman how wildly different genes can produce identical proteins?
> Presumably, if AlphaFold is finding the relationship, there's some information preserved at the sequence level
This is not my area of expertise, and maybe I'm misunderstanding this, but I thought that what AlphaFold does is extrapolate a structure from the sequence. The actual relationship with the other existing proteins would have been found by the investigators through other, more traditional means (like the 3D search you mentioned).