I guess the meaning of "essentially different" essentially depends on the strength of the mathematical theory that you used to classify them!
When I first heard it I thought about using some kind of similar symmetry arguments (e.g. swapping left-move and right-move). Maybe there are also more elaborate symmetry arguments of some kind.
Isn't it fair to say that there is no single objective definition of what differences between machines are "essential" here? If you have a stronger theory and stronger tools, you can draw more distinctions between TMs; with a weaker theory and weaker tools, you can draw fewer distinctions.
By analogy, suppose you were looking at groups. As you get a more sophisticated group theory you can understand more reasons or ways that two groups are "the same". I guess there is a natural limit of group isomorphism, but perhaps there are still other things where group structure or behavior is "the same" in some interesting or important ways even between non-isomorphic groups?
If you count arbitrary transition tables, then you get a count of 63403380965376 [1].
If you count transition tables in which states are reachable and canonically ordered, then you get a count of 632700 * 4^10 = 663434035200 [2]. These machines can behave differently on arbitrary tape contents.
TNF further reduces these counts by examining machine behaviour when starting on an empty tape.
Turing machine program states are conventionally represented with letters: A, B, C, etc. The starting state is A.
Now suppose you are running a Turing machine program from the beginning. The only state it has visited so far is state A. It runs until it reaches a state that has not been visited yet. What state is it? B? C? D? According to "tree normal form", the name for that next state just is the earliest unused state name, which in this case is B.
Visited states so far are A and B. Run until an unvisited state is reached again. What is its name? C? D? E? Tree normal form says that the state will be called C. And the next newly visited state will be D, etc. In general, the canonical form for a Turing machine program will be the one that puts initial state visits in alphabetical order. (This concept also applies to the multi-color case.)
It's not possible to tell of an arbitrary TM program whether or not that program is in tree normal form. But this proof doesn't look at arbitrary programs. It generates candidate programs by tracing out the normal tree starting from the root, thereby bypassing non-normal programs altogether.
That is what "essentially different" means here.