No human baseline to compare it to. Without that you are missing an important check on the task being poorly constructed. More importantly there is an implied reference thats missing. The implication is that people would have done better, or that perfect agreement is possible.