I might've missed it, but I feel this analysis is lacking a control? A category which there is no reason to assume would flinch. How about scoring how much it flinches when encountering, say, foods? If the words sausage, juice, cauliflower and burrito results in a non-0 flinch score, that would indicate that there's something funky going on, or that 0 isn't necessarily the value we should expect for a non-flinching model.