Semantics/irrelevant. There are two motivations, you're either trying to highlight ways in which things are the same/similar, or ways in which they are different. If your ideal premise is that while we know they are different, that we shouldn't assume mainstream groupthink is accurate, or that what is thought of as better than, is actually not, while it may be noble, is basically the definition of a gotcha test.
I won't agree that the tests are are only for two reasons. But I see your pushing the extremes point. No doubt either can be used to sway and manipulate. So, intent does matter, but so does the methodology. And a lot of other things. And the "Emphasis" or "Gothcha" is not inherently a bad idea either.
You yourself pointed out the obvious "gotcha" benefit: Reduce cost and maintain quality. I'll just make up this copper wire example. Supplier A is providing Copper wire that is 99.997% pure to make our pickups. They are forced to raise prices, thus our pickups need to have a price increase. We search for a different supplier. we find one with a great price but it is 99.975% pure. If we can use that and get exactly the same result, we can not only keep our costs the same to the customer, but add some extra money to employee college fund. So yes - I want to do a "Gotcha" test where the intent is to see if there are any reliably (<= Remember that word) different opinions in the product using 99.975% pure wire.
If I have three different people each make three runs with three different instrument (Fancy industry term - Multi-vari chart) and there is no consistent pattern of people, part, or in this case guitar, differences we are going to call it a "GOTCHA"
But it isn't anything diabolical, or biased, or any sort of ill intent. It was a straight up test of people, product, and tool bias.
But you can throw a whole test like that into the garbage if simply label one pickup as Duncan and another as GFS- even if they are the exact same pickup. Why? People's cognitive bias will have them imagine things that are not there.
So yes - this is a "Gotcha" test. But your example of Yngwie show him doing that to himself. The listening environment you described had him in a practical setting trying to if the differences he heard mattered to him. I will say I'm surprised. I have seen him play and that Play Loud sticker...he means that!
But again - I'm doing this for Yngwie. I'm doing this as per the (what I believe) are the majority of people yapping about this stuff. And I stick with what I said:
I want you to CALL YOUR shot:
I want you to NAME you favorite
I want you to listen to multiple blind recordings and name what they were
I want you to SIGN YOUR NAME
And then we'll see who has skill and ears and who is full of crap....
I'm not even - nor do I believe I ever said there were not actual differences, even in the sound. And if the magnets went from smooth to roughcast etc....well hell. That's almost a different pickup. But as with all things, just because it is old doesn't mean it is good, or better. Or that you can even tell it is different. We'll see.
More on "reliably" different later...