Okay, the basic concept is simple. Start with a question, propose a theoretical answer, devise a test to see if that theory is true. If it holds up to testing then it is, if it doesn't, it's not.
The execution of this process however, is anything but simple. "Does wood make a difference in tone" may seem a deceptively simple question, but look closer to devise a test and it quickly cascades in to an incredibly complex series if questions and tests. Possible influences could potentially include:
Influence on primary signal (low volume transmission, resonance, reflection, and damping).
Influence on secondary signal (mid-high volume feedback resonance and transmission).
Resonant bar vibrations of the neck.
Resonant plate vibrations of the body.
Initial impedance at boundary points.
Cross-influence between strings at boundary points.
Vibrations induced in pickups through connection to chassis.
There's a few main concerns, but the list could go on much further.
Now to start you can take a shotgun approach, covering as wide a swath as possible. Gather all the reasonable range of influences in to the most (speculatively) divergent packages you can within reasonable scope of materials typically used. Hard maple neck and body against African mahogany neck and body would be a fair start. Identical hardware, identical pickups, etc, and the parts should all be CNC'd to identical specs (could be done with templates and careful measure, but would be easier and cheaper on a small scale test just to have them CNC'd).
Then comes the testing procedures. Both low volume and high volume tests would be needed for any meaningful results, ideally accompanied by the much more complicated blind tests of actual playing. First you must be digent about ensuring identical setups, from the string height, to pickup mounting and height, to the way the strings are wrapped on the tuners, and piles of other little details. Then you need a consistent drive mechanism. Mechanical picks may seem appealing, but in fac are quite difficult to ensure consistency. The more traditional lab approach for this type of testing is the wire pluck method. I typically use magnet wire from 38 up to 41 gauge, wrap it around a string at a marked point, and pull the ends wrapped over a rest to ensure consistent direction of drive. Fine magnet wire will break at a remarkably consistent force, and is capable of delivering a very consistent drive. This is a fairly consistent and proven method in industry research.
Still, in order to be sure your results are reliable you need to test your methods. This would include repeatedly tearing down and setting up a single test sample in the same manner you would with different test samples, to ensure you are able to get consistent results with no variables changed, and establish a range of errors. This is where testing gets quite interesting and at times frustrating, for no matter how perfect and consistent you think your controls are, there will always be a number of unexpected bugs to work out.
Then once methods are proven and error range established, testing can begin to look for meaningful differences. That's just one step though. To be worth doing you would also have to test at high volumes for influence from resonant feedback. This could be done in a controlled chamber with a signal generator (or any consistent signal) running at unaltered levels through an amp placed near the guitar. The metered signal could be taken directly from the pickups, or perhaps arranged in a way where natural sustained feedback is mic'd without use of a signal generator, though demonstrating consistency of this method would be much more challenging.
The guitar would have to have a reliable mounting stand, preferably with locator pins and shock mount suspension of some sort to both ensure consistent positioning and isolate it from the stand to a reasonable degree (a neck hanger with locators at strap button position could be a reasonable approach if designed well). Then this test layout again needs to be tested and refined to prove consistent response with the same materials and range of error established befor moving on to comparitive testing.
At this point there are two main possibilities - either there are notable differences detected, or there are not. If not, you can either call it done, or start over with further testing of different layouts and materials in search for a change that may have been missed. If a notable change is observed though, then you're just getting started. Now there are so many other areas to branch out in to.
First may be auditory testing - double blind listening tests to determine if these changes can be reliably identifies by average or trained listeners. This presents a whole slew of other challenges, such as devising ways in which the player can not know which sample they are demonstrating, which could mean not only painting them to opaque colors and consistent textures, but suspending the instruments on a mount while being played so that changes in weight could not be perceived. And of course there are issues of player consistency, player fatigue, listener fatigue, controlled listening environment - the more you know about this type of testing and all its potential interferences, the more complicated you realize it can be.
Of course so far we haven't even touched on whether our focus would be on factors of timbre or sustain, and how test methods would have to be configured differently and repeated to look at different factors. Then let's say after all of this you have found that differences between two particular wood samples can be detected both by controlled readings and auditory testing. What will you have accomplished at this point? Very little other than to show further testing may be warranted.
You won't have settled any internet spats, because people will still find reasons to disagree on the test methods ( no matter how bullet proof you make them) and the final impact on application in the field. More importantly, little data of use or interest will have been gained for professional use in the field.
To gain any data of practical value, these tests would have to be followed up with a much larger sampling size of other wood species and comparisons within like species. You would have to continue focused testing on influence of neck materials vs body materials, or whether changes may have more or less impact with different body/neck styles and configurations. Great value could be gained by looking in to what factors of the wood are of greatest significance or impact different factors. What degree of change is affected solely by density and mass, vs stiffness, elasticity, damping coefficient? These would be valuable things to know I you want your test results to actually have any usefulness in guiding decisions for builders and buyers (and even for technicians in troubleshooting).
And as long winded as this post may be, trust me that it is just the tip of the iceberg in executing reliable testing which would yield any meaningful or useful results. I have both a personal and professional interest in doing this sort of testing, but not enough to warrant much investment beyond a pet project. I also know enough about reliable testing standards to reckognize that if tests are simplified much beyond this, they would be fairly useless in my opinion.
I know some have different priorities in testing, and different standards for proclaiming a conclusion to be certain. I'm always interested in seeing what others can try and come up with, but if I can find too many holes in the methods and reasoning, I have a hard time finding much value gained from their efforts.
Reliable testing is often very hard, and the unshakable certainty some seem to hold (on either side) without such effort is not only unjustified, but I have to say a bit disturbing.
*please excuse any typos. I blame auto-correct, and I'm not going back to edit them all.