The equivalence of inertial and gravitational mass is the foundation of the geometric description of spacetime provided by general relativity. Even before relativity, it was regarded as a remarkable and somewhat mysterious feature of nature, certainly not to be brushed away as a mere coincidence. Newton measured it to a precision1 of a part in thousand using a simple setup with identical pendula, containing gold, silver, lead, glass, sand, salt, wood, water, and wheat. Later experiments, such as the ones by Bessel and Eötvös, culminating in the measurements taken by the MICROSCOPE satellite, generally focused on increasing precision at the expense of relying on more theory.
For instance, the MICROSCOPE satellite reached a precision of 10-15 by comparing the free fall of masses made of titanium and platinum alloys. The choice of materials is aimed at maximising the difference in the neutron/proton ratio, which is considerably higher in platinum (~1.5) with respect to titanium (~1.2). In fact the experiment is really aimed at testing whether protons and neutrons respond to gravity in the same way, the reasoning being that most ordinary matter is made of protons and neutrons anyway. Strictly speaking, this is not an unconditional test of the universality of free fall: it is conditional on atomic theory and related ideas on the way matter is organised at the microscopic level. By comparison, Newton’s approach was agnostic to atomic theory: he tested a bunch of materials, some of which are pure elements (gold, silver, lead), some pure compounds (water, salt, glass) and some are neither (sand, wood, wheat). Since Newton’s work predated most of modern chemistry, this was the obvious approach to take. Later on, we grew confident in our understanding of the constituents of matter, so experimentalists gave up testing the universality of free fall on stuff like wood or wheat, which we know to be made up of carbon compounds and water after all.
If you ever wrote and debugged code, you are likely acutely aware of the fact that what you know and what really is going on inside your code are two different beasts. Like that tensor that obviously should have shape (64, 32) but when you go and check it has shape (32, 64). If anything, nature ought to be harder to understand than man-made code, suggesting that modern experiments may have a blind spot. By foregoing testing a disparate array of compounds, modern experiments gained in precision but they lost in breadth. What if zinc chloride experiences a subtle deviation from the universality of free fall? We may never find out2
Luckily we can fix this blind spot with… drum roll… citizen science!
The idea is to mass-produce a simple setup, like an aluminium or plastic support with two identical pendula attached to it, pre-fill the bobs of the pendula with equal gravitational masses of various (preferably nontoxic and cheap) compounds, and sell the experimental kits to high schools, science clubs, libraries or even interested individuals. These would then run the experiment and report its results. Measuring the period difference between the two pendula should be well within the means of any high-school lab technician or university student in physics or chemistry, and at any rate we could try to make the experience of assembling the equipment as Ikea-like as possible.
If a single experimental kit costs around $50 to make (see below), testing a hundred compounds3 should set us back a few grands, which can be recouped by selling the kits. The buyer would have the opportunity to use the kit as a demonstration tool in lab class, but also as a way to participate in making actual science. Within a reasonable time frame they would be asked to report back with their measurements. Ideally, when the data collection is terminated for a large enough fraction of the kits, a paper should be published. Individual buyers could be acknowledged by name or even enter as coauthors.
Some redundancy is probably necessary in the face of people buying the kit but not taking measurements or screwing up in some other way. The optimal level of redundancy is to be determined empirically, but all compounds should appear at least twice. This way if the same compound triggers a violation detection twice then we will know that we may be onto something! We could also have a few setups that are rigged to produce a violation signal (the simplest way would be to invisibly lower or raise the center of mass of the bob, obtaining a different effective length) and a few others that are guaranteed not to produce a violation (for instance by having both bobs filled with the same compound). This would keep people honest, allowing us to estimate the rate of participants who report fake experimental results without having performed the experiment. Obviously, in the very unlikely event that a genuine violation is found, we would repeat the experiment for the compounds in question with much more accurate equipment.
Any input on how to improve on this scheme or on how to make it happen?
Addendum: more details on the contraption
Here is a quick sketch of how the experimental apparatus might look like. The support would be telescopic plastic tubing that can be screwed in place or can attach to a table with a clip. Perhaps we can include a built-in level to make sure it’s horizontal. The two plastic bobs come pre-filled with our compounds, making sure that the center of mass is at the same height. This will require some lightweight padding inside the bob holding the denser material if the density of the two compounds being tested is very different. The position of the center of mass should be controlled to within 1 mm if the wires are 1 m long. Under the bobs we have a thin protrusion that can be observed from the side. A camera or phone may be attached to record video and a mirror with a ruler on the opposite side makes it easy to check the position of the bobs.
Costs
Bobs cost a few cents each. A lab support goes for $12 at a quick google, but we can probably do much better. A km of fishing line is under $10. Fishing swivels are 50 cents each. So one unit can be built for about $15 ignoring labor and shipping costs, and of course ignoring the chemical compounds that need to be tested.
Lab grade magnesium sulphate costs $27 for half a kg. Let’s say that the bobs contain around 60 cubic cm, so that’s about 160g or $9. A bit too much for my taste, but maybe we do not need lab grade. Let’s consider a cost of $15 for filling two bobs in the average.
Now on to labor costs. Weighing the compounds and preparing the package should take about half an hour once the procedure is streamlined. At twice minimum wage this is $15, at least in Québec, before overhead4.
All told, one experimental kit would cost ~ $50 plus shipping. This seems reasonable.
Pitfalls, limitations, criticism
Why build many experimental kits and distribute them? It seems that a single lab could test a few hundred compounds over a few weeks. So why build many kits and distribute them?
Well, if we manage to sell the kits (or to give them away in exchange for a small donation) we can make the project scalable. Ideally, we could offset all the costs initially incurred for assembling the kits and procuring the compounds, and maybe also the costs related to data aggregation and analysis. If we pull this off, we can test a larger number of compounds. High school labs and freshman courses in physics/engineering/etc. are relying on hands-on demonstrations and experiments with pendulums anyway. Wouldn’t it be cool if they could contribute to fundamental physics research while they are at it? Finally, we can collect data on how honest people are when performing experiments and reporting experimental results. How many rigged bobs will be reported as showing no violation?
Typically expressed as (mg –mi)/(mg+mi) or similar ratios. Feel free to debate whether I should have used the word “accuracy” instead.
The argument that the binding energy of any chemical compound is a tiny fraction of its rest mass is clearly putting the cart before the horses.
Picking and sourcing the compounds may be harder than it seems.
At this stage we can pretend CAD:USD = 1. This should balance out the overhead lost to taxes, etc. It’s just a very rough ballpark estimate.
https://botland.store/motion-sensors/18691-ir-beam-interruption-sensor-led-5mm-0-100cm-5904422360603.html
It seems that the closest we got to systematically testing different materials is the work of Potter 1923 (https://royalsocietypublishing.org/doi/pdf/10.1098/rspa.1923.0130) who tested brass, lead, steel, ammonium flouride, bismuth, paraffin wax, duralumin, and mahogany. He reports that Bessel 1827 tested iron, zinc, lead, silver, gold, Fe3O4, CaCO3, clay, quartz, and water.
Repeating Potter’s experiments with ~10^2 compounds (possibly containing tens of different elements) would be a pretty big step forward.