Letter to the editor - Sent August 29th, 2022
Dear Professor Moseley,
We demand an immediate retraction of the recently published paper by Fadil et al. (Fadil, F.; Samol, C.; Berger, R.S.; Kellermeier, F.; Gronwald, W.; Oefner, P.J.; Dettmer, K. Isotope Ratio Outlier Analysis (IROA) for HPLC–TOFMS-Based Metabolomics of Human Urine. Metabolites 2022, 12, 741. https://doi.org/10.3390/ metabo12080741) in Metabolites because it is erroneous in method, and therefore results. Finally, it is slanderous and defamatory due to analytical failures/negligence on the part of the author(s).
In brief, the process described in this publication bears no relationship to the TruQuant protocol as published: 1) The LTRS (Long Term Reference Standard) is diluted to half (0.5X) of the concentration required by the protocol. 2) The internal standard which is formulated for 30 samples is distributed over 60 samples and diluted to 0.66X of the concentration required. 3) The IS-Only (IROA blanks) are at 0.33X of the required concentration. 4) The authors did not follow a calibration process to balance the sample concentration to the internal standard concentration, which is a required prerequisite to assure an adequate LC-MS signal strength. 5) Finally, in June of 2019 ClusterFinder Version 4 was released to support the TruQuant protocol, but instead they report that Version 3 was used. Version 3 had minimal support for the TruQuant protocol. Each of these four points will be discussed in more detail below. While any one of these would cause problems in the successful conclusion of the protocol, the sum of these issues suggests negligence on the part of the experimenter(s) who clearly did not comprehend the analytical process.
In depth:
1) All of the dilutions are different both from one another and the published protocol. The explicitly stated IROA TruQuant protocol ensures that LTRS, IS-Only (blank), and IS have identical concentrations (20 ug 95% 13C-IS per sample) to find, identify, quantitate and normalize the samples. As a result of these altered concentrations, it was not possible to get accurate results since many peaks would not be found in one or more of the sample types, and lost peaks cannot be normalized.
a. “For the internal standard, five vials of IROA–IS were used. To each vial, 600 µL of pure water was added, instead of the recommended 1200 µL, to keep a proper concentration after dilution with urine. All five resulting IS solutions were mixed to form a homogenous 3 mL IROA–IS solution. Then, 10 µL of IROA–IS was added to each diluted urine sample, resulting in a final volume of 30 µL. Two in-house QC urine samples were prepared similarly. QC1 and QC2 have creatinine concentrations of 19.71 mM and 8.92 mM, respectively, and were diluted 1:8 and 1:4 with water correspondingly, to reach a volume of 120 µL. Sixty microliters of IS was then added to each QC, which allowed one to maintain the same sample: IS ratio used to prepare the samples. The final volume of 180 µL of each QC was necessary to allow multiple injections throughout the experiment.”
Adding 10 ul of double strength IS to 20 ul of sample resulted in a concentration of 66% of the protocol requirements, i.e. not “the correct concentration” as stated in the text of the publication in question. Note: This is the strongest concentration of any of the reagents used (See below: the LTRS is at 50% strength and the IS blank is at 33% strength). None of the samples, controls or QA, were used at the strength required by the protocol. The protocol requires that all protocol samples, LTRS, experimental, and blanks contain the same concentrations for the Isotopic side, as previously noted, 20 ug 95% 13C-IS per sample. The software relies on this equality to process the dataset to assure that chromatographic performance of the different sample types will be more similar. Unfortunately, we have no indication of the injection volume for these extremely dilute samples, but in any case, this dilution would render these samples unsuitable for processing, i.e. too dilute to for the identification and quantitation of most metabolites.
Note: Points 1 a, b, and c only consider the C-13 isotopic concentrations. The natural abundance sample concentration will be discussed in Point 2.
b. “For the LTRS, 80 µL of water was added to reach a similar concentration of the IS in the samples.”
The LTRS is diluted with 80 ul of “water” (purified?) despite the explicit instruction to dilute it to 40 ul in order to have the LTRS and IS contain equal concentrations; however, this dilution is at 50% of the protocol requirements, and therefore unequal to the concentration of the IS in the experimental samples. We speculate that this may have been done because the experiment included 244 samples plus 10 to 20 additional QA samples and the experimenter needed to make LTRS injections before and after every approximately 10 injections; so the LTRS was diluted (incorrectly) in order to achieve the number of injections needed. However, despite the authors statement that he did this to match the IS the concentration of the LTRS, in fact it did not match the concentration. Thus, this was not only a deviation from the protocol but also a fundamental error in calculation.
The author had additional LTRS and our recommendation in these situations , as described in the literature supplied with the reagents, is to pool 2 or 3 LTRS samples, as was done with the IS, when more injections are needed to assure the LTRS concentration is equivalent to the IS concentration. We normally see between 500 and in excess of 1,000 IROA peaks in the LTRS, while the author does not state the volume of the injection, the author must have injected only 1 or 2 ul of the now half strength LTRS in order to get the 26+ injections that were needed. At half protocol strength and small volume injections, there was no possibility of finding between 500 and 1,000 IROA peaks.
c. “IS blank was prepared by adding 80 µL water to a 40 µL IS solution.”
The IS Blank is supposed to be present at full strength; the experimenter prepared the material at 33% of the protocol requirements. This is critical because the IS blank is used to determine the lowest suppressed values. This lack of coordination renders this QA sample useless. Again, no injection volume information is given but unacceptably large injections would have been required to overcome the extreme dilution of these critical samples. As in point 1b, another critical deviation to the protocol.
2) “Normalization is needed to balance out these differences among individuals and reveal actual biological variance. Creatinine is commonly used to achieve that as it is mainly excreted by glomerular filtration at a constant rate without reabsorption in the renal tubule [26]. Urine was diluted with pure water to a similar creatinine concentration to correct for urinary output.”
The experimental samples were diluted to a ”similar” creatinine concentration (as described in their previous publication (Ref 26)) without benefit of the prerequisite calibration step that would have determined the creatinine equivalence level needed to balance the IS. The C12 and C13 peaks for most compounds are optimally measured when they are closer to a 1:1 ratio. A calibration experiment would have determined the creatinine equivalence value for the IS that would have given the greatest number of peaks found and assured their accuracy. From the results, it is clear that the samples were too dilute for the IS concentration and a calibration step here would have provided the correct balance.
The Calibration experiment, its importance and the benefits derived from it, are described in great detail in the instructions included in the literature included with the reagents. While it is not part of the protocol itself, it is described as the preliminary experiment required to calibrate the SOP for sample preparation to the IS. It is recommended to be done once per SOP.
The ”Normalization” the authors addressed is an important pre-injection normalization. However, as they discuss in the introduction, the TruQuant protocol affords both a correction for in-source ionization losses (nominally, ion suppression) and uses the suppression-corrected data to do a post-injection sample-to-sample normalization. They did not attempt to acquire or use the suppression-corrected or post-injection normalization data. These two steps are specifically designed to reduce the impact of starting sample differences. Because of the failure to follow the TruQuant protocol they could not do either the suppression-correction nor post-injection normalization.
In the recommended Calibration experiment samples are made that differ in concentration over 10+ fold and yet are completely normalized. Based on the range of the correction routinely achieved in the Calibration experiment it is possible that they did not need the pre-injection normalization as the post-injection normalization may have been sufficient, and they would be using suppression-corrected data and access to more compounds in the more concentrated samples. Is this better than the old dilute and shoot? We do not know but that would have been a more worthwhile exploration.
3) The Authors used an outdated version of ClusterFinder. ClusterFinder Version 3 did not have any of the optimizations needed to find and use the TruQuant data. The ClusterFinder Version 4 software was released in June of 2019, over three years before the submission of this paper. The software is distributed free of charge to people who use our reagents. The software times out after six months at which time most users routinely update their software upon request. Most importantly, all users of IROA materials purchased through IROA Technologies were alerted to the release of Version 4. Likewise, requests for software from users who have purchased the material through Sigma-Aldrich are also directed to IROA. IROA also alerted metabolomics users to the new version that were attendees of ASMS (who either visited with our posters or booth, or who had listed Metabolomics as an interest), the Metabolomic Society, and the Metabolomics Association of North America (MANA). We have no records that this group ever requested a new version of the software.
Version 4 introduced the TruQuant Protocol as a semi-automated process that takes into consideration all sample types and handles them accordingly. The new version was significantly upgraded to achieve better identification through improved peak library handling, and advanced quantitation with routine ion-correction, and normalization.
4) “The LTRS data files were evaluated in ClusterFinder, and a reference library was created. This library consists of the peak pairs of the main 12C and 13C peaks, i.e., the unlabeled and the fully labeled peaks of each compound, respectively. ClusterFinder did not manage to handle all 331 data files at once. Therefore, further data evaluation had to be performed in MZmine 2 using the reference library to identify peak pairs. Peak pairs were matched manually, and orphan entries were excluded.”
ClusterFinder finds and interprets the entire isotopic envelope for all isotopic balances, including natural abundance, U-13C 5%, U-13C 5%. Because of the nature of the labeling in these situations the full isotopic envelope needs to be summed to determine the actual quantity of material on either side. In the case of natural abundance, the error is real but minimal. In the case of both 5% and 95% labeling the isotopic peak envelopes (the collection of Isotopologues, “peaks”, and the number of isotopomers that make up each of them) are very closely approximated by a binomial distribution and is completely dependent on the number of carbons in a molecule. Thus, for a molecule containing 6 carbons the M+1 and M-1 peaks alone represent 37% of the height of the base peak, and for a 10 carbon molecule represents 51% of the molecules in the base peak. However, The data used for the analysis was not generated by ClusterFinder but rather by MzMine2 and only data from the monoisotopic peaks was collected which would not have been sufficient for accurate measurements. It is hard to fathom the effect of this discrepancy, but it fundamentally means that every peak ratio was incorrect.
ClusterFinder can and has been used for experiments with several hundred samples, the central strategy of CF has always been to run an intensive non-targeted analysis of the standard samples (in this case, the concurrently-run LTRS samples ) and write the results out as a project-specific “Search Library” file. This Search library file is then used to perform a targeted analysis against any number of experimental files. This is strategic, unlike the non-targeted search, the targeted search is very fast, uses very little memory and produces non-sparce datasets. The use of this project specific Search Library allows it to be run against any number of experimental samples. Either individually or as multiple sets that will always be searched for the same set of compounds.
5) “Further data curation was done based on quality control. For instance, the 12C peaks should be minimal in the IS blank, so if a considerable 12C peak (12C/13C > 20%) was detected there, the pair was excluded.”
Any signal in the C12 channel for the IS-only (Blank) sample represents carry-over. Acknowledging that they allowed up to 20% carryover throws all of their analysis into question. To make it worse the 20% carryover would have been against a diluted (33% strength) IS Blank, and thus represents in reality a 60% carryover, with the implication that they actually saw this much carryover? It is not clear how to even discuss this point as the dilutions were irresponsibility incorrect.
6) Had the reviewer(s) compared the Methods in this paper to the published IROA TruQuant protocol, this paper would never have been accepted as the original protocol was not followed. Neither the experimenter nor the reviewer(s) understood the underlying principles upon which the protocol is based. The fact that the experiment failed is entirely based on the Methods employed because the concentrations used in this paper were not consistent with the published protocol (always diluted to reduce signal) and the experimental samples were maximally diluted rather than adjusted for equivalence. In this case the reviewer(s) should have noticed the quoted text (above in points 1 and 2) and immediately realized that the analytical integrity of the protocol had been abandoned.
7) Since the results were not positive it is not clear why a reviewer could accept such a paper. Negative data should be generally unpublishable since the source of the error can be almost anywhere, but clearly in this case it was because the published protocol was not followed, appropriate software was not used to gather the data, and there was a general misunderstanding of the principles involved.
Full Disclosure: During December of 2019 the lead author contacted IROA Technologies because he saw so few peaks. 1) He assured us that he had followed the protocol and used ClusterFinder (CF). 2) We also saw many less peaks than normal but could not understand why he saw so few peaks. We believed that he had to be using CF V4 since it had been out for over half a year. CF V3 would have found the same number of IROA bins as CF V4, but with CF V4 we introduced a reintegration step that increased quantitative accuracy significantly. Upon reading this paper, it is clear that the lowered number of peaks was due to the dilutions of all sample types, and using smaller injections. We believe that most of the error was due to using mzMine2 rather than ClusterFinder for data extraction.
Wishing the best for metabolomics,
Chris Beecher