Dear editor, dear reviewer,
We carefully read the retraction request made by the concerned readers, in fact the company IROA technologies. It goes without saying, that we take the readers’ concerns very seriously. We are very sorry that the paper is perceived negatively by the company. We do not share this sentiment as the paper also states advantages of using the IROA standard. We employed the IROA standard as a stable isotope labelled internal standard for untargeted metabolomics of a highly complex matrix that is hugely different from yeast. A major concern seems to be the low number of IROA peak pairs described. Of course, this can be boosted, but the major focus of the work is the correlation of the 12C/13C ratios with quantitative data and not the number of detected peaks. In the paper we clearly state:
“In general, using a yeast extract for urine analysis is not optimal. We, however, did not address the other aspects of using the kit here, such as the range of covered metabolites, which would be readily affected by the matrix type. We, instead, focus on a few identified metabolites and investigate the use of a complex IS and its ability to improve the data correlation to a reference dataset.”
In the following letter we will address the concerns raised. Paragraphs taken from the retraction request are marked in blue.
“the process described in this publication bears no relationship to the TruQuant protocol as published”
The reader is correct, we did not follow the protocol slavishly but used a strategy adapted to the special requirements of urine samples. We basically used the kit as source for a complex stable isotope labelled internal standard. In retrospect we realize that we should have stated more clearly that we, for good reasons, did not intent to employ the full workflow. Nevertheless, all steps are detailed in the experimental section and every deviation was a compromise we deemed acceptable to achieve a practical usage of the kit in our workflow and test its ability to be incorporated in the analysis of patients’ urine specimens.
Point 1
“All of the dilutions are different both from one another and the published protocol.”
“The LTRS (Long Term Reference Standard) is diluted to half (0.5X) of the concentration required by the protocol.”
This is a deviation from the protocol, but we only used the LTRS to identify IROA peak pairs. The number of pairs detected will of course go down with a lower LTRS concentration, but in the protocol published on the Sigma website, an injection volume of 3 or even 2 µL is recommended for the protocol in positive mode. Injecting 5 µL of 0.5X LTRS (i.e., 2.5 µL 1.0X LTRS) falls within this range.
Please note: Multiple times, the reader points out that the injection volume is not given. This is not correct. In the experimental section 4.3, we clearly state “Sample volumes of 5 μL were injected.” (A sample is every vial in the autosampler of the LC system.)
The number of detected peak pairs is also influenced by the LC conditions (stationary phase, gradient, etc.) and of course the mass spectrometer in use.
It is correct that 10 µL of double strength IS were added to 20 µL of sample resulting in 66% of the protocol recommendation. This was chosen to keep the dilution of low concentrated urine samples to a minimum. All samples with a creatinine concentration below 3 mM were directly used (20 µL pure urine plus 10 µL IS). It is a common consent in the metabolomics community that sample preparation in untargeted metabolomics should be non-selective and as simple as possible to keep the sample composition constant. Often, a dilute-and-shoot strategy is used, where all samples irrespective of their individual creatinine concentration are diluted by a constant factor such as 1:4 or 1:2. Unnecessarily drying the samples and reconstituting them in IS solution is not only time-consuming and impractical with large samples sets, but it also bears a large risk of changing the sample composition and thereby, introducing additional unwanted variance in the data. Therefore, we did not follow the recommendation in the protocol to dry the samples but used the described compromise.
“Adding 10 ul of double strength IS to 20 ul of sample resulted in a concentration of 66% of the protocol requirements, i.e. not “the correct concentration” as stated in the text of the publication in question.“
The quoted expression “the correct concentration” is not to be found in our publication. We wrote:
“To each vial, 600 µL of pure water was added, instead of the recommended 1200 µL, to keep a proper concentration after dilution with urine.”
We did not want to overdilute the IS, aiming for an acceptable concentration.
(Note that by writing “instead of the recommended 1200 µL” we did point out a deviation from protocol.)
The only deviation in concentration is between samples/ IS blanks (both 66%) and the LTRS (50%). The “IS concentration” was 66% both in the samples and in the IS blanks. Adding 80 µL water to 40 µL IS solution (of course the double strength solution, prepared as described) is the same ratio as adding 10 µL IS solution to 20 µL of urine. Hence, all the concerns raised multiple times throughout the letter with respect to the IS blank concentration are void as it is identical with the IS sample concentration.
Point 2:
“The authors did not follow a calibration process to balance the sample concentration to the internal standard concentration, which is a required prerequisite to assure an adequate LCMS signal strength.”
Urine is a complex and very variable matrix. Even after normalization to creatinine, metabolite concentrations can vary by factors of 5 to 10, or even higher (for ranges see for example: Bouatra et al. The human urine metabolome. PLoS One. 2013 Sep 4;8(9):e73076.). As mentioned above, a dilute-and-shoot approach is widely used in urinary metabolomics. We refined the dilute-and-shoot approach by diluting the samples to similar creatinine concentrations because a uniform dilution would also dilute samples with low creatinine concentration. Therefore, samples with creatinine concentrations below 3 mM were not diluted, but only mixed with IS solution. We chose the urine dilution factors based on previous experience, the creatinine concentration in our samples and the overall signal intensities in the base peak chromatograms of urine samples. Diluting the urine to 1 – 2 mM creatinine still shows highly abundant, even saturated peaks in the base peak chromatograms (BPCs). In our original sample set of 244 urine specimens, creatinine ranged from 0.7 to 26.3 mM creatinine with a median value of 6.19 mM creatinine.
With this wide creatinine range, the selection of a sample for the calibration experiment is almost impossible. One can choose a pool sample, but individual samples will deviate from the pool. The recommended calibration experiment requires drying down different sample (extract) volumes. Since we did not intend to dry the urine samples, we did not perform a full calibration experiment, but chose the urine dilution factor as described above (Note: In the product information from Sigma Aldrich we find the following sentence “it is recommended that a “calibration” step is done ahead of time, i.e. test different amounts of standard prep with the same amount of Internal Standard to figure out how much to balance with the Internal Standard.” Recommended not mandatory).
However, to address the concern with regards to the calibration, we now performed a calibration experiment according to the protocol using the IS and LTRS dilutions from the protocol and 11 creatinine concentrations (0.25, 0.5, 1.25, 1.5, 2, 2.5, 3.75, 5, 7.5, 10, and 20) mM in triplicates. All samples (40 µL) were dried and reconstituted in 40 µL IS solution.
As result, a graph (see Figure 1) is obtained that indicates a creatinine concentration that “yields an overall mass spectral signal that is equal to the overall mass spectral signal of the IS. This is the amount of sample that will most accurately be measured using the IS in the future, i.e. well balanced by the standard 40 µL of IS.” (Ref: product information sheet from Sigma Aldrich). This is in Figure 1 the intersection between the lines of the normalized IS MSTUS marked by blue squares and the normalized C12 MSTUS values marked by green crosses and the line of the suppression corrected C12 values marked by red circles. In our case, this would be 6.5 mM creatinine! We did send the graph to IROA technologies to make sure, that we had interpreted indeed the graph correctly. We then repeated the original experiment using a subset of 26 urine specimens with original creatinine concentrations equal or greater than 6.5 mM creatinine. Aliquots of 40 µL of urine with a creatinine concentration of 6.5 mM (either pure or prediluted to 6.5 mM) were dried and reconstituted in IS (dissolved according to protocol in 1.2 mL) so that each sample contained a final urine concentration of 6.5 mM. The LTRS and IS blanks were also prepared according to protocol. Additionally, aliquots from the same 26 urine specimens were diluted to 2 mM creatinine to obtain concentrations comparable to our original data set, and similarly 40 µL were also dried and reconstituted in IS.
Figure 1. Calibration experiment results. Note that the X axis shows the creatinine concentrations multiplied by a factor of 100 merely for the software to recognise it, as it had troubles recognising decimals.
The data was analyzed with ClusterFinder4 (CF4). We also analyzed the same samples from our original samples with CF4 (see Table 1). Using a creatinine concentration of 6.5 mM as recommended by the calibration resulted in 342 IROA features (pairs) with 299 features containing no missing values. Interestingly, when we used a creatinine concentration of 2 mM, we obtained 323 IROA features, with 301 features containing no missing values. Clearly, the use of highly concentrated samples is not beneficial for the number of features detected. Granted, the number in our original data set was much lower with 190 IROA features, with 180 features containing no missing values. This may be attributed to the lower IS concentration.
“The C12 and C13 peaks for most compounds are optimally measured when they are closer to a 1:1 ratio.”
We also evaluated the 12C/13C ratios of the features. Only 9.6% of the features in the 6.5 mM creatinine samples had ratios between 0.8 and 1.2. Hence, only for a small subset of features the intended ratios are seen. The percentage in the old data set and in the 2 mM creatinine samples were lower, but not drastically. It is also worth to mention, that the 6.5 mM creatinine samples had higher proportions of features with ratios >2 (27.8%) or even 10 (7.5%). This would in turn indicate that the samples are too concentrated. One should also keep in mind that maintaining a creatinine concentration of 6.5 mM over all samples would require to dry down about 371 µL of urine with a creatinine concentration of 0.7 mM! Moreover, these overly concentrated samples will most likely cause a rapid contamination of the system and fast deterioration in instrument performance.
Most importantly, we also evaluated the quantitative performance by correlating the peak areas and 12C/13C ratios with quantitative data as performed in the publication. Data was analyzed with ClusterFinder4 and MZmine (see Table 2). The files from the original experiment corresponding to the 26 specimens were also evaluated as seen in Table 2.
While we extracted more features with CF4, which is a clear benefit, the overall quantitative performance did not significantly improve, when employing the protocol with dried urine samples and the suggested IS concentration. Only for arginine, a clearly improved correlation was seen for the ratios when using the samples adjusted to 6.5 mM creatinine and evaluated with CF4 i.e., where we exactly followed the suggested protocol (columns 6 and 7 of Table 2). Therefore, as we already stated in our original publication, the obtained improvement by employing the IROA kit in terms of quantitative performance is only moderate.
Table 2: Coefficients of determination. Missing entries correspond to cases where no bin(s) were present in the data.
“They did not attempt to acquire or use the suppression-corrected or post-injection normalization data. These two steps are specifically designed to reduce the impact of starting sample differences. Because of the failure to follow the TruQuant protocol they could not do either the suppression-correction nor postinjection normalization.”
The sentence above is a bit confusing as it is not possible to acquire suppression corrected data. Only after data acquisition, data will be corrected for ion suppression. By computing the ratio between C12 and C13-IS values data will be suppression corrected as we did. The next step to get from the obtained ratios the true areas that accurately reflect biological concentrations is a bit problematic as detailed below. These areas are influenced by both ion suppression for which we corrected and general ionization efficiencies, which may greatly vary across compounds, and which are unknown.
As the reader says the aim of post-injection normalization is to reduce for example starting sample differences, i.e., make data more comparable across samples. However, here we first compared for each individual sample the values obtained by an absolute quantitative method to those obtained using the IROA kit employing a correlation analysis. Therefore, a normalization to make data more comparable across samples is not required for this kind of analyses. Such a normalization is helpful for the subsequent analysis of differences between groups as we did with the PCA analyses shown in Figures 5 and 7 of our original contribution. Here, we used probabilistic quotient normalization (PQN) introduced by Dieterle et al. (Anal. Chem. 2006, 78, 13, 4281–4290) for all data sets, as it is a widely used method for human urine and can be applied to pure area data and the ratios obtained with the IROA-IS. In this context, please also see below the paragraph on data normalization.
Ion suppression corrected areas:
With respect to the computation of peak areas corrected for ion suppressed areas = x*C12/IROAIS with X being the least suppressed value of this analyte, one should keep in mind that the obtained areas are corrected for differences in ion suppression and the amount of internal standard present for this compound, but differences in general ionization efficiency of a compound, which may vary greatly across compounds, are not corrected for. Therefore, differences in corrected areas between different analytes may not truly reflect biological difference and should be treated with care.
The selection of the highest IS area (picked from the raw 13C data) in the software is a bit problematic. It is picked from one sample (not the average over all IS blank samples) that shows the highest 13C area for a particular feature. The highest area for the next feature can stem from a different IS blank sample.
Normalization:
The MSTUS (mass spectrometry total useful signal) approach is a variation of the common normalization to a constant sum (CS), as described in Craig et al., Anal. Chem. 2006, 78, 22622267. It attempts to limit the contributions of xenobiotics and artifacts to the normalization factor by including only a subset of signals for computation of the normalization factor. As IROA technologies states in its white paper “all the peaks we used in computing a normalization factor had minimum criteria to qualify: 1) both the C12 and C13 isotopic clusters have to be present in all samples, 2) they both have to be above a minimum peak area, and 3) and the ratio between the C13-IS and the C12 monoisotopic peaks has to be greater than 0.001.” (Ref: Integration of Standards for Ion Suppression Correction and QC in an Untargeted Metabolomics Workflow, DOI: 10.13140/RG.2.2.28112.74245 ). Next, the areas of all remaining signals will be corrected for ion suppression.
The computation of the MSTUS correction factor is described in an IROA poster as follows: “The calculation for the MSTUS normalization correction factors is: sumSCC12/sumSCIS where sumSCC12 is the total suppression corrected area of all considered C12 compounds, and sumSCIS is the total suppression corrected area of all considered IS compounds. The suppression corrected values are multiplied by these factors to normalize them all to the same base.” (Ref: Poster, Lorenzi et al. “Correction of Ion Suppression and Normalization for Improved Quantitative Rigor and Reproducibility Using IROA”, Metabolomics Society 2018).
Therefore, this is clearly an improved version of the common CS normalization. However, CS normalization in general has its limitations. Its inherent assumption is that across groups only a relatively small number of features is regulated in approximately equal shares up and down. As Craig et al. correctly state for urine: “For a series of spectra with highly similar internal peak ratios but differing in total intensity because of such dilution or concentration effects, CS normalization of each spectrum can be considered to approximate the relative concentration of species (i.e., as in solute). Importantly, this approximation will break down when large perturbations occur to intensities in some spectra (e.g., those from certain toxin-treated animals or the use of diuretic drugs, for example). This is easily seen because if some peak areas are increased and the total is normalized to a constant, others will appear to have decreased, and this effect in “closed” data sets has been noted previously.” (Ref. Craig et al., Anal. Chem. 2006, 78, 2262-2267). Due to the inherent limitations of CS normalization especially for urine, the normalization strategy suggested by the reader will be appropriate in many cases such as for many cell culture experiments but is of only limited use for urine. For a discussion on different normalization approaches see for example Wulff and Mitchell, Advances in Bioscience and Biotechnology, 2018, 9, 339-351.
Point 3 - 5:
“The Authors used an outdated version of ClusterFinder. ClusterFinder Version 3 did not have any of the optimizations needed to find and use the TruQuant data.”
One would expect that version 3 of a software is mature, in particular when the option to analyze TruQuant data is given. Moreover, despite CF4 being a better version, we still encountered so many errors, leading to several attempts until we managed to get results.
We downloaded our software from the Sigma Aldrich homepage in November 2019 and were not alerted to the newer version that shows a good performance improvement.
The first author of the paper was in contact with IROA at the end of 2019, but the updated CF4 software version never came up in the discussion. We did inform IROA that we did not detect as many metabolites in LTRS (diluted according to protocol) as they claimed. They did reply at first and we sent them a data file. They also ended up getting low find rates (despite seeing a lot of IROA peaks when inspecting the spectra). Suspecting issues with the conversion, they asked for the original (not converted) data file, which we also provided. After not hearing from IROA for three weeks, we contacted them again in January 2020 asking whether they had found out what seemed to be the problem, but our inquiry was left unanswered.
Nevertheless, we used CF4 to analyze our data and agree that CF4 performs better and yields more IROA pairs than our MZmine workflow.
“ClusterFinder finds and interprets the entire isotopic envelope for all isotopic balances, including natural abundance, U-13C 5%, U-13C 5%. Because of the nature of the labeling in these situations the full isotopic envelope needs to be summed to determine the actual quantity of material on either side.”….”The data used for the analysis was not generated by ClusterFinder but rather by MzMine2 and only data from the monoisotopic peaks was collected which would not have been sufficient for accurate measurements. It is hard to fathom the effect of this discrepancy, but it fundamentally means that every peak ratio was incorrect.”
There is no need to sum the isotope signals. The isotopic pattern of the internal standards will not change from sample to sample, only the signal intensity of the isotopic peaks will vary relative to each other due to ion suppression, etc. The isotopic pattern of the endogenous metabolite from the urine sample is also fixed. The monoisotopic peaks reflect the abundance of the endogenous metabolite and IS in the sample. Nevertheless, we compared the ratios of CF4 with our MZmine ratio and, as expected, they show excellent correlation (Table 2). The claim that the MZmine ratios are wrong is not true.
In addition, our very strict data curation (“For instance, the 12C peaks should be minimal in the IS blank, so if a considerable 12C peak (12C/13C > 20%) was detected there, the pair was excluded.”) is interpreted as a carryover by the reader. Under the incorrect assumption of the reader that the IS concentration in the IS blank is not the same as in the sample, the carryover is even inflated to 60%. This is not the case. The higher ratios seen for a few pairs are most likely caused by mismatched peaks. Blank water samples (of course the water was always purified by a PURELAB Plus system ELGA, LabWater, Celle, Germany) were inspected for carryover.
Dear editor, we are happy to include the new data generated according to the suggested protocol in an addendum that will show that a higher number of detected IROA pairs is achieved but it will not demonstrate a significant improvement in quantitative performance over our original protocol.
We hope that we address all concerns raised by the reader.
Sincerely,
Katja Dettmer-Wilde & Fadi Fadil