Last week I had a friend tell me he was doing a semi-targeted analysis by retrieving only peaks that he knew he could name. He still couldn't quantitate them but at least he thought he knew what they were. He was justifiably proud to be doing a higher grade of metabolomics work than he had when he first started. I was too.
At the current time, Metabolomics is generally divided into two camps, namely, Non-targeted (sometimes called unbiased) metabolomics, and Targeted metabolomics. The Targeted work is generally based on a quantitative analysis, with clinical style analytical measurements of compounds. While there is a range of techniques that are considered "Targeted Analyses”, in general, the term refers to measurements that are quantitative, of known compounds, with the implicit need to compare or normalize different samples against one another. On the other hand, Non-targeted metabolomics experiments represent the need to make metabolomics a true omics science in which one measures everything and tries to determine which, of all the peaks seen, have a possible statistical relationship with different experimental subgroups. Non-targeted metabolomics initially started out as biomarker discovery on steroids, pre-Bonferroni, and it was not unusual for an untargeted analysis to consider every one of the peaks that were seen in the mass spectrum. This sometimes meant including 30,000 to 40,000 peaks and was called the Complete Reporting Analytical Protocol.
The reality is that these two techniques represent Ideals that most researchers do not conform to. Most people, like my friend, do not do pure targeted or non-targeted work but rather work somewhere between these two goals. Instead they take shortcuts or make enhancements according to their abilities or needs to balance the quantity of data they can get while still guarding data quality. After all, generating the dataset is the easy part, maybe a day or two, while the analysis of the data may take months, so why waste your time with bad data.
In this post I am laying out a framework to think about the space between these two approaches, all of which logically carries the name Semi-targeted.
Below I am proposing what I believe is a reasonable framework for thinking about semi-targeted approaches. I am assuming that there are three main ways (error axes) that most researchers try to enhance their data:
1) The first is by correctly identifying the peaks to compounds,
2) The second is by normalizing the dataset against sample-to-sample variations, and
3) The third is by correcting for in source errors and inefficiencies.
If each semi-targeted technique is scored on each of these aspects on a zero to ten scale (where zero is no effect, and ten is a complete solution) then any Semi-targeted analysis protocol can be placed somewhere in this three-dimensional space. My friend, by using only peaks that he has positively identified would be somewhere along the identification axis. (He knows a lot of compounds so let’s say he can get a 6 out of 10 for ability to identify all his peaks well.) He has no ability to correct for source errors, and no normalization scheme in his normal workflow so his method rests at coordinates (0, 5, 0) in this space. If he introduced a sample-to-sample normalization routine based on some internal standards (giving him a reasonable way to normalize his samples and a 3 (of 10) score in normalization) then his protocol would be a semi-targeted protocol with a (3, 5, 0). If he measures used more internal standards and a standard sample to correct for source errors, the his protocol may give him a semi-targeted profile of (3, 5, 2).
This seems pretty straight-forward. There are a number of well-known techniques for correcting each of these error axes. If each of them could be assigned an “effectiveness” score from 0 to 10, then we could likely put all of the techniques that researchers use and develop a picture of where the world of semi-targeted analyses are today. It might surprise us. It might be fun, and it would surely lead to better metabolomics tomorrow.
I am very interested in these definitions. Please send me a quick note about whether your laboratory workflow is targeted, semi-targeted, or non-targeted metabolomics and what you think of this. Are there additional aspects in need of correction that you can think of?