Saturday, February 9, 2013

The seven deadly sins of DNA Barcoding (6)

However, there is no a priori reason to assume that a universal threshold is applicable, as coalescent depths among species will vary considerably due to differences in population size, rate of mutation and time since speciation (Collins and Cruickshank, 2012).

Inappropriate use of fixed distance thresholds


Sin number six has indeed been extensively debated. Actually it has been debated to a point to which I  wonder where all the agitation comes from. It has been very rare that fixed values have been used in DNA Barcoding studies. The fascinating observation is that in a variety of cases a generic threshold actually works to some extend. Although it is correct to consider thresholds as arbitrary, and the occurrence of false-positive and false-negative errors is indeed very likely, the fact that there are fixed cut-off points for species groups in a number of cases is striking.

Everyone will agree that thresholds should be optimized based on the data at hand. There also shouldn't be fixed values such as the 2% that has been around for some years - no idea where this actually came from. The DNA Barcoding classic by Hebert et al. in 2003 discusses 3% for lepidoptera in general and the famous Astraptes fulgerator paper deals with similar values (and some lower ones). The one that I came across far more often is the "10x-rule" which actually represents a basic form of an optimized threshold generated from the data. The cut-off point is reached when the interspecific distance is ten times higher than the intraspecific distance. Not a perfect solution but again it works quite well as a first proxy.

The last year has seen the rise of more sophisticated software and protocols that calculate optimized thresholds or other parameters to determine group membership and Collins and Cruickshank did a good job in listing them all for everyone interested.

After all threshold values are as much in flux as species descriptions. They hold until the next data point is added. Theoretically they should be a good heuristic at the point when sampling is complete but how do we determine that?

But that's a different story.

No comments:

Post a Comment