I am evaluating two models using a testing set. The models are tailored to return a prediction for each instance, only if there is enough evidence that it is highly accurate. On the contrary, if there is no evidence for a test instance, then the models will not return a prediction for such instance.

This means that both models will return output vectors of different sizes. For example:

  • Model 1 returns a prediction for 20% of the test set.
  • Model 2 returns a prediction for 60% of the test set.

How can I perform a t-test to compare the means of both approaches?

  • One of the solutions I have been thinking about is to compute the t-test only for the instances that both models managed to predict (overlap).

  • Another solution would be to return a random prediction when there is not enough evidence, but I find this a bit misleading due to the nature of the task (predict a geolocation).

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>