What would be the good metrics then? Of course metrics are just indicators that can be interpreted incorrectly. Still, we have to measure something tangible. What would you propose? I am aware of limitations and would gladly use something better...
Some people mention Matthews correlation coefficients, Youden's J statistic, Cohen's kappa etc. but I haven't seen them in any Deep Learning paper so far and I bet they have large blindspots as well.
The problem with discussions like this is that they never provide systematic examples of how a portfolio of metrics or qualitative checking can be integrated into a modeling problem. There’s a lot of finger pointing at metrics and complacency about problems, but the solutions are super vague, like the sanctimonious passage in this article about hiring from under-indexed groups in tech companies and just listening to first-person accounts (which is probably a bad idea if you actually want to help).
Ultimately I agree with the underlying idea, but I think to be helpful you have to present case studies of reproducing research but with metric optimization swapped out for a holistic variety of metrics plus qualitative checking.
I recommend the books Bayesian Data Analysis by Gelman et al and Data Analysis Using Regression and Multilevel/Hierarchical Models by Gelman and Hill if you want to read good accounts of doing this in practice with real data sets.
There’s definitely room for a book like this that focuses on more domain specific models in NLP, computer vision and deep neural networks.
righto. if you're looking for input on what metric should be used, i'm afraid you'd have to pony up more information. target the least subjective and least manipulatable metrics possible. hopefully things you can also access and objectively measure on your own.
I think it's important to be careful with the vocabulary here (if only because of the site we're on).
There can be metrics that are targets which still remain good metrics. For example, in many machine learning competitions, the submissions optimize a known, given metric; but the test data is not known. Therefore, it is still a good metric.
Some people mention Matthews correlation coefficients, Youden's J statistic, Cohen's kappa etc. but I haven't seen them in any Deep Learning paper so far and I bet they have large blindspots as well.
reply