Talbot, Preventing over-fitting in model selection via Bayesian regularisation of the hyper-parameters, Journal of Machine Learning Research, volume 8, pages 841-861, April 2007. ![]() Which demonstrates that over-fitting in model selection is a substantial problem in machine learning (and you can get severely biased performance estimates if you cut corners in model selection during performance evaluation) and Talbot, Over-fitting in model selection and subsequent selection bias in performance evaluation, Journal of Machine Learning Research, 2010. In my view, this appears not to be widely appreciated, but is a substantial pitfall in the application of machine learning methods, and is the main focus of my current research I have written two papers on the subject so far There is a definitive answer to this question which is "yes, it is certainly possible to overfit a cross-validation based model selection criterion and end up with a model that generalises poorly!". Obviously there is a large computational cost associated with doing many many iterations to squeeze every last bit of predictive power out of a model, so clearly this is something you would do if you've got the time/grunt to do the optimisation and every bit of performance improvement is valuable. I just don't hear any talk about this, so it makes me wonder if there is something I'm missing. I might find by re-sampling that the best number of trees to grow in a GBM for this data set is 647 with an interaction depth of 4, but how sure can I be that this will be true of new data (assuming the new population is identical to the training set)? With no reasonable value to 'shrink' to (or if you will, no informative prior information) re-sampling seems like the best we can do. What bothers me is the lack of regularisation. Maybe the question was badly asked, but perhaps the question itself represents a bad approach that people generally do not do? ![]() My question is, can you go too far here? People talk about doing grid searches as so forth, but why not simply treat this as an optimisation problem and drill down to the best possible set of parameters? I asked about some mechanics of this in this question, but it hasn't received much attention. This is generally done with some kind of re-sampling technique (bootstrap, CV etc) in order to fit the set of parameters that give the best generalisation error. Machine learning algorithms, such as SVM, GBM, Random Forest etc, generally have some free parameters that, beyond some rule of thumb guidance, need to be tuned to each data set. This question may well be too open-ended to get a definitive answer, but hopefully not.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |