Machine learning, bootstrapping, null models and why we are still not 100% sure which marks were made by crocodiles

You are here:

Home
Publications
Machine learning, bootstrapping, null models and why we are still not 100% sure which marks were made by crocodiles

Data science and open science are two of the more interesting developments in recent years that influence how research is conducted and disseminated. Data science generally draws on sophisticated, newly accessible methods of quantitative analysis as applied to large data sets, and this field is rapidly evolving. Open science represents a drive to make the scientific process, from experimental design and data collection to analysis and publication, more transparent and accessible (Wilkinson et al., 2016). Here, we argue for the interdependence of these two developments by exploring a paper recently published in the ongoing and often contentious debate over the interpretation of bone surface modifications. We show how an application of machine learning in this instance artificially inflated the success rate of classification (Domínguez-Rodrigo and Baquedano 2018) and obscured a far simpler explanation for the differentiation of marks based on their measurements. We do this by replicating their study, following the published descriptions of the methods. We simulated our own random and patterned data to generate expectations for the machine learning model provided by the study’s authors and analyzed the results. Aside from what our findings might mean for the interpretation of who or what made marks on bones, we use this example to highlight the increasing importance of the open science emphasis on methodological transparency as more sophisticated data science protocols are brought into paleoanthropology.