Why You Need Standardized Cleaning Rules

P. Allison Minugh, Ph.D., Susan L. Janke, M.S., and Renee Saris-Baglama, Ph.D.

It may be surprising to find that there is little guidance available for how to effectively handle problematic data. Although professional guidelines cover general principles of data collection and data analysis, “researcher degrees of freedom” for how data are cleaned and edited are evident (Simmons, Nelson, & Simonsohn, 2011). Without scientific guidelines that clearly indicate what is and is not appropriate for editing messy data, many researchers and scientists proceed idiosyncratically. For all practical purposes, when presented with the same messy data, different scientists will “fix” the data problem in different ways yielding widely differing results (Leahey, Entwisle, & Einaudi, 2003).

For example, Lam and colleagues (2013) demonstrated that the data editing approach used for handling data inconsistencies in Global Youth Tobacco Survey data resulted in different estimates of smoking prevalence. As they aptly note, “accurate comparisons between two studies can be made only if the same approach in handling inconsistent data is used.” Documentation on data editing is required to reproduce and compare study results. This has particular implications for meta-analyses.

Understanding that different data cleaning approaches produce different results, Datacorp has worked to standardize data cleaning processes. These data cleaning procedures can be applied across a broad spectrum of data that are used for a variety of purposes.

Contact us at dataqualityassessments@mjdatacorp.com to learn how our technical experts can help you prevent, detect, and resolve data errors to improve your data.

References

Lam, E., Rolle, I., Shin, M., & Lee, K. A. (2013). Impact of data editing methods on estimates of smoking prevalence, global youth tobacco survey, 2007-2009. Preventing Chronic Disease, 10, E38.

Leahey, E., Entwisle, B., & Einaudi, P. (2003). Diversity in everyday research practice: The case of data editing. Sociological Methods and Research, 32, 63-89.

Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359-1366.