Beyond preparing the report on the project for the European Union, I have spent the last month thinking about the necessary steps to clean the real-world study data. Data cleaning is a first step in a chain of tasks which need to be finished in order to preprocess the data for further analysis.
Cleaning the data will involve both, manual and algorithmic work both of which I will be doing simultaneously throughout the upcoming months. One very good example for this is partitioning the data such that we know which portion of the data was collected for which level of familiarity. While there are algorithmic approaches such as map matching to help in this, the quality of the location data is weak at times and, hence, manual annotation is needed.