Clean your data
Earlier this month we sorted out your workspace. Today’s task is to clean your data. This may be a job you start today or plan to begin soon – and continue over a few days or weeks. Data cleaning always takes longer than you expect, especially if you’ve never cleaned your data before and have a backlog to work through.
In my view, keeping your data clean is an essential research skill we should do routinely. And yet we’re rarely taught how to do it, nor given any time or encouragement to build it into our work schedule.
What does data cleaning mean? It means checking through your datasets, records, photos, films, or transcripts to ensure everything is tidy and correct.
That may include:
- running frequencies on quantitative data to check there are no numbers there that shouldn’t be.
- checking through quantitative databases for missing data and ensuring this is coded correctly.
- reading through interview transcripts and noting any spelling errors (especially if you’ve used automated transcription tools).
- listening to interviews or focus groups and comparing the conversations to what’s on your transcript (ensuring pauses, laughter, or inaudible segments are noted).
- ensuring all participant details (addresses, names, dates etc) are correct, while being aware of confidentiality and anonymity issues.
- filing any paperwork in the right order (e.g. by date or participant number).
- ensuring all data you’ve collected and wish to store complies with the relevant data protection regulations for your country/state.
- if you don’t use a data management tool you might want to investigate some of these (you can find a list of recommendations under ‘Chapter Two’ of this link, scroll down to find them).
- if you are entering references manually you may want to learn how to use reference management software to keep things under control and less likely to be lost.
- ensure all documentation relating to your project is stored in the right place (on or offline).
- if you use photographs or images, check you are not storing duplicates and archive work you no longer need.
- if you work with film, ensuring edits and storage is organised rather than allowing a backlog of disorganised clips.
- sorting through your books and papers to make sure you know where everything is, plus assess what things are being used currently and which are your archive sources.
If you have a lot of data or are working as a team this may be something you work on together. This may be important if you are using colleagues to help translate your work or provide audio or text descriptions for you, and if you’ve generated a lot of digital data this past year through research and teaching, it would be good to ensure that is organised so you can find it easily should you need it.
As you clean your data remember to make backups of everything! It can help as you clean your data to keep notes to remind yourself what you did and when. This is crucial if more than one person is involved in the cleaning process – you don’t want to throw away something vital!
Keep a note if you run into any problems or questions. For example if in data cleaning you discover lost data or data you’re no longer sure about you can check with colleagues or supervisors about what’s going on or make corrections. As well as ensuring your data is tidy it helps you gain a better sense of what your work is about and where it is going.
If you found this difficult
As with the previous task of cleaning your workspace, tackling your data can be overwhelming if you have never done it before. You may realise you have a huge backlog of work to sort through. If this is the case then you may want to create a timetable where you break down work to be cleaned and set aside time to do that.
Some people also struggle if they fear cleaning the data equals losing data. Remember you’re not deleting for the sake of it. Instead you are ensuring data or materials you are working with and analysing are present and correct. It is absolutely fine to keep your uncleaned data somewhere if you are super anxious, but that you work on data that is spruced up. (Note also that data cleaning doesn’t mean deleting or removing findings that don’t suit your research angle, that’s not cleaning, that’s fraud!)
It’s also common to hold onto datasets or other materials that are out of date or part of projects you worked on years ago – either due to sentimental reasons or uncertainty of what you can and can’t keep. Taking advice from your employers, IT or HR, or professional organisations (for independent researchers) can help clarify what you can archive and what you can delete/shred.
Cleaning your data is like cleaning your home. It doesn’t mean throwing everything away. It means discarding rubbish, reorganising what you have, knowing where to find everything, and making sure your living space is comfortable.
Data cleaning is part of analysis and thinking of it as such makes it much easier to fit into your work routine. As with backups, cleaning can be done regularly so you know you’re keeping the right things safe.
Remember once you’ve cleaned your data to give yourself a reward for all your hard work!
And if you want a stepwise plan on how to keep your data tidy you can check Chapter Seven of The Research Companion.