Clean your data

Earlier this month we sorted out your workspace. Today’s task is to clean your data. This may be a job you start today and continue over a few days or weeks, especially if you’ve never cleaned your data before and have a backlog to work through.

In my view, keeping your data clean is an essential research skill we should do routinely. And yet we’re rarely taught how to do it, nor given any time or encouragement to build it into our work schedule.

What does data cleaning mean? It means checking through your datasets, records, photos, films, or transcripts to ensure everything is tidy and correct.

That may include:

  • running frequencies on quantitative data to check there are no numbers there that shouldn’t be
  • checking through quantitative databases for missing data and ensuring this is coded correctly
  • reading through interview transcripts and noting any spelling errors
  • listening to interviews or focus groups and comparing the conversations to what’s on your transcript (ensuring pauses, laughter, or inaudible segments are noted)
  • ensuring all participant details (addresses, names, dates etc) are correct, while being aware of confidentiality and anonymity issues
  • filing any paperwork in the right order (e.g. by date or participant number)
  • if your work comes under GDPR ensuring all data collected complies
  • if you don’t use a data management tool you might want to investigate some of these (you can find a list of recommendations under ‘Chapter Two’ of this link, scroll down to find them)
  • if you are entering references manually you may want to learn how to use reference management software to keep things under control and less likely to be lost
  • ensure all documentation relating to your project is stored in the right place (on or offline)
  • if you use photographs or images, check you are not storing duplicates and archive work you no longer need
  • if you work with film, ensuring edits and storage is organised rather than allowing a backlog of disorganised clips
  • sorting through your books and papers to make sure you know where everything is, plus assess what things are being used currently and which are your archive sources.

If you have a lot of data or are working as a team this may be something you work on together. This may be important if you are using colleagues to help translate your work or provide audio or text descriptions for you.

As you clean your data remember to make backups of everything! It can help as you clean your data to keep notes to remind yourself what you did and when. This is crucial if more than one person is involved in the cleaning process – you don’t want to throw away something vital!

As you clean your work keep a note if you run into any problems or questions. For example if in data cleaning you discover lost data or data you’re no longer sure about you can check with colleagues or supervisors about what’s going on or make corrections. As well as ensuring your data is tidy it helps you gain a better sense of what your work is about and where it is going.

If you found this difficult
As with the previous task of cleaning your workspace, tackling your data can be overwhelming if you have never done it before. You may realise you have a huge backlog of work to sort through. If this is the case then you may want to create a timetable where you break down work to be cleaned and set aside time to do that.

Some people also struggle if they fear cleaning the data equals losing data. Remember you’re not deleting for the sake of it. Instead you are ensuring data or materials you are working with and analysing are present and correct. It is absolutely fine to keep your uncleaned data somewhere if you are super anxious, but that you work on data that is spruced up. (Note also that data cleaning doesn’t mean deleting or removing findings that don’t suit your research angle, that’s not cleaning, that’s fraud!)

It’s also common to hold onto datasets or other materials that are out of date or part of projects you worked on years ago – either due to sentimental reasons or uncertainty of what you can and can’t keep. Taking advice from your employers, IT or HR, or professional organisations (for independent researchers) can help clarify what you can archive and what you can delete/shred.

Cleaning your data is like cleaning your home. It doesn’t mean throwing everything away. It means discarding rubbish, reorganising what you have, knowing where to find everything, and making sure your living space is comfortable.

Data cleaning is part of analysis and thinking of it as such makes it much easier to fit into your work routine. As with backups, cleaning can be done regularly so you know you’re keeping the right things safe.

Remember once you’ve cleaned your data to give yourself a reward for all your hard work!

Got data cleaning tips and hacks? Save them in the comments or share on the #ResearcherRenew hashtag on Twitter or Instagram.

And if you want a stepwise plan on how to keep your data tidy you can check Chapter Seven of The Research Companion.