2 Comments

Clear and practical advice - as always!

Could you please explain the reasons for not using 9, 99 etc. to represent missing values?

I remember reading somewhere that it is best not to leave any blanks. There’s no way of knowing whether a cell was left blank intentionally (missing data) or accidentally (data entry error). Maybe your recommendation to have a column to record the reason for the missing value will solve this problem.

Expand full comment
author

Thanks for asking. So the main reason I avoid it is because from time to time people include 999 (e.g.) in a column where a valid measurement = 999 could actually exist. I do take the point about not leaving a blank because it could be ambiguous, but that ambiguity, given there are multiple ways for data to be missing, would still exist unless you use multiple indicators. As you say, I think this is "solved" by just using a second column. Finally, having a mix of characters (to reflect a missing value) and numbers in the same column can result in a column classed as character data when it's imported into the analysis software - this is admittedly a very minor inconvenience though.

Expand full comment