logoalt Hacker News

hu3yesterday at 8:42 PM0 repliesview on HN

Interesting. I'm not experienced in data cleaning. About Python vs Excel: Isn't manual cleanning of data in Excel prone to permanent error? Because:

- it's hard to version control/diff

- it's done by a human fat fingering spreadsheet cells

- it's not reproducible. Like if you need to redo the cleaning of all the dates, in a Python script you could just fix the data parsing part and rerun the script to parse source again. And you can easily control changes with git

In practice I think the speed tradeoff could be worth the ocasional mistake. But it would depend on the field I guess.