Power Query + Power Pivot + M. I don't use formulas in cells. The sheets are just a canvas for Pivot Tables, final tables, and charts connected to the data from Power Query and Pivot.
I deal with hundreds of API integrations involving various JSON, CSV, TSV, and XML files with mixed validity. My workflow: Notepad++ for a visual check -> Prototype everything in Excel. I give users a "visual", connect it to real data, and only then migrate the final logic to BI dashboards or databases.
Nothing else delivers results this fast. SQL, BI tools, and Python are too slow because they generally need "clean" data. Cleaning and validation take too much time there. In Excel, it's mostly just a few clicks.
PS: I spent 2 years (2022-2023) using LibreOffice Calc. I didn't touch Excel once, thinking I needed to break the habit. In the end, I did break the habit, but it was replaced by a pile of scripts and utilities because Calc couldn't do what I needed (or do it fast enough). The experience reminds me of testing Krita for 2 years (2018-2020) — I eventually returned to Adobe Photoshop (but that's another story).
PS2: About (Query + Pivot + BI). This allows you to process millions of rows (bypassing grid limitations). It also allows you to compress everything into an OLAP cube, taking up little space and working quickly with data.
Interesting. I'm not experienced in data cleaning. About Python vs Excel: Isn't manual cleanning of data in Excel prone to permanent error? Because:
- it's hard to version control/diff
- it's done by a human fat fingering spreadsheet cells
- it's not reproducible. Like if you need to redo the cleaning of all the dates, in a Python script you could just fix the data parsing part and rerun the script to parse source again. And you can easily control changes with git
In practice I think the speed tradeoff could be worth the ocasional mistake. But it would depend on the field I guess.