There are modern VLIW architectures. I think Groq uses one. The lessons on what works and what doesn't are worth learning from history.
A more everyday example is the Hexagon DSP ISA in Qualcomm chips. Four-wide VLIW + SMT.
The new TI C2000 F29 series of microcontrollers are VLIW
I meant narrowly only about IA64. There is sure some lessons learned value.
IA64 was EPIC, which, itself, was a "lessons learned" VLIW design, in that it had things like stop bits to explicitly demarcate dependency boundaries so instructions from multiple words could be combined on future hardware with more parallelism, and speculative execution and loads, which, well, see the article on how the speculative loads were a mixed blessing.
https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...
VLIW works for workloads where the compiler can somewhat accurately predict what will be resident in cache. It’s used everywhere in DSP, was common in GPU for awhile, and is present in lots of niche accelerators. It’s a dead end for situations where cache residency is not predictable, like any kind of multitenant general purpose workload.