logoalt Hacker News

vbezhenaryesterday at 4:45 PM0 repliesview on HN

An interesting thing is that most webpages are generated using text templates. There's some text processing like escaping special characters, but it's mostly text that happened to be (somewhat) valid HTML.

So extracting information from this text with regexps often makes perfect sense.