logoalt Hacker News

lwhilast Tuesday at 1:43 PM1 replyview on HN

Which is a process of reverse engineering and guess work.


Replies

wizzwizz4last Tuesday at 2:23 PM

Unless the formats are clearly-documented, and not overcomplicated. The WordPerfect format is philosophically similar to RTF, except that it's easier to get a plain-text version. Quoth http://justsolve.archiveteam.org/wiki/WordPerfect:

> If you're a programmer attempting to get a program to extract the plain text out of a WordPerfect document, and are not interested in the fancy formatting and other features, this is a fairly simple process; just make the program skip the parts that are not text.

The "fancy formatting" is pretty easy to parse, too, as I understand (though I've never tried it): it's pretty much one-to-one with what's shown in the program's UI, which is literally designed to be easy to understand.

Formats like DOC (Microsoft Office's pre-DOCX format) and PSD (PhotoShop's horrid mess) require reverse-engineering, even given the (atrocious) documentation, because they're overcomplicated and the documentation is not complete. This is what I'm saying should be prohibited. We don't need to mandate that people use existing protocols or file formats.

show 1 reply