logoalt Hacker News

billconanyesterday at 4:42 PM5 repliesview on HN

I don't think HTML is the right approach. HTML is better than PDF, but it is still a format for displaying/rendering.

the actual paper content format should be separated from its rendering.

i.e. it should contain abstract, sections, equations, figures, citations etc. but it shouldn't have font sizes, layout etc.

the viewer platforms then should be able to style the content differently.


Replies

cluckindanyesterday at 5:37 PM

HTML alone is in fact not a format for displaying/rendering. Done properly, it is a structural representation of the content. (This is often called ”semantic HTML”.)

They are converting to HTML to make the content more accessible. Accessibility in this context means a11y, in effect ”more accessible” equates to ”more compatible with screen readers”.

While PDF documents can be made accessible, it is way easier to do it in HTML, where browsers build an actual AOM (accessibility object model) tree and expose it to screen readers.

>it should contain abstract, sections, equations, figures, citations etc.

So <article>, <section>, <math>, <figure>, <cite>, etc.

show 2 replies
m-schuetzyesterday at 6:31 PM

That's a purist stance that's never going to work out in praxtice. Authors will always want to adjust the presentation of content, and html might be even better suited for that than Latex, which as bad at both.

dimalyesterday at 4:53 PM

Perfect is the enemy of good. HTML is good enough. Let’s get this done.

And as another commenter has pointed out, HTML does exactly what you ask for. If it’s done correctly, it doesn’t contain font sizes or layout. Users can style HTML differently with custom CSS.

show 1 reply
bob1029yesterday at 5:08 PM

> HTML is better than PDF

I disagree. PDF is the most desirable format for printed media and its analogues. Any time I plan to seriously entertain a paper from Arxiv, I print it out first. I prefer to have the author's original intent in hand. Arbitrary page breaks and layout shifts that are a result of my specific hardware/software configuration are not desirable to me in this context of use.

show 2 replies
afavouryesterday at 4:42 PM

Wouldn’t that be CSS?

show 1 reply