logoalt Hacker News

ArXiv Declares Independence from Cornell

333 pointsby bookstore-romeotoday at 4:24 AM90 commentsview on HN

Comments

frankling_today at 7:39 AM

The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

show 4 replies
swiftcodertoday at 9:29 AM

> raised concerns about the proposed $300,000 salary for arXiv’s new CEO, saying it seemed high

Is a mid-to-high engineering salary outlandish for a CEO of what is likely to be a fairly major non-profit? Even non-profits have to be somewhat competitive when it comes to salary, and the ideal candidate is likely someone who would be balancing this against a tenured position at a major university

show 3 replies
halpertertoday at 5:43 AM

Statement by arXiv: https://tech.cornell.edu/arxiv/

show 1 reply
psalminentoday at 5:54 AM

I might be missing something, but I still don't get the why. I don't see any "problem" that needs to be solved.

show 2 replies
vedantxntoday at 11:09 AM

we got this before gta 6

ACCount37today at 10:52 AM

Frankly, the only beef I have with arXiv as is: its insistence on blocking AI access.

I had to tell my AI to set up an MCP for "fetch while bypassing arXiv's rate limit" so that it doesn't burn 40k tokens looking for workarounds every time it wants to look at a paper and gets hit with a "sorry, meatbags only" wall.

Very annoying, given how relevant arXiv papers are for ML specifically, and how many of papers there are. Can't "human flesh search" through all of them to pick the relevant ones for your work, and they just had to insist on making it harder for AIs to do it too.

dataflowtoday at 5:32 AM

This sounds terrible. Of course there's a huge risk of it becoming made for-profit. It almost makes you wonder if the academic publishers are behind this push somehow.

Could they not have made it into some legal structure that puts universities at the top? Say, with a bunch of universities owning shares that comprise the entirety of the ownership of arXiv, but that would allow arXiv to independently raise funds?

show 1 reply
asimpleusecasetoday at 7:59 AM

I wonder if there are plans to licence the content for AI training

show 2 replies
Aerolfostoday at 8:31 AM

And they hired a LinkedIn business idiot to run the new organization - so the aim is for an infinite growth tech startup in terms of governance, despite the technical legal status of non-profit. It shows in the language they use in the announcement, too ("improved financial viability in the long run")

OpenAI shows exactly how well that works and what that kind of governance does to a company and to its support of science and the commons.

TL;DR, it's fucked.

bonoboTPtoday at 9:42 AM

I fear their Mozilla-ification and Wikipedia-ification. Scope creep, various outreach feel-good programs, ballooning costs, lost focus etc. And other types of enshittification.

Any change to the basic premise will be a negative step.

They should just be boring quiet unopininionated neutral background infrastructure.

show 2 replies
tornikeotoday at 5:43 AM

Now the question is, will arxiv wage a decade long bloody war with Cornell, using heavy infantry (PhD students), archers (reviewers) and field artillery (AI slop papers), or will the independence be mostly peaceful? Only time can tell.

show 1 reply
Peteragaintoday at 7:13 AM

.. and soon to be dependent on US military funding? Controlled by someone who has run-ins with universities? This'll end in tears.

Garleftoday at 7:36 AM

Maybe they should implement a graph based trust system:

You need your favourite academic gatekeeper (= thesis advisor) to vouch for you in order to be allowed to upload.

Then AI slop gets flagged and the shame spreads through the graph. And flaggings need to have evidence attached that can again be flagged.

show 6 replies
shevy-javatoday at 8:25 AM

"Recently arXiv’s growth has accelerated. Since 2022, it has expanded its staff to 27, in large part to deal with a 50% increase in submitted manuscripts."

I am wary of that. IMO the business model is damaged therein. You can say in 2022 we had 27; bankrupt in 2030.

OutOfHeretoday at 7:07 AM

With 300K for the CEO, its enshittification will commence imminently. It will now serve to maximize revenue. Just wait and watch while they issue a premium membership, payment requirements for authors, and other revenue generators to please their investors.

show 1 reply
adamnemecektoday at 5:24 AM

Good call, ArXiv seems like one of the most important institutions out there right now.

show 3 replies
bobokaytoptoday at 8:11 AM

[dead]

Ghengeauatoday at 9:12 AM

[dead]

unit149today at 5:52 AM

[dead]

eastern-suntoday at 9:50 AM

[dead]

tgtracingtoday at 5:59 AM

[dead]

davnicwiltoday at 6:26 AM

Very unrelated to the article, but I think 'arXiv' as a brand is bad, and really detrimental to what the institution aims to accomplish.

That is, it's not readily parseable, it really gives an insider term vibe - like this isn't for you if you don't already know what it means or how you should read or say it. It sort of reminds me of the overuse of latin and latinate terms generally in the old professions and, well, the academy.

Just always struck me as being somewhat at odds with the goal.

show 4 replies