The arXiv license is the “default” license under which most preprints are being submitted to the arXiv at least in my subject. Out of dark curiosity, I am wondering how safe it is at doing what it is meant to do, namely ensure that these preprints are widely and freely available through the reasonable future. Let me quote the license in full:

The URI is used to record the fact that the submitter granted the following license to on submission of an article:

I grant a perpetual, non-exclusive license to distribute this article.

I certify that I have the right to grant this license.

I understand that submissions cannot be completely removed once accepted.

I understand that reserves the right to reclassify or reject any submission.

I am wondering what this entails in any of the following hypothetical scenarios:

  • Cornell University decides to sell off for whatever reasons (which may be far lesser reasons than bankruptcy — e.g., someone might catch wind of the fact that a great many PDF files are not ADA-compliant; or publishers might unleash a barrage of lawsuits on Cornell for hosting what they believe are not quite preprints; or it is simply decided that continued hosting of the arXiv is too much of a cost center), and the new proprietors don’t see public access as a priority. Someone with a full dump uploads it on a server in the Ukraine.
    (Comparable cases: SSRN bought by Elsevier, although the full-dump analogy is broken here — I don’t know if anyone ended up re-hosting the papers taken down.)

  • The HTTP protocol and the WWW are superseded by something new and shiny, and the “.org” domain and the notion of a “server” lose their meaning; arXiv involves into a service which may have a hard time arguing that it is the same (“a highly-automated electronic archive and distribution server”) that the license was granted to. (Comparable cases: The precursor of was a mailing list; it is far from obvious that mailing out a preprint on an ephemeral medium like a mailing list grants any rights for future perpetual hosting on the internet. Now imagine the next step after the mailing list and the internet, whatever that may be; ignore the current social media hype, which is not a relevant development for hosting documents.)

  • Various countries block the official arXiv domains (or force arXiv to geo-fence them out), causing the creation of multiple not-quite-official mirrors, some even on the dark web (.onion) or otherwise hidden-from-view. How can these mirrors argue that the arXiv license was granted to them?

  • The arXiv team splits along a political fault-line, resulting in two different groups/servers/teams with claims to the arXiv name. Are they both allowed to host the papers?

  • The TeX and PDF formats lose their universal support, and new formats come up (or new versions, breaking backwards compatibility); the arXiv team can no longer keep up writing compatibility scripts, and volunteers end up fixing the papers and posting them on github. (The compatibility nightmare is already happening to some extent — the arXiv has its share of broken PDFs, and I recall even seeing a TeX that did not compile until I made a tweak. So far, most of the damage has been repaired, probably with a lot of manual drudgery, but the arXiv is getting more and more papers, and the next generation of formats to be deprecated will have a much higher amount of papers posted in it.)

What these scenarios all have in common is that, in a sense, the arXiv does not disappear — it just evolves, changes its skin, reincarnates, as times change. My question is: Does the license follow it, or will the “new arXiv” be in troubles trying to prove that it still has a right to host preprints uploaded under the (standard) arXiv license back in the early 2000s?

Leave a reply

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>