@ton I've had a few exchanges with IA over the "Save page now" URL and automated submission.

They actively endorse its use in this way.

I do susepct there's a prospect for abuse (I've seen rate limits / delays in submission where I've submitted many manually), but in general, "other people found this link of interest" is in fact a useful archival heuristic. It's used, for example, in deciding what YouTube content to archive (mentions to Twitter will trigger an archival). That's discussed on the IA blog.


@ton I've employed the method several times on extracts of dying / sunsetted Web services. Specifically Google+ and Joindiaspora.

In each case, it was possible to create an extract of my own content, but of course, the origin would die.

The extracts were both in JSON format. I've learned a fair bit of jq as a consequence, and could use that to pull out (or generate) the original URLs, and then submit those to the Wayback machine using a very simple and persistent bash script.

(Adding a brief delay to the process seems to help ensure that most content is in fact captured.)

The main limitation is that Archive.Org often only captures some of a page or is otherwise imperfect --- for G+, only some comments are included, for Diaspora*, NO coments were.

Archive.Today is another option, and tends to more accurately capture sites, but has no fully-automated process and tends to throw up CAPTCHAs on multiple repeat submissions. It is possible to bulk-generate URLs which start the submission process, and I've used that to the scale of several thousands of posts (over a few weeks of working at it). My submission rate was about 100--200/hour or so, sustained.

@ton There's also an Archive.Org verification API which lets you know if a specific URL has been captured. You might want to schedule a run of that a few hours / a day or so after running your initial request to confirm success, and re-try if the first attempt didn't take.

(I have notes on this I can pull up if you'd like, the information's on IA's website if you look for it though.)

Sign in to participate in the conversation

On the internet, everyone knows you're a cat — and that's totally okay.