Webrecorder: web archiving service

Webrecorder provides an integrated Python-based platform for creating high-fidelity, verifiable, standards compliant web archives while browsing and sharing archived content.

Webrecorder.ioWebrecorder is an open source easy-to-use and secure web archiving service. Its aim is to create a high-fidelity, verifiable archival record of any web page, website, or multiple sites. This project was developed at Rhizome in order to make web archiving simple and accessible.

Webrecorder was written in Python programming language with the application of several open source frameworks and tools: Bottle, Cork, Beaker. It also uses pywb, warcprox, Redis and Nginx. Webrecorder can be run locally using Docker and Docker Compose. At the moment service is still in a beta prototype stage - can be deployed only by advanced users. Default storage option is the local file system, but Amazon S3 is also supported.

Webrecorder.io supplies an archive-as-you-browse service. User can visit webrecorder.io, insert preferred URL and browse website. Then user can view a list of browsed pages and download either page or website in a standard archival compressed file. Webrecorder records the HTTP request/response traffic as well as all images, stylesheets, scripts, etc.

This service captures all interactive content that is performed by user on the page. Recorded pages with their interactive content can be viewed any time later. So if there is a need to archive some dynamic web content, like social media, Webrecorder is a reliable archival method and offers interactive, contextual archives for dynamic web content. User just has to interact with web pages so that all content is loaded and all JavaScript is executed. Of course there are limitations, like form submission or large video files, but Webrecorder is still new and will be enhanced.

Visitors can use the service anonymously or create account with an online archive. In the latter case users have full control over their own archives. They can be made public, private or shared privately. Archived data is stored in compressed WARC files. WARC (Web ARChive) is an international standard that specifies a method of storing and representing web archival data. This format is widely used and is compatible with multiple other tools that work in the same field.

One of the main Webrecorder’s objectives is to provide security and privacy for users:

  • Each user’s recording session uses temporary cookies and its data can’t be accessed by any other (unauthorized) user.
  • Webrecorder.io does not offer hosting of the recording data. The only way to store it - is to download archive within half an hour after recording or replay session finishes.
  • After 30 minutes of inactivity all user data is removed. It does not matter whether it was downloaded or not. Such approach improves security and saves storage space.
  • There is Erase button that deletes browsed and archived data if there is such need.

People interested in web archiving or digital preservation can use recorded pages as a verifiable record, since records are digitally signed, their authenticity and time of archiving can be verified. Researchers or media organizations can use archived files for research or reference, for citation or as a backup in case of content removal.

Webrecorder offers an integrated platform that allows creating standards compliant web archives and on-demand archiving service through the browser. You can try it out at https://webrecorder.io/ or get more information at Webrecorder repository.

Connect with our experts Let's talk