David and Kris, with help from staff at the Internet Archive, put together a list of 13 problem areas already causing problems for Web preservation:
Database driven features
Complex/variable URI formats
Dynamically generated URIs
Rich, streamed media
Incremental display mechanisms
Multi-sourced, embedded content
Dynamic login, user-sensitive embeds
User agent adaptation
Exclusions (robots.txt, user-agent, …)
Exclusion by design
Server-side scripts, RPCs
Read more about this on David’s blog:
- Harvesting and Preserving the Future Web, by David Rosenthal, DSHR’s Blog (May 7, 2012).
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.