I know one approach from over 10 years ago was to have your editable site on a issolated server that would periodicaly copy its content to the hosting site. This meant that any changes to the hosted site would get stamped over by the master copy.
Still nothing is perfect and this is a good read about a issue alot of people are not aware of.
What's frustrating to me -- as someone who's getting closer and closer to publicly launching a shared hosting service -- is that this problem should be solved already: we have twice-daily backups for our websites, going back up to a year. It's all automated, and customers can access the backups directly. If your site's been hacked, you can log in to the backup server, view the changes for your site's directory, and download the most recent good copy.
For most sites that store their templates as regular files and their contents in a database, this is plenty good enough. For sites that store their content as regular files too, it only takes a few extra minutes to separate the good stuff from the bad stuff.
This is super easy to implement. Every web host should be doing it.
I would urge some caution with regards to backups. If there easily accesable from the site then if your site is hacked then they are easily accesable to the hacker. It is a common overlooked area and can also be a weakness that I have seen in many hosted/colo sites were they have all the servers isolated and yet still linked via some all singing all dancing backup server.
Yeah, that's something I considered pretty carefully. The backup server is completely isolated from the rest of the network, it pulls the backups via ssh/rsync using a special user account that only has sudo permissions for the rsync command (and can only authenticate via ssh certificate). The only way to break the backup server from a compromised server would be to replace OpenSSH on the compromised server and then wait for the backup server to connect -- and then try to somehow break rsync.
I agree with Zenst below that it's a separate (though not less important) issue. The backups are there to get the customer back on their feet as quickly as possible, and to protect their data. A surprising number of our clients have had problems with websites related to not having any backups or local copies ... web-based CMSs have made this a remarkably easy trap to fall into.
The nature of the compromise is something we'd be interested in, and hopefully something that they'd bring to our attention. I'd like to be able to have our log monitoring software watch for attempts at common exploits and automatically block them, but it doesn't do that yet. Which is one reason why it's still not ready for launch yet. :-)
Content and software are seperate. Were talking about content and not the software. Sure you fix the flaw in the software that allowed the expliot or the weak password or however the site was taken over, but it is a seprate issue.
They are separate in the PHP software that gets targeted by "the vast majority of hacks". Those hacks are against popular CMS packages that can be scanned for and exploited in an automated fashion. In a CMS, the software being exploited and the content are separate, in PHP as every other language.
The PHP files where the content and the system are one and the same (hand written pages not using a packaged CMS) aren't part of "the vast majority of hacks" category. Compared to exploiting a WordPress vulnerability in 50+ million installs, someone trying to mess with the black box that is someone's custom written page happens insignificantly rarely. Your retort doesn't hold water.
> The PHP files where the content and the system are one and the same (hand written pages not using a packaged CMS) aren't part of "the vast majority of hacks" category.
Speaking from experience, this is simply not true. There are automated scanners in the wild which will attempt to detect and exploit common vulnerabilities in simple PHP templating systems and CMSes. One frequently exploited vulnerability is in applications which use URLs of the form:
index.php?page=foobar
With supporting code along the lines of:
$page = $_GET["page"]; /* if register_globals isn't set */
include("pages/$page.html");
Until relatively recently, when PHP started rejecting filenames with embedded null bytes, code like this was vulnerable to input such as:
This completely falls over for popular packages like Joomla, unfortunately, which have miserably bad caching systems and file upload mechanisms and web-based upgrade functions and module installers and the like.
Still nothing is perfect and this is a good read about a issue alot of people are not aware of.