Formerly JulioFlores.com - random rambllings about web2py, python, Zope and a bit of C#
Home | web2py | ¿Quién Soy? | Contact Me | Language English |   

Last 10
Older Posts
External Links

[Back to the Homepage]

Added Feb 11 2009 , Modified Aug 30 2009 - 03:03 PM

This is an update from my post originally published Feb 14, 2009, it has been updated and refined a bit, hope you find it interesting.

Since I traded my venerable MacBook for a powerhorse phenom PC, and then continued to my current setup, a nice Dell inspiron 530 running Fedora 11 (don't ask :), I returned to my roots and could not live without Linux after a while.

Moving out of the OS X (at least for the role as my primary computer) platform, you soon realize that there were many things one took for granted, such as the easy of setup of wireless connections, updates to the operating system, and most importantly, a nice gem that came with the Tiger and subsequent updates: Time Machine

I was one more of the lazy drones google-ing "time machine for Linux" almost every day, many options do exist, actually, but many of them I doubt were created by actual mac users, don't get me wrong, most of them work, but I wanted one that was smart enough to not just "copy" files, but synchronize, perform incremental backups and not duplicate the same data across the backups.

Have you ever wondered in a Mac how come a 500GB external drive can contain several snapshots of your home directory, and yet still have, say, half of the available disk space free?

OS X makes use of its own hard links equivalent, a hard link is really just a reference to a file in your filesystem, for example, assume that I have a file called myfile.txt, the contents of the file amounts for 1Kb worth of data, now, let's make a copy of the file to some other folder, something like this: cp myfile.txt ./somefolder. At the end of the process you will have 2 files, eating 2Kb of disk space. Now, say that instead of copying the file, you create a hard link for the file: ln ./somefolder/myfile.txt, at the end of this process you will have 2 files, eating up only 1Kb of disk space.

Something like this will happen in my version of the "Time Machine for Linux" (Or as I'd like to call it, a poor's man Time Machine), the trick is to first create a normal (via a copy/rsync/ssh) backup to your external hard disk, and then from there, just creating your subsequent backups by updating only what has changed, anything else remains as a "hard copy". What I just described is exactly what time machine does, so be confident. This is exactly the same way Time Machine works in OS X (except for the nice GUI Time Machine provides of course).

Assumptions

  • You have an external backup drive mounted (for this example I'll use /media/LinuxTimeMachine)

Begin - Consider the source code below (explanation follows after)

  1. #!/bin/bash
  2. # Generates incremental backups of my home folder using rsync.
  3. # Note that since I use an external USB drive for my backup operation
  4. # ssh into another server is not required, however, adding support
  5. # for this should be easy enough.
  6. # Cron Suggestions:
  7. # If you are going to run this script, say every two hours a day (12 times per day)
  8. # and want to keep a month's worth of data, then MAX_BACKUPS should be in the (12x30)
  9. # range: 3600+ in my case I just want to have the last 25 backups regardless of when
  10. # I run the script
  11. # Change these variables below for your own purposes:
  12. MOUNTPOINT="/media/LinuxTimeMachine"
  13. BACKUP_DIR="$MOUNTPOINT/Backups.teroknor/julio"
  14. SOURCE_LOC="/home/julio"
  15. MAX_BACKUPS=25
  16. LOG_FILE="${SOURCE_LOC}/bin/rsync.log"
  17. EXCLUDE_FILES="$SOURCE_LOC/bin/excludes.rsync"
  18. RSYNC_OPTS="-aHvxog --delete --progress --log-file=$LOG_FILE --exclude-from=$EXCLUDE_FILES"
  19. # (Optional) - Check if my mountpoint is actually mounted:
  20. mountpoint -q $MOUNTPOINT || { echo $MOUNTPOINT is invalid or not a mount point ; exit 1; }
  21. # Also check if the backup directory exists:
  22. [ -d $BACKUP_DIR ] || { echo $BACKUP_DIR not found ; exit 1; }
  23. # Next is a very simple but efficient way to check if this is the first time
  24. # we make a backup, it relies on a softlink done in the backup folder containing
  25. # a link to the latest backup, note that even if you have a backup sysem already
  26. # running and REMOVE the soft link "current"
  27. # (/media/LinuxTimeMachine/Backups.teroknor/julio/latest)
  28. # on my original example) the system will treat the backup as the first one and
  29. # will copy the entire tree (slow) as opposed to only the changes (via hard links)
  30. if [ ! -L $BACKUP_DIR/latest ] ; then
  31. echo "Initial Backup, this may take some time..."
  32. rsync $RSYNC_OPTS $SOURCE_LOC/ $BACKUP_DIR/backup.0
  33. ln -s $BACKUP_DIR/backup.0 $BACKUP_DIR/latest
  34. else
  35. # This next segment will take care of the rotation, basically I'll have the following structure:
  36. # /media/Elements/Backups.teroknor/backup.0
  37. # /media/Elements/Backups.teroknor/backup.1
  38. # /media/Elements/Backups.teroknor/backup.2
  39. # ...
  40. # ...
  41. # /media/Elements/Backups.teroknor/backup.11
  42. # /media/Elements/Backups.teroknor/backup.12
  43. # /media/Elements/Backups.teroknor/latest (symlinked to backup.0 - Latest Backup)
  44. #
  45. # current backup
  46. cur_backup=`expr ${MAX_BACKUPS}`
  47. # remove oldest backup if it exists
  48. if [ -d $BACKUP_DIR/backup.$cur_backup ] ; then
  49. rm -fr $BACKUP_DIR/backup.$cur_backup
  50. fi;
  51. # Move each previous backup (i.e. backup.0 to backup.1, backup.11 to backup.12
  52. # all this in order to leave backup.0 ready for rsyncing the latest files..
  53. for i in `seq ${cur_backup} -1 0`;
  54. do
  55. # previous backup
  56. next_backup=`expr ${i} + 1`
  57. # move previous backup out of the way
  58. if [ -d ${BACKUP_DIR}/backup.${i} ] ; then
  59. mv $BACKUP_DIR/backup.${i} $BACKUP_DIR/backup.$next_backup
  60. fi;
  61. done
  62. rsync $RSYNC_OPTS --link-dest=$BACKUP_DIR/backup.1 $SOURCE_LOC/ $BACKUP_DIR/backup.0
  63. # Remove the current "latest" symlink, since it'll change right away
  64. rm -f $BACKUP_DIR/latest
  65. ln -s $BACKUP_DIR/backup.0 $BACKUP_DIR/latest
  66. fi;

Information

Lines 15-21 - Modify them to suit your needs, Line 20 can be excluded providing that you remove --exclude-from=$EXCLUDE_FILES from line 21.

Line 16 - Make sure you create this directory structure in your backup drive before running the program, it'll be empty originally, but it'll be the location for your backups. Feel free to change the /Backups.teroknor folder to your liking, and the last folder as well.

Lines 44-51 - This is the final structure that you will end up with in your external drive.

Conclusion

I now have a fully-functional personal backup system that is smart enough to not copy entire files, performs an incremental backup and by using rsync as the transfer program, it will only update those parts of individual files (that were modified) that actually needed to be copied. All this has been tested with CentOS 4.5 (My production server) and Fedora 11 (My desktop system), I hope you find it useful as it was for me.

The next undertaking is creating a UI for this script, I believe this can be accomplished using a web-based approach, thinking of using web2py to provide this.

Any takers?

c o m m e n t s    f o r
Time Machine for Linux.. kinda..
Added Aug 30 2009 , Modified Aug 30 2009 - 05:05 PM By rj..@gmail.com
This style of backup is something I both need and want. Since
I am less hard core than you, I'll wait (im)patiently for someone
to create a GUI using [I hope...] Web2py, a fine framework.

Thanks for creating this handy utility!

ron k jeffries
http://identi.ca/ronkjeffries
http://blogt.eronj.com
Added Aug 31 2009 , Modified Aug 31 2009 - 01:31 AM By JulioF
Thanks Ron for your post,

The script above does indeed work exactly as TimeMachine does. I personally use it daily as my personal backup in fact I run it automatically as a cron job daily @ 11:00 on, a web gui does not sound like a bad idea at all, I was thinking making some sort of UI using wxwidweta or something like that. Creating an interface in web2py might just be the way to go, good thinking.

JulioF

Add a Comment | Back to the Homepage

 
Proudly Powered by Python

TechFuel.net | Web Standards xhtml 1.1 and css 2.1 | Rel 14