Formerly JulioFlores.com - random rambllings about web2py, python, Zope and a bit of C#
Home | web2py | ¿Quién Soy? | Contact Me | Language English |   

Last 10
Older Posts
External Links

Add a Comment

Please keep this blog clean, avoid inflamatory, vulgar or otherwise improper comments. Thanks!

Verify your humanity [97976] Please type in the number shown
Name or Email (OPTIONAL - Names or emails will not be used for any other purpose than contacting the sender)
Message

(Some safe-HTML code is allowed only)

Post Original:

This is an update from my post originally published Feb 14, 2009, it has been updated and refined a bit, hope you find it interesting.

Since I traded my venerable MacBook for a powerhorse phenom PC, and then continued to my current setup, a nice Dell inspiron 530 running Fedora 11 (don't ask :), I returned to my roots and could not live without Linux after a while.

Moving out of the OS X (at least for the role as my primary computer) platform, you soon realize that there were many things one took for granted, such as the easy of setup of wireless connections, updates to the operating system, and most importantly, a nice gem that came with the Tiger and subsequent updates: Time Machine

I was one more of the lazy drones google-ing "time machine for Linux" almost every day, many options do exist, actually, but many of them I doubt were created by actual mac users, don't get me wrong, most of them work, but I wanted one that was smart enough to not just "copy" files, but synchronize, perform incremental backups and not duplicate the same data across the backups.

Have you ever wondered in a Mac how come a 500GB external drive can contain several snapshots of your home directory, and yet still have, say, half of the available disk space free?

OS X makes use of its own hard links equivalent, a hard link is really just a reference to a file in your filesystem, for example, assume that I have a file called myfile.txt, the contents of the file amounts for 1Kb worth of data, now, let's make a copy of the file to some other folder, something like this: cp myfile.txt ./somefolder. At the end of the process you will have 2 files, eating 2Kb of disk space. Now, say that instead of copying the file, you create a hard link for the file: ln ./somefolder/myfile.txt, at the end of this process you will have 2 files, eating up only 1Kb of disk space.

Something like this will happen in my version of the "Time Machine for Linux" (Or as I'd like to call it, a poor's man Time Machine), the trick is to first create a normal (via a copy/rsync/ssh) backup to your external hard disk, and then from there, just creating your subsequent backups by updating only what has changed, anything else remains as a "hard copy". What I just described is exactly what time machine does, so be confident. This is exactly the same way Time Machine works in OS X (except for the nice GUI Time Machine provides of course).

Assumptions

  • You have an external backup drive mounted (for this example I'll use /media/LinuxTimeMachine)

Begin - Consider the source code below (explanation follows after)

  1. #!/bin/bash
  2. # Generates incremental backups of my home folder using rsync.
  3. # Note that since I use an external USB drive for my backup operation
  4. # ssh into another server is not required, however, adding support
  5. # for this should be easy enough.
  6. # Cron Suggestions:
  7. # If you are going to run this script, say every two hours a day (12 times per day)
  8. # and want to keep a month's worth of data, then MAX_BACKUPS should be in the (12x30)
  9. # range: 3600+ in my case I just want to have the last 25 backups regardless of when
  10. # I run the script
  11. # Change these variables below for your own purposes:
  12. MOUNTPOINT="/media/LinuxTimeMachine"
  13. BACKUP_DIR="$MOUNTPOINT/Backups.teroknor/julio"
  14. SOURCE_LOC="/home/julio"
  15. MAX_BACKUPS=25
  16. LOG_FILE="${SOURCE_LOC}/bin/rsync.log"
  17. EXCLUDE_FILES="$SOURCE_LOC/bin/excludes.rsync"
  18. RSYNC_OPTS="-aHvxog --delete --progress --log-file=$LOG_FILE --exclude-from=$EXCLUDE_FILES"
  19. # (Optional) - Check if my mountpoint is actually mounted:
  20. mountpoint -q $MOUNTPOINT || { echo $MOUNTPOINT is invalid or not a mount point ; exit 1; }
  21. # Also check if the backup directory exists:
  22. [ -d $BACKUP_DIR ] || { echo $BACKUP_DIR not found ; exit 1; }
  23. # Next is a very simple but efficient way to check if this is the first time
  24. # we make a backup, it relies on a softlink done in the backup folder containing
  25. # a link to the latest backup, note that even if you have a backup sysem already
  26. # running and REMOVE the soft link "current"
  27. # (/media/LinuxTimeMachine/Backups.teroknor/julio/latest)
  28. # on my original example) the system will treat the backup as the first one and
  29. # will copy the entire tree (slow) as opposed to only the changes (via hard links)
  30. if [ ! -L $BACKUP_DIR/latest ] ; then
  31. echo "Initial Backup, this may take some time..."
  32. rsync $RSYNC_OPTS $SOURCE_LOC/ $BACKUP_DIR/backup.0
  33. ln -s $BACKUP_DIR/backup.0 $BACKUP_DIR/latest
  34. else
  35. # This next segment will take care of the rotation, basically I'll have the following structure:
  36. # /media/Elements/Backups.teroknor/backup.0
  37. # /media/Elements/Backups.teroknor/backup.1
  38. # /media/Elements/Backups.teroknor/backup.2
  39. # ...
  40. # ...
  41. # /media/Elements/Backups.teroknor/backup.11
  42. # /media/Elements/Backups.teroknor/backup.12
  43. # /media/Elements/Backups.teroknor/latest (symlinked to backup.0 - Latest Backup)
  44. #
  45. # current backup
  46. cur_backup=`expr ${MAX_BACKUPS}`
  47. # remove oldest backup if it exists
  48. if [ -d $BACKUP_DIR/backup.$cur_backup ] ; then
  49. rm -fr $BACKUP_DIR/backup.$cur_backup
  50. fi;
  51. # Move each previous backup (i.e. backup.0 to backup.1, backup.11 to backup.12
  52. # all this in order to leave backup.0 ready for rsyncing the latest files..
  53. for i in `seq ${cur_backup} -1 0`;
  54. do
  55. # previous backup
  56. next_backup=`expr ${i} + 1`
  57. # move previous backup out of the way
  58. if [ -d ${BACKUP_DIR}/backup.${i} ] ; then
  59. mv $BACKUP_DIR/backup.${i} $BACKUP_DIR/backup.$next_backup
  60. fi;
  61. done
  62. rsync $RSYNC_OPTS --link-dest=$BACKUP_DIR/backup.1 $SOURCE_LOC/ $BACKUP_DIR/backup.0
  63. # Remove the current "latest" symlink, since it'll change right away
  64. rm -f $BACKUP_DIR/latest
  65. ln -s $BACKUP_DIR/backup.0 $BACKUP_DIR/latest
  66. fi;

Information

Lines 15-21 - Modify them to suit your needs, Line 20 can be excluded providing that you remove --exclude-from=$EXCLUDE_FILES from line 21.

Line 16 - Make sure you create this directory structure in your backup drive before running the program, it'll be empty originally, but it'll be the location for your backups. Feel free to change the /Backups.teroknor folder to your liking, and the last folder as well.

Lines 44-51 - This is the final structure that you will end up with in your external drive.

Conclusion

I now have a fully-functional personal backup system that is smart enough to not copy entire files, performs an incremental backup and by using rsync as the transfer program, it will only update those parts of individual files (that were modified) that actually needed to be copied. All this has been tested with CentOS 4.5 (My production server) and Fedora 11 (My desktop system), I hope you find it useful as it was for me.

The next undertaking is creating a UI for this script, I believe this can be accomplished using a web-based approach, thinking of using web2py to provide this.

Any takers?


 
Proudly Powered by Python

TechFuel.net | Web Standards xhtml 1.1 and css 2.1 | Rel 14