A backup and restore component

This article shows how to develop a backup component using Delphi 2009 and the zlib compression algorithm. First, some theory of data compression is reviewed and, second, a compression component is presented and developed. A test program is provided with the component.

If you download many programs and files off the Internet, you've probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users because it lets you reduce the overall number of bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on disk. Here, we will develop a component that compresses files using the zlib compression algorithm.

A short review of the theory

Most types of computer files are fairly redundant -- they have the same information listed over and over again. File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program.

There are two main types of data compression:

The popular WinZip program is an example of lossless compression. JPEG is an example of lossy compression. Lossy compression is not discussed in this article.

In 1977 and 1978, Ziv and Lempel[Ziv77][Ziv78] introduced the LZ class of adaptive dictionary data compression techniques that have become so very popular today. In competition with these techniques are the Markov algorithms that give better compression but run much more slowly. These two methods are compared in an article by Ross Williams (1991).

Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. "LZ" refers to Lempel and Ziv, the algorithm's creators, and "dictionary" refers to the method of cataloging pieces of data. The system for arranging dictionaries varies, but it could be as simple as a numbered list.

Zlib Data Compression Library

The Zlib data-compression library is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system. The zlib compression method is a variant of the LZ77 compression method and its data format is itself portable across platforms.

Zlib was written by Jean-Loup Gailly (compression) and Mark Adler (decompression) . Jean-Loup is also the primary author/maintainer of gzip, the author of the compression FAQ list and the former maintainer of Info-ZIP's Zip; Mark is also the author of gzip's and UnZip's main decompression routines and was the original author of Zip. Not surprisingly, the compression algorithm used in zlib is essentially the same as that in gzip and Zip, namely, the "deflate" method that originated in PKWARE's PKZIP 2.x.

Using zlib with Delphi

The Delphi unit containing the zlib declarations and functions is zlibex.pas. (specifically the Delphi zlib 1.2.3.2009, the zlib version 1.2.3 for Delphi 5, 6, 7, 8, 2005, 2006, 2007 and 2009). It defines three classes:

In practice, if ZStream is defined as a TZCompressionStream, the statement ZStream.CopyFrom(AStream, 0) compresses the content of AStream into ZStream and vice versa if ZStream is declared as a TZDecompressionStream.

Additionally, the zlib compression algorithm uses CRC (an acronym for "Cyclic Redundancy Check"), a non-secure hash function designed to detect accidental changes to raw computer data. A CRC-enabled algorithm calculates a short, fixed-length binary sequence, known as the CRC code or just CRC, for each block of data and sends or stores them both together. When a block is read, the calculation is repeated; if the new CRC does not match the one calculated earlier, then the block contains a data error.

The core component

The component is defined as a descendant of TComponent and its compression algorithm is the zlib compression library. Its objective is to take a sequence of files and compress each of them into a single archive file as is done by most backup programs. This archive file is comprised of a sequence of chunks that contains the compressed version of the file, each corresponding to a file of the list. This sequence is followed by a registry that contains a list of file records, one for each file in the list. This data record contains specifics of the chunk such as its position in the stream, its length, the date of the input file, and so on. The archive file is terminated by a termination record that gives the position of the registry in the stream.

With the component, it is possible to take a list of files and compress them into an archive file, and vice versa, restore the files from the archive file. We will now find out how to perform these tasks with the component.

How to backup?

There are three public methods that initiate the backup:

function TGtroBackup.BackupFromPattern(const Pattern: string; 
  Target: string; BackupMode: TBackupMode): Boolean;
function TGtroBackup.BackupFromPatterns(const Patterns: Tstrings; 
  Target: string; BackupMode: TBackupMode): Boolean;
function TGtroBackup.Backup(const FileList: TStrings; 
	ATarget: string; BackupMode: TBackupMode): Boolean;

All three methods require three arguments. Target is the name of the archive file (with path) and BackupMode is either complete (bmAll) or synchronized (bmNewer).

Inputting patterns

For BackupFromPattern() and BackupFromPatterns, the first argument is a pattern or a list of patterns. A pattern is simply a path to a directory of the shell followed by an optional suffix "/s" that implies that all files in all the subdirectories will be included in the enumeration.

When patterns are used as input to these two methods, an algotithm that visits the directory and its subdirectories (if the suffix "/s" is included) recursively. It uses the FindFirst(), FindNext() and FindClose() functions declared in sysutils.pas to enumerate the files that this or these directories contain. The file list thus produced is then presented as the first argument of the Backup() method.

Inputting a list of files

Once a list of the files to be backed up has been produced, the Backup() method is used as shown in Figure 1 below. The second argument of the Backup() method is called Target. It is the name and path of the file that will become the archive. The third argument is the backup mode: it can be Complete (bmAll) or Synchronized (bmNewer).

functionTGtroBackup.Backup(const FileList: TStrings; ATarget: string; BackupMode: TBackupMode): Boolean;
begin
  ...
  Success:= BackupFiles(FileList, BackupMode);
  DeleteFlag:= True;
  if Success then // BackupFiles successfull?
  begin
    Result:= Success;
    if FileExists(fTarget) then // if archive already exists
    begin
      DeleteFlag:= DeleteFile(fTarget); // delete it
      ...
    end; // if FileExists(fTarget)
    if DeleteFlag then // existing archive was deleted successfully
    begin
      if not RenameFile(TmpFileName, fTarget) then // TempFileName contains the new archive; rename it
        ShowMessage(strCannotchange + TmpFileName + ' en ' + fTarget);
      ...
      DeleteFile(TmpFileName); // delete the temporary archive
      CloseArchive;
    end // if DeleteFlag
    else
      Result:= False;
  end
  else
  begin
    if Assigned(FOnFileList) then FOnFileList(Self, True, strFailure);
    Result:= False;
  end;
end;

The private method BackupFiles(), highlighted in yellow above, is at the center of the backup process:

function TGtroBackup.BackupFiles(FileList: TStrings; Mode: TBackupMode): Boolean;
var
  TempStream: TStream;
begin
  ...
  TmpFileName:= GetTmpFileName; // Get the name of a temp file in "temp" directory
  fArchiveStream:= TFileStream.Create(TmpFileName, fmCreate); // Create Output stream
  ... // code used by the synchronized backup only removed from listing  
  // Now compress files and archive them --- fill output stream with new chunks
  fNumFiles:= FileList.Count;
  if Assigned(FOnFileList) then
    FOnFileList(Self, False, strCompressingFileList  + IntToStr(FileList.Count) + ')');
  TriggerProgressEvent(0, '');
  fTotalSize:= FileList.Count;
  fArchiveStream:= AddFiles(FileList, fArchiveStream);
  if Assigned(FOnFileList) then FOnFileList(Self, True, strSavingRegistry);
  Result:= fRegistry.SaveToStream(fArchiveStream, fCompressionLevel);
  fArchiveStream.Free; // ???
end;

Now it is time to consider the differences in the process due to the selection of the Backup Mode. It can be either Complete or Synchronized.

Complete backup

This is the simplest way to backup files into the archive. It takes all the files of the list of files and compresses them all into the archive file selected by the user. The process is as follows:

  1. A temporary file stream is created;
  2. Each file of the list is compressed into a chunk and the chunks are stored sequentially in the temporary stream. This is performed by the AddFiles() method that follows:
  3. function TGtroBackup.AddFiles(FileList: TStrings; const InStream: TStream): TStream;
    var
      FileInfo: pZLBFileRec; SuccessFlag: Boolean;
    begin
      while FileList.Count > 0 do
      begin // for each file in FileList
        if FileExists(FileList[0]) then
        begin
          FileInfo:= AllocMem(SizeOf(TZLBFileRec));
          SuccessFlag:= CompressFileToStream(FileList[0], InStream, FileInfo);
          fRunningTotal:= fTotalSize - FileList.Count;
          TriggerProgressEvent(fRunningTotal, FileList[0]);
          if SuccessFlag then
            fRegistry.Add(FileInfo); // File information added to registry
          FileList.Delete(0); // remove the file from FileList
        end; // if FileExists
      end; // while FileList.Count
      Result:= InStream;
    end;

    specifically by the CompressToStream() method:

    function TGtroBackup.CompressFileToStream(const FileName: string;
      OutStream: TStream; FileInfo: pZLBFileRec): Boolean;
    var
      InFileStream, TmpStream: TStream; ZStream: TZCompressionStream;
      MemStreamFlag: Boolean; TmpFile: string;
    begin
     Result:= False;
      try
       InFileStream:= TFileStream.Create(FileName, fmOpenRead + fmShareDenyWrite);
        try
          ...
          if InFileStream.Size <= (20 * (1024*1024)) then
          begin // Use memory or file stream?  modified 21 Apr 2006
            TmpStream:= TMemoryStream.Create;
            MemStreamFlag:= true;
            TmpFile:= '';
          end
          else
          begin
            TmpFile:= GetTmpFileName;
            TmpStream:= TFileStream.Create(TmpFile, fmCreate);
            MemStreamFlag:= False;
          end;
          try // Compress data using TZCompressionStream
            ZStream:= TZCompressionStream.Create(TmpStream, fCompressionLevel);
            try
              ...
              ZStream.CopyFrom(InFileStream, 0); // File is compressed here
            finally
              ZStream.Free;
            end; // try...finally (ZStream)
            ... // Fill the FileInfo record
    
            if TmpStream.Size >= InFileStream.Size then
            begin  // do not compress
              ...
              OutStream.CopyFrom(InfileStream, 0); // Copy uncompressed data to OutFile
            end // if TmpStrm
            else
            begin // compress
              ...
              OutStream.CopyFrom(TmpStream, 0); // Copy compressed data to Outfile
            end; // else
          finally
            TmpStream.Free;
            ...
          end; // try...finally (TmpStrm)
        finally
          InFileStream.Free;
          ...
          Result:= True;
        end; // try...finally (InFile)
      except // handles exceptions when the file is in use or locked by Windows
        Result:= False;
        ShowMessage('File ' + FileName + ' could not be opened and was skipped');
      end;
    end;
  4. For each file, a file information record is appended to the Registry by the method fRegistry.Add(). Once all files have been compressed and their file information record appended to the Registry, the Registry is appended to the stream and the stream is terminated by the archive record. If it exists, the original archive file is deleted and the temporary stream identified to the new archive file.

Important!
If the file cannot be opened because it is in use or locked by Windows, the file is skipped and the user notified of the event.

Synchronized backup

The objective of this backup mode is to backup only those files that have been created or modified since the last backup. As shown in the listing hereunder, the archive specified by the user is opened (it must exist) as a file stream and a registry object is created that contains all the file information records of the files in the archive.

if (Mode <> bmAll) and FileExists(fTarget) then // Synchronized backup
  begin
    TempStream:= OpenArchive(fTarget, True); // the stream created must not be freed
    if TempStream <> nil then // if the temporary file exists
    begin
      FilterFileList(FileList); // Keep only the files created or modified; mark modified for deletion
      fArchiveStream:= DeleteFiles(TempStream);
    end;
  end; // if (Mode <> bmAll)

The FilterFileList() method is called to filter out of the list of files all those files that are not going to be backed up. At the same time, the files that have been modified are marked for deletion from the archive (their registry record is checked). The DeleteFiles() method only performs garbage collection: it copies all the chunks and unchecked registry entries in a new stream from which all references to modified files have been deleted. At this point, the process reverts to the scheme used for the complete backup.

Structure of the archive file

I have made a lot of reference to the archive file. It has the structure shown in Figure 2. It is comprised of a sequence of chunks (a sequence of bytes correspondiing to the compressed file, one for each file), a registry containing the file information (one recors per file) and an Archive record terminating the file and containing global file attributes.

Archive
Structure of the archive file produced by the component

Each chunk contains the result of compressing each file of the backup. Corresponding to each chunk, there in a file information entry in the Registry that characterizes the information pertaining to each file: its name, its path, its age, its original size, its compressed size, the start point of its chunk in the archive file, its CRC, its status (compressd if it was, stored otherwise), its compression level and the Checked field. Finally, the archive record contains the number of files in the archive, the size of the Registry, the total size of the chunks and an indication of the compression of the Registry.

How to restore?

In order to restore, the archive file must first be opened as a stream. During the process the Registry is extracted and the remaining of the stream freed. At this point, the Registry becomes available for the calling program to display it and let the user select which files he wants to restore.

Restore
Flowchart of the restore/copy process

One way for the calling application to display the content of the registry is to use a TListView with the properties ViewStyle and CheckBoxes set to vsReport to vsReport and true respectively. This following snippet can be used for this display:


  ListView.Clear;
  ListView.Items.BeginUpdate;
  try
    RegList:= TList(Node.Data);
   for i:= 0 to RegList.Count - 1 do
   begin
     Reg:= pZLBFileRec(RegList.Items[i]);
     FA:= FileDateToDateTime(Reg^.Age);
     LVItem := ListView.Items.Add;
     LVItem.Caption:= Reg^.Name;
     LVItem.SubItems.Add(FormatDateTime('dd-mm-yy hh:nn', FA));
     LVItem.SubItems.Add(IntToStr(Reg^.start));
     LVItem.SubItems.Add(IntToStr(Reg^.oSize));
     LVItem.SubItems.Add(IntToStr(Reg^.cSize));
     LVItem.SubItems.Add(FileStatusStr[Reg^.Status]);
     LVItem.SubItems.Add(FileCompStr[Reg^.CompressionLevel]);
     LVItem.SubItems.Add(IntToHex(Reg^.CRC, 8));
     LVItem.Checked:= Reg^.Checked;
     LVItem.Data:= Pointer(Reg);
   end; // for i
  finally
    ListView.Items.EndUpdate;
  end; // try...finally}

This display let the user choose which files are to be restored by checking the appropriate checkboxes on the display. Once this selection has been made, the calling application calls the Restore() public method of the component.

function TGtroBackup.Restore(ArchiveFileName, NewPath: string; RestoreMode: TRestoreMode): Boolean;
var
  i: Integer; Item: pZLBFileRec; NumRestored: Integer; fNewPath: string;
begin
  ...
  for i:= 0 to Registry.Count - 1 do // scan the registry
  begin
    Item:= Registry.Items[i];
    if Item.Checked {or not RegOpen} then // if the item is checked
    begin // if RegOpen is false, all files are restored
      Inc(NumRestored);
      if Assigned(FOnReport) then FOnReport(Self, True, strExtractingFile +
       Item^.Path + Item^.Name + ' [' + IntToStr(Item^.oSize) + ']');
      Extract(i, NewPath, RestoreMode); // extract it
    end; // if Item.Checked or not RegOpen
  end; // for i:= 0 to Registry.Count - 1
  ...
  Result:= True;
end;

Restore() requires three arguments:

  1. ArchiveFile - the name of the archive file from which the user selects files to restore;
  2. CopyPath - the path where the restored files are copied. If set to an empty string, the files are restored in their original location; if not, files are copied in a directory selected by the user;
  3. RestoreMode - a TRestoreMode type used to define how the files will be restored.

Highlighted in yellow, above, the Extract() method is called to fetch the chunk corresponding to the file information of the registry and decompress it.

procedure TGtroBackup.Extract(Index: Integer; ATarget: string);
// Extracts file[i] from the archive, decompress and restore it to ATarget
var
  OutFile: TFileStream; FileInfo: pZLBFileRec; Target: string;
begin
  ...
  if Index <> -1 then
  begin
    FileInfo:= Registry.Items[Index]; // get the file information of the file from the registry
    if ATarget <> '' then 
      Target:= FileInfo^.Path + FileInfo^.Name // copy mode
    else Target:= FileInfo^.Path + FileInfo^.Name; // restore mode
    if DoRestore(Target, FileInfo) then
    begin
      ForceDirectories(ExtractFilePath(Target));
      OutFile:= TFileStream.Create(Target, fmCreate);
      try
        ExtractFileFromStream(Index, OutFile);
        FileSetDate(OutFile.Handle, FileInfo^.Age);
      finally
        OutFile.Free;
      end; // try...finally
    end; // if DoRestore
  end; // if Index
  fProgressIndex:= 0;
end;

Two methods are called in this method: DoRestore() and ExtractFileFromStream(). DoRestore() simply applies the RestoreMode selected by the user whereas ExtrackFileFromStream() locates the chunk associated with the file to be restored, decompresses it (if it was) and restore it in its original location.

How to copy?

Copying files is a bit more complex than restoring them. During the restoration process, the Registry already contain the path and the name of the files and the files are restored in this original position in the shell. Copying them requires that these coordinates be modified. First, a common path and the location where the files are to be copied need to be determined.

FindCommonPath()

The method FindCommonPath() calculated the path that is common to all the files that are selected for the copy operation. Its code is as follows:

function TRegistry.FindCommonPath: string;
// used to find the common path of checked files of the registry
var
  i, j, Index: Integer; Path, CommonPath: string; Paths: TStringList; Stop: Boolean;
begin
  ...
  Paths:= TStringList.Create; // creates a string list to contain the paths
  try
    j:= 1; // initial setting for Posex()
    repeat
      Paths.Clear; // empty the stringlist
      for i:= 0 to Count - 1 do // scan the registry
      begin
        if Items[i].Checked then // considers only checked file information records
        begin
          Index:= Posex('\', Items[i].Path, j); // find jth position of "\"
          Path:= Copy(Items[i].Path, 0, Index); // path to "\"
          Paths.Add(Path); // add to stringlist
          if (i > 0) and (Path <> Paths[0]) or (Index = 0) then
          begin
            Result:= CommonPath;
            Stop:= True;
          end; // if (i > 0) and (Path <> Paths[0]) or (Index = 0)
        end; // if Items[i].Checked
      end; // for i:= 0 to Count - 1
      j:= Index+1; // update j for Posex()
      if Paths.Count > 0 then
        CommonPath:= Paths[0]
      else
      begin // no file seleted
        Result:= 'No file selected';
        Stop:= True;
      end; // if Paths.Count > 0
    until Stop;
  finally
    Paths.Free;
  end; // try...finally
end;

Where to copy

The way to determine the location where the files are to be copied is not implemented in the component but a simple call to the function SelectDirectory() declared in FileCtrl.pas is recommended.

A Test Program

The test program that I have developed is called the GTRO_Backup program. It provides en execution environment and uses the TgtroBackup component to perform the backup and restore operations.

Backing up

Figure 4 shows how the program presents itself to the user (the backup page is activated).

GTRO_Backup Backup
The GTRO_Backup program in Backup mode

The visual component on the left of the Figure is a TgtroCheckShellTreeView component described on this Web site and on About.com/Delphi Programming. The user of the program can select files to be backed up by checking nodes in the component. Checking a collapsed node selects all the files in the disectory as well as all the files in its subdirectories whereas checking an expanded one selects only the files contained in the directory. The component uses a process similar to the one used to expand the list of files within the component and display the list of the selected files on the right hand side visual component.

Once such selection has been performed, the user can save the patterns corresponding to his selection to a text file by clicking the "Save patterns to File" button. He can use this text file later on to backup the same selection by clicking the "Load patterns from File" button and opening this text file. Once this is done, the TGtroCheckShellTreeView component will be updated to show his selection.

Additionally, the user can backup the selected files to an archive file with the extension .zlb by clicking on the "Backup" button. The user is then invited to select the archive file. Prior to doing this, he must determine if the backup will be complete or synchronized. Additionally, he can select the compression level (Normal is the default value).

Restoring/Copying

Figure 5 shows the form of the program when the Restore page is activated and an archive file has been opened.

Restore
The GTRO_Backup program in Restore mode

When the Restore page is activated, no archive is open and the display shown in Figure 5 is empty. The user must open an archive file and the archive is opened as shown above in a Windows Explorer-like display. The left hand side component is a TgtroCheckRegTreeview component that displays the directories contained in the archive file as a tree. When a node is selected, its content is displayed in the right-hand side TListview component (with Checkboxes). Then, the user can select complete directories by clicking checkboxes in the tree or by selecting individual files from the TListView component.

Once this is done, the user can either restore the files (clicking on the "Restore" button) to their original location or copy them (by clicking on the "Copy" button) to a new location.

Conclusion

This component has been developed as part of the development of a backup program where it is used to do the bulk of the backup and restore tasks. The latest zlib compression algorithm that performs compression and decompression is used without any modification in the component.

The test program is called GTRO_Backup.exe. It is a utility program that does not require any installation. You just drop it in a directory of your choice and execute it.

The source code of the component and the test program as well as the components needed for the test program can be downloaded from here. The gtroBackup.zip file contains one main directory (gtrobackup) and three sub-directories:

  1. gtroBackup contains the code of component and its sub-directory, Delphi zlib 1,2,3, 2009, contains the ZlibEx.pas unit and all it needs to operate;
  2. Gtro_Backup contains the code of the test program as well as its executable GTRO_Backup.exe ready to go; and
  3. CheckTreeView contains the components TgtroCheckShellTreeView and TgtroCheckTreeView that are needed to operate the test program.

Before the test program can be compiled and executed, the TgtroBackup (GTroBackup.pas in the gtroBackup directory), TgtroCheckShellTreeView (GTroCheckShellTreeView.pas in the CheckTreeView directory) and TgtroCheckTreeview (GTroCheckTreeView.pas in the CheckTreeView directory) components need to be put in a design package and installed.

This test program can be very useful for limited backups such as that of user data, music and videos but it has limitations. It cannot backup any file that is in use or locked by Windows when the backup is performed. Such file is automatically skipped and the user is notified when this happens. Correcting this would require using the Volume Shadow Copy Service as Microsoft Backup programs do but it appeared too complicated for a test program.

Warning!
This code was developed for the pleasure of it. Anyone who decides to use it does so at its own risk and agrees not to hold the author responsible for its failure.


Questions or comments?
E-Mail
Last modified: September 3rd 2014 22:48:56. []