This article shows how to develop a backup component using Delphi 2009 and the zlib compression algorithm. First, some theory of data compression is reviewed and, second, a compression component is presented and developed. A test program is provided with the component.
If you download many programs and files off the Internet, you've probably encountered ZIP files before. This compression system is a very handy invention, especially for Web users because it lets you reduce the overall number of bytes in a file so it can be transmitted faster over slower Internet connections, or take up less space on disk. Here, we will develop a component that compresses files using the zlib compression algorithm.
A short review of the theory
Most types of computer files are fairly redundant -- they have the same information listed over and over again. File-compression programs simply get rid of the redundancy. Instead of listing a piece of information over and over again, a file-compression program lists that information once and then refers back to it whenever it appears in the original program.
There are two main types of data compression:
- Lossless data compression, the compressed-then-decompressed data is an exact replication of the original data.
- Lossy data compression, the decompressed data may be different from the original data. Typically, there is some distortion between the original and the reproduced signal.
The popular WinZip program is an example of lossless compression. JPEG is an example of lossy compression. Lossy compression is not discussed in this article.
In 1977 and 1978, Ziv and Lempel[Ziv77][Ziv78] introduced the LZ class of adaptive dictionary data compression techniques that have become so very popular today. In competition with these techniques are the Markov algorithms that give better compression but run much more slowly. These two methods are compared in an article by Ross Williams (1991).
Most compression programs use a variation of the LZ adaptive dictionary-based algorithm to shrink files. "LZ" refers to Lempel and Ziv, the algorithm's creators, and "dictionary" refers to the method of cataloging pieces of data. The system for arranging dictionaries varies, but it could be as simple as a numbered list.
Zlib Data Compression Library
The Zlib data-compression library is designed to be a free, general-purpose, legally unencumbered -- that is, not covered by any patents -- lossless data-compression library for use on virtually any computer hardware and operating system. The zlib compression method is a variant of the LZ77 compression method and its data format is itself portable across platforms.
Zlib was written by Jean-Loup Gailly (compression) and Mark Adler (decompression) . Jean-Loup is also the primary author/maintainer of gzip, the author of the compression FAQ list and the former maintainer of Info-ZIP's Zip; Mark is also the author of gzip's and UnZip's main decompression routines and was the original author of Zip. Not surprisingly, the compression algorithm used in zlib is essentially the same as that in gzip and Zip, namely, the "deflate" method that originated in PKWARE's PKZIP 2.x.
Using zlib with Delphi
The Delphi unit containing the zlib declarations and functions is zlibex.pas. (specifically the Delphi zlib 1.2.3.2009, the zlib version 1.2.3 for Delphi 5, 6, 7, 8, 2005, 2006, 2007 and 2009). It defines three classes:
- TCustomZStream - It is an abstract class which is the basis of the TZCompressionStream and TZDecompressionStream;
- TZCompressionStream - is a write-only sequential stream that compresses data when written to the stream;
- TZDecompressionStream - is a write-only unidirectional stream that decompresses data when written to the stream.
In practice, if ZStream is defined as a TZCompressionStream, the statement ZStream.CopyFrom(AStream, 0) compresses the content of AStream into ZStream and vice versa if ZStream is declared as a TZDecompressionStream.
Additionally, the zlib compression algorithm uses CRC (an acronym for "Cyclic Redundancy Check"), a non-secure hash function designed to detect accidental changes to raw computer data. A CRC-enabled algorithm calculates a short, fixed-length binary sequence, known as the CRC code or just CRC, for each block of data and sends or stores them both together. When a block is read, the calculation is repeated; if the new CRC does not match the one calculated earlier, then the block contains a data error.
The core component
The component is defined as a descendant of TComponent and its compression algorithm is the zlib compression library. Its objective is to take a sequence of files and compress each of them into a single archive file as is done by most backup programs. This archive file is comprised of a sequence of chunks that contains the compressed version of the file, each corresponding to a file of the list. This sequence is followed by a registry that contains a list of file records, one for each file in the list. This data record contains specifics of the chunk such as its position in the stream, its length, the date of the input file, and so on. The archive file is terminated by a termination record that gives the position of the registry in the stream.
With the component, it is possible to take a list of files and compress them into an archive file, and vice versa, restore the files from the archive file. We will now find out how to perform these tasks with the component.
How to backup?
There are three public methods that initiate the backup:
function TGtroBackup.BackupFromPattern(const Pattern: string;
Target: string; BackupMode: TBackupMode): Boolean;
function TGtroBackup.BackupFromPatterns(const Patterns: Tstrings;
Target: string; BackupMode: TBackupMode): Boolean;
function TGtroBackup.Backup(const FileList: TStrings;
ATarget: string; BackupMode: TBackupMode): Boolean;
All three methods require three arguments. Target is the name of the archive file (with path) and BackupMode is either complete (bmAll) or synchronized (bmNewer).
Inputting patterns
For BackupFromPattern() and BackupFromPatterns, the first argument is a pattern or a list of patterns. A pattern is simply a path to a directory of the shell followed by an optional suffix "/s" that implies that all files in all the subdirectories will be included in the enumeration.
When patterns are used as input to these two methods, an algotithm that visits the directory and its subdirectories (if the suffix "/s" is included) recursively. It uses the FindFirst(), FindNext() and FindClose() functions declared in sysutils.pas to enumerate the files that this or these directories contain. The file list thus produced is then presented as the first argument of the Backup() method.
Inputting a list of files
Once a list of the files to be backed up has been produced, the Backup() method is used as shown in Figure 1 below. The second argument of the Backup() method is called Target. It is the name and path of the file that will become the archive. The third argument is the backup mode: it can be Complete (bmAll) or Synchronized (bmNewer).
functionTGtroBackup.Backup(const FileList: TStrings; ATarget: string; BackupMode: TBackupMode): Boolean;
begin
...
Success:= BackupFiles(FileList, BackupMode);
DeleteFlag:= True;
if Success then // BackupFiles successfull?
begin
Result:= Success;
if FileExists(fTarget) then // if archive already exists
begin
DeleteFlag:= DeleteFile(fTarget); // delete it
...
end; // if FileExists(fTarget)
if DeleteFlag then // existing archive was deleted successfully
begin
if not RenameFile(TmpFileName, fTarget) then // TempFileName contains the new archive; rename it
ShowMessage(strCannotchange + TmpFileName + ' en ' + fTarget);
...
DeleteFile(TmpFileName); // delete the temporary archive
CloseArchive;
end // if DeleteFlag
else
Result:= False;
end
else
begin
if Assigned(FOnFileList) then FOnFileList(Self, True, strFailure);
Result:= False;
end;
end;
The private method BackupFiles(), highlighted in yellow above, is at the center of the backup process:
function TGtroBackup.BackupFiles(FileList: TStrings; Mode: TBackupMode): Boolean;
var
TempStream: TStream;
begin
...
TmpFileName:= GetTmpFileName; // Get the name of a temp file in "temp" directory
fArchiveStream:= TFileStream.Create(TmpFileName, fmCreate); // Create Output stream
... // code used by the synchronized backup only removed from listing
// Now compress files and archive them --- fill output stream with new chunks
fNumFiles:= FileList.Count;
if Assigned(FOnFileList) then
FOnFileList(Self, False, strCompressingFileList + IntToStr(FileList.Count) + ')');
TriggerProgressEvent(0, '');
fTotalSize:= FileList.Count;
fArchiveStream:= AddFiles(FileList, fArchiveStream);
if Assigned(FOnFileList) then FOnFileList(Self, True, strSavingRegistry);
Result:= fRegistry.SaveToStream(fArchiveStream, fCompressionLevel);
fArchiveStream.Free; // ???
end;
Now it is time to consider the differences in the process due to the selection of the Backup Mode. It can be either Complete or Synchronized.
Complete backup
This is the simplest way to backup files into the archive. It takes all the files of the list of files and compresses them all into the archive file selected by the user. The process is as follows:
- A temporary file stream is created;
- Each file of the list is compressed into a chunk and the chunks are stored sequentially in the temporary stream. This is performed by the AddFiles() method that follows:
- For each file, a file information record is appended to the Registry by the method fRegistry.Add(). Once all files have been compressed and their file information record appended to the Registry, the Registry is appended to the stream and the stream is terminated by the archive record. If it exists, the original archive file is deleted and the temporary stream identified to the new archive file.
function TGtroBackup.AddFiles(FileList: TStrings; const InStream: TStream): TStream;
var
FileInfo: pZLBFileRec; SuccessFlag: Boolean;
begin
while FileList.Count > 0 do
begin // for each file in FileList
if FileExists(FileList[0]) then
begin
FileInfo:= AllocMem(SizeOf(TZLBFileRec));
SuccessFlag:= CompressFileToStream(FileList[0], InStream, FileInfo);
fRunningTotal:= fTotalSize - FileList.Count;
TriggerProgressEvent(fRunningTotal, FileList[0]);
if SuccessFlag then
fRegistry.Add(FileInfo); // File information added to registry
FileList.Delete(0); // remove the file from FileList
end; // if FileExists
end; // while FileList.Count
Result:= InStream;
end;
specifically by the CompressToStream() method:
function TGtroBackup.CompressFileToStream(const FileName: string;
OutStream: TStream; FileInfo: pZLBFileRec): Boolean;
var
InFileStream, TmpStream: TStream; ZStream: TZCompressionStream;
MemStreamFlag: Boolean; TmpFile: string;
begin
Result:= False;
try
InFileStream:= TFileStream.Create(FileName, fmOpenRead + fmShareDenyWrite);
try
...
if InFileStream.Size <= (20 * (1024*1024)) then
begin // Use memory or file stream? modified 21 Apr 2006
TmpStream:= TMemoryStream.Create;
MemStreamFlag:= true;
TmpFile:= '';
end
else
begin
TmpFile:= GetTmpFileName;
TmpStream:= TFileStream.Create(TmpFile, fmCreate);
MemStreamFlag:= False;
end;
try // Compress data using TZCompressionStream
ZStream:= TZCompressionStream.Create(TmpStream, fCompressionLevel);
try
...
ZStream.CopyFrom(InFileStream, 0); // File is compressed here
finally
ZStream.Free;
end; // try...finally (ZStream)
... // Fill the FileInfo record
if TmpStream.Size >= InFileStream.Size then
begin // do not compress
...
OutStream.CopyFrom(InfileStream, 0); // Copy uncompressed data to OutFile
end // if TmpStrm
else
begin // compress
...
OutStream.CopyFrom(TmpStream, 0); // Copy compressed data to Outfile
end; // else
finally
TmpStream.Free;
...
end; // try...finally (TmpStrm)
finally
InFileStream.Free;
...
Result:= True;
end; // try...finally (InFile)
except // handles exceptions when the file is in use or locked by Windows
Result:= False;
ShowMessage('File ' + FileName + ' could not be opened and was skipped');
end;
end;
Important!
If the file cannot be opened because it is in use or locked by Windows, the file is skipped and the user notified of the event.
Synchronized backup
The objective of this backup mode is to backup only those files that have been created or modified since the last backup. As shown in the listing hereunder, the archive specified by the user is opened (it must exist) as a file stream and a registry object is created that contains all the file information records of the files in the archive.
if (Mode <> bmAll) and FileExists(fTarget) then // Synchronized backup
begin
TempStream:= OpenArchive(fTarget, True); // the stream created must not be freed
if TempStream <> nil then // if the temporary file exists
begin
FilterFileList(FileList); // Keep only the files created or modified; mark modified for deletion
fArchiveStream:= DeleteFiles(TempStream);
end;
end; // if (Mode <> bmAll)
The FilterFileList() method is called to filter out of the list of files all those files that are not going to be backed up. At the same time, the files that have been modified are marked for deletion from the archive (their registry record is checked). The DeleteFiles() method only performs garbage collection: it copies all the chunks and unchecked registry entries in a new stream from which all references to modified files have been deleted. At this point, the process reverts to the scheme used for the complete backup.
Structure of the archive file
I have made a lot of reference to the archive file. It has the structure shown in Figure 2. It is comprised of a sequence of chunks (a sequence of bytes correspondiing to the compressed file, one for each file), a registry containing the file information (one recors per file) and an Archive record terminating the file and containing global file attributes.
Each chunk contains the result of compressing each file of the backup. Corresponding to each chunk, there in a file information entry in the Registry that characterizes the information pertaining to each file: its name, its path, its age, its original size, its compressed size, the start point of its chunk in the archive file, its CRC, its status (compressd if it was, stored otherwise), its compression level and the Checked field. Finally, the archive record contains the number of files in the archive, the size of the Registry, the total size of the chunks and an indication of the compression of the Registry.
How to restore?
In order to restore, the archive file must first be opened as a stream. During the process the Registry is extracted and the remaining of the stream freed. At this point, the Registry becomes available for the calling program to display it and let the user select which files he wants to restore.
One way for the calling application to display the content of the registry is to use a TListView with the properties ViewStyle and CheckBoxes set to vsReport to vsReport and true respectively. This following snippet can be used for this display:
ListView.Clear;
ListView.Items.BeginUpdate;
try
RegList:= TList(Node.Data);
for i:= 0 to RegList.Count - 1 do
begin
Reg:= pZLBFileRec(RegList.Items[i]);
FA:= FileDateToDateTime(Reg^.Age);
LVItem := ListView.Items.Add;
LVItem.Caption:= Reg^.Name;
LVItem.SubItems.Add(FormatDateTime('dd-mm-yy hh:nn', FA));
LVItem.SubItems.Add(IntToStr(Reg^.start));
LVItem.SubItems.Add(IntToStr(Reg^.oSize));
LVItem.SubItems.Add(IntToStr(Reg^.cSize));
LVItem.SubItems.Add(FileStatusStr[Reg^.Status]);
LVItem.SubItems.Add(FileCompStr[Reg^.CompressionLevel]);
LVItem.SubItems.Add(IntToHex(Reg^.CRC, 8));
LVItem.Checked:= Reg^.Checked;
LVItem.Data:= Pointer(Reg);
end; // for i
finally
ListView.Items.EndUpdate;
end; // try...finally}
This display let the user choose which files are to be restored by checking the appropriate checkboxes on the display. Once this selection has been made, the calling application calls the Restore() public method of the component.
function TGtroBackup.Restore(ArchiveFileName, NewPath: string; RestoreMode: TRestoreMode): Boolean;
var
i: Integer; Item: pZLBFileRec; NumRestored: Integer; fNewPath: string;
begin
...
for i:= 0 to Registry.Count - 1 do // scan the registry
begin
Item:= Registry.Items[i];
if Item.Checked {or not RegOpen} then // if the item is checked
begin // if RegOpen is false, all files are restored
Inc(NumRestored);
if Assigned(FOnReport) then FOnReport(Self, True, strExtractingFile +
Item^.Path + Item^.Name + ' [' + IntToStr(Item^.oSize) + ']');
Extract(i, NewPath, RestoreMode); // extract it
end; // if Item.Checked or not RegOpen
end; // for i:= 0 to Registry.Count - 1
...
Result:= True;
end;
Restore() requires three arguments:
- ArchiveFile - the name of the archive file from which the user selects files to restore;
- CopyPath - the path where the restored files are copied. If set to an empty string, the files are restored in their original location; if not, files are copied in a directory selected by the user;
- RestoreMode - a TRestoreMode type used to define how the files will be restored.
Highlighted in yellow, above, the Extract() method is called to fetch the chunk corresponding to the file information of the registry and decompress it.
procedure TGtroBackup.Extract(Index: Integer; ATarget: string);
// Extracts file[i] from the archive, decompress and restore it to ATarget
var
OutFile: TFileStream; FileInfo: pZLBFileRec; Target: string;
begin
...
if Index <> -1 then
begin
FileInfo:= Registry.Items[Index]; // get the file information of the file from the registry
if ATarget <> '' then
Target:= FileInfo^.Path + FileInfo^.Name // copy mode
else Target:= FileInfo^.Path + FileInfo^.Name; // restore mode
if DoRestore(Target, FileInfo) then
begin
ForceDirectories(ExtractFilePath(Target));
OutFile:= TFileStream.Create(Target, fmCreate);
try
ExtractFileFromStream(Index, OutFile);
FileSetDate(OutFile.Handle, FileInfo^.Age);
finally
OutFile.Free;
end; // try...finally
end; // if DoRestore
end; // if Index
fProgressIndex:= 0;
end;
Two methods are called in this method: DoRestore() and ExtractFileFromStream(). DoRestore() simply applies the RestoreMode selected by the user whereas ExtrackFileFromStream() locates the chunk associated with the file to be restored, decompresses it (if it was) and restore it in its original location.
How to copy?
Copying files is a bit more complex than restoring them. During the restoration process, the Registry already contain the path and the name of the files and the files are restored in this original position in the shell. Copying them requires that these coordinates be modified. First, a common path and the location where the files are to be copied need to be determined.
FindCommonPath()
The method FindCommonPath() calculated the path that is common to all the files that are selected for the copy operation. Its code is as follows:
function TRegistry.FindCommonPath: string;
// used to find the common path of checked files of the registry
var
i, j, Index: Integer; Path, CommonPath: string; Paths: TStringList; Stop: Boolean;
begin
...
Paths:= TStringList.Create; // creates a string list to contain the paths
try
j:= 1; // initial setting for Posex()
repeat
Paths.Clear; // empty the stringlist
for i:= 0 to Count - 1 do // scan the registry
begin
if Items[i].Checked then // considers only checked file information records
begin
Index:= Posex('\', Items[i].Path, j); // find jth position of "\"
Path:= Copy(Items[i].Path, 0, Index); // path to "\"
Paths.Add(Path); // add to stringlist
if (i > 0) and (Path <> Paths[0]) or (Index = 0) then
begin
Result:= CommonPath;
Stop:= True;
end; // if (i > 0) and (Path <> Paths[0]) or (Index = 0)
end; // if Items[i].Checked
end; // for i:= 0 to Count - 1
j:= Index+1; // update j for Posex()
if Paths.Count > 0 then
CommonPath:= Paths[0]
else
begin // no file seleted
Result:= 'No file selected';
Stop:= True;
end; // if Paths.Count > 0
until Stop;
finally
Paths.Free;
end; // try...finally
end;
Where to copy
The way to determine the location where the files are to be copied is not implemented in the component but a simple call to the function SelectDirectory() declared in FileCtrl.pas is recommended.
A Test Program
The test program that I have developed is called the GTRO_Backup program. It provides en execution environment and uses the TgtroBackup component to perform the backup and restore operations.
Backing up
Figure 4 shows how the program presents itself to the user (the backup page is activated).
The visual component on the left of the Figure is a TgtroCheckShellTreeView component described on this Web site and on About.com/Delphi Programming. The user of the program can select files to be backed up by checking nodes in the component. Checking a collapsed node selects all the files in the disectory as well as all the files in its subdirectories whereas checking an expanded one selects only the files contained in the directory. The component uses a process similar to the one used to expand the list of files within the component and display the list of the selected files on the right hand side visual component.
Once such selection has been performed, the user can save the patterns corresponding to his selection to a text file by clicking the "Save patterns to File" button. He can use this text file later on to backup the same selection by clicking the "Load patterns from File" button and opening this text file. Once this is done, the TGtroCheckShellTreeView component will be updated to show his selection.
Additionally, the user can backup the selected files to an archive file with the extension .zlb by clicking on the "Backup" button. The user is then invited to select the archive file. Prior to doing this, he must determine if the backup will be complete or synchronized. Additionally, he can select the compression level (Normal is the default value).
Restoring/Copying
Figure 5 shows the form of the program when the Restore page is activated and an archive file has been opened.
When the Restore page is activated, no archive is open and the display shown in Figure 5 is empty. The user must open an archive file and the archive is opened as shown above in a Windows Explorer-like display. The left hand side component is a TgtroCheckRegTreeview component that displays the directories contained in the archive file as a tree. When a node is selected, its content is displayed in the right-hand side TListview component (with Checkboxes). Then, the user can select complete directories by clicking checkboxes in the tree or by selecting individual files from the TListView component.
Once this is done, the user can either restore the files (clicking on the "Restore" button) to their original location or copy them (by clicking on the "Copy" button) to a new location.
Conclusion
This component has been developed as part of the development of a backup program where it is used to do the bulk of the backup and restore tasks. The latest zlib compression algorithm that performs compression and decompression is used without any modification in the component.
The test program is called GTRO_Backup.exe. It is a utility program that does not require any installation. You just drop it in a directory of your choice and execute it.
The source code of the component and the test program as well as the components needed for the test program can be downloaded from here. The gtroBackup.zip file contains one main directory (gtrobackup) and three sub-directories:
- gtroBackup contains the code of component and its sub-directory, Delphi zlib 1,2,3, 2009, contains the ZlibEx.pas unit and all it needs to operate;
- Gtro_Backup contains the code of the test program as well as its executable GTRO_Backup.exe ready to go; and
- CheckTreeView contains the components TgtroCheckShellTreeView and TgtroCheckTreeView that are needed to operate the test program.
Before the test program can be compiled and executed, the TgtroBackup (GTroBackup.pas in the gtroBackup directory), TgtroCheckShellTreeView (GTroCheckShellTreeView.pas in the CheckTreeView directory) and TgtroCheckTreeview (GTroCheckTreeView.pas in the CheckTreeView directory) components need to be put in a design package and installed.
This test program can be very useful for limited backups such as that of user data, music and videos but it has limitations. It cannot backup any file that is in use or locked by Windows when the backup is performed. Such file is automatically skipped and the user is notified when this happens. Correcting this would require using the Volume Shadow Copy Service as Microsoft Backup programs do but it appeared too complicated for a test program.