In the late 90s, I developed a personal information manager that I am still using. Up to now, I was satisfied of it but I realized that it would a real improvement to include a spell and grammar checker by automating MS Word to this end. This article provides a short overview of OLE automation and COM, describes how Word can be automated and details the approach and the process used to implement this capability.
Among other inputs, the Personal Information Manager contains an editor that the user utilizes to input his text and format it in a limited way (bold, italic, underline, font color and bullet). One helper was missing: spell and grammar check.
The personal information manager saves its data in a table in a Paradox database. The field holding the data is an fmtMemo that can accommodate Rich Text Formatted (RTF). The editor is a data-aware TFrame component (fully described in "A data-aware framed editor") that allows the user to write his text in a TDBRichEdit component.
The project is to automate MS Word: when the button is clicked, the content of the page is transferred to MS Word where it is spell and grammar checked. The corrected page is then transferred back to the editor to replace the original page, and put the cursor at the end of the page, ready for the user to pursue his writing.
The forthcoming sections show how this spell and grammar checking capability was installed automating MS Word from Delphi 2009. I will start a definition of spell and grammar checkers, follow with a general overview of Automation and will pursue with more details on how this was project was achieved.
In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly. Spell check mainly focuses on the correctness or proper way of how a word is spelled whereas, grammar check (or grammar checker) is an application program that flags sentences or the structure of the sentence and how it is written or typed. It usually involves proper punctuation, word agreements and proper use and placement of nouns, adjectives, adverbs, verbs, etc. It points out why the sentence or sentences is in disagreement according to language rules and format. It basically focuses on the structure of a sentence [from What is the difference between spelling and grammar check?].
There are several such applications available on the Web but, since I had Office 2010 on my computer, I decided to use MS Word's spell and grammar checker through automation.
Overview of automation
A certain confusion surrounds the Component Object Model (COM) technology because Microsoft has used different names for it for marketing reasons. It started with Object Linking and Embedding (OLE), an extension of the Dynamic Data Exchange (DDE) model. Later on, Microsoft updates OLE to OLE2 and started to add new capabilities such as OLE Automation and OLE Controls. Later on, the Window 95 shell and subsequent versions of Windows were built using OLE technology and interfaces, OLE Controls were renamed ActiveX controls after lightening the specification.
Under Windows, Automation is an inter-process communication mechanism based on a subset of Component Object Model. It is a convention by which one application can control another. The controlling application is referred to as the automation client, and the one being controlled is referred to as the automation server. The client manipulates the server application’s components by accessing those components properties and methods.
In fact, with COM, applications communicate with one another through interfaces that form a hierarchy at the root of which one finds the IUnknown interface. An interface can be described as an object of which only the declaration part (the interface section) is available. It looks like a type definition in the interface section of a Delphi program save for the presence of a GUID ( a 128-bit globally-unique identifier), the directive safecall after every function declarations and the lack of implementation.
A COM object is an instance of a class that implements one or more COM interfaces. It provides services as defined by its interfaces. Early and late binding, when applied to a COM object, refers to how the client code uses the automation interface (i.e., IDispatch) of the COM object.
- Early binding to a COM object, refers to the use of the automation interface with foreknowledge of interfaces being accessed. This foreknowledge is typically gained by the client having access, at build-time, to the type-library information provided by the server so that it is written into the compiled code. This way, at run-time, the client makes calls knowing precisely what to request. Because of this, early binding is significantly faster than late binding.
- Late binding refers to the discovery, at run-time, of the function offsets into the dispatch interface being accessed through the automation interface. This discovery is made by an extra function call to the IDispatch's GetIDsOfNames method. The implementation of the IDispatch interface, in the server, looks up the offset for the requested name and returns that offset to the client. This allows the client to be built without complete knowledge of the server's dispatch interface; however, the extra function calls to GetIDsOfNames make if much slower.
Their advantages and disadvantages are described in the table below.
Approach: | Advantages | Disadvantages |
---|---|---|
Early binding through interfaces | Resulting application is faster. Code is cleaner. Makes maximum use of compiler's ability to check references. |
Have to use Type Library with Delphi. |
Late binding -The controller's variables are declared as OLE Variants, and each method or property reference is resolved at run time. | No need to work with Type Libraries. | Compiler can't check your references -- frequent source of bugs -- won't be checked until runtime. In addition, resolving references at runtime is slow. |
Using Word as the server
By using Word Automation (the server), we can use Delphi (the client) to dynamically create a new document, add some text we want to spell check, and then have Word check the spelling. If we keep Microsoft Word minimized, users might never know!
Delphi can be used to fully control virtually all the features of MS Word. There is very little that can be done from inside that cannot also be automated from outside. In other words, Word can be fully controlled from Delphi applications using OLE Automation.
The COM interface exposed by MS Word gives a number of mechanisms for the use of the its spelling engine.
- The Word.Application object can call the spelling engine for a single word or string. The return value is a Boolean indicating True (no errors) or False (errors).
- The Document object contained in the Word.Application object exposes all the functionality of MS Word, in particular, the MS Word dialog boxes for spelling and grammar checking can be used.
Delphi 2009 has simplified the use of Microsoft Office applications by pre installing some ready-to-use components that wraps the automation interface for Office 2000 and expose the same properties, methods and events as the COM interface. These components are available in the Servers page of the Component Palette. They can also be used dynamically.
Implementation
I will automate MS Word from my application in order to check the spelling and the grammar and give a list of suggested correct replacements. The text to be spelled and grammar-checked is located in a Paradox database and displayed in a TDBRichEdit component by the application. It has to be transferred to the Word spelling engine, be processed there and then returned to its source while preserving its RTF-formatted content.
First, I had to select between possible options. I chose early binding because it is faster. I found out that the RTF-formatted content of the source (the TDBRichEdit component) could be transferred the engine nd returned to its component using the clipboard as a communication channel.
The required functionality could have been fully implemented as a procedure inserted in the code of the application. I decided otherwise. The functionality would be wrapped into a Delphi class that I called TGtroWordSpeller. If the code is to be used in many instances, it is preferable that it be a class since modifications and bug correction will then be centralized.
The early-binding code
Using the WordXP.pas unit in the interface, it is fairly easy to create an instance of the class, an object. Look at this:
uses Word2000, ...
var
FWordApp: TWordApplication;
FWordDoc: _Document;
procedure TGtroWordSpeller.Connect;
begin
FWordApp := TWordApplication.Create(Application); // create Word application
...
FWordDoc:= FWordApp.Documents.Add(EmptyParam, EmptyParam, EmptyParam, EmptyParam); // create Word document
...
end;
It creates the Word Application and generates a Word Document. Obviously, Word must be installed on the computer for this to work.
Transfer of the content
There seems to be no way to extract the RTF-formatted content from a TRichEdit or a TDBRichEdit component programmatically except through the clipboard. What is available from the Text or the Lines members of these components is plain text. I have tried using streams but the Range type in Word does have neither a LoadFromStream nor a SaveToStream method. As a concequence, I used the clipboard and planned doing what follows:
1. Connect to Word.App, create WordDoc and configure the connection; |
---|
2. Copy the RTF content of the DBRichEdit component to the clipboard; |
3. Paste into WordDoc.Content, a Range type; |
4. Do the spell and grammar check and modify WordDoc.Content accordingly; |
5. Copy WordDoc.Content to the clipboard; |
6. Paste back to DBRichEdit; |
7. Disconnect from Word. |
In the code, FText is the content of the TDBRichEdit Component and FWordDoc is the word document. In this code, the RTF-formatted tex, FText, is copied to the clipboard and pasted to the Word spelling engine.
FText.SelectAll; // select all the content of TDBRichEdit
FText.CopyToClipboard; // Copy it to the clipboard
FWordDoc.Content.Paste; // Paste content of clipboard in Word document
After the checking, the still RTF-formatted text is copied back to the clipboard and pasted back to FText.
FWordDoc.Content.Copy; // Copy corrected text to clipboard
FText.PasteFromClipboard; // Paste back to Target
and it works!
The following figure illustrates the process
Call to the spelling engine
When a user clicks on the button of the editor, the application gets connected with Word and the following code is executed:
function TGtroWordSpeller.CheckSpelling(Text: TDBRichEdit; LanguageID: TOleEnum): Boolean;
var
Cursor: TCursor;
begin
Result:= False;
Cursor:= Screen.Cursor;
Screen.Cursor:= crHourGlass;
FLanguageID:= LanguageID;
FText:= Text;
ToClipBoard;
FText.DataSource.DataSet.Edit; // Puts FText in "Edit" mode
try
Connect;
CreateWordDoc; // Create new empty document and configure
// Call Word's spell/grammar check dialog
if FWordApp.Dialogs.Item(wdDialogToolsSpellingAndGrammar).Show(EmptyParam) <> 0 // Spell/grammar dialog is called
then
begin
FWordDoc.Content.Copy; // Copy corrected text to clipboard
FText.PasteFromClipboard; // Paste back to the commponent
Result:= True;
end;
finally
Disconnect;
Screen.Cursor:= Cursor;
end;
end;
With the call to the Word spell and grammar dialog, Word takes full control of the process and the user cannot intervene before the verification is terminated unless the "Cancel" button is pressed.
Code of the component
The code of the component is listed hereunder:
unit gtrowordspeller;
{*******************************************************************************
* Unit gtrowordspeller.pas containing the TGtroWordSpeller class - 10 June 2103*
* developed by Georges Trottier, www.gtro.com *
*******************************************************************************}
interface
uses Windows, SysUtils, Classes, ActnList, Forms, StdCtrls, Dialogs, StdActns, ActnRes, Messages, ComObj,
ActiveX, Variants, Word2000, DBCtrls, OleServer, Comctrls, Controls, ClipBrd;
const
MSWordWndClass = 'OpusApp';
MSDialogWndClass2000 = 'bosa_sdm_Microsoft Word 9.0';
MSDialogWndClass97 = 'bosa_sdm_Microsoft Word 8.0';
DEBUG = False;
type TGtroWordSpeller = class (TComponent)
private
FConnected: Boolean;
FText: TDBRichEdit;
FWordApp: TWordApplication;
FWordDoc: _Document;
FWordVersion, FWordDialogClass: string;
Handle: HWND;
FLanguageID: TOleEnum;
procedure Connect;
procedure Disconnect;
procedure CreateWordDoc;
protected
public
constructor Create(AOwner: TComponent); override;
destructor Destroy; override;
function CheckSpelling(Text: TDBRichEdit; LanguageID: TOleEnum): Boolean;
function WordInstalled: Boolean;
end;
implementation
{ TGtroWordSpeller }
function TGtroWordSpeller.CheckSpelling(Text: TDBRichEdit; LanguageID: TOleEnum): Boolean;
var
Cursor: TCursor;
begin
Result:= False;
Cursor:= Screen.Cursor;
Screen.Cursor:= crHourGlass;
FLanguageID:= LanguageID;
FText:= Text;
FText.DataSource.DataSet.Edit; // Puts FText in "Edit" mode
FText.SelectAll; // select all the content of Target
FText.CopyToClipboard; // Copy content to clipboard
try
Connect;
CreateWordDoc; // Create new empty document and configure
FWordDoc.Content.Paste; // Paste content of clipboard in Word document
// Call Word's spell/grammar check dialog
if FWordApp.Dialogs.Item(wdDialogToolsSpellingAndGrammar).Show(EmptyParam) <> 0 then
begin
FWordDoc.Content.Copy; // Copy corrected text to clipboard
FText.PasteFromClipboard; // Paste back to Target the traditional way
// Garbage collection - added after the "insert lines" bug was investigated
FText.SelectAll; // select all
FText.SelStart:= FText.SelLength - 3; // start new selection
FText.SelLength:= 100; // extend it to the end
FText.SelAttributes.Style:= []; // remove all formatting
FText.SelText:= ''; // replace the selection by ''
Result:= True;
end;
finally
Disconnect;
Screen.Cursor:= Cursor;
end;
end;
procedure TGtroWordSpeller.Connect;
begin
if FConnected then Exit; // don't create two instances
FWordApp := TWordApplication.Create(Application); // create Word application
FWordApp.ConnectKind := ckNewInstance; // define kind of connexion
FWordApp.Connect; // connect to Word
FWordApp.WindowState := $00000002; // minimise
FWordApp.Visible := False; // and default to NOT Visible
FWordApp.ScreenUpdating := False; // speed up winword's processing
FWordVersion := FWordApp.Version;
if FWordVersion[1] = '9' then
FWordDialogClass := MSDialogWndClass2000
else
FWordDialogClass := MSDialogWndClass97;
FConnected:= True;
end;
constructor TGtroWordSpeller.Create(AOwner: TComponent);
begin
inherited Create(AOwner);
FConnected:= False;
end;
destructor TGtroWordSpeller.Destroy;
begin
FConnected:= False;
end;
procedure TGtroWordSpeller.Disconnect;
var
SaveChanges: OleVariant;
begin
SaveChanges:= wdDoNotSaveChanges;
FWordApp.Quit(SaveChanges); // close Word document without saving
FWordApp.Free; // Close Word application
end;
function TGtroWordSpeller.WordInstalled: Boolean;
begin
Result:= True;
try
FWordApp := TWordApplication.Create(Application); // create Word application
except
Result:= False;
ShowMessage('Erreur!');
SysUtils.Abort;
end;
end;
procedure TGtroWordSpeller.CreateWordDoc;
var
s: string;
begin
FWordDoc:= FWordApp.Documents.Add(EmptyParam, EmptyParam, EmptyParam, EmptyParam);
s := FWordDoc.Name + ' - ' + FWordApp.Name;
Handle := FindWindow(MSWordWndClass, PChar(s)); // winword
SetWindowPos(Handle, HWND_TOPMOST, 0, 0, 0, 0, SWP_NOACTIVATE + SWP_HIDEWINDOW); // dialog always on top
FWordDoc.Content.LanguageID := FLanguageID; // select french (Canada)
end;
end.
It is a component called TGTroWordSpeller that can be used dynamically in this way (FSpeller: TGtroWordSpeller):
FSpeller:= TGtroWordSpeller.Create(nil);
FWordInstalled:= FSpeller.WordInstalled; // FWordInstalled: Boolean;
FSpeller.Free;
in order to test if MS Word is installed on the computer. The function returns "true" if Word is installed, "no" otherwise. It allows the application to disable all the buttons and menu items that relate to the spell/grammar checker.
Given that Word is installed, the spell/grammar checker is started by inserting this code in the -button click event handler:
FSpeller:= TGtroWordSpeller.Create(nil);
FSpeller.CheckSpelling(DBRichEdit, SpellLanguage);
FSpeller.Free;
where DBRichEdit is the name of the TDBRichEdit component the text of which we want to spell and grammar check. SpellLanguage is the code of the spell language to be used by Word (french and english are implemented). The text of the source component is then selected, transferred to Word through the clipboard, spell/grammar check by word (a dialog appears for each error) and returned to the source component through the clipboard.
Garbage collection
The code performs the spell and grammar check correctly but something bothered me when I noticed that the process appended empty new lines to the text which was recuperated from Word. At first, I tought that it was a bug but, from interactions on the Web, I soon found out that it was the normal behaviour of the component.
The problem is illustrated in the figure next to this paragraph. In this test program, I use two RichEdit components. In RichEdit1, I generate three formatted lines programmatically, select all the content, copy it to the clipboard and paste it to RichEdit2. From a normal display, only the three lines would show but, the problem becomes evident when I select the content of both components (as show in the Figre): there are trailing empty new lines at the end of the text in both components - one in RichEdit1, two on RichEdit2.
Something similar occurred in my editor after every spell/grammar check. From inspection, the new lines seemed to be structured like this: a paragraph mark/no text + a paragraph mark + some formatting codes. It could be normal behaviour but, in the context of my application, it was garbage.
Since I could do nothing about the display of this garbage in the TDBRichEdit component, I decided to perform garbage collection as follows:
// Garbage collection
FText.SelectAll; // select all
FText.SelStart:= FText.SelLength - 3; // start new selection
FText.SelLength:= 100; // extend it to the end
FText.SelAttributes.Style:= []; // remove all formatting
FText.SelText:= ''; // replace the selection by ''
This code is executed once the text has been retrieved from Word through the clipboard. The text is selected and the length of the selection is used to start a new selection that extends to the end of the text. The formatting codes are removed from the selection and the new selection is replaced by an empty string. The value of 3 in the assignment to SelStart is the result of trial and errors.
With this code, the process puts the cursor at the end of the last line of the editor.
Conclusion
Now, with this code, I can verify the grammar and the spelling of the RTF-formatted text that I write in my application. The component is configured specifically for the TDBRichEdit component that I use but it can be used for TRichEdit components if the type of the source is changed in the code.
This article results from merging the ideas and codes presented in Utilisation du correcteur orthographique de Word and Using MS Word as a Spelling and Grammar Checker for Delphi. The first article uses early binding and calls the Word spell and grammar check dialog whereas the second shows how to transfer RTF-formatted text from a TRichEdit component to the Word spell and grammar checker. Indebtedness is hereby acknowledged.