A spell and grammar checker using Word

In the late 90s, I developed a personal information manager that I am still using. Up to now, I was satisfied of it but I realized that it would a real improvement to include a spell and grammar checker by automating MS Word to this end. This article provides a short overview of OLE automation and COM, describes how Word can be automated and details the approach and the process used to implement this capability.

Among other inputs, the Personal Information Manager contains an editor that the user utilizes to input his text and format it in a limited way (bold, italic, underline, font color and bullet). One helper was missing: spell and grammar check.

The personal information manager saves its data in a table in a Paradox database. The field holding the data is an fmtMemo that can accommodate Rich Text Formatted (RTF). The editor is a data-aware TFrame component (fully described in "A data-aware framed editor") that allows the user to write his text in a TDBRichEdit component.

The project is to automate MS Word: when the aaa button is clicked, the content of the page is transferred to MS Word where it is spell and grammar checked. The corrected page is then transferred back to the editor to replace the original page, and put the cursor at the end of the page, ready for the user to pursue his writing.

View of the application with an expanded view of the framed editor

The forthcoming sections show how this spell and grammar checking capability was installed automating MS Word from Delphi 2009. I will start a definition of spell and grammar checkers, follow with a general overview of Automation and will pursue with more details on how this was project was achieved.

In computing, a spell checker (or spell check) is an application program that flags words in a document that may not be spelled correctly. Spell check mainly focuses on the correctness or proper way of how a word is spelled whereas, grammar check (or grammar checker) is an application program that flags sentences or the structure of the sentence and how it is written or typed. It usually involves proper punctuation, word agreements and proper use and placement of nouns, adjectives, adverbs, verbs, etc. It points out why the sentence or sentences is in disagreement according to language rules and format. It basically focuses on the structure of a sentence [from What is the difference between spelling and grammar check?].

There are several such applications available on the Web but, since I had Office 2010 on my computer, I decided to use MS Word's spell and grammar checker through automation.

Overview of automation

A certain confusion surrounds the Component Object Model (COM) technology because Microsoft has used different names for it for marketing reasons. It started with Object Linking and Embedding (OLE), an extension of the Dynamic Data Exchange (DDE) model. Later on, Microsoft updates OLE to OLE2 and started to add new capabilities such as OLE Automation and OLE Controls. Later on, the Window 95 shell and subsequent versions of Windows were built using OLE technology and interfaces, OLE Controls were renamed ActiveX controls after lightening the specification.

Under Windows, Automation is an inter-process communication mechanism based on a subset of Component Object Model. It is a convention by which one application can control another. The controlling application is referred to as the automation client, and the one being controlled is referred to as the automation server. The client manipulates the server application’s components by accessing those components properties and methods.

In fact, with COM, applications communicate with one another through interfaces that form a hierarchy at the root of which one finds the IUnknown interface. An interface can be described as an object of which only the declaration part (the interface section) is available. It looks like a type definition in the interface section of a Delphi program save for the presence of a GUID ( a 128-bit globally-unique identifier), the directive safecall after every function declarations and the lack of implementation.

A COM object is an instance of a class that implements one or more COM interfaces. It provides services as defined by its interfaces. Early and late binding, when applied to a COM object, refers to how the client code uses the automation interface (i.e., IDispatch) of the COM object.

Their advantages and disadvantages are described in the table below.

Approach: Advantages Disadvantages
Early binding through interfaces Resulting application is faster.
Code is cleaner.
Makes maximum use of compiler's ability to check references.
Have to use Type Library with Delphi.
Late binding -The controller's variables are declared as OLE Variants, and each method or property reference is resolved at run time. No need to work with Type Libraries.  Compiler can't check your references -- frequent source of bugs -- won't be checked until runtime.
In addition, resolving references at runtime is slow.

Using Word as the server

By using Word Automation (the server), we can use Delphi (the client) to dynamically create a new document, add some text we want to spell check, and then have Word check the spelling. If we keep Microsoft Word minimized, users might never know!

Delphi can be used to fully control virtually all the features of MS Word. There is very little that can be done from inside that cannot also be automated from outside. In other words, Word can be fully controlled from Delphi applications using OLE Automation.

The COM interface exposed by MS Word gives a number of mechanisms for the use of the its spelling engine.

Delphi 2009 has simplified the use of Microsoft Office applications by pre installing some ready-to-use components that wraps the automation interface for Office 2000 and expose the same properties, methods and events as the COM interface. These components are available in the Servers page of the Component Palette. They can also be used dynamically.

Implementation

I will automate MS Word from my application in order to check the spelling and the grammar and give a list of suggested correct replacements. The text to be spelled and grammar-checked is located in a Paradox database and displayed in a TDBRichEdit component by the application. It has to be transferred to the Word spelling engine, be processed there and then returned to its source while preserving its RTF-formatted content.

First, I had to select between possible options. I chose early binding because it is faster. I found out that the RTF-formatted content of the source (the TDBRichEdit component) could be transferred the engine nd returned to its component using the clipboard as a communication channel.

The required functionality could have been fully implemented as a procedure inserted in the code of the application. I decided otherwise. The functionality would be wrapped into a Delphi class that I called TGtroWordSpeller. If the code is to be used in many instances, it is preferable that it be a class since modifications and bug correction will then be centralized.

The early-binding code

Using the WordXP.pas unit in the interface, it is fairly easy to create an instance of the class, an object. Look at this:

uses Word2000, ...

var
  FWordApp: TWordApplication;
  FWordDoc: _Document;

procedure TGtroWordSpeller.Connect;
begin
  FWordApp := TWordApplication.Create(Application); // create Word application
...
  FWordDoc:= FWordApp.Documents.Add(EmptyParam, EmptyParam, EmptyParam, EmptyParam); // create Word document
...
end;

It creates the Word Application and generates a Word Document. Obviously, Word must be installed on the computer for this to work.

Transfer of the content

There seems to be no way to extract the RTF-formatted content from a TRichEdit or a TDBRichEdit component programmatically except through the clipboard. What is available from the Text or the Lines members of these components is plain text. I have tried using streams but the Range type in Word does have neither a LoadFromStream nor a SaveToStream method. As a concequence, I used the clipboard and planned doing what follows:

The Plan
1. Connect to Word.App, create WordDoc and configure the connection;
2. Copy the RTF content of the DBRichEdit component to the clipboard;
3. Paste into WordDoc.Content, a Range type;
4. Do the spell and grammar check and modify WordDoc.Content accordingly;
5. Copy WordDoc.Content to the clipboard;
6. Paste back to DBRichEdit;
7. Disconnect from Word.

In the code, FText is the content of the TDBRichEdit Component and FWordDoc is the word document. In this code, the RTF-formatted tex, FText, is copied to the clipboard and pasted to the Word spelling engine.

  FText.SelectAll; // select all the content of TDBRichEdit
  FText.CopyToClipboard; // Copy it to the clipboard
  FWordDoc.Content.Paste; // Paste content of clipboard in Word document

After the checking, the still RTF-formatted text is copied back to the clipboard and pasted back to FText.

  FWordDoc.Content.Copy; // Copy corrected text to clipboard
  FText.PasteFromClipboard; // Paste back to Target

and it works!

The following figure illustrates the process

Illustration of the spell/grammar check process

Call to the spelling engine

When a user clicks on the button of the editor, the application gets connected with Word and the following code is executed:

function TGtroWordSpeller.CheckSpelling(Text: TDBRichEdit; LanguageID: TOleEnum): Boolean;
var
  Cursor: TCursor;
begin
  Result:= False;
  Cursor:= Screen.Cursor;
  Screen.Cursor:= crHourGlass;
  FLanguageID:= LanguageID;
  FText:= Text;
  ToClipBoard;
  FText.DataSource.DataSet.Edit; // Puts FText in "Edit" mode
  try
    Connect;
    CreateWordDoc; // Create new empty document and configure
    // Call Word's spell/grammar check dialog
    if FWordApp.Dialogs.Item(wdDialogToolsSpellingAndGrammar).Show(EmptyParam) <> 0 // Spell/grammar dialog is called
    then
      begin
        FWordDoc.Content.Copy; // Copy corrected text to clipboard
        FText.PasteFromClipboard; // Paste back to the commponent
        Result:= True;
      end;
  finally
    Disconnect;
    Screen.Cursor:= Cursor;
  end;
end;

With the call to the Word spell and grammar dialog, Word takes full control of the process and the user cannot intervene before the verification is terminated unless the "Cancel" button is pressed.

Code of the component

The code of the component is listed hereunder:

unit gtrowordspeller;
{*******************************************************************************
* Unit gtrowordspeller.pas containing the TGtroWordSpeller class - 10 June 2103*
* developed by Georges Trottier, www.gtro.com                                  *
*******************************************************************************}

interface

uses Windows, SysUtils, Classes, ActnList, Forms, StdCtrls, Dialogs, StdActns, ActnRes, Messages, ComObj,
     ActiveX, Variants, Word2000, DBCtrls, OleServer, Comctrls, Controls, ClipBrd;

const
  MSWordWndClass = 'OpusApp';
  MSDialogWndClass2000 = 'bosa_sdm_Microsoft Word 9.0';
  MSDialogWndClass97 = 'bosa_sdm_Microsoft Word 8.0';
  DEBUG = False;

type TGtroWordSpeller = class (TComponent)
private
  FConnected: Boolean;
  FText: TDBRichEdit;
  FWordApp: TWordApplication;
  FWordDoc: _Document;
  FWordVersion, FWordDialogClass: string;
  Handle: HWND;
  FLanguageID: TOleEnum;
  procedure Connect;
  procedure Disconnect;
  procedure CreateWordDoc;
protected
public
  constructor Create(AOwner: TComponent); override;
  destructor Destroy; override;
  function CheckSpelling(Text: TDBRichEdit; LanguageID: TOleEnum): Boolean;
  function WordInstalled: Boolean;
end;

implementation

{ TGtroWordSpeller }

function TGtroWordSpeller.CheckSpelling(Text: TDBRichEdit; LanguageID: TOleEnum): Boolean;
var
  Cursor: TCursor;
begin
  Result:= False;
  Cursor:= Screen.Cursor;
  Screen.Cursor:= crHourGlass;
  FLanguageID:= LanguageID;
  FText:= Text;
  FText.DataSource.DataSet.Edit; // Puts FText in "Edit" mode
  FText.SelectAll; // select all the content of Target
  FText.CopyToClipboard; // Copy content to clipboard
  try
    Connect;
    CreateWordDoc; // Create new empty document and configure
    FWordDoc.Content.Paste; // Paste content of clipboard in Word document
    // Call Word's spell/grammar check dialog
    if FWordApp.Dialogs.Item(wdDialogToolsSpellingAndGrammar).Show(EmptyParam) <> 0 then
    begin
      FWordDoc.Content.Copy; // Copy corrected text to clipboard
      FText.PasteFromClipboard; // Paste back to Target the traditional way
      // Garbage collection - added after the "insert lines" bug was investigated
      FText.SelectAll; // select all
      FText.SelStart:= FText.SelLength - 3; // start new selection
      FText.SelLength:= 100; // extend it to the end
      FText.SelAttributes.Style:= []; // remove all formatting
      FText.SelText:= ''; // replace the selection by ''
      Result:= True;
    end;
  finally
    Disconnect;
    Screen.Cursor:= Cursor;
  end;
end;

procedure TGtroWordSpeller.Connect;
begin
  if FConnected then Exit; // don't create two instances
  FWordApp := TWordApplication.Create(Application); // create Word application
  FWordApp.ConnectKind := ckNewInstance; // define kind of connexion
  FWordApp.Connect; // connect to Word
  FWordApp.WindowState := $00000002; // minimise
  FWordApp.Visible := False; // and default to NOT Visible
  FWordApp.ScreenUpdating := False; // speed up winword's processing
  FWordVersion := FWordApp.Version;
  if FWordVersion[1] = '9' then
    FWordDialogClass := MSDialogWndClass2000
  else
    FWordDialogClass := MSDialogWndClass97;
  FConnected:= True;
end;

constructor TGtroWordSpeller.Create(AOwner: TComponent);
begin
  inherited Create(AOwner);
  FConnected:= False;
end;

destructor TGtroWordSpeller.Destroy;
begin
  FConnected:= False;
end;

procedure TGtroWordSpeller.Disconnect;
var
  SaveChanges: OleVariant;
begin
  SaveChanges:= wdDoNotSaveChanges;
  FWordApp.Quit(SaveChanges); // close Word document without saving
  FWordApp.Free; // Close Word application
end;

function TGtroWordSpeller.WordInstalled: Boolean;
begin
  Result:= True;
  try
    FWordApp := TWordApplication.Create(Application); // create Word application
  except
    Result:= False;
    ShowMessage('Erreur!');
    SysUtils.Abort;
  end;
end;

procedure TGtroWordSpeller.CreateWordDoc;
var
  s: string;
begin
  FWordDoc:= FWordApp.Documents.Add(EmptyParam, EmptyParam, EmptyParam, EmptyParam);
  s := FWordDoc.Name + ' - ' + FWordApp.Name;
  Handle := FindWindow(MSWordWndClass, PChar(s)); // winword
  SetWindowPos(Handle, HWND_TOPMOST, 0, 0, 0, 0, SWP_NOACTIVATE + SWP_HIDEWINDOW); // dialog always on top
  FWordDoc.Content.LanguageID := FLanguageID; // select french (Canada)
end;

end.

It is a component called TGTroWordSpeller that can be used dynamically in this way (FSpeller: TGtroWordSpeller):

FSpeller:= TGtroWordSpeller.Create(nil); 
FWordInstalled:= FSpeller.WordInstalled; // FWordInstalled: Boolean;
FSpeller.Free;

in order to test if MS Word is installed on the computer. The function returns "true" if Word is installed, "no" otherwise. It allows the application to disable all the buttons and menu items that relate to the spell/grammar checker.

Given that Word is installed, the spell/grammar checker is started by inserting this code in the aaa -button click event handler:

FSpeller:= TGtroWordSpeller.Create(nil);
FSpeller.CheckSpelling(DBRichEdit, SpellLanguage); 
FSpeller.Free;

where DBRichEdit is the name of the TDBRichEdit component the text of which we want to spell and grammar check. SpellLanguage is the code of the spell language to be used by Word (french and english are implemented). The text of the source component is then selected, transferred to Word through the clipboard, spell/grammar check by word (a dialog appears for each error) and returned to the source component through the clipboard.

Illustration of the problem

Garbage collection

The code performs the spell and grammar check correctly but something bothered me when I noticed that the process appended empty new lines to the text which was recuperated from Word. At first, I tought that it was a bug but, from interactions on the Web, I soon found out that it was the normal behaviour of the component.

The problem is illustrated in the figure next to this paragraph. In this test program, I use two RichEdit components. In RichEdit1, I generate three formatted lines programmatically, select all the content, copy it to the clipboard and paste it to RichEdit2. From a normal display, only the three lines would show but, the problem becomes evident when I select the content of both components (as show in the Figre): there are trailing empty new lines at the end of the text in both components - one in RichEdit1, two on RichEdit2.

Something similar occurred in my editor after every spell/grammar check. From inspection, the new lines seemed to be structured like this: a paragraph mark/no text + a paragraph mark + some formatting codes. It could be normal behaviour but, in the context of my application, it was garbage.

Since I could do nothing about the display of this garbage in the TDBRichEdit component, I decided to perform garbage collection as follows:

      // Garbage collection
      FText.SelectAll; // select all
      FText.SelStart:= FText.SelLength - 3; // start new selection
      FText.SelLength:= 100; // extend it to the end
      FText.SelAttributes.Style:= []; // remove all formatting
      FText.SelText:= ''; // replace the selection by ''

This code is executed once the text has been retrieved from Word through the clipboard. The text is selected and the length of the selection is used to start a new selection that extends to the end of the text. The formatting codes are removed from the selection and the new selection is replaced by an empty string. The value of 3 in the assignment to SelStart is the result of trial and errors.

With this code, the process puts the cursor at the end of the last line of the editor.

Conclusion

Now, with this code, I can verify the grammar and the spelling of the RTF-formatted text that I write in my application. The component is configured specifically for the TDBRichEdit component that I use but it can be used for TRichEdit components if the type of the source is changed in the code.

This article results from merging the ideas and codes presented in Utilisation du correcteur orthographique de Word and Using MS Word as a Spelling and Grammar Checker for Delphi. The first article uses early binding and calls the Word spell and grammar check dialog whereas the second shows how to transfer RTF-formatted text from a TRichEdit component to the Word spell and grammar checker. Indebtedness is hereby acknowledged.

Warning!
This code was developed for the pleasure of it. Anyone who decides to use it does so at its own risk and agrees not to hold the author responsible for its failure.


Questions or comments?
E-Mail
Last modified: September 3rd 2014 11:54:40. []