This article introduces jQuery Chili, a syntax highlighter plug-in for jQuery. It consists of an analysis of its coding which shows all the steps that the plug-in takes to transform plain code into highlighted code for programming languages such as Delphi, HTML, Javascript, CSS and PHP.
For a Delphi programmer, Javascript is a new and intriguing paradigm. Up to now, this web site had been a show case to publish Delphi articles with snippets syntax-highlighted by means of a Delphi code parser. Up to the beginning of February of this year, the site was relatively static except for the the PHP Layers Menus System which is PHP-generated and Javascript-driven. More needed to be done.
I discovered Javascript and DHTML while developing a way to include tooltips on this site. First, these tooltips were css-based tooltips that were occasionally clipped. DHTML was a solution to this problem and led me to discover the jQuery library. When I thought of an alternative to the Delphi code parser to highlight the syntax of the code snippets presented on this site, I discovered the jQuery.chili plug-in.
This jQuery plug-in is authored by Andrea Ercolino from Barcelona, Spain. When I first used it, I liked what I saw. It has a very clean footprint: you just put the code you want highlighted within a <code> tag with an appropriate class and Chili will highlight all the symbols of the language the code is written in. The process is driven by a recipe file that provides a set of matches and styles for each supported languages: Javascript, PHP, HTML, CSS and Delphi. Additionally, it's fairly easy to add a new recipe for any other language by means of a set of regular expressions.
How it works
I have used jQuery.chili with complete satisfaction for a few days and, because I like to adapt and personalize the display of the code that I borrow (I did it with the layers menus), I needed to understand how this plug-in works. I Made a short investigation of the code and developed the flow diagram that follows.
Linking to the jQuery function property
When writing a jQuery plug-in, it is customary to start by adding a new function property to the jQuery.fn object where the name of the property is the name of the plug-in, here, it is "chili".
Program starts
As shown, the programs checks if there are <code> elements in the page that have the proper class (js for Javascript, php for PHP, html for HTML, css for CSS and delphi for Delphi). If there are, for each such elements, the program call askDish( this ) where this is the current object. This function discriminates between a dynamic and a static setup. With the jQuery.chili files on the site, I used the static setup.
cook()
The cook(ingredients, recipe, blockName) is then called. ingredients is the text of the code snippet surrounded by the <code> tags, recipe is a language recipe stored in the recipes.js file and blockName is an identifier for the element being worked on.
checkSpices()
After some cleanup, this function calls checkSpices( recipe) that places the style sheets of the language in the <head> section of the page.
prepareBlock()
prepareBlock(recipe, blockName) is then called. It fills steps with the details of the recipe and associates it with the blockName. In fact, steps is a literal object containing several sub-element:
- recipe - the programming language;
- blockName - usually _main;
- stepName - the type of match from the recipe (ml_comment, sl_comment, ...);
- exp - the regular expression associated with the value of stepName fetched from the recipe. It is developed as "(" + exp + ")";
- length - is used to count the number of matches; and
- replacement - the replacement of the matched symbol
knowHow()
knowHow( recipe ) takes the recipe and puts together all the regular expressions associated with each stepName of steps, ORs them, prepends a prolog and appends an epilog. This composite regular expression is then used as a first argument to the Javascript replace() function that is dealt with in the next section.
Here follows the regular expression that knowHow() produces for the Javascript language.
/((?:\s|\S)*?)(?:(\/\*[^*]*\*+(?:[^\/][^*]*\*+)*\/)|(\/\/.*)|((?:\'[^\'\\\n]*(?:\\.[^\'\\\n]*)*\')|(?:\"[^\"\\\n]*(?:\\.[^\"\\\n]*)*\"))|(\b[+-]?(?:\d*\.?\d+|\d+\.?\d*)(?:[eE][+-]?\d+)?\b)|((?:\w+\s*)\/[^\/\\\n]*(?:\\.[^\/\\\n]*)*\/[gim]*(?:\s*\w+))|(\/[^\/\\\n]*(?:\\.[^\/\\\n]*)*\/[gim]*)|([\])|(\b(with|while|var|try|throw|switch|return|if|for|finally|else|do|default|continue|const|catch|case|break)\b)|(\b(URIError|TypeError|SyntaxError|ReferenceError|RangeError|EvalError|Error)\b)|(\b(String|RegExp|Object|Number|Math|Function|Date|Boolean|Array)\b)|(\b(undefined|arguments|NaN|Infinity)\b)|(\b(parseInt|parseFloat|isNaN|isFinite|eval|encodeURIComponent|encodeURI|decodeURIComponent|decodeURI)\b)|(\b(void|typeof|this|new|instanceof|in|function|delete)\b)|(\b(sun|netscape|java|Packages|JavaPackage|JavaObject|JavaClass|JavaArray|JSObject|JSException)\b))|((?:\s|\S)+)/g
There is more on this at Annex B.
A mysterious piece of code
At this point, we arrive where the real action is: highlighting the syntactic symbols the language, at least those activated by the developer of the plus-in:
var perfect = ingredients.replace( kh, function() {
return chef.apply( { steps: steps }, arguments ); // repeated 38 times
} );
To me, it is a very mysterious and involved statement. I had to learn more on Javascript functions and how they are called before .
Overview of Javascript functions
In Javascript, functions are extremely flexible. Any function can receive any number of arguments irrespective of the number of parameters in its signature. Arguments can be any type. It is the responsibility of the function to detect and react to these different arguments. Troubling!
A function is an object. It can contain members just as other objects. This allows a function to contain its own data tables. It also allows an object to act as a class, containing a constructor and a set of related methods. A function can be a member of an object. When a function is a member of an object, it is called a method. JavaScript does not have a void type, so every function must return a value. The default value is undefined, except for constructors, where the default return value is this.
In addition, every function comes with a parameter with the keyword this and include an arguments array-like object. In fact, arguments passed in during function invocation are stored in the arguments object which can be used to retrieve and use all the parameters in the order they are passed in.
There are four ways to call a function. There's the function form, the method form, the constructor form, and the apply form: They differ in what they do with this. In the first three cases, this will be the global object, the object that owns the method and the new object respectively. The fourth case uses the apply(thisArg, arguments) function. Its first argument overrides the value of this to the object defined in thisArg. In other words, it allows the invocation of a function as if it were a method of some other object. For apply(), the second parameter is an array (or array-like) of the parameters the function will receive in arguments. apply() is letting us set up the conditions around which a function is invoked..
The test bench
The findings stated above were obtained using a test bench and Firebug. I selected a short snippet at random: the Javascript code of lines 73 to 90 of the jQuery-2.2.js file. Here it is
function prepareStep( recipe, blockName, stepName ) {
var step = recipe[ blockName ][ stepName ];
var exp = ( typeof step._match == "string" ) ? step._match : step._match.source;
return {
recipe: recipe
, blockName: blockName
, stepName: stepName
, exp: "(" + exp + ")"
, length: 1 // add 1 to account for the newly added parentheses
+ (exp // count number of submatches in here
.replace( /\\./g, "%" ) // disable any escaped character
.replace( /\[.*?\]/g, "%" ) // disable any character class
.match( /\((?!\?)/g ) // match any open parenthesis, not followed by a ?
|| [] // make sure it is an empty array if there are no matches
).length // get the number of matches
, replacement: step._replace ? step._replace : book.defaultReplacement
};
} // prepareStep
Something went awry when I looked at it on my browser. On lines 82 and 87, the plug-in failed to match the single-line comments! Why? I didn't know! But in order to understand, I had to delve into the code of the plugin. I had to figure out what really happens when this snippet is executed:
var perfect = ingredients.replace( kh, function() {
return chef.apply( { steps: steps }, arguments ); // repeated 38 times
} );
Observations
I have observed the execution of this code with Firebug and my observations are summarized below:
- ingredients has 26 sub-items;
- steps is a literal object based on recipe that contains 14 regex and 14 replacements for the symbols matched by the regex;
- return chef.apply( {steps: steps}, arguments); is called 38 times with 38 arguments of length 26 (it is always 26 irespective of the number of lines in the text of the snippet but varies with the selected language).
- these 38 arguments contain segments of the text of the snippet that tend to terminate with a symbol matched by a regex as arguments[0]. The length of each segment is variable and depends on the location of the match;
- arguments[1] contains the part of the segment (prolog) that preceedes the matched symbol whereas arguments[arguments.length] contains the value of ingredients. arguments[arguments.length - 1] always contains a number that is zero on the first pass of the function and increases monotonically on the following passes.
- Segments of steps #24 and # 35 are different and each of them contain one symbol (single-line comment) missed by the regex.
- A scan of arguments is made to discover which argument is not an empty string. The scan starts with arguments[2] and stops when one of the arguments[i] is not empty. If it stops at arguments[2, a single-line comment has been matched, if it stops at arguments[8], a curly brace has been found [see Annex C for the whole list].
Analysis
I will try to explain what happens when this snippet is executed.
var perfect = ingredients.replace( kh, function() {
return chef.apply( { steps: steps }, arguments );
} );
In the fisrt statement, prefect is obtained from the step by step replacement of the values matched by the composite regular expression hk (result of knowHow()) applied to the value obtained from chef.apply({ steps: steps}, arguments). Here, chef() is invoked as if it was a method of the steps literal object implying that chef() has access to all the properties of arguments.
Each time ingredients.replace(hk, functions()) is executed, it applies the hk regex to the result of chef.apply({ steps: steps}, ingredients). The mysterious part of it is that arguments is already structured as follows when chef.apply() is executed for the first time:
- the segment that terminates with the match is stored in arguments[0];
- the prolog (text of the segment that preceedes the match) is stored in arguments[1];
- depending on the type of match, the arguments[i] that corresponds to the type of match (see Annex C) is filled with the matched symbol if the match was a non-word boundaries match. Otherwise,two consecutive arguments[i] re filled with the matched symbol;
- the epilog (the text that follows the match) is stored in arguments[arguments.length-3];
- a number is stored in arguments[arguments.length-2]; and
- the text of the snippet as an object amputated of its first line is storedin arguments[arguments.length-1].
Since chef() is invoked as a method of steps,
- it sets the type of match from the ordinal of arguments[i] after scanning arguments until it finds a non-empty string;
- it sets the matched symbol from the value of arguments[i];and
- it determine its replacement base on the type of match.
This operation is repeated on the successive segments until the end of the snippet is reached. At the end, the result of this algorithm is stored in the variable perfect.
Flawed lines
On faulty lines 82 and 87, arguments[6] was filled with "exp // count" and "length // get" respectively where arguments[6] stands for a reg_not match meaning that the plugin had avoided mismatching a regular expression. Knowing this, I commented out the section of the recipe that defines the reg_not match, and the snippet displayed as it should have.
Conclusion
During this review, I made an in-depth investigation of the code of jquery.chili-2.2.js, the most recent version of jQuery.chili. It is a cleverly designed code that tries to code all the highlighting requirements of the languages that it supports, even covering very exceptional circumstances.
One of these exceptional circumstances is the fact that if regular expressions are matched and highlighted when they are found, an expression like " a / b / c " could be misinterpreted as a regular expression, hence the addition of the reg_not recipe in Javascript. This addition is very reasonable for a piece of code that becomes a jQuery plugin but correcting the problem is way beyond my expertise of Javascript.