Skip to content

Rewrite JavaScript highlighting, and add/rewrite TypeScript and JSX highlighting

Jeffery To requested to merge jefferyto/gtksourceview:js-rewrite into master

This is:

  • A rewrite of the JavaScript language definition (using contexts)
  • A rewrite of the TypeScript language definition (using contexts, built on top of the rewritten JS definition)
  • An addition of JSX and TypeScript JSX language definitions (again, using contexts and built on top of the JavaScript and TypeScript definitions)
  • A fix-up of HTML, JSON and Objective-J definitions to work with the rewritten JS definition
    • The Objective-J definition is also a semi-rewrite, based on examining the Objective-J compiler and actual Objective-J code

When I started working on TypeScript and JSX highlighting (last year, after completing CSS/SCSS/Less), it became clear to me that the existing JavaScript definition, which relies on regular expressions and keyword matching only, would not be a sufficient base to build on top of.

Example TypeScript JSX code:

var elementNotType = true ?
        <tag                        // JSX element
            attr="value"></tag>
        : boolean                   // ternary operator else clause
    , typeNotElement
        : boolean                   // type annotation
        = num
        <tag                        // less than operator
            && attr > value;

Using regular expressions alone, it would not be possible to distinguish (with a high degree of certainty) whether <tag is the start of a JSX element or a less-than comparison, or whether : value is a TypeScript type annotation or a ternary operator else clause.

So I rewrote JavaScript highlighting using contexts, so that it "understands" the grammar of JS (expressions, statements, etc). Then I added TypeScript and JSX (and TypeScript JSX) definitions that extend that grammar.

A few screenshots:

javascript

using-keywords-as-identifiers

syntax-madness

typescript


I've had this work on GitHub for a few weeks to gather bug reports and feedback. (The git history is a bit of a lie; this is the result of many months of work.)

I'm actually a bit conflicted as to whether merging this here is a good idea or not.

Obviously I think these definitions are superior to what is currently in the repo. For example, issues like #81 (closed) and #82 (closed) exist because the current highlighting depends on regular expressions only; the regexes can be patched but there will always be corner cases. (Having context information makes it much easier to tell whether a / is the start of a regular expression literal or is the division operator.

On the other hand, these definitions are complex, spread across many files. Maintaining these will require deep knowledge of JavaScript (or the patience to read Appendix A of the ECMAScript spec). Updates will need to be made as new constructs are added to the language, even if these constructs do not involve highlighting, to ensure files are "parsed" correctly.

I've tried to keep these as simple to understand as possible but there is an inherent complexity in (pseudo-)parsing a language that cannot be simplified. I'm committed to fixing and updating these definitions as necessary but obviously I can't guarantee if I can do so indefinitely or always in a timely manner.

I'd be happy if this is merged but I would also understand if it is not.

(If there was a way to keep the existing definitions as "simple" highlighting and add this as "advanced" highlighting, that could work too, but I guess this would require more code changes.)

(Also my apologies to @nunocastromartins, I know this tramples on your recent work.)

Merge request reports