Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagged templates ES3 & 5 #1589

Merged
merged 30 commits into from
Feb 26, 2015
Merged

Conversation

ivogabe
Copy link
Contributor

@ivogabe ivogabe commented Jan 4, 2015

I've implemented basic tagged templates emit when targeting ES3 or ES5.

foo `A ${1} B ${2} C`;

is compiled to

foo(["A ", " B ", " C"], 1, 2);

See also #1590. I haven't updated the tests yet.

Example:
foo `bar` should compile to foo([“bar”])
@Arnavion
Copy link
Contributor

Arnavion commented Jan 5, 2015

Perhaps a bug with template strings even before your change, but: Ignore this. Lack of morning coffee.


function foo(a: string[], b: number, c: number) { }

foo`A${ 1 }B${ 2, 3 }C`;

Now, with your change, this emits

foo(["A", "B", "C"], 1, 2, 3);

whereas it should emit 1, (2, 3)

Regular untagged template strings have a function that checks whether each expression inside needs to be parenthesized or not - see how needsParens is computed inside emitTemplateExpression. You need to do something similar.

Example:
foo`A${ 1 }B${ 2, 3 }C`;
@ivogabe
Copy link
Contributor Author

ivogabe commented Jan 5, 2015

Thanks! Fixed

@DanielRosenwasser
Copy link
Member

You should have a function like emitDownlevelTaggedTemplate which actually shouldn't call emitTemplateExpression IMO.

So I'd rather have the raw property on - if we're doing it, we should do it more correctly, and t's only a matter of time until someone actually wants that. I think that if we're supporting downlevel, let's just do it here. However, let's wait for input from @mhegazy and @jonathandturner.

@Arnavion
Copy link
Contributor

Arnavion commented Jan 5, 2015

And while it's true that the comma operator has the lowest precedence so it's enough to directly check (<BinaryExpression> templateSpan.expression).operator === SyntaxKind.CommaToken, it might be cleaner to have a comparePrecedenceToComma function just like the one in emitTemplateExpression.

@@ -2048,7 +2048,12 @@ module ts {
}

function getTemplateLiteralAsStringLiteral(node: LiteralExpression): string {
return '"' + escapeString(node.text) + '"';
if (node.parent.kind === SyntaxKind.TaggedTemplateExpression) {
// Emit tagged template as foo(["string"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 spaces instead of tabs.

@ivogabe
Copy link
Contributor Author

ivogabe commented Jan 5, 2015

@DanielRosenwasser How would the raw property be emitted? Something like this:

var __template; // Create variable at top of the file

foo((__template = ['A', 'B'], __template.raw = ['A', 'B'], __template), 1);
@Arnavion
Copy link
Contributor

Arnavion commented Jan 6, 2015

Using a single __template at the top of the file gets nasty with nested template strings.

var x = foo`12\n3${ bar`45\n6${ 789 }` }`;
var x = foo((__template = ["12\n3", ""], __template.raw = ["12\\n3", ""], __template), bar((__template = ["45\n6", ""], __template.raw = ["45\\n6", ""], __template), 789));

It does seem to be well-defined per the ES spec. __template is assigned 123... first and then overwritten with 456... before calling bar(), but foo is still called with 123... because that was what __template evaluated to at the time.

Maybe create a new temporary for each tagged template string, just like destructuring creates a new temporary for each destructuring assignment.

var a = ["45\n6", ""]; a.raw = ["45\\n6", ""];
var b = ["12\n3", ""]; b.raw = ["12\\n3", ""];
var x = foo(b, bar(a, 789));

Edit: Fixed emit example.

@Arnavion
Copy link
Contributor

Arnavion commented Jan 6, 2015

For comparison, 6to5 uses a helper function:

var _taggedTemplateLiteral = function (strings, raw) {
  return Object.freeze(Object.defineProperties(strings, {
    raw: {
      value: Object.freeze(raw)
    }
  }));
};

var x = foo(_taggedTemplateLiteral(["12\n3", ""], ["12\\n3", ""]), bar(_taggedTemplateLiteral(["45\n6", ""], ["45\\n6", ""]), 789));

(We could simplify it to { strings.raw = raw; return strings; } if we don't care about freezing the two arrays.)

@ivogabe
Copy link
Contributor Author

ivogabe commented Jan 6, 2015

I wouldn't prefer a function for this, it doesn't really reduce the complexity very much since the code with the variable isn't very complicated. I don't see an advantage (yet) of using multiple vars. I could see the advantage of having one variable per function (instead of per file), since uglifyjs for example can mangle those names very well.

@Arnavion
Copy link
Contributor

Arnavion commented Jan 6, 2015

The advantage of multiple vars or the utility function is that it avoids using the same variable for multiple invocations, like in the nested template string example I gave.

Granted, I haven't been able to think of a way to break it, since ES guarantees that

  • parameters are evaluated left-to-right, and
  • function bodies bind their arguments at the time they're called

Since the strings and strings.raw expressions can only contain strings, not code, they can't have any other side effects on the shared variable apart from the assignment. Any code which does have side effects (such as a template expression containing a tagged template string) will appear after them in the parameter list and so won't affect them.

It just looks ugly and unreadable, and you have to resort to language-lawyering like above to demonstrate its correctness. It's also arguably not the kind of JS a human would write, which is one of TS's principles.

So my personal order of preference is:

  1. One var per tagged template string. Used by Traceur.
  2. One function call per tagged template string. Used by 6to5 (single function, reused for all strings) and jstransform (individual IIFE per string).
  3. A single variable shared by multiple template strings, either file-level or function-level as you later suggested.
@DanielRosenwasser
Copy link
Member

It would appear that you can get away with one variable if you concede some other functionality.

Here are the concerns:

  • Function level variables are "annoying" to emit - you need to keep track of them with a flag - probably in NodeCheckFlags. The problem is where would you keep track of how many tagged template invocations you use? Given this, it's probably easier to emit a single variable somewhere or use a helper function.
  • Template string arrays are supposed to be _immutable_, so you should get a runtime error if you modify them. We can either...
    • Freeze them so you get a runtime error - however...
      • Freezing is only available in ES5.
    • Serve up a unique value for each invocation so that the effects are not visible outside the call.
  • Referential equality breaks if we do not properly intern the array, whereas the spec mandates this. I believe each template array should be the same so long as the constituent raw portions are equal.
  • In a hot loop, you don't want to keep allocating the template arrays, which is where interning kicks in.

We could support interning, but because we don't stop you from modifying the array, it's actually at odds with immutability - unless we freeze the arrays, in which case we can't support ES3.

My personal intuition is that if you want performance, there's an easy enough workaround - don't use a tagged template. Convert it to a function call.

Again, I'd like to get feedback from @mhegazy but we can certainly continue discussing this.

@Arnavion
Copy link
Contributor

Arnavion commented Jan 7, 2015

@DanielRosenwasser

Function level variables are "annoying" to emit - you need to keep track of them with a flag - probably in NodeCheckFlags. The problem is where would you keep track of how many tagged template invocations you use?

Destructuring assignment already emits a local variable per assignment. Looking at the code, it just tries increasing variable names until it finds one that isn't used in the current scope and all parent scopes. Temporaries are kept track of in a tempVariables array that is updated for each scope as emit traverses the AST.

Template string arrays are supposed to be immutable.

You could use Object.freeze for ES5 and not for ES3, or you could just not use it altogether as a compromise. If you go with the utility function route you can let the user override it with their own implementation of the function (ala __extends) that does use Object.freeze() if they so desire. That said...

Referential equality breaks if we do not properly intern the array, whereas the spec mandates this. I believe each template array should be the same so long as the constituent raw portions are equal.

This is interesting. I didn't know about this, but you're right. (Template strings are registered in a registry, and if the strings and raw strings of a particular template are identical to one already in the registry, the registered string's template object is used for the first argument to the tag function. See 12.3.7 and 12.2.8.2.2)

FWIW it does not seem to be followed by FF at the moment.

function foo(arr) { return arr; }
foo`123` === foo`123`; // false

(And also not by any other ES6 shims.) Perhaps it would be fine for TS's emit to also not bother.

Or, as you said, you could use Object.freeze() to atleast maintain immutability, and not support this downlevel emit for ES3.

@DanielRosenwasser
Copy link
Member

Spoke with @mhegazy about this. We'd like to get this in for 1.5 if possible.

The approach we're favoring is actually one similar to destructuring - one variable for every invocation. This is to favor a reader's perspective when trying to reason about the code, so it may be more ideal. If you need any pointers in pursuing this direction, just let us know.

@ivogabe
Copy link
Contributor Author

ivogabe commented Jan 9, 2015

Ok, will try to do that. Just to be sure, where will the variable statement be emitted? I think we want before the statement which contains a tagged template. Thus if the tagged template is a part of another tagged template and another statement, it will be written above that statement.

The only problem or inconsistency I see with this approach is the case when a tagged template is used within the condition of a for (or while) loop. The idea is that a template array won't be reused but will be reassigned every time it's needed. But in a for loop, how will that work? Since if you'd emit it like I described it above, it will emit:

var a = ["Lorem", "Ipsum"];
a.raw = ["Lorem", "Ipsum"];
for (var i = 0; foo(a, i); i++) {}
// Ugly alternative:
var a;
for (var i = 0; a = ["Lorem", "Ipsum"],
a.raw = ["Lorem", "Ipsum"], foo(a, i); i++) {}

Is this an edge-case that doesn't matter?

@Arnavion
Copy link
Contributor

Arnavion commented Jan 9, 2015

Just do the same as destructuring:

var x, y;
for (; { x, y } = { x: 5, y: 6 };) {
}
var x, y;
for (; (_a = { x: 5, y: 6 }, x = _a.x, y = _a.y, _a);) {
}
var _a;
@DanielRosenwasser
Copy link
Member

I think we want before the statement which contains a tagged template. Thus if the tagged template is a part of another tagged template and another statement, it will be written above that statement.

I think that would actually be ideal and appropriate.

Just do the same as destructuring

Exactly

@ivogabe
Copy link
Contributor Author

ivogabe commented Jan 9, 2015

Ok, I've implemented the variable approach, except for the loops. Current codegen:

// Typescript
foo `${foo `A${1}B`}`;

// Javascript
var _a = ["", ""];
_a.raw = ["", ""];
var _b = ["A", "B"];
_b.raw = ["A", "B"];
foo(_a, foo(_b, 1));

Have I implemented the functionality in the right functions and files?

Also when you have a comment before a statement with a tagged template, the temporary variable will be emitted before the comment:

// Typescript

// Comment 
foo `${foo `A${1}B`}`;

// Javascript

var _a = ["", ""];
_a.raw = ["", ""];
var _b = ["A", "B"];
_b.raw = ["A", "B"];
// Comment
foo(_a, foo(_b, 1));

Can this be solved easily?

@@ -371,9 +372,25 @@ module ts {
function getDestructuringParameterName(node: Declaration) {
return "__" + indexOf((<SignatureDeclaration>node.parent).parameters, node);
}

function bindTaggedTemplateExpression(node: TaggedTemplateExpression) {
if (file.languageVersion >= ScriptTarget.ES6) return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newline/curlies around the if body.

@Arnavion
Copy link
Contributor

For lambda expressions returning the tagged template string, the temporaries are generated outside the lambda instead of inside. Regular functions are fine:

var bar = () => foo`A${ 1 }B`;
(() => foo`C${ 2 }D`)();
var baz = function () { return foo`E${ 3 }F`; };
(function () { return foo`G${ 4 }H`; })();
var _a = ["A", "B"];
_a.raw = ["A", "B"];
var bar = function () { return foo(_a, 1); };
var _b = ["C", "D"];
_b.raw = ["C", "D"];
(function () { return foo(_b, 2); })();
var baz = function () {
    var _a = ["E", "F"];
    _a.raw = ["E", "F"];
    return foo(_a, 3);
};
(function () {
    var _a = ["G", "H"];
    _a.raw = ["G", "H"];
    return foo(_a, 4);
})();

_a and _b for the first two could be emitted inside the anonymous function instead, similar to the last two.

@Arnavion
Copy link
Contributor

This crashes the compiler:

function foo(...args) { }

class A {
    x = foo`A${ 1 }B`;
}
C:\Stuff\Sources\TypeScript\built\local\tsc.js:6576
            statement.downlevelTaggedTemplates.push(node);
                     ^
TypeError: Cannot read property 'downlevelTaggedTemplates' of undefined
    at bindTaggedTemplateExpression (C:\Stuff\Sources\TypeScript\built\local\tsc.js:6576:22)
    at bind (C:\Stuff\Sources\TypeScript\built\local\tsc.js:6691:21)
    at child (C:\Stuff\Sources\TypeScript\built\local\tsc.js:3185:24)
    at Object.forEachChild (C:\Stuff\Sources\TypeScript\built\local\tsc.js:3217:179)
    at bindChildren (C:\Stuff\Sources\TypeScript\built\local\tsc.js:6462:16)
    at bindDeclaration (C:\Stuff\Sources\TypeScript\built\local\tsc.js:6506:13)
    at bind (C:\Stuff\Sources\TypeScript\built\local\tsc.js:6608:21)
    at children (C:\Stuff\Sources\TypeScript\built\local\tsc.js:3194:34)
    at Object.forEachChild (C:\Stuff\Sources\TypeScript\built\local\tsc.js:3328:139)
    at bindChildren (C:\Stuff\Sources\TypeScript\built\local\tsc.js:6462:16)

(You can use these as tests even if you change the code per DanielRosenwasser's comments.)

Edit: Same with

function bar(x = foo`A${ 1 }B`) { }

and

class C {
    bar(x = foo`A${ 1 }B`) { }
}

but not with

var y = {
    bar: (x = foo`A${ 1 }B`) => undefined
};

var z = {
    bar(x = foo`A${ 1 }B`) { return undefined; }
};
@ivogabe
Copy link
Contributor Author

ivogabe commented Jan 12, 2015

@DanielRosenwasser Thanks for the feedback. I think the difficulty is that for destructuring, the variable is emitted in the current node (if it is a variable declaration) or before it (for example in a for loop), but for the tagged templates we need it before the node. For destructuring the variable is added to an array and when it's possible to emit these variable declarations it will emit them. The variable declarations are thus emitted after the destructuring node, but that doesn't matter since the variable will be hoisted.

For these templates we want the variables before the tagged template, since only the declaration, not the initialization will be hoisted. An alternative would be to emit

for (var i = 0; foo((_a = ["Lorem", "Ipsum"], _a.raw = ["Lorem", "Ipsum"], _a), i); i++) {}
var _a;

But we decided we didn't want that. So I think it's necessary to determine where those temporary variables should be emitted before the emit actually starts. Or am I missing something?

@Arnavion It searches for a statement that is the parent of the tagged template, but as you pointed out that isn't working always, since not all tagged templates have to have a statement as parent, or it might point to the wrong parent.

@DanielRosenwasser
Copy link
Member

@ivogabe assignment expressions permit binding patterns on the left-hand side as well, so destructurings are actually also an expression-level construct, and suffer from the same issue.

Currently the emit for those temps occurs at the end of the enclosing scope - this is permissible given JavaScript's hoisting semantics.

See emitTempDeclarations for where this occurs, but all you'll have to concern yourself with is ensureIdentifier and recordTempDeclaration in the emitter. Just extract those two functions out of emitDestructuring so that you can use them and the rest should follow.

@CyrusNajmabadi
Copy link
Contributor

Looks great! I just had a few nits to take care of (mostly about commenting functions to make their purpose and behavior clear).


// Newline normalization
text = text.replace(/\r\n?/g, "\n");
text = escapeString(text);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't these steps for the cooked strings?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh maybe these are the steps that uncook the string

@JsonFreeman
Copy link
Contributor

👍

Conflicts:
	tests/baselines/reference/taggedTemplateStringsTypeArgumentInference.js

tests/baselines/reference/taggedTemplateStringsWithOverloadResolution3.j
s

tests/baselines/reference/taggedTemplateStringsWithTypeErrorInFunctionEx
pressionsInSubstitutionExpression.js
	tests/baselines/reference/templateStringInObjectLiteral.js
@ivogabe
Copy link
Contributor Author

ivogabe commented Feb 23, 2015

Thanks for the feedback everyone! I've added some new commits, can you re-review it?

@CyrusNajmabadi That was indeed for the raw strings. I've added a comment to explain it. Is that clear enough?

@DanielRosenwasser
Copy link
Member

Odd, the baselines seem to have changed for a few files. I swear, we're almost there!

@ivogabe
Copy link
Contributor Author

ivogabe commented Feb 24, 2015

That's strange, on my machine they pass. I've run the tests again and they don't error. Any idea how that's happening?

@DanielRosenwasser
Copy link
Member

The tests pass on my system too, so try pulling from master and pushing again. Travis might just need to be triggered.

@ivogabe
Copy link
Contributor Author

ivogabe commented Feb 25, 2015

There were some baselines that changed after merging the master, should be fixed now. Does Travis pull commits from both the master as the branch that is tested?

@DanielRosenwasser
Copy link
Member

Yeah, I believe what happens is that Travis will run tests on a given branch and then on a hypothetically merged branch. Just waiting on @mhegazy to take a look at this change.

@DanielRosenwasser
Copy link
Member

Due to tests introduced by #2143, can you pull in from master and update the baselines?

@mhegazy
Copy link
Contributor

mhegazy commented Feb 26, 2015

👍

thanks @ivogabe

@ivogabe
Copy link
Contributor Author

ivogabe commented Feb 26, 2015

I've merged the master again. It looks like taggedTemplateStringsWithUnicodeEscapes is wrong, the \u before {1f4a9} is missing?

DanielRosenwasser added a commit that referenced this pull request Feb 26, 2015
@DanielRosenwasser DanielRosenwasser merged commit a77d39b into microsoft:master Feb 26, 2015
@DanielRosenwasser
Copy link
Member

No that's fine, it's meant to be broken until #2047 is fixed. Actually, I'm more concerned that we don't preserve string escapes in the output, but I'll take care of it there where we'll have easier means.

Well I think that's it - really big thanks for this feature! Our team, and I think the community, will definitely be happy to see this, and you've done a great job!

@microsoft microsoft locked and limited conversation to collaborators Jun 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
7 participants