Skip to content

Conversation

@dpchamps
Copy link
Contributor

@dpchamps dpchamps commented Feb 24, 2022

Ref: #359

It was discovered that when compiling against locale variants (e.g. en-US, es-MX etc etc...) output is generated against expectations.

For example

const MessageFormat = require("@messageformat/core");
const compileMessageModule = require("@messageformat/core/lib/compile-module");

const messagePacks = {
  "en-US": {
    utilsDate: "{date, date, ::EEEMMMd}"
  },
  "es-MX": {
    utilsDate: "{date, date, ::EEEMMMd}"
  },
};

const mf = new MessageFormat("*");
const result = compileMessageModule(mf, messagePacks);

Where result is

const date_en_EEEMMMd_tmb11z = (function() {
  var opt = {"weekday":"short","month":"short","day":"numeric"};
  var dtf = new Intl.DateTimeFormat("en", opt);
//------------------------------------^ this is problematic
  return function(value) { return dtf.format(value); }
})();

export default {
  "en-US": {
    utilsDate: (d) => date_en_EEEMMMd_tmb11z(d.date)
  },
  "es-MX": {
    utilsDate: (d) => date_en_EEEMMMd_tmb11z(d.date)
  }
}

The expectation is instead

const date_en_EEEMMMd_tmb11z = (function() {
  var opt = {"weekday":"short","month":"short","day":"numeric"};
  var dtf = new Intl.DateTimeFormat("en", opt);
  return function(value) { return dtf.format(value); }
})();

const date_es_EEEMMMd_m7z124 = (function() {
  var opt = {"weekday":"short","month":"short","day":"numeric"};
  var dtf = new Intl.DateTimeFormat("es", opt);
  return function(value) { return dtf.format(value); }
})();

export default {
  "en-US": {
    utilsDate: (d) => date_en_EEEMMMd_tmb11z(d.date)
  },
  "es-MX": {
    utilsDate: (d) => date_es_EEEMMMd_m7z124(d.date)
  }
}

Variants are an expected part of locale strings, so it makes sense to support them inside of imports. This bug was caught in our codebase because we want to maintain parity with the icu variant spec.

The patch proposed here is (i think) pretty un-controversial: if a given plural key can be normalized, perform the lookup based on the normalized value.

for (const key of Object.keys(src)) {
const pl = (plurals && plurals[key]) || plural;
const normalizedPluralsKey = key.length > 2 ? normalize(key) : key;
const pl = (plurals && plurals[normalizedPluralsKey]) || plural;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered initially making this a double fallback:

const pl = (plurals && plurals[normalizedPluralsKey]) || (plurals && plurals[key]) || plural;

But could not determine whether this covers a desirable fallback path. Could use guidance here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if 1) normalization is to be trusted, and 2) only normalized values are to be supported, then this appears to be a safe refactor

@dpchamps
Copy link
Contributor Author

dpchamps commented Feb 24, 2022

I'm not quite sure how to address the test failures... They seem to be unrelated to my changeset and passing locally under the same node versions, is there known flakyness here?

It looks like there are some tests failing on master under some versions of node, see: 2bf89a7 for proposed fix

const result: StringStructure = {};
for (const key of Object.keys(src)) {
const pl = (plurals && plurals[key]) || plural;
const normalizedPluralsKey = key.length > 2 ? normalize(key) : key;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we just normalize 💯 % of the time? seems like this length check is already implemented there. may be nice to dedupe it

Copy link
Contributor Author

@dpchamps dpchamps Feb 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normalize will throw if given unexpected input, so no (given that there will be single length keys at some point, we need to ensure that it's a candidate for normalization). Unless we want to catch here, which seems less desirable. Or, we could duplicate the normalize function to be more permissive.

This to me seems like the simplest solution: if the key can be normalized, normalize it. Otherwise use it as-is. Suppose could move to a separate fn, maybeGetNormalizedValue, but it seems like overkill.

'Dhuʻl-Hijjah 2, 1426',
'Dhuʻl-Hijjah 2, 1426 AH'
'Dhuʻl-Hijjah 2, 1426 AH',
'2 Dhuʻl-Hijjah 1426 AH'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this looks like the right fix for this test. Would you be ok if I cherry-picked this commit separately from the rest of the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please do :)

Copy link
Member

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it may be a breaking change. Consider this case, and how it works at the moment:

// input
const  mf = new MessageFormat(['en', 'no']) // English & Norwegian
compileModule(mf, { 'no-one': '{var, plural, one{one} other{other}}' })
// output
import { en } from "@messageformat/runtime/lib/cardinals";
import { plural } from "@messageformat/runtime";
export default {
  "no-one": (d) => plural(d["var"], 0, en, { one: "one", other: "other" })
}

With normalized as well as exact keys getting tested, that message would end up using Norwegian rather than English pluralisation rules. That's a breaking change.

The current documented behaviour is that:

If the messageformat instance has been initialized with support for more than one locale, using a key that matches the locale's identifier at any depth of a messages object will set its child elements to use that locale.

which in your case ends up interacting with the '*' locale identifier:

If locale has the special value '*', it will match all available locales. This may be useful if you want your messages to be completely determined by your data, but may provide surprising results if your input message object includes any 2-3 character keys that are not locale identifiers.

The documentation bug here is that the latter part refers to "locale identifiers" and only implicitly specifies that to really mean "BCP-47 primary language subtag".

While I agree that we ought to find a solution for your use case, I think this change shouldn't be it.

@dpchamps
Copy link
Contributor Author

dpchamps commented Feb 27, 2022

@eemeli Yes, I was concerned that there may be some edge cases like this. Makes total sense.

Ideas for how we might be able to find a solution that works without a breaking change:

  1. implement a canonicalization procedure (level 1 should be sufficient)
    a. Alternatively, just continue extract the key without the canonicalization procedure via normalize (like I'm doing here in this PR)
  2. determine if country code / variant is valid
  3. If so, select the locale identifier from the parsed input
  4. (optionally) put this behind a configuration flag, something like supportLocaleKeyVariants

That way, by ensuring that the country code / variant is in the set of valid codes if would be very difficult to accidentally specify a key that would be misinterpreted by a variant.

Supporting a configuration flag would prevent any accidents at all from happening.

However, this would require an additional set of all valid country codes / variants...

@dpchamps
Copy link
Contributor Author

dpchamps commented Mar 9, 2022

Hey @eemeli hope you're well. Do you have any thoughts wrt #360 (comment)

Apologies for the bump. We have some work that is currently blocked by this and I'm more than happy to contribute to the codebase if it makes sense. If not, totally understandable-- I can find a workaround in the interim. Thanks so much!

@cdaringe
Copy link
Contributor

hey @eemeli, hope all is well. re-reading thru this issue i'm still trying to degrok things, but curious if you'd be willing to give @dpchamps or myself any concrete guidance here for a solution you'd be interested in supporting. Thanks for your time and consideration

@eemeli
Copy link
Member

eemeli commented Mar 11, 2023

Closing this in favour of #386.

@eemeli eemeli closed this Mar 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants