[jitsi-dev] Syntax and plurals


#1

Hi folks,

I've been working on the Gaelic localization and I've got some questions.

1) There are strings like "are calling..." and "is calling...", "is writing a message" and so on. Is there a way I can force a placeholder into this because in Gaelic the word order is Verb-Subject-Predicate, which means a sentence like "Frank is calling" comes out as "is Frank calling" so the username at the start of the sentence is something that just can't be done, linguistically, even if I try to be very "creative". Given the languages already around, this problem has most likely already occurred - could someone let me know what the best solution is?

2) Plurals in the PO file seems to use and English 2-plural-form pattern (e.g. "one more unread conversation ..." vs "{1} more unread conversation..."), no matter how I modify the PO header. There are a lot of languages which use more complex patterns - including Gaelic which requires 4 plural forms. I probably just missed it - could someone let me know what I must modify to cover all the plural forms we need?

Thanks!

Michael


#2

I've been working on the Gaelic localization and I've got some questions.

1) There are strings like "are calling..." and "is calling...", "is
writing a message" and so on. Is there a way I can force a placeholder
into this because in Gaelic the word order is Verb-Subject-Predicate,
which means a sentence like "Frank is calling" comes out as "is Frank
calling" so the username at the start of the sentence is something that
just can't be done, linguistically, even if I try to be very "creative".
Given the languages already around, this problem has most likely already
occurred - could someone let me know what the best solution is?

Normally, a non-static text/message should look like "{0} is calling...",
where {0} would be replaced by the caller's name. There might be certain
strings where this pattern hasn't been followed and we need to fix that.
Please write an e-mail to the list with resource-name of such string and we
will fix them in the code.

2) Plurals in the PO file seems to use and English 2-plural-form pattern
(e.g. "one more unread conversation ..." vs "{1} more unread
conversation..."), no matter how I modify the PO header. There are a lot
of languages which use more complex patterns - including Gaelic which
requires 4 plural forms. I probably just missed it - could someone let
me know what I must modify to cover all the plural forms we need?

There are a few issues here which aren't easy at all to solve:
- We use Java's resource bundle technology which isn't aware of plural forms
- Even if PO could handle these situations correctly, they would be lost
during the conversion
- PO's capabilities for multiple plurals or generally limited

When you asked to open the Gaelic translation, we had a short discussion
about using a different localization system (in the code) to cover such
problems. The problem is that this would be a HUGE change and we currently
simply don't have the resources to handle this (well, maybe the "backend"
change, but not all the repetitive tasks of replacing the calls to the
backend).

I know that Polish also has four plurals, so maybe Paweł can help us out
here how this was done for the Polish translation?

Thanks!
Michael

Ingo


#3

29/07/2013 07:56, sgrìobh Ingo Bauersachs:

Normally, a non-static text/message should look like "{0} is calling...", where {0} would be replaced by the caller's name. There might be certain strings where this pattern hasn't been followed and we need to fix that. Please write an e-mail to the list with resource-name of such string and we will fix them in the code.

Ok, I will collect them and mail them to the list. I'll finish most of the translation first so I can collect them all in one go, I'm already at 41% so it shouldn't take too long.

There are a few issues here which aren't easy at all to solve:
- We use Java's resource bundle technology which isn't aware of plural forms
- Even if PO could handle these situations correctly, they would be lost
during the conversion
- PO's capabilities for multiple plurals or generally limited

That's not true - at least the bit about PO. I guess it depends on how the source was prepared in terms of getxt (and I'm no expert) but PO is perfectaly capable of handling all sorts of crazy plural rules. Generally what happens (from the translators point of view) that when you open a PO which has a plural rule specified in the header, it offers the right number of fields for a given string. For example Wordpress specifies the following in the header:
"Plural-Forms: nplurals=3; plural=n < 2 ? 0 : n == 2 ? 1 : 2;\n"

And the getxt looks like this:
#: wp-content/admin-plugins/wpcom-billing/class.my-upgrades.php:823
#: wp-content/mu-plugins/subscriptions.php:1238
msgid "%s month"
msgid_plural "%s months"
msgstr[0] "%s mhìos"
msgstr[1] "%s mhìos"
msgstr[2] "%s mìosan"
msgstr[3] "%s mìos"

I can't speak for Java but somehow, given the complex plural systems in Slavonic languages, I can't imagine they're just forcing "All your bases is belong to us" type plurals on users of Russian and Polish?

When you asked to open the Gaelic translation, we had a short discussion
about using a different localization system (in the code) to cover such
problems. The problem is that this would be a HUGE change and we currently
simply don't have the resources to handle this (well, maybe the "backend"
change, but not all the repetitive tasks of replacing the calls to the
backend).

I know that Polish also has four plurals, so maybe Paweł can help us out
here how this was done for the Polish translation?

I understand the issue of resources - and if the plurals are currently broken, I understand that a fix may take a long time. But I think it's worthwhile considering the issue and dedicating a certain amount of developer time to it because it has a strong impact on the GUI and won't simply go away and the more localizations join, the more languages will fall foul of this.

Michael


#4

29/07/2013 07:56, sgrìobh Ingo Bauersachs:

Normally, a non-static text/message should look like "{0} is
calling...", where {0} would be replaced by the caller's name. There
might be certain strings where this pattern hasn't been followed and
we need to fix that. Please write an e-mail to the list with
resource-name of such string and we will fix them in the code.

Ok, I will collect them and mail them to the list. I'll finish most of
the translation first so I can collect them all in one go, I'm already
at 41% so it shouldn't take too long.

Sure, that’s fine.

There are a few issues here which aren't easy at all to solve: - We use
Java's resource bundle technology which isn't aware of plural forms -
Even if PO could handle these situations correctly, they would be lost
during the conversion - PO's capabilities for multiple plurals or
generally limited

That's not true - at least the bit about PO. I guess it depends on how
the source was prepared in terms of getxt (and I'm no expert) but PO is
perfectaly capable of handling all sorts of crazy plural rules.
Generally what happens (from the translators point of view) that when
you open a PO which has a plural rule specified in the header, it offers
the right number of fields for a given string. For example Wordpress
specifies the following in the header:
"Plural-Forms: nplurals=3; plural=n < 2 ? 0 : n == 2 ? 1 : 2;\n"

And the getxt looks like this:
#: wp-content/admin-plugins/wpcom-billing/class.my-upgrades.php:823
#: wp-content/mu-plugins/subscriptions.php:1238
msgid "%s month"
msgid_plural "%s months"
msgstr[0] "%s mhìos"
msgstr[1] "%s mhìos"
msgstr[2] "%s mìosan"
msgstr[3] "%s mìos"

I wasn't referring to such "easy" plurals, but rather something that handles "At DATE (he|she) wrote (one|N) message(s)". I know that PO can handle the cases you showed above.

I can't speak for Java but somehow, given the complex plural systems in
Slavonic languages, I can't imagine they're just forcing "All your bases
is belong to us" type plurals on users of Russian and Polish?

There are possibilities [1] to do that, but they are way too complex and don't handle all cases.

When you asked to open the Gaelic translation, we had a short discussion
about using a different localization system (in the code) to cover such
problems. The problem is that this would be a HUGE change and we currently
simply don't have the resources to handle this (well, maybe the "backend"
change, but not all the repetitive tasks of replacing the calls to the
backend).

I know that Polish also has four plurals, so maybe Paweł can help us out
here how this was done for the Polish translation?

I understand the issue of resources - and if the plurals are currently
broken, I understand that a fix may take a long time. But I think it's
worthwhile considering the issue and dedicating a certain amount of
developer time to it because it has a strong impact on the GUI and won't
simply go away and the more localizations join, the more languages will
fall foul of this.

Please don't get me wrong, I know that UI is a really important part, if not the most important. But we need help. Overall this is not really complicated once the infrastructure is right but rather a tedious task. We will probably have a look at it when it comes to the localization of the Android port.

Michael

Ingo

[1] http://docs.oracle.com/javase/tutorial/i18n/format/choiceFormat.html


#5

Hi Ingo

29/07/2013 13:46, sgrìobh Ingo Bauersachs:

I wasn't referring to such "easy" plurals, but rather something that handles "At DATE (he|she) wrote (one|N) message(s)". I know that PO can handle the cases you showed above.

Ah that's going into l20n, yes, I can see how that might get tricky without resorting to chopping up strings which is always a bad idea.

Please don't get me wrong, I know that UI is a really important part, if not the most important. But we need help. Overall this is not really complicated once the infrastructure is right but rather a tedious task. We will probably have a look at it when it comes to the localization of the Android port.

I did get you wrong it seems - my apologies. I run a lot of localization projects for our language and quite often you run into projects which give you a look like you said "let's send a spaceship through the sun" when you say "plural formatting", which is a bit frustrating. But the way you explained it makes sense and it's good that the UI is considered so important and that you will look at it in the future, so that's all good, I will do my best to work around for now.

Michael


#6

Ok, it's been a while but I got up to 87% and found all the strings which look like they should have a place holder, so I'm posting them as Ingo asked me to:

#: service.gui.ARE_CALLING
msgid "are calling..."
msgstr ""

#: service.gui.AUTHORIZATION_ACCEPTED
msgid "contact has accepted your authorization request."
msgstr ""

#: service.gui.AUTHENTICATION_REJECTED
msgid "contact has rejected your authorization request."
msgstr ""

#: service.gui.IS_CALLING
msgid "is calling..."
msgstr ""

#: service.gui.PROACTIVE_NOTIFICATION
msgid "is writing a message"
msgstr ""

#: service.gui.RECEIVED
msgid "received"
msgstr ""

#: service.gui.STATUS_CHANGED_CHAT_MESSAGE
msgid "has become {0}"
msgstr ""

The other question I have is as follows - I have tried to stick to the doubling of '' but Gaelic uses the apostrophe a lot and I'm sure I missed some, simply because I'm really fast at translating and I tend to forget about the '' as it's a rather unnatural thing to have to do, counting ''. Is there some check that can be run over the po to flag up any I've missed?

Cheers,

Michael


#7

Hi Michael

Thank you very much for pointing these out and the Gaelic translation!

Ok, it's been a while but I got up to 87%

I updated the templates on Pootle - so Gaelic (along with a couple of other
languages) is now down again to 95%. I'm trying with our build maintainers
to get at least the Pootle-import (source code -> Pootle) automated.

and found all the strings
which look like they should have a place holder, so I'm posting them as
Ingo asked me to:

#: service.gui.ARE_CALLING
msgid "are calling..."
msgstr ""

#: service.gui.AUTHORIZATION_ACCEPTED
msgid "contact has accepted your authorization request."
msgstr ""

#: service.gui.AUTHENTICATION_REJECTED
msgid "contact has rejected your authorization request."
msgstr ""

#: service.gui.IS_CALLING
msgid "is calling..."
msgstr ""

Fixed.

#: service.gui.PROACTIVE_NOTIFICATION
msgid "is writing a message"
msgstr ""

This one is a bit tricky: the message is being used for the popup. The
contacts name is used as the title and the above message is the popup's
text/body. I left it as is.

#: service.gui.RECEIVED
msgid "received"
msgstr ""

Fixed.

#: service.gui.STATUS_CHANGED_CHAT_MESSAGE
msgid "has become {0}"
msgstr ""

Similar to the proactive notification: This is used when the chat window is
open and a peer changes its status. The peer's name is in the grey area
above the message. In any case, these messages are currently disabled as
they flooded the chat window (grab a Jitsi 2.0 if you want to experience
them...). Left as is.

The other question I have is as follows - I have tried to stick to the
doubling of '' but Gaelic uses the apostrophe a lot and I'm sure I
missed some, simply because I'm really fast at translating and I tend to
forget about the '' as it's a rather unnatural thing to have to do,
counting ''. Is there some check that can be run over the po to flag up
any I've missed?

Other than grepping with a fancy regex, I don't know of anything. French was
equally annoying...

Cheers,
Michael

Ingo