[jitsi-dev] Message history log file can get corrupted by invalid XML/HTML chars


#1

Hi,

The way message history is stored in Jitsi currently, it is possible to corrupt the message history file. Also, when the history file gets corrupted, the file gets truncated, because the XML is invalid and therefore isn't parsed correctly and preserved. The root cause for this is that there are still some chars that are invalid, even as numeric character reference (e.g. {). See http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range 0-31, which is an illegal range in HTML and XML. I currently drop these control characters, and once I get html formatting set up I will convert it to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having received such a message with illegal chars, and you "get in contact" with the history file. For example, by again chatting with the same contact that sent you the illegal character previously. While the history file is being opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere around MessageHistoryServiceImpl)

Danny


#2

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

···

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#3

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

jitsi.log (2.77 KB)

···

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen > <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#4

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

···

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen <danny@dannyvanheumen.nl> wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen >> <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#5

Hi damencho,

The message is basically:"#bitcoin: Beware of scams! Scammers are sending users private messages with bitcoin-stealing malware and offers to trade. We are unable to stop them, so *you must protect yourself*. NEVER download or run programs from strangers! When in doubt, ask the ops.".
(In case the html formatting doesn't come through, "you must protect yourself" is in bold.)

The IRC control char to indicate bold formatting is 0x02, and 0x02 the second time indicates ending bold formatting. I suspect that closing and opening CDATA may be due to the html numeric reference char. I think that CDATA is literal text so html numeric reference can only be placed outside a CDATA section so it can be interpreted. I haven't digged deep enough to see whether we explicitly open a new CDATA section, or that this happens inside some (third party) library.

Kind regards,
Danny

···

On 02/11/2014 08:07 AM, Damian Minkov wrote:

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen > <danny@dannyvanheumen.nl> wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen >>> <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#6

Hi,

well thanks, I meant not the text content but the chars ... does it
contain cdata, or is it html formated?
Is it possible to connect somewhere and reproduce the issue myself,
this will be the easiest way to track it down?

Thanks
damencho

···

On Tue, Feb 11, 2014 at 8:28 PM, Danny van Heumen <danny@dannyvanheumen.nl> wrote:

Hi damencho,

The message is basically:"#bitcoin: Beware of scams! Scammers are sending
users private messages with bitcoin-stealing malware and offers to trade. We
are unable to stop them, so you must protect yourself. NEVER download or run
programs from strangers! When in doubt, ask the ops.".
(In case the html formatting doesn't come through, "you must protect
yourself" is in bold.)

The IRC control char to indicate bold formatting is 0x02, and 0x02 the
second time indicates ending bold formatting. I suspect that closing and
opening CDATA may be due to the html numeric reference char. I think that
CDATA is literal text so html numeric reference can only be placed outside a
CDATA section so it can be interpreted. I haven't digged deep enough to see
whether we explicitly open a new CDATA section, or that this happens inside
some (third party) library.

Kind regards,
Danny

On 02/11/2014 08:07 AM, Damian Minkov wrote:

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen > <danny@dannyvanheumen.nl> wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen > <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#7

Hi,

I just thought of something. This could make reproduction quite a bit
easier.

String messageText = "Hello \u0002World\u0002!";

Create a message and put this messageText variable as the message's
text/content. Fire that message as an incoming (received) message and
you should get the same issue.

Danny

···

On 02/11/2014 07:47 PM, Damian Minkov wrote:

Hi,

well thanks, I meant not the text content but the chars ... does it
contain cdata, or is it html formated?
Is it possible to connect somewhere and reproduce the issue myself,
this will be the easiest way to track it down?

Thanks
damencho

On Tue, Feb 11, 2014 at 8:28 PM, Danny van Heumen > <danny@dannyvanheumen.nl> wrote:

Hi damencho,

The message is basically:"#bitcoin: Beware of scams! Scammers are sending
users private messages with bitcoin-stealing malware and offers to trade. We
are unable to stop them, so you must protect yourself. NEVER download or run
programs from strangers! When in doubt, ask the ops.".
(In case the html formatting doesn't come through, "you must protect
yourself" is in bold.)

The IRC control char to indicate bold formatting is 0x02, and 0x02 the
second time indicates ending bold formatting. I suspect that closing and
opening CDATA may be due to the html numeric reference char. I think that
CDATA is literal text so html numeric reference can only be placed outside a
CDATA section so it can be interpreted. I haven't digged deep enough to see
whether we explicitly open a new CDATA section, or that this happens inside
some (third party) library.

Kind regards,
Danny

On 02/11/2014 08:07 AM, Damian Minkov wrote:

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen >> <danny@dannyvanheumen.nl> wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen >> <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#8

Hey,

just committed a fix that will not break the history file anymore, but
still will lose those control chars. But at least will not lose the
messages.
Could you please test with latest build/source.

Regards
damencho

···

On Thu, Feb 13, 2014 at 12:49 AM, Danny van Heumen <danny@dannyvanheumen.nl> wrote:

Hi,

I just thought of something. This could make reproduction quite a bit
easier.

String messageText = "Hello \u0002World\u0002!";

Create a message and put this messageText variable as the message's
text/content. Fire that message as an incoming (received) message and
you should get the same issue.

Danny

On 02/11/2014 07:47 PM, Damian Minkov wrote:

Hi,

well thanks, I meant not the text content but the chars ... does it
contain cdata, or is it html formated?
Is it possible to connect somewhere and reproduce the issue myself,
this will be the easiest way to track it down?

Thanks
damencho

On Tue, Feb 11, 2014 at 8:28 PM, Danny van Heumen >> <danny@dannyvanheumen.nl> wrote:

Hi damencho,

The message is basically:"#bitcoin: Beware of scams! Scammers are sending
users private messages with bitcoin-stealing malware and offers to trade. We
are unable to stop them, so you must protect yourself. NEVER download or run
programs from strangers! When in doubt, ask the ops.".
(In case the html formatting doesn't come through, "you must protect
yourself" is in bold.)

The IRC control char to indicate bold formatting is 0x02, and 0x02 the
second time indicates ending bold formatting. I suspect that closing and
opening CDATA may be due to the html numeric reference char. I think that
CDATA is literal text so html numeric reference can only be placed outside a
CDATA section so it can be interpreted. I haven't digged deep enough to see
whether we explicitly open a new CDATA section, or that this happens inside
some (third party) library.

Kind regards,
Danny

On 02/11/2014 08:07 AM, Damian Minkov wrote:

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen >>> <danny@dannyvanheumen.nl> wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen >>> <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


#9

Hi damencho,

Great. Losing the control characters is not a big issue. Especially
since I've already got some code in place that should either drop them
or create a formatted HTML message anyway. So either way, I *shouldn't*
hit that code. But ... since the implementation is still very new, I
wouldn't be surprised if I missed some codes, so better safe than sorry.

Danny

···

On 02/18/2014 01:08 PM, Damian Minkov wrote:

Hey,

just committed a fix that will not break the history file anymore, but
still will lose those control chars. But at least will not lose the
messages.
Could you please test with latest build/source.

Regards
damencho

On Thu, Feb 13, 2014 at 12:49 AM, Danny van Heumen > <danny@dannyvanheumen.nl> wrote:

Hi,

I just thought of something. This could make reproduction quite a bit
easier.

String messageText = "Hello \u0002World\u0002!";

Create a message and put this messageText variable as the message's
text/content. Fire that message as an incoming (received) message and
you should get the same issue.

Danny

On 02/11/2014 07:47 PM, Damian Minkov wrote:

Hi,

well thanks, I meant not the text content but the chars ... does it
contain cdata, or is it html formated?
Is it possible to connect somewhere and reproduce the issue myself,
this will be the easiest way to track it down?

Thanks
damencho

On Tue, Feb 11, 2014 at 8:28 PM, Danny van Heumen >>> <danny@dannyvanheumen.nl> wrote:

Hi damencho,

The message is basically:"#bitcoin: Beware of scams! Scammers are sending
users private messages with bitcoin-stealing malware and offers to trade. We
are unable to stop them, so you must protect yourself. NEVER download or run
programs from strangers! When in doubt, ask the ops.".
(In case the html formatting doesn't come through, "you must protect
yourself" is in bold.)

The IRC control char to indicate bold formatting is 0x02, and 0x02 the
second time indicates ending bold formatting. I suspect that closing and
opening CDATA may be due to the html numeric reference char. I think that
CDATA is literal text so html numeric reference can only be placed outside a
CDATA section so it can be interpreted. I haven't digged deep enough to see
whether we explicitly open a new CDATA section, or that this happens inside
some (third party) library.

Kind regards,
Danny

On 02/11/2014 08:07 AM, Damian Minkov wrote:

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen >>>> <danny@dannyvanheumen.nl> wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn't find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty 'history' tag. I think that
isn't supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
"malformed" message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen >>>> <danny@dannyvanheumen.nl> wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn't parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. &#123;). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you "get in contact" with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev

_______________________________________________
dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev