[jitsi-dev] Message history log file can get corrupted by invalidXML/HTML chars


#1

Hi damencho,

I know. It is quite literally what I said. The message is plain text, content type text/plain, encoding UTF-8. The chars are literally the content I explained, including the 0x02 chars surrounding the bold text.

If you want to reproduce, you can do the following:

Get the following code: https://github.com/cobratbq/jitsi/commit/a20f31c4e279ae2001dcf55afbd2d66744c08745
and make the following change (see diff for the correct method): Make Utils.parse(String text) simply return the original text value. (Prevents dropping control codes.) See where Utils.parse is used to find the 2 locations for processing received messages.

Then create an account and connect to IRC chat.freenode.org, join channel #bitcoin. Wait (very shortly) until ‘gribble’ automatically sends you that message.

Kind regards,
Danny

···

----- Original Message -----

From: Damian Minkov [mailto:damencho@jitsi.org]

To: danny@dannyvanheumen.nl

Cc: dev@jitsi.org

Sent: Tue, 11 Feb 2014 20:47:11 +0200

Subject: Re: [jitsi-dev] Message history log file can get corrupted by invalidXML/HTML chars

Hi,

well thanks, I meant not the text content but the chars … does it
contain cdata, or is it html formated?
Is it possible to connect somewhere and reproduce the issue myself,
this will be the easiest way to track it down?

Thanks
damencho

On Tue, Feb 11, 2014 at 8:28 PM, Danny van Heumen danny@dannyvanheumen.nl wrote:

Hi damencho,

The message is basically:"#bitcoin: Beware of scams! Scammers are sending
users private messages with bitcoin-stealing malware and offers to trade. We
are unable to stop them, so you must protect yourself. NEVER download or run
programs from strangers! When in doubt, ask the ops.".
(In case the html formatting doesn’t come through, “you must protect
yourself” is in bold.)

The IRC control char to indicate bold formatting is 0x02, and 0x02 the
second time indicates ending bold formatting. I suspect that closing and
opening CDATA may be due to the html numeric reference char. I think that
CDATA is literal text so html numeric reference can only be placed outside a
CDATA section so it can be interpreted. I haven’t digged deep enough to see
whether we explicitly open a new CDATA section, or that this happens inside
some (third party) library.

Kind regards,
Danny

On 02/11/2014 08:07 AM, Damian Minkov wrote:

Hi,

The message part of the record looks strange. It contains 3 CDATA
sections, while I think it is supposed to have only one.
Can you confirm what is the exact message coming from that bot?

Regards
damencho

On Tue, Feb 11, 2014 at 1:01 AM, Danny van Heumen > danny@dannyvanheumen.nl wrote:

Hi damencho,

See the attached jitsi.log file. I may have misunderstood from the error
message that the escaped char was already stored. (I didn’t find it in
the XML file.) It does however truncate the log. Once I got that error,
the existing XML file was just an empty ‘history’ tag. I think that
isn’t supposed to happen.

I get this message immediately after I get a private message from the
user (gribble), when, I guess, the message log is first loaded and this
“malformed” message is received.

I can help with testing if needed. I only need to disable parsing the
message in order to get this raw formatting code. (And the response is
from a bot, so very predictable.)

Kind regards,
Danny

On 02/10/2014 08:12 AM, Damian Minkov wrote:

Hey,

can you send me a fragment of such broken history xml file, so I can
take a look? I think we already escape some chars.

Thanks
damencho

On Sun, Feb 9, 2014 at 5:53 PM, Danny van Heumen > danny@dannyvanheumen.nl wrote:

Hi,

The way message history is stored in Jitsi currently, it is possible to
corrupt the message history file. Also, when the history file gets
corrupted, the file gets truncated, because the XML is invalid and therefore
isn’t parsed correctly and preserved. The root cause for this is that there
are still some chars that are invalid, even as numeric character reference
(e.g. {). See
http://en.wikipedia.org/wiki/Character_encodings_in_HTML#Illegal_characters
for a list of illegal characters.

I encountered this by accident as IRC has some control codes in the range
0-31, which is an illegal range in HTML and XML. I currently drop these
control characters, and once I get html formatting set up I will convert it
to actual formatting codes.

The actual problem arises when Jitsi is started the next time after having
received such a message with illegal chars, and you “get in contact” with
the history file. For example, by again chatting with the same contact that
sent you the illegal character previously. While the history file is being
opened, a parser exception is thrown and history is truncated.

I believe this should be fixed in the general history processor. (Somewhere
around MessageHistoryServiceImpl)

Danny


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev


dev mailing list
dev@jitsi.org
Unsubscribe instructions and other list options:
http://lists.jitsi.org/mailman/listinfo/dev