[sip-comm-dev] [gsoc] news.google.com patch and RSS support objectives for interim report


#1

Hello everybody,
As noted a few times on the list one of the problems of the current
RSS support implementation was the inabilty to retrieve feeds from
news.google.com After a few packet dumps and telnet requests I came to
the conclusion that for some unknown reason the User-Agent Java/x.y.z
(Java/1.6.0) on my machine is blacklisted on the news.google.com
servers, thus every request gets a 403 Forbidden response.
Fixing the problem is just a matter of changing the default User-Agent
ROME is sending along with the HTTP request. I found that the current
version string is just fine, but it can be anything you'd like and
it's not blacklisted :stuck_out_tongue:
I've attached to the mail the patch file. Hope it's OK, as this is my
first patch ever :slight_smile:

On a somewhat different matter I would like to talk a little about the
objectives I set for the interim deadline with Emil and Vincent. Very
briefly these would be:
  1. implementing the favicon contact image (as presented in my SoC
application).
  2. better HTML rendering (right now the information presented is
more or less usable due to a not so friendly rendering / choice of the
representative info)
  3. catching and squishing of any bugs found during 1 and 2 above :slight_smile:

Have a nice day,
Mihai

news.google.com.patch (732 Bytes)


#2

Hi Mihai,

Good catch!
I have tested your patch and it works like a charm.

It's a good day :slight_smile:

Vincent

PS: It works at least with google, but other RSS flows are still problematic with ROME like: http://www.freenews.fr/feeds/rss.php.

Mihai Balan wrote:

···

Hello everybody,
As noted a few times on the list one of the problems of the current
RSS support implementation was the inabilty to retrieve feeds from
news.google.com After a few packet dumps and telnet requests I came to
the conclusion that for some unknown reason the User-Agent Java/x.y.z
(Java/1.6.0) on my machine is blacklisted on the news.google.com
servers, thus every request gets a 403 Forbidden response.
Fixing the problem is just a matter of changing the default User-Agent
ROME is sending along with the HTTP request. I found that the current
version string is just fine, but it can be anything you'd like and
it's not blacklisted :stuck_out_tongue:
I've attached to the mail the patch file. Hope it's OK, as this is my
first patch ever :slight_smile:

On a somewhat different matter I would like to talk a little about the
objectives I set for the interim deadline with Emil and Vincent. Very
briefly these would be:
1. implementing the favicon contact image (as presented in my SoC
application).
2. better HTML rendering (right now the information presented is
more or less usable due to a not so friendly rendering / choice of the
representative info)
3. catching and squishing of any bugs found during 1 and 2 above :slight_smile:

Have a nice day,
Mihai
------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#3

Hi Mihai,

Very nice catch indeed!

I've committed your fix (I've only added the program name in the beginning of the UA header) and ack-ed your effort on the Team and Contributors page.

I am really looking forward to your future contributions! :slight_smile:

Cheers
Emil

Mihai Balan wrote:

···

On a somewhat different matter I would like to talk a little about the
objectives I set for the interim deadline with Emil and Vincent. Very
briefly these would be:
  1. implementing the favicon contact image (as presented in my SoC
application).
  2. better HTML rendering (right now the information presented is
more or less usable due to a not so friendly rendering / choice of the
representative info)
  3. catching and squishing of any bugs found during 1 and 2 above :slight_smile:

Have a nice day,
Mihai

------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#4

Vincent Lucas wrote:

Vincent

PS: It works at least with google, but other RSS flows are still problematic with ROME like: http://www.freenews.fr/feeds/rss.php.

Rss like the one above (ver. 0.91) are missing date so are not added and not processed.

net.java.sip.communicator.impl.protocol.rss.RssFeedReader line 110.

If we skip this if case it will work but every time we run sip-communicator the rss will be retrieved so
a chat window will popup.
I think we must come up with some other approach for comparing rss flows as there are cases where there is no date of change.

damencho

···

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#5

Rss like the one above (ver. 0.91) are missing date so are not added and not processed.

net.java.sip.communicator.impl.protocol.rss.RssFeedReader line 110.

If we skip this if case it will work but every time we run sip-communicator the rss will be retrieved so
a chat window will popup.
I think we must come up with some other approach for comparing rss flows as there are cases where there is no date of change.

You are right!
For the moment we are using:
- the published date.
- the title.
- the link (maybe we have to replace this by the URI ?).
In my humble opinion, it may be interesting to use the link/URI (which is obligatory filled for each feed's entry and must be unique) for comparing rss flows.

What do you think?

Vincent

···

damencho

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#6

Hi
I just tested with freenews and got a grasp of the problem. I'll try
to come up with a solution ASAP after my last exam on friday. I think
using the URI could do the job. Also, another thing I was thinking
about was using the HistoryService to provide a way for not retrieving
the same feed twice (i.e. on different startups of SIP).
And, correct me if I'm wrong, but isn't there any way to search
through the conversations archives (for protocols like Yahoo, Jabber,
MSN, etc)? (I think I saw a discussion on this matter on the list, but
can't search for it right now).

Have a nice day,
Mihai

···

On 6/27/07, Vincent Lucas <lucas@clarinet.u-strasbg.fr> wrote:

> Rss like the one above (ver. 0.91) are missing date so are not added
> and not processed.
>
> net.java.sip.communicator.impl.protocol.rss.RssFeedReader line 110.
>
> If we skip this if case it will work but every time we run
> sip-communicator the rss will be retrieved so
> a chat window will popup.
> I think we must come up with some other approach for comparing rss
> flows as there are cases where there is no date of change.
You are right!
For the moment we are using:
- the published date.
- the title.
- the link (maybe we have to replace this by the URI ?).
In my humble opinion, it may be interesting to use the link/URI (which
is obligatory filled for each feed's entry and must be unique) for
comparing rss flows.

What do you think?

Vincent
>
> damencho
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
> For additional commands, e-mail: dev-help@sip-communicator.dev.java.net
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#7

Hello Vincent,

We've been discussing this issue with the guys that previously worked on RSS, and couldn't find a satisfactory solution at the time.

Vincent Lucas wrote:

- the link (maybe we have to replace this by the URI ?).
In my humble opinion, it may be interesting to use the link/URI (which is obligatory filled for each feed's entry and must be unique) for comparing rss flows.

Are you certain that there is no way the same URL could be used twice in the same feed? That would really be terrific and save us a lot of trouble! I'd appreciate it if you could point us to the right

To tell you the truth, however, I am abit sceptical that this would be enough. RFC 4287, for example, defines the atom:id element which is the only one that is supposed to be unique (I think).

Other standars use other elements (I saw a guid tag in another syndication format for example).

So, unfortunately, I think we'd have to handle these on a per-standard basis.

Emil

···

What do you think?

Vincent

damencho

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#8

Hello Emil,

Are you certain that there is no way the same URL could be used twice in the same feed? That would really be terrific and save us a lot of trouble! I'd appreciate it if you could point us to the right

To have the same URL is frequent. The same URI must be less frequent, which means that it might append.

To tell you the truth, however, I am abit sceptical that this would be enough. RFC 4287, for example, defines the atom:id element which is the only one that is supposed to be unique (I think).

Other standars use other elements (I saw a guid tag in another syndication format for example).

So, unfortunately, I think we'd have to handle these on a per-standard basis.

You are surely right, but it is worth investigating first for URI and for the Dublin Core identifier: "SyndEntryImpl.getDCModule().getIdentifier()".

Vincent

···

Emil


#9

Hi Vincent,

Vincent Lucas wrote:

Hello Emil,

Are you certain that there is no way the same URL could be used twice in the same feed? That would really be terrific and save us a lot of trouble! I'd appreciate it if you could point us to the right

To have the same URL is frequent. The same URI must be less frequent, which means that it might append.

I am not sure I see which tag you are talking about. I did see the getUri() method in ROME's SyndEntry class but I don't see what it corresponds to in the feed. Take the RSS flow you gave as an example earlier:

http://www.freenews.fr/feeds/rss.php

Is there an URI tag in there?

To tell you the truth, however, I am abit sceptical that this would be enough. RFC 4287, for example, defines the atom:id element which is the only one that is supposed to be unique (I think).

Other standars use other elements (I saw a guid tag in another syndication format for example).

So, unfortunately, I think we'd have to handle these on a per-standard basis.

You are surely right, but it is worth investigating first for URI and for the Dublin Core identifier: "SyndEntryImpl.getDCModule().getIdentifier()".

Is this guaranteed to be non-null? Do you have an idea as to how it is constructed?

Cheers
Emil

···

Vincent

Emil
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#10

For the "freenews" feed all fileds are "null", empty or unusable excepts for the following:
- SyndEntryImpl.getDescritption().getValue() := Retrouvez depuis ce matin la 16�me �dition d'Online sur Freen...
- SyndEntryImpl.getLink() := http://www.freenews.fr/nat/4974-freenews-online-16-a-l-antenne.html
- SyndEntryImpl.getTitle() := Online 16 � l'antenne
- SyndEntryImpl.getTitleEx().getValue := Online 16 � l'antenne

Which means that both the URI and the Dublin Core can not be used directly.
But the web-page "http://wiki.java.net/bin/view/Javawsxml/Rome05URIMapping" describes our problem and shows that the URI are created on per-standard basis but can still be "null".
A possible solution is to use 3 or 4 fields in order of preferences (If the 1st is "null", then use the 2nd. If the 2nd is "null" too, then use the 3rd. etc.) to identify one feed entry:
1) URI
2) Date
3) URL
4) Title

Hope this help,
Vincent


#11

That sounds reasonable, and I agree that it might be the only possibility we have to handle the issue. I also expect that we'll probably have to include other elements such as the atom:id or the guid (which i saw somewhere but i don't remember where exactly).

Emil

Vincent Lucas wrote:

···

For the "freenews" feed all fileds are "null", empty or unusable excepts for the following:
- SyndEntryImpl.getDescritption().getValue() := Retrouvez depuis ce matin la 16�me �dition d'Online sur Freen...
- SyndEntryImpl.getLink() := http://www.freenews.fr/nat/4974-freenews-online-16-a-l-antenne.html
- SyndEntryImpl.getTitle() := Online 16 � l'antenne
- SyndEntryImpl.getTitleEx().getValue := Online 16 � l'antenne

Which means that both the URI and the Dublin Core can not be used directly.
But the web-page "http://wiki.java.net/bin/view/Javawsxml/Rome05URIMapping" describes our problem and shows that the URI are created on per-standard basis but can still be "null".
A possible solution is to use 3 or 4 fields in order of preferences (If the 1st is "null", then use the 2nd. If the 2nd is "null" too, then use the 3rd. etc.) to identify one feed entry:
1) URI
2) Date
3) URL
4) Title

Hope this help,
Vincent

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net