[sip-comm-dev] Report about work on RSS protocol for SC


#1

Hi everybody !

After many hours of work (most of them to understand the philosophy of SC and its existent implementation, and how netbeans works too !), we have reached the goal of our project which was to insert into SC a RSS flows reader functionality. You can have a look on this subject here <http://www.sip-communicator.org/index.php/Development/SupportForRSSFlows>.

At this time, and according to our tests, the main options that we retained are fully working. You can see them at chapter III �Our Implementation�. We don't have commit our code on CVS until now, we prefer that you see our work directly (it's the goal of this mail). So try it !

We detail briefly in these following lines how to proceed (we suppose you have a very good knowing of SC's world, more than us !) for using our implementation.

_*I/ How to try our RSS protocol implementation into SC*_

1/ Download the latest SIP Communicator source from java.net via the cvs checkout functionality (e.g. with netbeans),

2/ save the archive (rss4sc.tar.gz) joined with this mail directly into your SC main folder, and extract it in this folder,

3/ modify some configuration files according to the instructions specified in the � modif.txt �file contained into the archive,

4/ compile the new project (rebuild target) with ant or into netbeans,

5/ run SC, and try to add an RSS account...

_*II/ RSS: what is it ?*_

A good definition from wikipedia:

*�RSS* (an acronym for Really Simple Syndication) is a family of web feed <http://en.wikipedia.org/wiki/Web_feed> formats used to publish frequently updated digital content, such as blogs <http://en.wikipedia.org/wiki/Blog>, news feeds or podcasts <http://en.wikipedia.org/wiki/Podcasts>. RSS is analogous to a table of contents. An RSS "feed" provides a table of contents for a site's content for a certain period of time; it does not provide the content itself, but links to the content. RSS is useful because it helps aggregate lots of content into an easily accessible place.�

This is the theory about RSS, but the reality is less simple. Currently, it doesn't exist a standard on this � protocol �: RSS1.*, RSS2.* and the newest Atom. Not entering in details, IETF as finally accepted Atom as reference (RFC4287), but the historical RSS isn't dead !

Knowing that, to begin our work on this project, we tested several applications using different RSS stacks, to determine which could be the best for SC environment. Then we learned to use Netbeans and to globally understand SC.

_*III/ Our implementation*_

Here, we don't explain in detail our code, which is completely based on the Gibberish protocol. We just have implemented a new RSS Protocol Provider Service using the library INFORMA <http://informa.sourceforge.net/> . Of course, we have implemented RSS functionalities modifying some files, and you can have a more precise overview on it while looking on their comments. We give here the main utility of them into our implementation:

rss4sc.tar.gz (2.35 MB)

rss4sc.pdf (117 KB)

···

*

      ContactRssImpl.java: /*define the main characteristics of a RSS feed*/

    *

      OperationSetBasicInstantMessagingRssImpl.java: /*manage the most
      usefull features of our protocol, like retrieving a feed,
      creating/stopping the timer, sending messages...*/

    *

      OperationSetPersistentPresenceRssImpl.java: /*what we do when
      adding a new feed/contact*/

    *

      RssStatusEnum.java: /*define the two status, ONLINE and OFFLINE*/

    *

      ProtocolProviderServiceRssImpl.java: /*start or stop timer when
      changing status*/

or creating others:

    *

      RssThread.java: /*what the thread does when it's *//*launched
      automatically or manually*/

    *

      RssTimerRefreshFeed.java: /*what the timer does when it expires*/

    *

      RssFeedReader.java: /*manage the whole actions supported by a RSS
      feed (retrieve the feed, parse it, retrieve the interesting data,
      create the message to send...)*/

Now, we're going to detail which actions are possible when launching SC with RSS:

    *

      add a RSS account. You can just add one RSS account (useless to
      have some more), and you don't need to specify an ID nor a password.

    *

      add one or more groups, like in many other protocols soon
      implemented into SC.

    *

      add as much RSS flows than you want. These flows will be treated
      as contacts. When added, a feed is automatically retrieved and
      parsed. If the feed really exists at the specified URL, the
      display name of your new feed/contact will take the title of the
      retrieved flow.

    *

      change between two status: ONLINE and OFFLINE.

      /_ONLINE mode_/:

            When you get into this status, you launch in fact a timer,
            which will itself periodically launch a new thread (each 10
            minutes).

            This thread will look at the existing feeds/contacts and try
            to retrieve for each the corresponding .xml file before to
            parse it and post the a message to the current account.

      /_OFFLINE mode_/:

            When you get into this status, you just stop the timer. You
            have always the possibility (like in the ONLINE status
            moreover) the refresh one feed/contact you want: click on
            it, and send anything, like when you send a standard message
            with an another protocol. In fact, you just launch a new
            thread as if it were the timer, but just on a specific
            contact/feed.

/_Receive a message:_/

When a thread retrieve successfully a RSS flow, the RSS stack used (i.e. the INFORMA <http://informa.sourceforge.net/> library) returns all the data available on this flow, like its title, and all its items, each with its own attributes: title, link, date. You can see here <http://informa.sourceforge.net/apidocs/index.html> the JavaDoc of this library to have more details.

In our implementation, we work with these items' dates to determine if it's new or old items and know if we post them to the user or not. We have several situations:

    *

      sometime, the library isn't able to parse the date, because the
      xml file doesn't respect a standard format (like the website
      L'Equipe <http://www.lequipe.fr/>). In this case, we just post the
      last ten items retrieved.

    *

      when adding a new feed/contact, we don't have any date previously
      stored, so we post the last ten items retrieved too.

    *

      when we retrieve a feed and if a date is available, we determine
      the most recent item found, we take its date and we store it as a
      persistent-data into the file contactlist.xml just firing the
      listeners via the method �fireContactPropertyChangeEvent( )�.

    *

      during loading of SC, the metaContactList loads the last update
      date of each contact from this file. We use this date as referent
      during the first query of the current session, updating the
      persistent-data as necessary.

The method getPrintedFeed( ) applied on a ContactRssImpl give us the String containing all the new items compared to this last reference date. So, for finish, we just need to send this new message via the method fireMessageReceived( ) in the current OperationSetBasicInstantMessagingRssImpl.

That's all !

/_Specific notes on the date_/

Even if you subscribe to RSS flows on servers elsewhere in the world, which means with different local times, we work just with OUR local time. In fact, the parser give us the date of each item converted into our local times: if you see a difference between the time of your item (e.g. from the source page option from firefox applied to a RSS URL) and this saved into your contactlist.xml, don't panic, that's normal ! Each comparison between date is made according our time zone (GMT, GMT+1, GMT-1, ...).

For more details, have a look on the code itself and its comments.

_*IV/ What it remains to do *_

    At this time, we have implemented a RSS protocol from one stack, INFORMA. Obviously, we tested several others existing stacks, in particular ROME, and we retained that which seemed to us the best. This choice not being exhaustive, you can find one better in the future, and have a more pertinent implementation.

    The number of secured servers increasing rapidly, one other thing which it could be fun to implement would be to read and parse secured feeds -- we didn't have enough time to study this question.

Based on Gibberish protocol, it's possible that it remains not usefull code for RSS implementation: don't hesitate to remove it !

Finally, I think we don't have really respected the SC code convention, so be indulgent: you can correct it yourself too !

_*Conclusion*_

So you can test this implementation (I'm sure it's many bugs inside !) and report to us or to our tutor (Emil Ivov) all bugs/reports/suggestions on it.

Thanks to the contributors on dev's mailing lists, and...sorry for my bad English !

Bye.

Javes.


#2

Hello Jean-Albert,

This is some very good work! I've been testing it these days and it works really very well! I'd really like us to include it into the main source tree. There is, however, one thing that bothers me:

SIZE!

The whole plugin has a size of 1.6MB which is a bit too much. AFAICS the most hefty chunk is xerces. So I am wondering whether we could try and get rid of it. Would it be too much work to make INFORMA use the XML libs integrated inside the JRE instead of using directly xerces?

WDYT?

If you don't have the time to do this yourself, then perhaps someone else might take it (Mihai, you still interested?)

Emil

Jean-Albert Vescovo wrote:

···

Hi everybody !

After many hours of work (most of them to understand the philosophy of SC and its existent implementation, and how netbeans works too !), we have reached the goal of our project which was to insert into SC a RSS flows reader functionality. You can have a look on this subject here <http://www.sip-communicator.org/index.php/Development/SupportForRSSFlows>.

At this time, and according to our tests, the main options that we retained are fully working. You can see them at chapter III �Our Implementation�. We don't have commit our code on CVS until now, we prefer that you see our work directly (it's the goal of this mail). So try it !

We detail briefly in these following lines how to proceed (we suppose you have a very good knowing of SC's world, more than us !) for using our implementation.

_*I/ How to try our RSS protocol implementation into SC*_

1/ Download the latest SIP Communicator source from java.net via the cvs checkout functionality (e.g. with netbeans),

2/ save the archive (rss4sc.tar.gz) joined with this mail directly into your SC main folder, and extract it in this folder,

3/ modify some configuration files according to the instructions specified in the � modif.txt �file contained into the archive,

4/ compile the new project (rebuild target) with ant or into netbeans,

5/ run SC, and try to add an RSS account...

_*II/ RSS: what is it ?*_

A good definition from wikipedia:

*�RSS* (an acronym for Really Simple Syndication) is a family of web feed <http://en.wikipedia.org/wiki/Web_feed> formats used to publish frequently updated digital content, such as blogs <http://en.wikipedia.org/wiki/Blog>, news feeds or podcasts <http://en.wikipedia.org/wiki/Podcasts>. RSS is analogous to a table of contents. An RSS "feed" provides a table of contents for a site's content for a certain period of time; it does not provide the content itself, but links to the content. RSS is useful because it helps aggregate lots of content into an easily accessible place.�

This is the theory about RSS, but the reality is less simple. Currently, it doesn't exist a standard on this � protocol �: RSS1.*, RSS2.* and the newest Atom. Not entering in details, IETF as finally accepted Atom as reference (RFC4287), but the historical RSS isn't dead !

Knowing that, to begin our work on this project, we tested several applications using different RSS stacks, to determine which could be the best for SC environment. Then we learned to use Netbeans and to globally understand SC.

_*III/ Our implementation*_

Here, we don't explain in detail our code, which is completely based on the Gibberish protocol. We just have implemented a new RSS Protocol Provider Service using the library INFORMA <http://informa.sourceforge.net/> . Of course, we have implemented RSS functionalities modifying some files, and you can have a more precise overview on it while looking on their comments. We give here the main utility of them into our implementation:

    *

      ContactRssImpl.java: /*define the main characteristics of a RSS feed*/

    *

      OperationSetBasicInstantMessagingRssImpl.java: /*manage the most
      usefull features of our protocol, like retrieving a feed,
      creating/stopping the timer, sending messages...*/

    *

      OperationSetPersistentPresenceRssImpl.java: /*what we do when
      adding a new feed/contact*/

    *

      RssStatusEnum.java: /*define the two status, ONLINE and OFFLINE*/

    *

      ProtocolProviderServiceRssImpl.java: /*start or stop timer when
      changing status*/

or creating others:

    *

      RssThread.java: /*what the thread does when it's *//*launched
      automatically or manually*/

    *

      RssTimerRefreshFeed.java: /*what the timer does when it expires*/

    *

      RssFeedReader.java: /*manage the whole actions supported by a RSS
      feed (retrieve the feed, parse it, retrieve the interesting data,
      create the message to send...)*/

Now, we're going to detail which actions are possible when launching SC with RSS:

    *

      add a RSS account. You can just add one RSS account (useless to
      have some more), and you don't need to specify an ID nor a password.

    *

      add one or more groups, like in many other protocols soon
      implemented into SC.

    *

      add as much RSS flows than you want. These flows will be treated
      as contacts. When added, a feed is automatically retrieved and
      parsed. If the feed really exists at the specified URL, the
      display name of your new feed/contact will take the title of the
      retrieved flow.

    *

      change between two status: ONLINE and OFFLINE.

      /_ONLINE mode_/:

            When you get into this status, you launch in fact a timer,
            which will itself periodically launch a new thread (each 10
            minutes).

            This thread will look at the existing feeds/contacts and try
            to retrieve for each the corresponding .xml file before to
            parse it and post the a message to the current account.

      /_OFFLINE mode_/:

            When you get into this status, you just stop the timer. You
            have always the possibility (like in the ONLINE status
            moreover) the refresh one feed/contact you want: click on
            it, and send anything, like when you send a standard message
            with an another protocol. In fact, you just launch a new
            thread as if it were the timer, but just on a specific
            contact/feed.

/_Receive a message:_/

When a thread retrieve successfully a RSS flow, the RSS stack used (i.e. the INFORMA <http://informa.sourceforge.net/> library) returns all the data available on this flow, like its title, and all its items, each with its own attributes: title, link, date. You can see here <http://informa.sourceforge.net/apidocs/index.html> the JavaDoc of this library to have more details.

In our implementation, we work with these items' dates to determine if it's new or old items and know if we post them to the user or not. We have several situations:

    *

      sometime, the library isn't able to parse the date, because the
      xml file doesn't respect a standard format (like the website
      L'Equipe <http://www.lequipe.fr/>). In this case, we just post the
      last ten items retrieved.

    *

      when adding a new feed/contact, we don't have any date previously
      stored, so we post the last ten items retrieved too.

    *

      when we retrieve a feed and if a date is available, we determine
      the most recent item found, we take its date and we store it as a
      persistent-data into the file contactlist.xml just firing the
      listeners via the method �fireContactPropertyChangeEvent( )�.

    *

      during loading of SC, the metaContactList loads the last update
      date of each contact from this file. We use this date as referent
      during the first query of the current session, updating the
      persistent-data as necessary.

The method getPrintedFeed( ) applied on a ContactRssImpl give us the String containing all the new items compared to this last reference date. So, for finish, we just need to send this new message via the method fireMessageReceived( ) in the current OperationSetBasicInstantMessagingRssImpl.

That's all !

/_Specific notes on the date_/

Even if you subscribe to RSS flows on servers elsewhere in the world, which means with different local times, we work just with OUR local time. In fact, the parser give us the date of each item converted into our local times: if you see a difference between the time of your item (e.g. from the source page option from firefox applied to a RSS URL) and this saved into your contactlist.xml, don't panic, that's normal ! Each comparison between date is made according our time zone (GMT, GMT+1, GMT-1, ...).

For more details, have a look on the code itself and its comments.

_*IV/ What it remains to do *_

    At this time, we have implemented a RSS protocol from one stack, INFORMA. Obviously, we tested several others existing stacks, in particular ROME, and we retained that which seemed to us the best. This choice not being exhaustive, you can find one better in the future, and have a more pertinent implementation.

    The number of secured servers increasing rapidly, one other thing which it could be fun to implement would be to read and parse secured feeds � we didn't have enough time to study this question.

Based on Gibberish protocol, it's possible that it remains not usefull code for RSS implementation: don't hesitate to remove it !

Finally, I think we don't have really respected the SC code convention, so be indulgent: you can correct it yourself too !

_*Conclusion*_

So you can test this implementation (I'm sure it's many bugs inside !) and report to us or to our tutor (Emil Ivov) all bugs/reports/suggestions on it.

Thanks to the contributors on dev's mailing lists, and...sorry for my bad English !

Bye.

Javes.

------------------------------------------------------------------------

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@sip-communicator.dev.java.net
For additional commands, e-mail: dev-help@sip-communicator.dev.java.net


#3

Hello!
As I've already told to Emil, I'd be interested in further maintaining and
eventually porting the RSS support to the version of Xerces included in the
JRE, once I'm done with my exams :slight_smile:

Best regards,
Mihai

Hello Jean-Albert,

This is some very good work! I've been testing it these days and it
works really very well! I'd really like us to include it into the main
source tree. There is, however, one thing that bothers me:

SIZE!

The whole plugin has a size of 1.6MB which is a bit too much. AFAICS the
most hefty chunk is xerces. So I am wondering whether we could try and
get rid of it. Would it be too much work to make INFORMA use the XML
libs integrated inside the JRE instead of using directly xerces?

WDYT?

If you don't have the time to do this yourself, then perhaps someone
else might take it (Mihai, you still interested?)

Emil

···

On 5/27/07, Emil Ivov <emcho@emcho.com> wrote: