Inserting locally connected device data into chat

Let me paint a picture. You are in a chat with another person who is at some remote location. Sitting next to that person is a robotic arm (or other device… in this case it’s an open source industrial robotic arm from Haddington Dynamics, Inc called Dexter). The robot arm is connected to the computer via the local network. CAT5 because low latency is critical. The remote person gives you permission to move their robotic arm and goes off to do other things. Sitting next to you is another robotic arm. You put the local arm into “follow me” mode and take hold of it. Your local robot arm follows your movements. Those movements are transmitted to your local computer, then out through the chat of the jitsi meeting (or some other channel if available) and into the computer on the remote end, where it is routed to the remove robot arm, making it move. From your local position, you move things around and do useful work on the remote end. If the robot on the remote end runs into something, it sends back data about the forces it is experiencing and those are used to drive the motors in the local arm, so that you can feel the responses.

The missing bit is routing the data from the robot arm into the chat at this end and out of the chat into the robot arm at that end. All the rest of that paragraph already exists, except the sentences that are in italics. We can already control one (or many) robot arm with another on the local, very low latency network, including stunningly realistic haptic feedback, we are just missing the ability to go over the internet.

There are two ways I can imagine this being done.:

  1. We host Jitsi ourselves and modify the web pages it serves so that the browser reaches out to the robots via websockets. We have a browser / websocket interface already. However: There are cross scripting issues. Bigger issue: This limits the computational work on the robots motion to what can be accomplished in the browser

  2. We have a VERY advanced IDE already which installs as an Electron app and has gobs of code in it for kinematics which can really help make predictions about robot movement which will reduce latency and provide a better experience. If that application can connect to the room as if it were a separate user, and send PMs to it’s remote counterpoint, then no change to Jitsi would be required. Human operator can see / hear via video / audio with remote PC, and move the arm via text chat PM between applications.

Can you advise me as to how to allow an application, which is not a web browser, to join a meeting and send PM texts to another participant? Or in general, how would you implement transfer of data between devices attached to PC’s other than video / audio?

For that just xmpp over bosh will work. Just join the muc myroomname@conference.mydomain.com and sent the message to the participant you want. Choose language and libraries by your taste, just make sure it supports xmpp and bosh … if the connections to your serve will not go over public internet you are good and with opening port 5222 and connect directly and skip bosh.

Not sure what a “muc” is… “Multi User Chat”? So if we are browsing


then our program would connect to massmind@meet.jit.si ?

Is there any documentation or example of the XMPP message we would compose and send? I assume that would be a very basic XMPP message, just specifying the destination “user” (remote application) and the content?

We typically work in node.js / electron. Is this a good NPM package to try?


Is that all we need?

Pardon the newbie questions, but none of us have worked with XMPP or BOSH or Jitsi before now. Our lead developer from MIT has a lot of experience with NPM and has written a really nice development environment for the robot called DDE (Dexter Development Environment), so I’m sure he can figure it out with a few more hints.

The document seems not reachable at the moment:) https://xmpp.org/extensions/xep-0045.html That has many examples.
muc is Multi user chat and from https://meet.jit.si/config.js you can see the address of the muc:
‘conference.meet.jit.si’, os if you open https://meet.jit.si/myroom, that will be room myroom@conference.meet.jit.si.

1 Like

I have one question about the from= when joining a room (it’s in italics in the below) I’m including my understanding of everything here incase you want to check it over and make sure I have it right.

So what I get is that they have a format of:
room@service/nick where “nick” is the username in the room, “room” is one specific meeting room or identifier for a video call, and “service” is the host domain. So if it were “James Newton” in
httpd://meet.jit.si/massmind then they would say
massmind@meet.jit.si/James%20Newton
Probably best to avoid a nick with a space…

They call room@service/nick the JID. I’m guessing that JID is “Jabber ID” since this used to be called Jabber.

“A user enters a room (i.e., becomes an occupant) by sending directed presence to <room@service/nick>.”

“An occupant exits a room by sending presence of type “unavailable” to its current <room@service/nick>.”

Section 6.4 is how to ask a room what features it supports. We want to make sure it’s not private or requiring a login. Not sure this is needed at least initially.

Section 7.2.1 shows how to join a room:

<presence
    from='hag66@shakespeare.lit/pda'
    id='n13mt3l'
    to='coven@chat.shakespeare.lit/thirdwitch'>
  <x xmlns='http://jabber.org/protocol/muc'/>
</presence>

So the <x xmlns='http://jabber.org/protocol/muc'/> just says “I’m speaking your language”. the to= appears to be talking about room@service/user which I think means this is a user known as “hag66@shakespeare.lit/pda” wants to be known as “thirdwitch” joining a MUC in a room called “coven” on the “chat.shakespeare.lit” server. Apparently the from field is required? But on Jitsi there is no user signup, so do we just make that up?

I think the id is just a random string to make sure messages don’t get misrouted.

Section 7.5 is sending a private message to another user in the MUC

<message
    from='wiccarocks@shakespeare.lit/laptop'
    id='hgn27af1'
    to='coven@chat.shakespeare.lit/firstwitch'
    type='chat'>
  <body>I'll give thee a wind.</body>
  <x xmlns='http://jabber.org/protocol/muc#user' />
</message>

So here, it appears that the from= is the users real id, but the server will replace it with the nick “thirdwitch” (which they had used to join the room) before sending the message on to the to= user. The to= starts with the current room/server and the nick for the intended destination just gets added to that. So then the message gets sent to the nick “firstwitch” which the server happens to know is actually crone1@shakespear.lit/desktop.

<message
    from='coven@chat.shakespeare.lit/secondwitch'
    id='hgn27af1'
    to='crone1@shakespeare.lit/desktop'
    type='chat'>
  <body>I'll give thee a wind.</body>
  <x xmlns='http://jabber.org/protocol/muc#user' />
</message>

It’s possible that this is all we need to know about the system. Next we just need to figure out how BOSH works.

Oh, hang on… I missed the conference.

So we would use:

<presence
    from='Dexter@hdrobotic.com/DEX-12345'
    id='n13mt3l'
    to='massmind@conference.meet.jit.si/Dexter'>
  <x xmlns='http://jabber.org/protocol/muc'/>
</presence>

But we still connect to meet.jit.si right? We are trying to run this code (again, basically node.js)

// doc at: http://strophe.im/strophejs/doc/1.3.0/files/strophe-js.html
var Strophe = require("strophe.js").Strophe

//var conn = new Strophe.Connection("/http-bind/") //the local server???
var conn = new Strophe.Connection("https://meet.jit.si/massmind")
console.log(conn)

//conn.getUniqueId() //probably needed in sending a message?

var send_stanza = Strophe.xmlElement("presence",
              {from: 'Dexter@hdrobotic.com/DEX-12345',
              id: conn.getUniqueId(), //'n13mt3l',
              to: 'massmind@conference.meet.jit.si/dde' 
                 //dde is the "user name", massmind is the "room"
              })
conn.send(send_stanza)

//maybe this is for listening to incoming message
conn.addHandler(function(stanza) {
   console.log("got: " + stanza)
   return true //returning true keep handler installed
   }
   //more args to select when this handler is called
)

conn.disconnect("reason to disconnect")

When we run this, the conn object’s connected attribute is false, that seems wrong?

conn.send doesn’t error, but the call back function is not called.

That is wrong, that is not a bosh connection address.

Looks like we were also missing calling the .connect method on the conn object.

So… should it be “meet.jit.si/http-bind”? When we use that, we get:
1 CONNECTING
3 AUTHENTICATING
2 CONNFAIL
6 DISCONNECTED

or “meet.jit.si/http-pre-bind”? When we use that, we get:
1 CONNECTING
2 CONNFAIL
6 DISCONNECTED

In each case, this is the code we are trying to use; not even trying to join a room yet, just trying to get connected.

var conn = new Strophe.Connection("https://meet.jit.si/http-bind")
conn.connect('massmind@meet.jit.si/cfry', //fake jid ok?
             "", //don't need password?
             function(status_code) { //status_code is a small non-neg integer
                   console.log("connected with status: " + status_code + " " + strophe_status_code_to_name(status_code))
             }) 

We understood that user names are not really required for Jitsi, so the “massmind@meet.jit.si/cfry” is just made up, and we don’t understand why authentication fails.

Is there any documentation for opening a bosh connection on jit.si?

We are attempting to add support for Jitsi to a robot control IDE so that robots can be controlled in Jitsi calls for remote access and collaboration. This will enable medical treatment over the internet with a FOSS robot arm and may well save lives.

So far, after reading the (massive) xmpp spec and the (massive) documentation available for Stroph.js, the above is where we are. Any help would be deeply appreciated

Were you able to get the system up and running?

I am investigating using the jitsi video feed and pipeline over internet for some telepresence/rover group/school projects during the extended stay away from people covid excercises. I am currently testing jitsi as a video only feed on android or raspberry pi while controlling robot movements via remotexy android to local esp32 wifi devices. My goal is a web interface with jitsi only for both tasks.

If you did get it running, do you have any notes on the control latency and how well it syncs with the video latency?

Thanks

We were not able to get it to work. The code is very tied to the browser environment, so using it in any other device is quite difficult.

We ended up doing our own thing, implementing a very simplified BOSC system on a free google app server project. With that, we had round trip latencies from 120 to 600ms. Video was run separately.

Thank you for the results.

Curious about the browser environment difficulties… was that a case where proficiency with coding in that format was the main issue, or did you determine it to be difficult and cumbersome even for those that are very experienced?

Basically, there was no interface to go from a non-browser environment into the code to send a chat or other data message. That we could find anyway. E.g. you could launch a call, and then the browser environment would take over, and there was a user interface for humans to type in chat messages, but not way for the code that launched that call to send a chat message.

We could probably eventually have figured out how to add that interface, but having to fire up an entire browser to send text just seemed silly. BOSC is (or can be) VERY light weight and simple, and super fast, even on a free server, so it didn’t seem worth the extra trouble to keep the video together with it. We could probably have sent compressed video data over BOSC as well, although we haven’t.

In the end, we think that standard video compression isn’t all that good for robot teleoperation anyway because doing a 3D scan at the robot end and transmitting back voxels and textures eventually produces lower bandwidth in the case where you aren’t focused on a human moving around.

e.g. if the robot is moving in a 3D environment that isn’t changing, and the motion of the robot can be subtracted from the image, then no data other than movement needs to be sent and the accurate camera image from the robot can still be produced at the human end. Producing the same estimated camera image at the robot end, then comparing that with the action image, means we only need to send the difference, and only once.

James, thanks for the extra details.

I will take a look at the BOSC approach and consider it when I get around to finding some assistance with this project.

If you have any video of your arm in action that is sharable, that would be cool to check out.

Good luck!

If you want more info about the BOSC setup, see: