LibJitsi BouncyCastle FIPS port


#1

I have recently started working on porting LibJitsi to work with the BC-FIPS version of BouncyCastle.

So far, I have managed to build the VideoBridge with my modified libjtsi jar, and deploy a server, and even with much effort get as far as getting a DTLS-SRTP feed, and stats via the DtlsPacketTransformer.

Basically sometimes everything seems to work correctly, fingerprints are verified, keys are exchanged, packets begin flowing, and a peer connection is established on the client.

The issue is, it seems to intermittently fail, and in general I am getting lockups inside of BouncyCastle, particularly in the clientHandshake method of DTLSClientProtocol, which is being called from connect in the runInConnectThread method of DtlsPacketTransformer.

src/org/jitsi/impl/neomedia/transform/dtls/DtlsPacketTransformer.java:1016

In all likelihood this is something more to do with the BouncyCastle FIPS implementation that with Jitsi, but I imagine that you guys have a lot of experience debugging the VideoBridge and these sorts of things, and I was hoping maybe someone here would have some insight or ideas about what exactly might be causing this, or how to debug it.

So far I have been using jdb on my server to basically trace through libjitsi, and it seems like it’s stopping in the clientHandshake portion, sometimes indefinitely, and I suspect that it would be easier to debug by looking at the network side of things, and I thought perhaps you guys maybe already had some tools or mechanisms for doing this.

I seem to remember there is a way to dump a pcap file or something along those lines?

Also, I was wondering if anyone else, or the Jitsi team, would be interested in this work?

So far it is only a proof of concept at best, and will need some cleaning up once it’s working entirely.

Any help or pointers would be greatly appreciated.


#2

During some rewrite work I noticed that the version of bouncycastle we’re using is deprecated and I considered updating it, but decided to table it for now to focus on getting the other work in before going back to that…that’s about as close as I’ve come to any bouncycastle work. That being said, I did change the way we interact with it quite a bit in the new code. If there’s a chance the lockup you’re seeing is happening due to some libjitsi threading issues, then it may be fixed (or, at least, broken in a different way :)) in the new code. I also have some easy harnesses in the new code to feed pcaps in; I haven’t used them for interacting with the DTLS code but they’d serve as good examples on how to do it.

Not sure how much that helps, since things aren’t done yet…but maybe it will help you isolate some things to determine where the problems lie.

The “new” libjitsi is here. Here and here are some examples of feeding pcap files in.


#3

Have you looked at the handshake in wireshark?


#4

@bbaldino what efforts beyond setting the version to 1.60 would be needed?


#5

I think I probably did when I was debugging getting the handshake working, but I’m afraid that was a while back. EDIT: lol oops, obviously this wasn’t directed towards me.

You mean to integrate a newer bouncycastle version in the new libjitsi? I’m not sure. It’s possible the deprecated API we’re using has been removed completely, so it would have to be replaced. Either way I imagine you’d want to change a fair amount of the code in the dtls dir (DtlsStack and the other classes).


#6

Yes. I did compile jvb today with bouncycastle 1.60. It does produce the deprecation warnings on AESFastEngine. Not sure if it works though, will try it eventually.


#7

@bbaldino, Actually the idea it could be threads stepping on one another in some way had crossed my mind, so it’s useful to hear you say that is a possibility.

As far as I can tell, most of the access to datagramTransport and the dtlsClientProtocol object are being done in the connectThread that is created as part of the runInConnectThread method of DtlsPacketTransformer.

Is there any higher level documentation or architecture maps for libjitsi that might help me sort of conceptualize what might be causing thread contention?

@jesse I haven’t tried WireShark yet… I suppose if it is in the client handshake process, I should be able to see the exchange unencrypted.

@bbaldino, I will take a look at your port, and maybe use parts of it. I think perhaps the BC-FIPS library I am using either has a somewhat older, or somewhat newer API exposed to BouncyCastle, many things had to be changed with regards to certificates, and getting a hold of block ciphers, and maybe some of this work is related.

Thanks for your quick reply, and helpful suggestions.


#8

There is no diagram for that code, I’m afraid. Things do get complicated with regards to threads once the handshake is done, but you’re right that things should be handled by the connect thread there. The new versions might be a little easier to read, so maybe it will help things make sense a little. Agree that wireshark could be interesting: maybe the bridge is re-sending packets and not getting responses? Or getting packets and sending no response? etc.


#9

So, it’s not a Wireshark trace, but I did put some print statements in DatagramTransportImpl at src/org/jitsi/impl/neomedia/transform/dtls/DatagramTransportImpl.java:290

And in in doSend at: src/org/jitsi/impl/neomedia/transform/dtls/DatagramTransportImpl.java:172

For now just printing out that something was sent, and the size of the packet.

In the scenario where the handshake succeeds and doesn’t lock:

JVB 2019-01-16 22:28:36.720 WARNING: [2127] org.jitsi.impl.neomedia.transform.dtls.DTLSClientProtocolWrapper.connect() JRT: Before Client Handshake.
JVB 2019-01-16 22:28:36.721 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: DO SEND 145
JVB 2019-01-16 22:28:36.768 INFO: [2118] org.jitsi.videobridge.Channel.log() CAT=stat transport_connected,conf_id=fd2b29be2ef27ce0,content=video,ch_id=1f45958587dfbb48,endp_id=33380132,stream=116559061
JVB 2019-01-16 22:28:36.792 INFO: [2118] org.jitsi.videobridge.Channel.log() CAT=stat transport_connected,conf_id=fd2b29be2ef27ce0,content=data,ch_id=f61f33a85a9a3a18,endp_id=33380132
JVB 2019-01-16 22:28:36.794 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 658
JVB 2019-01-16 22:28:36.800 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DtlsControlImpl.log() TLS Signing Algorithm: 1.2.840.10045.4.3.2
JVB 2019-01-16 22:28:36.812 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 658
JVB 2019-01-16 22:28:36.813 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DtlsControlImpl.log() REMOTE Fingerprint Algorithm: sha-256, CA:0B:B8:D0:1E:12:50:46:93:69:FD:63:EE:59:94:CF:98:3C:88:57:BD:3E:8D:C6:81:0F:D2:E5:EB:B3:67:AC
JVB 2019-01-16 22:28:36.815 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DtlsControlImpl.log() Searching for digest algorithm from SHA256WITHECDSA

JVB 2019-01-16 22:28:36.897 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 658
JVB 2019-01-16 22:28:36.908 WARNING: [2127] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() Unknown DTLS handshake message type: 45
JVB 2019-01-16 22:28:36.909 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: DO SEND 795
JVB 2019-01-16 22:28:36.913 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 91
JVB 2019-01-16 22:28:36.929 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 157
JVB 2019-01-16 22:28:36.930 WARNING: [2127] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() Unknown DTLS handshake message type: 83
JVB 2019-01-16 22:28:36.930 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: DO SEND 795
JVB 2019-01-16 22:28:36.931 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 91
JVB 2019-01-16 22:28:36.932 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: DO SEND 795
JVB 2019-01-16 22:28:36.933 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 91
JVB 2019-01-16 22:28:36.941 INFO: [2127] org.jitsi.impl.neomedia.transform.dtls.DtlsPacketTransformer.log() JRT: AFTER connect DtlsPacketTransformer

Before Client Handshake is inside DTLSClientProtocol.java right before clientHandshake is called.

And the ‘AFTER connect’ log message happens right after dtlsClientProtocol.connect has finished in DtlsPacketTransformer.java.

And for the circumstance that does not succeed:

2019-01-16 22:31:44.350 WARNING: [3194] org.jitsi.impl.neomedia.transform.dtls.DTLSClientProtocolWrapper.connect() JRT: Before Client Handshake.
JVB 2019-01-16 22:31:44.351 INFO: [3194] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: DO SEND 145
JVB 2019-01-16 22:31:44.353 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 657
JVB 2019-01-16 22:31:44.403 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 657
JVB 2019-01-16 22:31:44.502 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 657
JVB 2019-01-16 22:31:44.702 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE 657
JVB 2019-01-16 22:31:44.951 FINE: [55] org.jitsi.videobridge.xmpp.ComponentImpl.processIQ() (serving component ‘JitsiVideobridge’) Processing IQ (packetId adoxy-386):
JVB 2019-01-16 22:31:44.951 FINE: [55] org.jitsi.videobridge.xmpp.ComponentImpl.log() RECV:
JVB 2019-01-16 22:31:45.102 INFO: [170] org.jitsi.impl.neomedia.transform.dtls.DatagramTransportImpl.log() JRT: QUEUE RECEIVE

The responses seem to all group together, and the thread dealing with verifyAndValidateCertificate for DtlsControlImpl seems to be lazily not being notified, or not being triggered…

So it’s like the queue of messages is being backed up and not serviced.

The log for the Fingerprint algorithm I placed at src/org/jitsi/impl/neomedia/transform/dtls/DtlsControlImpl.java:917

Here is a branch on github with the very rough, and preliminary proof of concept FIPS changes.

It was forked from 1058 which matches the current version of my VideoBridge, and may not work against the latest VideoBridge.


#10

I wonder if these “Unknown DTLS handshake message type: XX” messages are relevant somehow?


#11

Wireshark might make things easier (or at least another data point) as it can parse DTLS.


#12

Those do smell. Neither DTLS 1.2 nor (draft) DTLS 1.3 has a handshake message of type 45 nor 83:
https://tools.ietf.org/html/rfc6347#section-4.3.2
https://tools.ietf.org/html/draft-ietf-tls-tls13-28
Does that mean somehow the handshake state is messed up and non-handshake packets are being (mis)interpreted as handshake messages?

Wireshark/tcpdump and/or -Djavax.net.debug=ssl along with more debug statements should help (although I don’t know how much more logging you’ll get with that option since we’re using bc and jitsi for most of the stuff, so maybe -Djavax.net.debug=all, expect a ton of logs with either option).


#13

@Jason_Thomas one more helpful thing if you haven’t already done it is a thread dump when it is locked up.


#14

Just a small update to those interested, I just got back to working on this, and looking at Wireshark.

When everything is locked up, the trace looks like the following:

And when it succeeds, it looks like this instead:

It’s interesting that we seem to get lots of Server Hellos fairly quickly, and actually can’t be a total deadlock, since the server seems to be doing its job and sending the packets expected of it.

Sometimes if I wait a very long time it will eventually respond with an equal number of Certificate, Client Key Exchange messages for however many Server Hellos there were, which you can see eventually happened in the first trace above.

It’s odd, because I can see the server messages hitting my client, but my client isn’t responding immediately.

However it eventually either responds to all of those messages, or I receive an ICE Failure message, I suspect perhaps the client itself is waiting on something to happen before it sends a response to the server hello?

I can attach the capture files if someone would like to go through them.


#15

Looking deeper at the packet inspection against my videobridge with the regular libjitsi, versus the FIPS version, I’ve identified that there is a small difference between the Server Hello DTLS responses:

Particularly this “Extension Master Secret” which adds an additional 4 bytes to the packet, and has its ‘Length’ field set to 0.

Perhaps some misconfiguration of the BCFIPS library, or maybe something I’m not setting up correctly could explain it?

The messaging regarding unrecognized DTLS message types appear both in the regular JVB and my patched version, so I believe that is a red herring.


#16

Interestingly the Client Hello specifies an extended_master_certificate field that also has a length of 0, and it looks like the BouncyCastle source detects this when parsing the client hello, and produces the extended_master_secret field to match it, with a length also as 0.

https://tools.ietf.org/html/draft-ietf-tls-session-hash-06

An interesting note on the logic in the BC FIPS code that deals with this says:

     * draft-ietf-tls-session-hash-04 4. Clients and servers SHOULD NOT accept handshakes
     * that do not use the extended master secret [..]. (and see 5.2, 5.3)

I wonder if there is some way to get Chrome to offer the hello with a non-zero extended_master_secret?

Or to get it to not offer it at all?

It seems to be ignoring the Server Hello responses, and this seems like the most likely culprit at the moment.


#17

I wonder if this is related:
https://bugs.chromium.org/p/webrtc/issues/detail?id=4223


#18

I think maybe it’s related to the bouncycastle patches you guys are applying not being applied to the FIPS library? (They literally can’t be because the binary can’t be effected without breaking the signing.)

(Actually appears those patches are still being applied… But I imagine somehow maybe the SDP is signaling to the Chrome client to include these broken extension fields?)

Can anyone explain to me what the motivation is behind modifying the server extensions that are being advertised?

Basically the contents of resources/install/debian/patches/bouncycastle-1.51.patch:59

I am definitely seeing the Client Hello Chrome is generating setting a larger array of extensions including the extended_master_secret with a length of 0, which isn’t showing for the client hello against the normal JVB.


#19

So, basically Chrome is just not responding to the ServerHello messages from the BCFIPS bouncycastle clientHandshake.

For some odd reason, the ClientHello that Chrome sends first, has a different set of extensions depending on which libjitsi my backend is using, including the zero length extended_master_secret.

Does anyone know what exactly configures what extensions Chrome sends via its ClientHello.

The SDPs are pretty much identical, and I can only suspect it’s something in STUN/ICE?

Dumping out logs for dtlssrpttransport.cc and opensslstreamadapter.cc in Chrome yields the following on Chrome’s side:

[92848:28675:0120/185916.968921:INFO:opensslidentity.cc(45)] Making key pair
[92848:28675:0120/185916.969143:INFO:opensslidentity.cc(93)] Returning key pair
[92848:28675:0120/185916.969171:INFO:opensslcertificate.cc(58)] Making certificate for WebRTC
[92848:28675:0120/185916.969573:INFO:opensslcertificate.cc(105)] Returning certificate
[92848:28675:0120/185917.000721:INFO:dtlssrtptransport.cc(60)] Setting RTCP Transport on audio transport 0x0
[92848:28675:0120/185917.000769:INFO:dtlssrtptransport.cc(65)] Setting RTP Transport on audio transport 0x7fe8d9aa77c0
[92848:28675:0120/185917.001476:INFO:dtlssrtptransport.cc(60)] Setting RTCP Transport on video transport 0x0
[92848:28675:0120/185917.001492:INFO:dtlssrtptransport.cc(65)] Setting RTP Transport on video transport 0x7fe8d98441b0
[92848:28675:0120/185917.001649:INFO:dtlssrtptransport.cc(60)] Setting RTCP Transport on data transport 0x0
[92848:28675:0120/185917.001664:INFO:dtlssrtptransport.cc(65)] Setting RTP Transport on data transport 0x7fe8d7760050
[92848:28675:0120/185917.059821:INFO:dtlssrtptransport.cc(60)] Setting RTCP Transport on audio transport 0x0
[92848:28675:0120/185917.059844:INFO:dtlssrtptransport.cc(65)] Setting RTP Transport on audio transport 0x7fe8d9aa77c0
[92848:28675:0120/185917.059929:INFO:dtlstransport.cc(376)] DtlsTransport[audio|1|]: DTLS setup complete.
[92848:28675:0120/185917.869110:VERBOSE1:dtlstransport.cc(492)] DtlsTransport[audio|1|
]: ice_transport writable state changed to 1
[92848:28675:0120/185917.869128:INFO:dtlstransport.cc(791)] DtlsTransport[audio|1|]: configuring DTLS handshake timeout 1310 based on ICE RTT 655
[92848:28675:0120/185917.869143:INFO:opensslstreamadapter.cc(766)] BeginSSL with peer.
[92848:28675:0120/185917.869290:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185917.869365:VERBOSE1:opensslstreamadapter.cc(851)] – error want read
[92848:28675:0120/185917.869386:INFO:dtlstransport.cc(704)] DtlsTransport[audio|1|
]: DtlsTransport: Started DTLS handshake
[92848:28675:0120/185917.869399:VERBOSE1:dtlstransport.cc(773)] DtlsTransport[audio|1|]: set_dtls_state from:0 to 1
[92848:28675:0120/185917.869426:VERBOSE1:dtlstransport.cc(524)] DtlsTransport[audio|1|
]: ice_transport receiving state changed to 1
[92848:28675:0120/185918.964574:VERBOSE1:opensslstreamadapter.cc(727)] OpenSSLStreamAdapter::OnEvent SE_READ
[92848:28675:0120/185918.964603:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185918.964959:VERBOSE1:opensslstreamadapter.cc(851)] – error want read
[92848:28675:0120/185920.274813:INFO:opensslstreamadapter.cc(938)] DTLS timeout expired
[92848:28675:0120/185920.274920:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185920.274940:VERBOSE1:opensslstreamadapter.cc(851)] – error want read
[92848:28675:0120/185922.895081:INFO:opensslstreamadapter.cc(938)] DTLS timeout expired
[92848:28675:0120/185922.895181:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185922.895213:VERBOSE1:opensslstreamadapter.cc(851)] – error want read
[92848:28675:0120/185928.135177:INFO:opensslstreamadapter.cc(938)] DTLS timeout expired
[92848:28675:0120/185928.135269:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185928.135295:VERBOSE1:opensslstreamadapter.cc(851)] – error want read
[92848:28675:0120/185938.258951:VERBOSE1:dtlstransport.cc(524)] DtlsTransport[audio|1|]: ice_transport receiving state changed to 0
[92848:28675:0120/185938.615142:INFO:opensslstreamadapter.cc(938)] DTLS timeout expired
[92848:28675:0120/185938.615235:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185938.615265:VERBOSE1:opensslstreamadapter.cc(851)] – error want read
[92848:28675:0120/185943.038558:VERBOSE1:dtlstransport.cc(492)] DtlsTransport[audio|1|
]: ice_transport writable state changed to 0
[92848:28675:0120/185959.574495:INFO:opensslstreamadapter.cc(938)] DTLS timeout expired
[92848:28675:0120/185959.574544:VERBOSE1:opensslstreamadapter.cc(820)] ContinueSSL
[92848:28675:0120/185959.574572:VERBOSE1:opensslstreamadapter.cc(851)] – error want read

It doesn’t say anything obvious regarding simply not being able to parse the ServerHello, or finding some certificate mismatch or something along those lines like I would expect, but instead acts like I can’t even see the ServerHello that my client is sending and, is definitely being received according to Wireshark, which is running on the client machine.

I’m really stuck here, so any insight from any of you guys that have spent more time mucking around with DTLS would be greatly greatly appreciated.


#20

:facepalm: I was reading that all backwards— The ClientHello is from Jitsi, and the ServerHello is from Chrome.

It appears that the loop inside BouncyCastle FIPS is what isn’t reading/processing the ServerHello coming back from Chrome.

It could possibly be a bug in the FIPS implementation of BouncyCastle.

At least I have fairly concrete proof of this now.

It could possibly still be a thread starvation issue, which would explain why it intermittently succeeds. But as far as I can tell the reads are being processed on the same thread as what is writing the ClientHello, so it seems more likely to be a protocol issue.

I managed to remove the extra extensions from the ClientHello for extended_master_secret, and they appear to be a red herring, and not to effect the underlying broken behavior.