[jitsi-dev] [libjitsi] AES Bench (#80)


#1

You can view, comment on, or merge this pull request online at:

  https://github.com/jitsi/libjitsi/pull/80

-- Commit Summary --

  * Log AES benchmark result at info level
  * Make more accurate AES benchmark

-- File Changes --

    M src/org/jitsi/impl/neomedia/transform/srtp/AES.java (69)

-- Patch Links --

https://github.com/jitsi/libjitsi/pull/80.patch
https://github.com/jitsi/libjitsi/pull/80.diff

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80


#2

The AES benchmark is used at runtime, right? Consequently, I'm not sure I'd like to (1) log with info level or (2) run the benchmark for a very long time (i.e. 10_000 times). Anyway, I'll try to give these a second thought (after a refresher on the implementation details) soon.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-183358337


#3

Hi @lyubomir,
The benchmark is indeed done at runtime,
and as noted in my commit doing it 10000 times only take 0,3s for the 4 providers.
The problem with doing it only say 1000 times is that the JIT doesn't kick in, and you choose OpenSSL when in reality JCE is faster ... (just tested)
I would prefer to do it only once at program startup, but 100_000, to be 100% sure of what we choose.

About the log it's 2 lines per conference (for now), where you dump entire xmpp packets in the log at info level.

Can you share your plans about the "refresher on the implementation details" on the ML?

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-183370357


#4

Oh, by "referesher on the implementation details" I mean that I need to read through the source code again before giving a final comment on your changes because I've forgotten some of the implementation details.

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-183376523


#5

another important note,
i'm using perf-java-top (https://github.com/jrudolph/perf-map-agent) to see what function take more cpu (live), and i see that slowly (10min+ in conference) "Interpreter" go down the list, which indicate (i think) that the JIT slowly optimize part of the code, so we need to run really long bench to be sure (not at runtime but when we "optimize").

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-183377962


#6

we need to run really long bench to be sure (not at runtime but when we "optimize")

Well, it depends on what we want to achieve really. My choice was to run the benchmark multiple times knowing that the JIT would eventually kick in and we would end up with the most optimal implementation even if we didn't start with it. Given that I did the whole crypto optimization for the purposes of Videobridge, (1) it being a server which would run for very long periods of time and (2) it handling very large numbers of conferences make me prefer multiple short benchmarks. (Anyway, as I said, I'll think about it more.)

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-183386395


#7

The JIT only kick in if you use the code, and if one (say BouncyCastle) get chosen at one round, it will win many of the following bench round as the JIT will kick in for this one and not the others.

1000 times encrypting 1024 bytes isn't enough to choose JCE, 10000 is, for now we are encrypting 1 times 16384 bytes, so it might take you between 62.5 and 625 bench (read 1000 new participants) to choose the right provider in theory (poor extrapolation),
and in practice the current bench is really really too small so the measurement imprecision are bigger than the tests, so we are close to random :frowning:

To always have the same comportment each run (and from the start) I really prefer one long bench at startup (i can provide patches)

I will let you think about it more

Regards

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-183399644


#8

Can one of the admins verify this patch?

···

---
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-191945329


#9

Hi @lyubomir,
now that #97 is merged, what do you think about dropping everything and just use JCE as it's the fastest fallback with java 7/8 (it uses AES-NI)

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205694827


#10

@champtar, could you please be more specific about what exactly you want to drop? I think my OpenSSL Digest implementation was faster than BouncyCastle's and Java's and it was useful on its own because there are places outside of SRTP in which hashing functions are used.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205845992


#11

@lyubomir i mean drop AES.java, as JCE is always faster than BC or SunPKCS11 on java 7/8

JNI calls are really expensive, so i'm not found of keeping generic inefficient JNI wrapper around, and prefer to implement specific JNI wrapper for the fast path. I'll do some more bench tomorrow.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205898739


#12

@champtar, no, it's not always faster because it's not always available.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205911645


#13

if JCE is not always available, why not just call ```javax.crypto.Cipher.getInstance("AES/ECB/NoPadding")```, this should use JCE on most system. JavaDoc state that ```AES/ECB/NoPadding``` is mandatory (https://docs.oracle.com/javase/7/docs/api/javax/crypto/Cipher.html)

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205919894


#14

AES-NI is a hardware feature, it's not always available. Even on hardware which supports it, virtualization may choose not to enable it. I do not approve any removal and I disapprove of the removal of the OpenSSL Digest.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205922371


#15

I've heard you loud and clear for OpenSSL Digest :), but still JCE or BC interface is bad for JNI, ie 4 JNI calls instead of 1 is a waste of cpu cycles / performance.

For AES we should statically link jnopenssl against libcrypto so it work out of the box for everyone and AES.java is a fallback that is never used, and OpenSSL take care of using fastest version possible.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205934787


#16

@lyubomir was there a removal in #97 that you didn't want removed? If so, how did that go through in a month long conversation? @champtar is right to avoid as many JNI calls as possible, i.e. to calculate HMACs in one go. Calculations passing individual chunks to native code might be slightly faster than BC or JCE, but that advantage is only needed it in tight loops - where we have a better way now (by not using chunks).

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205944859


#17

For AES we should statically link jnopenssl against libcrypto so it work out of the box for everyone and AES.java is a fallback that is never used, and OpenSSL take care of using fastest version possible.

I don't think we should statically link to libcrypto because we want to depend on the system libcrypto version which is going to be updated with security fixes much more often than we are going to rebuild our JNI.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205947318


#18

There should definitely be no static linking to libcrypto. Distributions like Debian forbid that anyway in packages. And now that ice4j has been freed from the non-free javax.sdp getting into Debian has become an option again for libjitsi.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205950242


#19

Until such a day comes that AES-NI support is as common as support for arithmetic operations AES.java isn't going anywhere.

Unless I am missing some significant pain point, the removal of OpenSSLHMAC in PR #127 could also have waited. It is non-trivial, well written and well working code and I don't think we have enough of a history on Jitsi and JVB without it to go and remove it. @lyubomir agrees that bringing this latter one back would be non-trivial so we might as well do it once we actually need it again. But on the whole I'd like us to only remove working lower-layer code only after we've had significant history of running without using them.

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-205965309


#20

To sum up:
-I've removed OpenSSLDigest (sha1) because there was no user for it (if OpenSSLHMAC doesn't work, OpenSSLDigest doesn't work), it contains at least one bug, and it was implementing BC interface (init/update/dofinal) which is not optimal at all because JNI call are expensive. We could have keep it because it's faster than JCE or BC, but this encourage us to do half optimisation, or to use it outside of the fast path thus making code more complex for no real gain

-I've removed OpenSSLBlockCipher because it's slower than JCE once the JIT kick in (that's the problem of micro benchmark) and there was no user (if SRTPCipherCTROpenSSL doesn't work, OpenSSLBlockCipher doesn't work)

-I've fixed OpenSSLHMAC without modifying it much (still 4 JNI calls per packet) because the real way to go is only one JNI call to encrypt + mac or mac + decrypt

for now the top 12 is

	(t  5,4,s  5,4) java.lang.Object.wait
	(t  3,1,s  3,0) java.net.PlainDatagramSocketImpl.receive0
	(t  2,2,s  2,2) java.net.PlainDatagramSocketImpl.send
	(t  1,3,s  1,3) sun.misc.Unsafe.park
	(t  1,2,s  1,2) org.jitsi.impl.neomedia.transform.srtp.OpenSSLHMAC.HMAC_Update
	(t  2,6,s  1,1) java.util.concurrent.locks.AbstractQueuedSynchronizer.release
	(t  1,0,s  1,0) java.lang.Object.notifyAll
	(t  0,9,s  0,9) java.util.concurrent.locks.ReentrantLock$Sync.tryRelease
	(t  0,9,s  0,9) sun.misc.Unsafe.unpark
	(t  1,4,s  0,9) org.ice4j.socket.MultiplexingXXXSocketSupport.acceptBySocketsOrThis
	(t  6,2,s  0,8) org.ice4j.ice.harvest.SinglePortUdpHarvester.runInHarvesterThread
	(t  0,8,s  0,8) org.jitsi.impl.neomedia.transform.srtp.SRTPCipherCTROpenSSL.AES128CTR_CTX_process

so if we want better performance, we need to implement AES-GCM, and to remove most synchronized

···

---
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/jitsi/libjitsi/pull/80#issuecomment-206184735