Jibri - Chrome hangs and crashes

Hi everyone,

We have been experiencing Chrome crashes at times when recording with Jibri. The logs look like this

INFO: [17] org.jitsi.jibri.selenium.JibriSelenium.run() Jibri client receive bitrates: {audio={download=46, upload=0}, download=918, upload=0, video={download=872, upload=0}}, all clients muted? false
java.lang.Thread.run(Thread.java:748)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
org.jitsi.jibri.util.extensions.SchedulerExecutorServiceExtsKt$sam$java_lang_Runnable$0.run(SchedulerExecutorServiceExts.kt)
org.jitsi.jibri.selenium.JibriSelenium$startRecurringCallStatusChecks$1.invoke(JibriSelenium.kt:119)
org.jitsi.jibri.selenium.JibriSelenium$startRecurringCallStatusChecks$1.invoke(JibriSelenium.kt:308)
kotlin.sequences.TransformingSequence$iterator$1.next(Sequences.kt:172)
org.jitsi.jibri.selenium.JibriSelenium$startRecurringCallStatusChecks$1$event$1.invoke(JibriSelenium.kt:119)
org.jitsi.jibri.selenium.JibriSelenium$startRecurringCallStatusChecks$1$event$1.invoke(JibriSelenium.kt:196)
org.jitsi.jibri.selenium.status_checks.EmptyCallStatusCheck.run(EmptyCallStatusCheck.kt:31)
org.jitsi.jibri.selenium.status_checks.EmptyCallStatusCheck.isCallEmpty(EmptyCallStatusCheck.kt:46)
org.jitsi.jibri.selenium.pageobjects.CallPage.getNumParticipants(CallPage.kt:74)
org.openqa.selenium.remote.RemoteWebDriver.executeScript(RemoteWebDriver.java:480)
org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:543)
org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83)
org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:158)
org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode(AbstractHttpResponseCodec.java:44)
org.openqa.selenium.remote.http.AbstractHttpResponseCodec.decode(AbstractHttpResponseCodec.java:80)
org.openqa.selenium.remote.http.JsonHttpResponseCodec.reconstructValue(JsonHttpResponseCodec.java:40)
org.openqa.selenium.remote.ErrorHandler.throwIfResponseFailed(ErrorHandler.java:166)
org.openqa.selenium.remote.ErrorHandler.createThrowable(ErrorHandler.java:214)
java.lang.reflect.Constructor.newInstance(Constructor.java:423)
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
Session ID: 0834061907286b394e2e7b0611c8a9fc with stack: 
Capabilities {acceptInsecureCerts: false, acceptSslCerts: false, applicationCacheEnabled: false, browserConnectionEnabled: false, browserName: chrome, chrome: {chromedriverVersion: 78.0.3904.105 (60e2d8774a81..., userDataDir: /tmp/.com.google.Chrome.tAmQvX}, cssSelectorsEnabled: true, databaseEnabled: false, goog:chromeOptions: {debuggerAddress: localhost:32787}, handlesAlerts: true, hasTouchScreen: false, javascriptEnabled: true, locationContextEnabled: true, mobileEmulationEnabled: false, nativeEvents: true, networkConnectionEnabled: false, pageLoadStrategy: normal, platform: LINUX, platformName: LINUX, proxy: Proxy(), rotatable: false, setWindowRect: true, strictFileInteractability: false, takesHeapSnapshot: true, takesScreenshot: true, timeouts: {implicit: 0, pageLoad: 300000, script: 30000}, unexpectedAlertBehaviour: ignore, unhandledPromptBehavior: ignore, version: 78.0.3904.97, webStorageEnabled: true}
Driver info: org.openqa.selenium.chrome.ChromeDriver
System info: host: 'jibri-pod', ip: '0.0.0.0', os.name: 'Linux', os.arch: 'amd64', os.version: '4.14.193-149.317.amzn2.x86_64', java.version: '1.8.0_252'
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
Command duration or timeout: 0 milliseconds
  (Driver info: chromedriver=78.0.3904.105 (60e2d8774a8151efa6a00b1f358371b1e0e07ee2-refs/branch-heads/3904@{#877}),platform=Linux 4.14.193-149.317.amzn2.x86_64 x86_64) (WARNING: The server did not provide any stacktrace information)
  (Session info: chrome=78.0.3904.97)
from tab crashed
2020-10-08 17:59:21.239 SEVERE: [17] org.jitsi.jibri.selenium.JibriSelenium.invoke() Error while running call status checks: org.openqa.selenium.WebDriverException: unknown error: session deleted because of page crash
2020-10-08 17:59:21.241 FINE: [53] org.jitsi.jibri.statsd.JibriStatsDClient.incrementCounter() Incrementing statsd counter: stop:recording
2020-10-08 17:59:21.241 INFO: [53] org.jitsi.jibri.service.impl.FileRecordingJibriService.onServiceStateChange() File recording service transitioning from state Running to Error: ChromeHung SESSION Chrome hung
2020-10-08 17:59:21.241 INFO: [53] org.jitsi.jibri.selenium.JibriSelenium.onSeleniumStateChange() Transitioning from state Running to Error: ChromeHung SESSION Chrome hung
2020-10-08 17:59:21.242 INFO: [53] org.jitsi.jibri.util.JibriSubprocess.ffmpeg.stop() Stopping ffmpeg process
2020-10-08 17:59:21.242 INFO: [53] org.jitsi.jibri.service.impl.FileRecordingJibriService.stop() Stopping capturer
2020-10-08 17:59:21.242 INFO: [53] org.jitsi.jibri.JibriManager.stopService() Stopping the current service
2020-10-08 17:59:21.282 INFO: [53] org.jitsi.jibri.service.impl.FileRecordingJibriService.stop() Quitting selenium
2020-10-08 17:59:21.282 INFO: [53] org.jitsi.jibri.util.JibriSubprocess.ffmpeg.stop() ffmpeg exited with value 255
2020-10-08 17:59:21.281 INFO: [51] org.jitsi.jibri.capture.ffmpeg.FfmpegCapturer.onFfmpegStateMachineStateChange() Ffmpeg capturer transitioning from state Running to Finished
2020-10-08 17:59:21.304 INFO: [53] org.jitsi.jibri.service.impl.FileRecordingJibriService.stop() Participants in this recording: []
2020-10-08 17:59:21.325 INFO: [53] org.jitsi.jibri.selenium.JibriSelenium.leaveCallAndQuitBrowser() Recurring call status checks cancelled
2020-10-08 17:59:21.325 INFO: [53] org.jitsi.jibri.selenium.JibriSelenium.leaveCallAndQuitBrowser() Leaving call and quitting browser

Logs indicate that Jibri detected Chrome is hanged:

Furthermore we found a suggestion that address similar problem in the following post: https://stackoverflow.com/questions/53902507/unknown-error-session-deleted-because-of-page-crash-from-unknown-error-cannot
In our case we allocate 64MB for /dev/shm.

As of this moment we were not yet able to confirm if the usage of /dev/shm is actually drastically increasing and coming close to the limit. We’ll run some more test to investigate it and I’ll report back our findings.

We have a few questions:

  • Do you have any experience with such or similar crashes and what could cause them? Could it really be related with /dev/shm or is that rather a dead end?
  • If we decide to mount a disk location to /dev/shm, could that cause any unwanted side effects in Jibri? The question arises since obviously disk is going to be slower than memory.
  • If we define either --no-sandbox or --disable-dev-shm-usage by setting this flags in Jibri’s config, could one or the other flag result in unwanted side effects on Jibri? Would any of this flags actually solve this problem?

Thanks!

EDIT I’m sorry for the many edits, I appended some more logs and reversed the log to a more intuitive direction (top-bottom :stuck_out_tongue:).

Just as a follow-up. We can confirm that there’s a significant relative usage of /dev/shm during recording. Soon after start it reaches 80% so we’re quite confident the crashes and hangs are related to that.

Just found this: https://github.com/jitsi/jibri/issues/267

Which seems related to this.