Delay in setting the JVB Graceful Shutdown Flag

Hi Jitsi Community,

We are working on auto scaling of JVB where we implemented the scale-in by referring Creating a custom termination policy with Lambda - Amazon EC2 Auto Scaling.

When the lambda is invoked, we have a logic to set the ‘graceful-shutdown’ flag to true if the lambda couldn’t find any instances with ZERO conferences.

During testing, we observed a scenario where the AWS Auto scaling group is calling the lambda back-to-back within a second. Assume we had set the ‘graceful-shutdown’ to true for an instance. When the lambda was triggered for the second time, we see that the JVB is not returning ‘graceful-shutdown’ as true.

Queries:

  • Doesn’t the JVB set it’s ‘graceful-shutdown’ flag to true before returning the response for the HTTP POST?
  • Any suggestions to make sure that we get the correct graceful-shutdown flag?

Please see the sample script written to reproduce this issue (it gets reproduced in couple of retries):

root@ip-x-y-z-a:/home/manikandan/test# cat jvb-gs-timing-test.sh
#!/bin/bash
PUB_IP="x.y.z.a"
echo "PUB IP: ${PUB_IP}"

echo $(date +"%T.%N") ' Invoke Graceful Shutdown'
$(curl -s -o /dev/null -H "Content-Type: application/json" -d '{ "graceful-shutdown": "true" }' http://x.y.z.a:8888/colibri/shutdown)
while true
do
        CONFERENCES=$(curl -s http://${PUB_IP}:8888/colibri/stats)
        #echo 'CONFERENCE STATS: ' ${CONFERENCES}
        GS_FLAG=$(echo ${CONFERENCES} | jq '.graceful_shutdown')
        echo $(date +"%T.%N") ' ' ${GS_FLAG}
done

Output of the script where the graceful shutdown flag was not set to true till ‘1.5 second’, by that time, we had received atleast 11 times as false response in ‘/colibri/stats’.

root@ip-x-y-z-a:/home/manikandan/test# ./jvb-gs-timing-test.sh

PUB IP: x.y.z.a
13:54:58.660541850  Invoke Graceful Shutdown
13:54:59.242566008   false
13:54:59.317345281   false
13:54:59.397017889   false
13:54:59.473709830   false
13:54:59.556854295   false
13:54:59.648259561   false
13:54:59.715696681   false
13:54:59.784402507   false
13:54:59.848223487   false
13:54:59.916332130   false
13:55:00.024463731   false
13:55:00.123340582   true
13:55:00.183201115   true
13:55:00.235462711   true
13:55:00.287853439   true
13:55:00.339401940   true
13:55:00.391030934   true

Thanks for the great product and support.

Regards
Mani

Ping @Boris_Grozev @Jonathan_Lennox

It does set the flag immediately (before returning the HTTP response). However, the statistics used to populate /colibri/stats are cached and re-calculated once every few seconds. This also means the graceful-shutdown flag is propagated to jicofo with a slight delay.

I would suggest to add a new HTTP endpoint to query just the graceful-shutdown status without caching. A PR would be welcome

Boris