I can tell you how this integration works in jitsi-meet, whether it will work in your environment and what changes need to be made you need to figure it yourself.
So we have this component jigasi(https://github.com/jitsi/jigasi) which reuses the code from jitsi desktop and its ability to create sip and xmpp jingle calls and be able to merge those.
You can install jigasi, configure it with sip client credentials, where to connect as xmpp component (by default it is localhost, as default deployment is with jitsi-meet, xmpp server and jigasi on the same host).
Then jicofo (our focus component, that orchestrates the conference) sees a jigasi instance and offer that option to the jitsi-meet clients which can dial-out. When client dial-out a rayo message is sent to jigasi which then joins the muc amd jicofo invites the web client and the jigasi client, soding the jingle part and then jigasi creates a sip call which then bridges with the xmpp call.
The other direction, jigasi sees incoming sip call, extract from the sip headers the room it needs to join, it joins the muc and the same flow with jicofo inviting people to connect is executed again and the calls are connected.
Jigasi decodes all the audio that comes from the multiple streams coming on the xmpp side and mixes the audio and then encodes it and sends it to sip. The incoming audio from sip is also transcoded to the codec(opus) using in the xmpp call and send to the bridge.
Jigasi do no support video and will never will.
There is a translator mode for jigasi, which do not transcode anything, but just re-sends all media streams it receives to the sip server and let the sip server to take care of the mixing, this is in case the sip side supports multistreams.