Author |
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: July 26 2007 at 12:42pm | IP Logged
|
|
|
Hello LanScape Team,
We have experienced a serious issue when a user agent is behind a NAT. For example, let us say that two clients are behind the same NAT, and are both set to use RTP port 10000. First client goes out through router and informs through INVITE that his media is at 10000. LanScape tells him to get media at 8001. Client 1 is happily talking along.
Then, the next client places an INVITE, and tells LanScape the same thing, my media is at port 10000. LanScape says ok, and informs user to get media at 8003. Client 2 starts sending audio to LanScape at 8003. However, since user is behind a NAT, the Router port is random.. say 20000. So, using WireShark on the LanScape server, the problem becomes immediately apparent:
Client 1 starts receiving the RTP data intended for Client 2. This is because, LanScape simply sends all media to the port requested in the Sip message. There is no way for us to solve this issue here. The Sip Proxy in use has no idea what the real port of the Client 2 is, since Sip proxy only handles signaling.
LanScape media engine should immediately recognize that the audio arriving at port 8003 (Client 2) is from a different port (possibly IP, since client may give LAN IP) than what was requested in the SIP message. Of course, LanScape should only let this IP/Port change one time to prevent audio hijacking.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: August 13 2007 at 3:24pm | IP Logged
|
|
|
Hello again,
We have not heard any response from this issue yet, and it is very important as many users are behind NATs, and ports (as well as IPs) will many times be different then what the client said in the Sip message.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: August 15 2007 at 9:32am | IP Logged
|
|
|
Hi juice,
We are not sure how we want to handle this natively in the media engine. We understand what you have described and it is a real world deployment issue – especially when user agents are deployed behind symmetrical NATs.
If you want to solve this issue quickly, consider deploying our Centrex SIP proxy products along with one or more of our VOIP Media proxies. They are designed to handle the situation you have described.
If you cannot use our proxy products, we will have to continue to work together with your group to get to workable solution.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: August 15 2007 at 9:49am | IP Logged
|
|
|
juice,
One additional item: So we are absolutely sure what the possibilities are to get around this issue, tell me briefly about your VOIP app (what is it exactly) and how it will most often be deployed. All we know is that your VOIP app is a PSTN gateway type of an application that can be treated as a SIP/RTP server.
Additional info will help.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: August 15 2007 at 4:54pm | IP Logged
|
|
|
Hi Support,
Thanks for taking up this issue. The deployment we have is as the media engine in use as a B2BUA which acts as a PSTN gateway. The media engine is deployed in a public IP location, so could easily talk to any other party behind a NAT. We currently use a Sip Proxy that handles traffic PC to PC and PC to PSTN (which goes through media engine application). We can detect various things about the client at the Sip Proxy (which we do, like actual IP) and modify SDP messages. However, when it comes to SDP RTP port detection, we cannot modify it ourselves from the Sip Proxy.
This is because the actual RTP data goes to the Media Engine, not the Sip Proxy. So, it needs to be properly detected at the Media Engine side. So, to be clear, Sip traffic goes from:
User Agent <--> Sip proxy <--> B2BUA (Media Engine)
RTP (should) Goes:
User Agent <--> B2BUA (Media Engine)
Currently, though, it goes through:
User Agent <--> RTP Relay <--> B2BUA (Media Engine)
Our RTP relay can detect the changed Port/IP of users behind a NAT. However, having an RTP Relay:
* Duplicates traffic
* Increases lag
* Increase possible dropped packets
* Adds an unneeded failure point to the system
* Adds additional, unneeded processing
Since the Media Engine handles RTP data itself quite nicely, it should be a simple matter of detecting that an Invite said it would come in on 192.168.0.1:8080 but really came in on xx.xx.xx.xx:xxxx and allow a one time switch to occur. We realize that this allows a user to possibly hijack a session, but timing would have to be pretty good for someone to accomplish that. And if someone did hijack a port, the other end would likely hang up right away anyway, closing the connection. So not really an issue. The big issue is that with NATs giving a different port then the client requested there is no possibility of connection. And, it could also just be a startup param to allow it or not.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: August 16 2007 at 8:18am | IP Logged
|
|
|
Hi juice,
Your explanation is perfect. Thanks for taking the time to put all the ideas together.
The explanation you give above sounds reasonable. Let's tackle this issue after we get the "call state" issue resolved.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: August 31 2007 at 2:14pm | IP Logged
|
|
|
Hi Support,
As we are currently testing the newest build which should fix the call state issue, we were really hoping to continue on with resolving this NAT issue. It should be a simply fix and require very little code change, so we were hoping that this could be incorporated in the final version where the RTP locking fix is released in. That way our major difficulties would be overcome in one swoop :-)
Thanks
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: November 06 2007 at 5:34pm | IP Logged
|
|
|
Hi Andrew,
We have updated your support FTP account with a version of the media engine that will handle this RTP media situation. See the "v5.12.7.18 DLL Only" directory in your support FTP account.
There is an updated developer reference there too. Take a look at how your PSTN gateway application can process the new SipReceivedRtpMediaConflict immediate event. Also look at the RECEIVED_RTP_MEDIA_CONFLICT structure. Post any new questions to his thread.
Thanks,
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: November 26 2007 at 7:16pm | IP Logged
|
|
|
This is really good news. We have been really busy so just noticed this, we can't wait to test it and report back. Removing a media proxy from the chain will definitely reduce network lag and issues relating from media proxy failures.
Thanks
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 05 2007 at 12:00pm | IP Logged
|
|
|
Ok, we have been testing this, and so far there are a couple thing that seem wrong.
First issue, is that even after we tell lanscape not to change the RTP port/ip, we keep getting this message over and over. For instance, we only want to change this once, since later attempts possibly mean someone is trying to steal the session. Anyway, not a huge issue, but a lot of message handling for nothing. I propose, after we tell Lanscape to use the old IP/Port, it should not bother us again. Alternatively, you could add another flag to the structure, something like, LockRtpConnectionSettings
Second issue, kind of strange, but definitely noticeable. After we have locked the Rtp port to a specific source address, it works... However, if someone floods the Lanscape listening Rtp port with a different location (we get the flood above), but worse yet, there is also notifiable audio artifacts.. in the form of buzzing, latency, and a vibrato kind of effect. This happens most often after the bad connection stops sending data.. seems lanscape is trying to mix the data into the stream?
Third issue, and also sort of bad. We've only seen it once. We are also not sure the exact cause, or why we keep getting this. But, here is some logging:
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipByeReceived : 99 : 116 : 2
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipSendByeAck : 99 : 117 : 2
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipCallComplete : 99 : 122 : 2
[1196814864]: [SIP_MESSAGE]: SipOnHook : 99 : 67 : 2
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [LINE_HISTORY] : 240 Line#: 99 : SipOutgoingCallStart SipDialTone SipDialing SipSendInvite SipStartOutgoingRing SipReceivedProvisionalResponse SipReceived100Trying SipWaitForInviteOk SipReceived183SessionProgress SipInviteOkReceived SipSendInviteAck SipOutgoingCallConnected SipInCall SipByeReceived SipSendByeAck SipCallComplete SipOnHook
[1196814864]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814864]: [NOTE]: Engine terminated call line: 0
[1196814864]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814864]: [SIP_MESSAGE]: SipSendBye : 0 : 118 : 2
[1196814864]: Line: 99 State 'INCALL' to 'FREE' [C077EE66-A2C9-11DC-8181-001372519133]
[1196814865]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814865]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814865]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814865]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814865]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814865]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814865]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814865]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
[1196814865]: [SIP_MESSAGE]: SipReceivedByeAck : 0 : 119 : 2
[1196814865]: [SIP_MESSAGE]: SipReceivedRtpMediaConflict : 0 : 178 : 0
[1196814865]: [NOTE] SipReceivedRtpMediaConflict - ignoring change request
If you notice, I'm sure you do, we keep getting this spam: SipReceivedRtpMediaConflict. First off, not sure why.. The way we currently do logic is, once we get a good Rtp packet (from Rtp call back) we lock the connection to only allow one IP/port. Or, the first time we get that message, we lock the line. So, seems we locked the line, but then Rtp connection origination changed (probably a client issue, hard to say).
However, the most important item in the log is that phone line 0 (bridged with line 99) never got Sip On hook. We called Terminate line, but as you can see, continue to get more Rtp messages, until the other side hung up, after which there were no more messages. But, the line never got any other Sip messages from Lanscape, so basically ends up being stuck. I think it is related to the Rtp change message. Perhaps some locking conflict.
If you need, we can show you the Sip log, since now we have matching call ids it is easy to match up ;). However, the logs seem normal, just that it looks like some sort of logic issue.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 05 2007 at 2:21pm | IP Logged
|
|
|
Hi juice,
Thanks for the feedback. We hope you are not freaking out. We will get there. :)
Remember the binary images you are receiving are right out of development. There could be bugs.
Item 1:
Repeatedly getting the SipReceivedRtpMediaConflict event for every received RTP media packet:
This is not the corret behavior. You are only supposed to get this event once per call. We checked the source code and we screwed up. The if() case for one-shot variable that keeps track of if the event was sent to the app was commented out. So now we know why this is occuring.
Item 2:
RTP media rejection:
Are you saying that the media engine RTP port for a call may be receiving RTP packets from more than one call endpoint?
If the answer is yes, then the media engine probably is trying to assemble all the received RTP media into a single stream which will sound like crap. What we need is to dump any RTP that is received that is not coming from the “known” RTP far end.
Item 3:
Not seeing SipOnHook state.
Lets get items 1+2 corrected and then we can look at this.
If you can respond to this post by tomorrow, we will get a new image for test.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 07 2007 at 9:01am | IP Logged
|
|
|
juice,
An updated product image is in your support FTP account. Should take care of items 1 and 2. Read the engineering release notes that are with the distro.
Post back with your results.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 10 2007 at 3:48pm | IP Logged
|
|
|
We have not had a chance to test new build, but thanks for working on this, we will immediately look into it and post back results.
On issue #3, pretty serious as all lines eventually got filled up with bad status, since we never received SipOnHook for certain lines. However, we will try with the latest build, the reduction in messages from your fixes above should help us isolate the issue and post more details.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 10 2007 at 3:54pm | IP Logged
|
|
|
Thanks juice. Will wait for your feedback.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 14 2007 at 1:16pm | IP Logged
|
|
|
Hi again, we have some more details, although exact cause is still unknown to us. We can provide Sip log if really needed, but here is an internal log of messages coming into the system at the time of problem. Also, we should note, that this matter is becoming more urgent, as the last functioning build of Lanscape (which is working pretty good for us), was set to timebomb sometime around this month.. So, it is pretty imperative that we resolve this issue as soon as possible.
Anyway, it looks to us like the call to terminate a phone line somehow stuck our thread, and eventually led to the sticking of lanscape thread(s).
//////////////////////////////////////////////////////////// ////////////////////////////
//Thread Lanscape callback//////////////////////////////////////////////////// //////////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580770]: [SIP_MESSAGE]: SipByeReceived : 197 : 117 : 2
[1197580770]: [SIP_MESSAGE]: SipSendByeAck : 197 : 118 : 2
[1197580770]: [SIP_MESSAGE]: SipCallComplete : 197 : 123 : 2
[1197580770]: [SIP_MESSAGE]: SipOnHook : 197 : 68 : 2
//////////////////////////////////////////////////////////// ////////////////////////////
//Event Handler Thread //////////////////////////////////////////////////////////// /////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580770]: [LINE_HISTORY] : 178 Line#: 197 : SipOutgoingCallStart SipDialTone SipDialing SipSendInvite SipStartOutgoingRing SipReceivedProvisionalResponse SipReceived100Trying SipWaitForInviteOk
SipReceived180Ringing SipInviteOkReceived SipSendInviteAck SipOutgoingCallConnected SipInCall SipByeReceived SipSendByeAck SipCallComplete SipOnHook
[1197580770]: [NOTE]: Engine terminated call line: 2
//////////////////////////////////////////////////////////// ////////////////////////////
//Thread Lanscape callback//////////////////////////////////////////////////// //////////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580770]: [SIP_MESSAGE]: SipSendBye : 2 : 119 : 2
[1197580770]: [SIP_MESSAGE]: SipReceivedByeAck : 2 : 120 : 2
[1197580772]: Line: 197 State 'INCALL' to 'FREE' [52EAC5C8-A9C0-11DC-8181-001372519133]
//////////////////////////////////////////////////////////// ////////////////////////////
//Thread Lanscape callback//////////////////////////////////////////////////// //////////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580772]: [SIP_MESSAGE]: SipCallComplete : 2 : 123 : 2
[1197580772]: [SIP_MESSAGE]: SipOnHook : 2 : 68 : 2
[1197580772]: [LINE_HISTORY] : 135 Line#: 2 : SipIncomingCallStart SipSendTrying SipSendRinging SipStartIncomingRing SipOkToAnswerCall SipAnsweringCall SipSend200Ok SipReceivedInviteAck
SipIncomingCallConnected SipInCall SipSendBye SipReceivedByeAck SipCallComplete SipOnHook
[1197580772]: Line: 2 State 'INCALL' to 'FREE' [52EAC5C8-A9C0-11DC-8181-001372519133]
//////////////////////////////////////////////////////////// ////////////////////////////
//Event Handler Thread //////////////////////////////////////////////////////////// /////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580772]: [DELETE]: Session: [52EAC5C8-A9C0-11DC-8181-001372519133] Phone Line: 2
[1197580887]: [SIP_MESSAGE]: SipIncomingCallAssignPhoneLine : -1 : 90 : 0
[1197580887]: Line: 2 State 'FREE' to 'IN_REQUESTED' [No Session]
[1197580887]: [SIP_MESSAGE]: SipIncomingCallStart : 2 : 89 : 2
[1197580887]: [SIP_MESSAGE]: SipIncomingCallInitialized : 2 : 91 : 0
[1197580887]: [SIP_MESSAGE]: SipSendTrying : 2 : 94 : 2
[1197580887]: [SIP_MESSAGE]: SipSendRinging : 2 : 95 : 2
[1197580887]: [SIP_MESSAGE]: SipStartIncomingRing : 2 : 93 : 2
[1197580887]: [SIP_MESSAGE]: SipOkToAnswerCall : 2 : 97 : 2
[1197580887]: Line: 2 State 'IN_REQUESTED' to 'OUT_REQUESTED' [No Session]
//////////////////////////////////////////////////////////// ////////////////////////////
//Event Handler Thread //////////////////////////////////////////////////////////// /////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580889]: [INCOMING] CallID: c4d3b42883b3af00@192.168.2.102 Destination: 49228285130@83.120.12.8:5060
[1197580889]: Line: 2 State 'OUT_REQUESTED' to 'INPROGRESS' [No Session]
[1197580889]: Line: 197 State 'FREE' to 'OUT_REQUESTED' [5DD0C4DC-A9C1-11DC-8181-001372519133]
//////////////////////////////////////////////////////////// ////////////////////////////
//Thread Lanscape callback//////////////////////////////////////////////////// //////////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580889]: [SIP_MESSAGE]: SipOutgoingCallInitializing : 197 : 71 : 0
[1197580889]: [SIP_MESSAGE]: SipOutgoingCallStart : 197 : 73 : 2
[1197580889]: [SIP_MESSAGE]: SipDialTone : 197 : 74 : 2
[1197580889]: [SIP_MESSAGE]: SipDialing : 197 : 75 : 2
[1197580889]: [SIP_MESSAGE]: SipSendInvite : 197 : 76 : 2
[1197580889]: [SIP_MESSAGE]: SipStartOutgoingRing : 197 : 77 : 2
[1197580889]: [SIP_MESSAGE]: SipReceivedProvisionalResponse : 197 : 78 : 2
[1197580889]: [SIP_MESSAGE]: SipReceived100Trying : 197 : 79 : 2
[1197580889]: [SIP_MESSAGE]: SipWaitForInviteOk : 197 : 85 : 2
[1197580889]: [SIP_MESSAGE]: SipReceived180Ringing : 197 : 80 : 2
[1197580889]: Line: 197 State 'OUT_REQUESTED' to 'TRYING' [5DD0C4DC-A9C1-11DC-8181-001372519133]
[1197580911]: [SIP_MESSAGE]: SipInviteOkReceived : 197 : 86 : 2
[1197580911]: [SIP_MESSAGE]: SipSendInviteAck : 197 : 87 : 2
[1197580911]: [SIP_MESSAGE]: SipOutgoingCallConnected : 197 : 88 : 2
[1197580911]: [SIP_MESSAGE]: SipInCall : 197 : 104 : 2
[1197580911]: Line: 197 State 'TRYING' to 'INCALL' [5DD0C4DC-A9C1-11DC-8181-001372519133]
[1197580912]: [SIP_MESSAGE]: SipReceivedInviteAck : 2 : 100 : 2
[1197580912]: [SIP_MESSAGE]: SipIncomingCallConnected : 2 : 102 : 2
[1197580938]: [SIP_MESSAGE]: SipByeReceived : 197 : 117 : 2
[1197580938]: [SIP_MESSAGE]: SipSendByeAck : 197 : 118 : 2
[1197580938]: [SIP_MESSAGE]: SipCallComplete : 197 : 123 : 2
[1197580938]: [SIP_MESSAGE]: SipOnHook : 197 : 68 : 2
//////////////////////////////////////////////////////////// ////////////////////////////
//Event Handler Thread //////////////////////////////////////////////////////////// /////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580938]: [LINE_HISTORY] : 179 Line#: 197 : SipOutgoingCallStart SipDialTone SipDialing
SipSendInvite SipStartOutgoingRing SipReceivedProvisionalResponse SipReceived100Trying
SipWaitForInviteOk SipReceived180Ringing SipInviteOkReceived SipSendInviteAck
SipOutgoingCallConnected SipInCall SipByeReceived SipSendByeAck SipCallComplete SipOnHook
[1197580938]: [NOTE]: Engine terminated call line: 2
************************************************************ ****************************
******** Looks like event thread stuck here ********************************************
************************************************************ ****************************
//////////////////////////////////////////////////////////// ////////////////////////////
//Thread Lanscape callback//////////////////////////////////////////////////// //////////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197580978]: [SIP_MESSAGE]: SipIncomingCallAssignPhoneLine : -1 : 90 : 0
[1197580978]: Line: 3 State 'FREE' to 'IN_REQUESTED' [No Session]
[1197580978]: [SIP_MESSAGE]: SipIncomingCallStart : 3 : 89 : 2
[1197580978]: [SIP_MESSAGE]: SipIncomingCallInitialized : 3 : 91 : 0
[1197580978]: [SIP_MESSAGE]: SipSendTrying : 3 : 94 : 2
[1197580978]: [SIP_MESSAGE]: SipSendRinging : 3 : 95 : 2
[1197580978]: [SIP_MESSAGE]: SipStartIncomingRing : 3 : 93 : 2
[1197580978]: [SIP_MESSAGE]: SipOkToAnswerCall : 3 : 97 : 2
[1197581165]: [SIP_MESSAGE]: SipIncomingCallAssignPhoneLine : -1 : 90 : 0
[1197581165]: Line: 4 State 'FREE' to 'IN_REQUESTED' [No Session]
[1197581165]: [SIP_MESSAGE]: SipIncomingCallStart : 4 : 89 : 2
[1197581165]: [SIP_MESSAGE]: SipIncomingCallInitialized : 4 : 91 : 0
[1197581165]: [SIP_MESSAGE]: SipSendTrying : 4 : 94 : 2
[1197581165]: [SIP_MESSAGE]: SipSendRinging : 4 : 95 : 2
[1197581165]: [SIP_MESSAGE]: SipStartIncomingRing : 4 : 93 : 2
[1197581165]: [SIP_MESSAGE]: SipOkToAnswerCall : 4 : 97 : 2
[1197581213]: [SIP_MESSAGE]: SipCallCanceled : 4 : 56 : 2
[1197581213]: [SIP_MESSAGE]: SipOnHook : 4 : 68 : 2
[1197581235]: [SIP_MESSAGE]: SipByeReceived : 199 : 117 : 2
[1197581235]: [SIP_MESSAGE]: SipSendByeAck : 199 : 118 : 2
[1197581235]: [SIP_MESSAGE]: SipCallComplete : 199 : 123 : 2
[1197581235]: [SIP_MESSAGE]: SipOnHook : 199 : 68 : 2
[1197581236]: [SIP_MESSAGE]: SipByeReceived : 0 : 117 : 2
[1197581236]: [SIP_MESSAGE]: SipSendByeAck : 0 : 118 : 2
[1197581236]: [SIP_MESSAGE]: SipCallComplete : 0 : 123 : 2
[1197581236]: [SIP_MESSAGE]: SipOnHook : 0 : 68 : 2
[1197581322]: [SIP_MESSAGE]: SipByeReceived : 1 : 117 : 2
[1197581322]: [SIP_MESSAGE]: SipSendByeAck : 1 : 118 : 2
[1197581322]: [SIP_MESSAGE]: SipCallComplete : 1 : 123 : 2
[1197581322]: [SIP_MESSAGE]: SipOnHook : 1 : 68 : 2
[1197581345]: [SIP_MESSAGE]: SipByeReceived : 198 : 117 : 2
[1197581345]: [SIP_MESSAGE]: SipSendByeAck : 198 : 118 : 2
[1197581345]: [SIP_MESSAGE]: SipCallComplete : 198 : 123 : 2
[1197581345]: [SIP_MESSAGE]: SipOnHook : 198 : 68 : 2
//////////////////////////////////////////////////////////// ////////////////////////////
//Call Update thread / //////////////////////////////////////////////////////////// /////
//////////////////////////////////////////////////////////// ////////////////////////////
//Looks like since event thread never returns from termination of the line, that it never
//notices lines are inactive... so, it tries to force them closed
[1197581355]: Shutting down inactive phone line (blank audio)
[1197581355]: [NOTE]: Engine terminated call line: 0
[1197581445]: Shutting down inactive phone line (blank audio)
[1197581445]: [NOTE]: Engine terminated call line: 1
//////////////////////////////////////////////////////////// ////////////////////////////
//Thread Lanscape callback//////////////////////////////////////////////////// //////////
//////////////////////////////////////////////////////////// ////////////////////////////
[1197581603]: [SIP_MESSAGE]: SipIncomingCallAssignPhoneLine : -1 : 90 : 0
[1197581603]: Line: 5 State 'FREE' to 'IN_REQUESTED' [No Session]
[1197581603]: [SIP_MESSAGE]: SipIncomingCallStart : 5 : 89 : 2
[1197581603]: [SIP_MESSAGE]: SipIncomingCallInitialized : 5 : 91 : 0
[1197581603]: [SIP_MESSAGE]: SipSendTrying : 5 : 94 : 2
[1197581603]: [SIP_MESSAGE]: SipSendRinging : 5 : 95 : 2
[1197581603]: [SIP_MESSAGE]: SipStartIncomingRing : 5 : 93 : 2
[1197581603]: [SIP_MESSAGE]: SipOkToAnswerCall : 5 : 97 : 2
[1197581625]: [SIP_MESSAGE]: SipCallCanceled : 5 : 56 : 2
[1197581625]: [SIP_MESSAGE]: SipOnHook : 5 : 68 : 2
If you look carefully, we never seemed to return from hanging up line 2, nor did we ever see any more messages regarding line 2. So, our event handling thread sticks and can no longer process lanscape messages pushed to the queue. And, sometime later (around 30 minutes, timestamps are in Unix time) lanscape stops posting any new messages to the callback. Not sure why this happens, as this problem is unique to the last two builds of the Lanscape engine that we have tested (with the RTP / Call Id fixes). Note, that a separate thread (are call update thread) also tried to terminate two other lines (199, 0), but those calls did not stick that thread - though, the lines were already terminated from lanscape, so maybe that is why the thread did not lock?
Note, this was from running the engine at low capacity, (only ever a few calls going through at a time) for around 22 hours. It was working, working, and then just started acting up at the very end. As stated previously, we can provide a sip log if needed, hopefully you can quickly locate the issue by seeing what events were happening at the same time and that we were simply trying to terminate the line.
Only 4 phone lines were thought to be in use, and 3 others were simply in a weird state from missing lanscape events.
All help is appreciated.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 14 2007 at 7:41pm | IP Logged
|
|
|
This just happened again. This time, after being up for only 5 1/2 hours. Same situation:
[1197667931]: [SIP_MESSAGE]: SipOutgoingCallInitializing : 199 : 71 : 0
[1197667931]: [SIP_MESSAGE]: SipOutgoingCallStart : 199 : 73 : 2
[1197667931]: [SIP_MESSAGE]: SipDialTone : 199 : 74 : 2
[1197667931]: [SIP_MESSAGE]: SipDialing : 199 : 75 : 2
[1197667931]: [SIP_MESSAGE]: SipSendInvite : 199 : 76 : 2
[1197667931]: [SIP_MESSAGE]: SipStartOutgoingRing : 199 : 77 : 2
[1197667931]: [SIP_MESSAGE]: SipReceivedProvisionalResponse : 199 : 78 : 2
[1197667931]: [SIP_MESSAGE]: SipReceived100Trying : 199 : 79 : 2
[1197667931]: [SIP_MESSAGE]: SipWaitForInviteOk : 199 : 85 : 2
[1197667931]: [SIP_MESSAGE]: SipReceived180Ringing : 199 : 80 : 2
[1197667931]: Line: 199 State 'OUT_REQUESTED' to 'TRYING' [06CE7CB2-AA8C-11DC-8181-001372519133]
[1197667943]: [SIP_MESSAGE]: SipInviteOkReceived : 199 : 86 : 2
[1197667943]: [SIP_MESSAGE]: SipSendInviteAck : 199 : 87 : 2
[1197667944]: [SIP_MESSAGE]: SipOutgoingCallConnected : 199 : 88 : 2
[1197667944]: [SIP_MESSAGE]: SipInCall : 199 : 104 : 2
[1197667944]: Line: 199 State 'TRYING' to 'INCALL' [06CE7CB2-AA8C-11DC-8181-001372519133]
[1197667944]: [SIP_MESSAGE]: SipAnsweringCall : 0 : 101 : 2
[1197667944]: [SIP_MESSAGE]: SipSend200Ok : 0 : 98 : 2
[1197667944]: [SIP_MESSAGE]: SipReceivedInviteAck : 0 : 100 : 2
[1197667944]: [SIP_MESSAGE]: SipIncomingCallConnected : 0 : 102 : 2
[1197667964]: [SIP_MESSAGE]: SipByeReceived : 199 : 117 : 2
[1197667964]: [SIP_MESSAGE]: SipSendByeAck : 199 : 118 : 2
[1197667964]: [SIP_MESSAGE]: SipCallComplete : 199 : 123 : 2
[1197667964]: [SIP_MESSAGE]: SipOnHook : 199 : 68 : 2
[1197667964]: [LINE_HISTORY] : 14 Line#: 199 : SipOutgoingCallStart SipDialTone SipDialing SipSendInvite SipStartOutgoingRing SipReceivedProvisionalResponse SipReceived100Trying SipWaitForInviteOk SipReceived180Ringing SipInviteOkReceived SipSendInviteAck SipOutgoingCallConnected SipInCall SipByeReceived SipSendByeAck SipCallComplete SipOnHook
[1197667964]: [NOTE]: Engine terminated call line: 0
After calling terminate line, no error, no return, just stuck.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 18 2007 at 11:50am | IP Logged
|
|
|
Juice,
This is Randal, the CTO.
We need to get this issue elevated so we can find a solution and move on. This is what I propose:
1)
Set up a user account for you in our VOIP system here. I assume you have a spare SIP VOIP phone you can use to access our VOIP system. That way we can speak to each other directly and accelerate the transfer of info and get to the solution.
2)
I need you to explain in a bit more detail the contents of your post from “14 December 2007 at 1:16pm”.
3)
I need you to explain some other details:
Is the host machine multi core?
What exactly is the threading model of your app.
etc…
It’s important that we get to the bottom of this so that we can all move on to more productive work.
Repost to this thread when you are available to speak with me and we will get things set up at this end. I would prefer we start immediately or a soon as possible this week.
Thanks,
Randal
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 18 2007 at 12:30pm | IP Logged
|
|
|
Juice,
In your post from “14 December 2007 at 7:41pm” you made this statement:
You >>>
“After calling terminate line, no error, no return, just stuck.”
Are you saying your code called the TerminateCall() API procedure and the TerminateCall() procedure blocked indefinitely????
Thanks,
Randal
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 18 2007 at 12:39pm | IP Logged
|
|
|
Yes, the call to terminate simply blocks forever. The media engine is internally responsive for a short time afterwards, as you can see it is still sending events for incoming calls, call hang ups, etc. However, since our thread which handles the events locked up calling terminate, it no longer processes any more events. Then, the media engine itself no longer pushed any events to the callback. So, somehow internally media engine gets locked too.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 18 2007 at 1:16pm | IP Logged
|
|
|
support wrote:
Juice,
This is Randal, the CTO.
We need to get this issue elevated so we can find a solution and move on. This is what I propose:
1)
Set up a user account for you in our VOIP system here. I assume you have a spare SIP VOIP phone you can use to access our VOIP system. That way we can speak to each other directly and accelerate the transfer of info and get to the solution.
|
|
|
We would prefer a less involved communication medium (phone calls will take up more time, and are not really ideal for passing techincall info around), would you be willing to use something like IRC or another IM service? WIth IRC, we can easily setup a private/public room, involve multiple people (if needed), and copy/paste logs/messages/info around. I would be available all day tommorrow (9AM-6PM PST) for such a conversation, since it would not require constant attention as a voice call would.
support wrote:
2)
I need you to explain in a bit more detail the contents of your post from “14 December 2007 at 1:16pm”.
|
|
|
The format of the log is pretty simple. [SIP_MESSAGE] is logged from the lanscape callback thread when a message arrives. it is timestamp, message name, phone line, message enum value, and notification type.
So, [1197580770]: [SIP_MESSAGE]: SipOnHook : 197 : 68 : 2
Shows at the time 1197580770 (GMT Unix Time), SipOnHook came for line 197.
The first part of the log actually shows line 2 and line 197 in a good transaction. Line 197 hung up, so we proceed to hang up line 2. And this time, it worked as expected and we change the state of the line to free:
[1197580772]: Line: 197 State 'INCALL' to 'FREE' [52EAC5C8-A9C0-11DC-8181-001372519133]
Note, that is done in our event handling thread, but as a result of the SipOnHook message from lanscape.
However, the next time we used those lines - call comes in, we give it line 2, we make outgoing call, it gets line 197. Line 197 hangs up pretty quick (last message is SipIncomingCallConnected for line 2) - looks like 100 seconds for the sequence, we try to hang up line 2, but then it sticks (thread lock). Where:
******** Looks like event thread stuck here ******************************************** is marked. Right at the terminate call api procedure.
Notice, there are no more messages for Line 2 coming from lanscape engine.
Hopefully this is clearer, let me know if not and I can try to explain in more detail.
Quote:
3)
I need you to explain some other details:
Is the host machine multi core?
What exactly is the threading model of your app.
etc…
|
|
|
Yes, running on a multicore system (four cores). We have a few threads:
* Thread 1 - Call update thread, where we update some information about currently running cals, and try to detect no audio conditions to hang up lines. You can see this in the log where blank audio is logged.
* Thread 2 - Media Engine Event Processing thread - this is the thread that handles queued media events, and is the thread getting stuck from terminate call. This processes each message from the queue.
* Thread 3 - Another utility thread which occasionally reloads some config settings relating to routing destinations, etc.
And, then there is the event callback thread from Lanscape.
It is very important to mention that this issue only showed up in the last 2 releases we have received. We can send a sip log if you think that would help you locate the issue. Also, we are aware of the holidays coming up, and what that means is less time available for finding/fixing things like this.
Currently, we are running a successfully deployed 5.12.4.8 version of the media engine, only this one is said to have a timebomb for the end of the year, so that is bad for us since the latest version is causing the application to lock anywhere from 2 hours, to 24+ hours after starting up. So far, we have not identified a cause for this. Would it be possible to possibly get an extension on the 5.12.4.8 build so that we do not lose service during this time?
Thanks
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 20 2007 at 11:25am | IP Logged
|
|
|
juice,
We are looking into the TerminateCall() API procedure deadlock. We will repost when we have something useful.
Your current v5.12.4.8 DLL image will function up to 12-31-07. After that its dead.
Your support FTP account has been updated with a v5.12.4.8 DLL image time bombed to the end of 2008. See the “v5.12.4.8 DLL Only - expires 2008” directory. We can use this new image so that you are not in a jam. Be sure to run and verify the new v5.12.4.8 image with your app before customer deployment. Just to be on the safe side.
Your observations are correct in that this deadlock may be a new issue. The latest media engine product builds are built using an updated tool chain and the internal threading model of the product was updated to allow for higher call rates.
Important item:
We have asked this of you before but we will try and ask again. If you are very clear regarding the call traffic that leads up to the situation you described, have one of your R+D guys code together a small console app that exhibits the problem. This could be a simple media engine app that calls itself. If you can give us code that experiences the problem, it will get resolved all that much quicker.
Thanks,
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 20 2007 at 11:47am | IP Logged
|
|
|
Good news on the updated build, thank you.
In regards to a console app, we are working right now to duplicate this in a
reliable way. Once we identify the exact cause, if possible, we can provide a
sample app/code that reproduces and/or more details on the cause.
We will test the latest build of the .8 version immediately, since we are fully
aware of the time of the year and what that could mean in terms of people
not being around for support for the next week ;-)
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 20 2007 at 1:23pm | IP Logged
|
|
|
One thing that would be a benefit is to post how your VOIP app is calling the TerminateCall() API procedure. We are interested in the parameter values you pass.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: December 20 2007 at 2:14pm | IP Logged
|
|
|
First we try to get line status, so actually, perhaps the querying of the line status died (we have the terminatecall api method wrapped in a helper function, sorry for the confusion). Anyway, here is a code snippet:
if( GetLineStatus(mSipEngine, phoneLine, &state) == SipSuccess )
{
TELEPHONY_RETURN_VALUE s = TerminateCall(mSipEngine, phoneLine, false, 30000);
if(s == SipAlreadyOnHook || s == SipSuccess)
return;
else
mLog.logMessage("[ERROR]: Terminate call returned : " + parseInt(s), Log::HighPriority);
}
Simply passing false to do it async, and 30000 is not really important in that case I don't think.
One thing we just noticed looking carefully over the logs is something rather strange.
In the latest log, where a phone line got stuck, something strange appears in the Sip Log. The initial connection looks good - Invite received (we give him line 0), we send out an invite (line 199) to connect the caller with a destination, once other party sends 200 ok, we answer initial caller with 200 ok.
Then, they are connected for around 20 seconds. After which, the sip log shows that the initial caller sent a BYE. In fact, initial caller sends three BYE's. Lanscape ignores all of them. Until, destination sends a bye (line 199), Lanscape immediately processes this BYE by sending ACK and then sends us the SipByeReceived message stating line 199 hung up. When in fact, line 0 should have been the person hanging up. Going further, we try to hang up line 0 since line 199 hung up... That is when it sticks.
So, it looks like processing on Line 0 is frozen? We really have no clue how your threading model works internally, but it looks from this that perhaps the sip parsing for line 0 may be stuck, so when we call getlinestatus or termiantecall the engine freezes our thread? what do you think, does this information maybe help locate the problem, or do you need to small portion of Sip log to better see.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: December 21 2007 at 1:35pm | IP Logged
|
|
|
Juice,
(This post pertains to the v5.12.7.21 terminate call deadlock you reported)
Another thing that will help us locate the problem is this:
1)
Create a new worker thread in your app. Call this thread X1.
2)
Have thread X1 monitor when there is a “stuck” line due to the TerminateCall() API procedure deadlocking.
3)
Have thread X1 call the GetLineStatus() to get the API line state. Log this value somewhere.
Immediately after you call the GetLineStatus() proc above, also call the undocumented API procedure GetInternalStateString() and log the string value it returns. The signature for this undocumented proc is:
Code:
char * VOIP_API GetInternalStateString(SIPHANDLE hStateMachine, int PhoneLine)
|
|
|
We need to see this internal “non API” state.
To respond to some of your other thoughts:
Your last explanation contained good information. Especially the description of the call flows at termination and the behavior you have observed. Good effort on your part.
1)
You >>>
…we try to get line status, so actually, perhaps the querying of the line status died
<<< Support
GetLineStatus() is ok no matter where it gets called and will never cause a deadlock no matter what thread calls the proc.
2)
You >>>
Then, they are connected for around 20 seconds. After which, the sip log shows that the initial caller sent a BYE. In fact, initial caller sends three BYE's. Lanscape ignores all of them.
<<< Support
Assuming your originating SIP device is a media engine based app (soft phone), a total of three BYEs will get sent by default if the media engine based gateway does not respond within the call terminate time as set by the soft phone calling the SetCallTerminateTimeout() API proc. The question is: why doesn’t the media engine phone line of the gateway app respond immediately to the first BYE received.
The only time we see this type of behavior (not responding to BYE packets received) is if the network is dropping UDP packets or if the server app (the gateway app in this case) is consuming 100% CPU and the received UDP data off the network is not getting processed fast enough. If your gateway app calls GetLineStatus() very often in a polling manner, make sure your gateway app is not railing at 100% CPU utilization. 100% CPU utilization will have an effect on when WinSock2 hands off UDP packets to the media engine process and may cause latency.
You explanation regarding the three BYEs is interesting. This is definitely a major clue.
Some of us will be working over the holidays. We plan to continue working this problem so we can get it resolved. We have still not reproduced it here – which is driving us a bit crazy. Any effort to get us that sample console app that experiences the same problem would be great.
More to come…
Support
Notes:
This post discusses VOIP Media Engine undocumented API procedures that are used for internal test purposes. Do not use these API procedures in your VOIP applications.
|
Back to Top |
|
|
|
|