Author |
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: May 19 2008 at 11:31am | IP Logged
|
|
|
Thanks for the update, we will check this immediately.
Question, you state we have to use the temp license to enable DTMF? So the old license will work but just not enabled DTMF? Or, the trial license is absolutely needed to make it work?
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 19 2008 at 1:59pm | IP Logged
|
|
|
Hi Juice,
<<< You
Question, you state we have to use the temp license to enable DTMF?
Support >>>
Yes. You will have to use the supplied temp trial license (LanScapeVME.c) to enable the new v6 “fully integrated DTMF” capability.
<<< You
So the old license will work but just not enabled DTMF?
Support >>>
Exactly. That’s the theory anyway :)
<<< You
…Or, the trial license is absolutely needed to make it work?
Support >>>
You do not need the trial license files to make your existing builds work just like the previous v5.12.8.1 image. You should simply be able to swap static import lib and DLL image for your current build.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: May 19 2008 at 5:29pm | IP Logged
|
|
|
Thanks, we went ahead and continued with current license, did not want to enable features at this moment which could impact our current DTMF handling.
So far, tests have worked well. Of course we started with the conference example we sent you, which worked flawlessly. And now have deployed in our live environment for more real world testing. We will let you know how testing goes, but so far looks to work perfectly.
Thanks again.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: May 23 2008 at 11:47am | IP Logged
|
|
|
When testing the latest release, we seem to be experiencing an issue similar to:
http://www.lanscapecorp.com/forum/forum_posts.asp?TID=452&PN =1
Although, we are not doing any registrations from the Media Engine, so slightly different. The problem also happens after some time, not sure of triggering event. Though, this seems to something new, at least we have not seen this exact behavior before.
What happens, is the when Invites start coming in, Lanscape fails to send 100 & 180 messages back. Also, we do not appear to be getting events of incoming calls for these last calls. (hard to say exactly, as this happens on a production machine with no debugging). After a little bit, the UAs seem to send CANCELs, but Lanscape sends back transaction does not exist message. We are trying to reproduce this error in one of our small test apps, but we uploaded a SIP log to FTP area to maybe help identify the issue.
Also, what ever lines were in use at the time seem to stay stuck and never go off hook, as does shutting down media engine seems to lock up, so something is definitely locked up - either our code or media engine. Happens much more with this release then we have seen previously - may or may not be related to conference (conference is enabled, but not being used on this machine).
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 23 2008 at 1:55pm | IP Logged
|
|
|
We do not see anything untoward or bad in the SIP that is being received by the media engine. However we will test a few of the INVITES being received from your log to make sure nothing in those INVITES are triggering a bug of some sort.
We do see what you have described (by looking at your SIP log). Its as though the received INVITE requests are simply being ignored. Hmmmm…. We have not seen this.
One thing that would be useful would be to dump the media engine internal phone line states to a log file. This will give us a snapshot of what is going on for each line when this problem occurs.
Here are two undocumented APIs that will dump phone line info to a log file or remote log server:
Code:
extern “C”
{
TELEPHONY_RETURN_VALUE VOIP_API DumpPhoneLineInfoToFile(SIPHANDLE hStateMachine, char *pFileName);
TELEPHONY_RETURN_VALUE VOIP_API DumpPhoneLineInfoToRemoteServer(SIPHANDLE hStateMachine, char *pServerName, int ServerPort);
}
|
|
|
If you can, try to call one of these procs when you detect that received INVITE requests are getting ignored. If we can get a log file from you, that may help. We will start looking into this problem at this end…
Support
Notes:
This post discusses VOIP Media Engine undocumented API procedures that are used for internal test purposes. Do not use these API procedures in your VOIP applications.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: May 23 2008 at 4:10pm | IP Logged
|
|
|
Thanks, we will add that to our shutdown routine, so when trying to restart it we can generate that report.
While shutting down one system, noticed that one line was stuck in call, after terminating the line, it never went to sip on hook. May or may not be related (Engine was still processing other calls) - but will have more info now since it is running with the call line log api you posted.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: May 24 2008 at 6:50am | IP Logged
|
|
|
Juice,
Is the machine you used for testing a multi-core?
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: May 28 2008 at 12:21pm | IP Logged
|
|
|
Hi Support, didn't notice this question until just now. Yes, our development machine with testing is 4 core (2 Hyper threaded CPUs), and the other 2 CPUs.
Right now, the issue is showing up about every other day or so. Our production machine, we just caught the issue, and tried to shutdown the server. We modified it to use those extra API calls to log to file. However, it appears to have locked during that call :( The log was created, but only has:
************* Log Opened (May 28 18:08:58) *************
We have tested the API procedure already and know it works, so it must be whatever state the lines get into internally, is preventing logging from happening. I've waited 5 minutes, and still log has no other data in it, so assume it is just stuck in that call. Our log also backs that up, as we have not reached the point of shutting down media engine, only entered that API call.
One thing to note, we see the following behavior:
With two lines (line 0 and line 199), line 0 hangs up. We then proceed to hang up line 199. Terminate call works (I recall that is where we saw deadlocking before), and we get control back from that method. The media engine sends bye, recieves ack (SipReceivedByeAck event happens)... But, never do we get SipCallComplete or SipOnHook for that line. So, it is now a lost line (and we also have open call session because call never completed).
Any thoughts for what to do next? Maybe it helps you to know that call log is locked and the event that never comes in.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: May 28 2008 at 12:22pm | IP Logged
|
|
|
Let me correct first response, the first machine is 4 Cores (2 real, 2 HT), the second computer is 2 CPUs. I note that my response is a little confusing :)
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: June 18 2008 at 11:42am | IP Logged
|
|
|
Any insight on why the call to log all line status would be locked up? Maybe a clue as to what is getting stuck?
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: June 19 2008 at 4:34pm | IP Logged
|
|
|
Hi Juice,
Thanks for waiting for this post. Its been a while since we posted regarding this thread.
The reason it has taken long to get back to this thread is because we are taking the steps and revamping how we are currently performing development and testing of the media engine.
We are replacing all of our LanScape host machines that are single core and updating all developers and testers who are involved with the media engine to multiprocessor hardware. It’s a bit costly but we decided that this change is necessary in order for us all to move forward. Frankly, we are tired of chasing threading issues. The only way to remove threading issues effectively is to develop and test on nothing but multiprocessor machines period. We will be throwing away any single core dev or test machines we were using and start using Quad core or better development and test machines.
Since we are still in the process of updating our development and test environments, it may be a bit longer to investigate the original problem that you reported. We hope to get back on this shortly.
In the mean time, if you want you can download from your support FTP account v5.12.8.4 and run tests against that image. It has updates that may help us. See file “v5.12.8.4 DLL Only - Expires 12-30-08.zip”.
You >>>
Any insight on why the call to log all line status would be locked up? Maybe a clue as to what is getting stuck?
<<< Support
Not sure at this point. Will probably understand why this is when we dig deeper.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: July 11 2008 at 11:33am | IP Logged
|
|
|
Hi Juice,
Thanks for being patient and waiting for this post. Developing, testing and QAing of the media engine always takes way more time than we want. Its just the way it is.
We have placed the v5.12.8.6 “engineering release” of the media engine into your support FTP user account. See the “DLL Only v5.12.8.6” directory. Download the entire contents of this directory.
The contents of the directory:
File:
v5.12.8.6 DLL Only - Expires 12-30-08.zip
Description:
Contains the new release. Replace your current files with the ones in this archive.
File:
XXXXXXX License Fri Jul 11 11_21_38 2008.zip
Description:
New license files. The new license has the fully integrated DTMF capability enabled and all other media engine options enabled. It will allow the media engine to enable up to 512 concurrent lines.
The license will also allow you to instantiate the media engine up to 1024 times on the same host machine. We wanted to do this to make your test/eval of this version easier. Also, it’s a small way we can say “thank you” for making your group wait so long for this version.
Files:
VOIP Media Engine v5.12.8.3 - Engineering Release Notes.htm
VOIP Media Engine v5.12.8.4 - Engineering Release Notes.htm
VOIP Media Engine v5.12.8.5 - Engineering Release Notes.htm
VOIP Media Engine v5.12.8.6 - Engineering Release Notes.htm
Description:
Release notes.
This version of the media engine has gone through the highest degree of multi-core testing to this date. The good news is that we have uncovered multiple and very subtle multi-core issues as the result of our testing on quad and dual quad core based host machines. In addition to that, multiple multi-core related performance improvements are included in this release that will improve call handling performance.
It is important that you continue testing and development using this new product image.
Other Notes:
Regarding conference call deadlock issues: This release has been verified to remove a previously identified conference deadlock issue that you reported. We did end up verifying the existence of this problem when performing our testing on quad and higher multi-core host machines. As is often the case, simple human error was the culprit. The problem was the typical deadlock: multiple threads trying to access and fighting for the same resources.
As a rule, we make every possible attempt to remove serialization of code in order to get the most out of the underlying host machine and processors. Doing so however is a tough task especially where conferencing is concerned. The internal logic that is responsible for coordinating phone lines, their states and their media is pretty involved. Because of this multithreading complexity, bugs can creep in. Without this complexity, we would not be able to achieve the multithreaded performance we currently enjoy.
The example conference test app you submitted to us works well with this updated version of the media engine. I forget the exact number of calls we allowed it to process but it was in the millions of calls as it executed for a week or so.
We did identify why the DumpPhoneLineInfoToFile() and DumpPhoneLineInfoToRemoteServer() APIs were deadlocking. These 2 APIs used the same internal thread logic as is used by conferencing. If conferencing logic is deadlocked, these two APIs will also deadlock.
Known issues:
As you know, we are chasing down a Vista audio playback issue that was recently reported. We will get this problem resolved shortly.
Post back to this thread as needed.
Thanks juice!
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: July 16 2008 at 11:28am | IP Logged
|
|
|
This is excellent news! The engine (while periodically locking up certain lines and having issues shutting down) has been relatively stable as of late, so the problem became a little less urgent, so waiting was not so bad. However, excellent that this new release should fix the remaining of issues. We will test and provide updates as we find them.
Thanks again for your hard work. And we will get back to you soon.
Thanks.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: July 23 2008 at 12:22pm | IP Logged
|
|
|
Ok, after a few days testing, we have a bit of feedback. So far, the deadlock has not shown up. However, another serious issue has come up, which we've not seen before.
Twice, the engine has hit a break point inside MS runtime: _lseeki64_nolock. First time this happened, was when we interacted with our test client by simply entering a text command to a console window, and it was to display a count of all handled sessions so far.
Next time it happened, after simply leaving the console client alone it just hit the break point.
We've uploaded a mini crash dump of the issue to the FTP account. Maybe this can help locate the cause? All we see is the call stack coming from a Lanscape thread. At the time of this crash, it looks like a session was deleted (lines hung up and resources being freed).
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: July 23 2008 at 2:01pm | IP Logged
|
|
|
Juice,
The deadlock news is good.
About the “break point inside MS runtime”:
Do you have a step by step recipe that we can use to reproduce?
In other words, what steps did you perform in the console window of your test app to get the issue to occur.
Is the “test client” the same test app we ran here?
Are you running your test client under the debugger + IDE or release buld?
We will look into it right away. We have not seen what you describe.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: July 23 2008 at 2:35pm | IP Logged
|
|
|
No, we do not currently know what causes the break point triggering. Simply, it has happened twice - each time about after a day of running. It is a debug build running under the debugger. The dump should allow you to examine your threads we hope. Also, it is in a larger program, not the demo application from before.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: July 24 2008 at 1:34pm | IP Logged
|
|
|
Interesting, after a night of running, another break point hit. Though, this time we did not make a dump. Also, it was a different break point (with no source available) from a less deep call stack. Appeared to come after a SipInCall message. We will have to try to reproduce this with the demo app.
However, was the previous stack trace we provided of any help?
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: July 25 2008 at 7:43am | IP Logged
|
|
|
Juice,
Thanks or the additional info. More info is always better...
We will be looking at your minidump this morning. Probably in the next 15 minutes. If the break point hits again, create a minidump with full heap if you can and upload as normal. It will be a large image but who cares.
We will post when we have more info.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: July 25 2008 at 10:30am | IP Logged
|
|
|
Juice,
Thanks for the dump file. We see what occurred and why.
Grab the updated image via support FTP account. See the “DLL Only v5.12.8.7” directory. Use your existing license files.
Read the release notes, test further and repost with your results.
Thanks,
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: July 25 2008 at 12:15pm | IP Logged
|
|
|
Thanks for the update, we will download and retest immediately.
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: August 03 2008 at 7:43pm | IP Logged
|
|
|
Well, this last build is also pretty unstable. Getting a couple random(ish) crashes/breaks. Still looking to get a debug capture of the issue. Weird, how the original build was not causing breakpoints/crashes, only some deadlocks - but now we are seeing break points.
Hopefully, we can get you a dump soon.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: August 04 2008 at 12:07pm | IP Logged
|
|
|
Juice,
Not the feedback we wanted to get.
We will have to get a mini dump from you. There must be an uninitialized pointer or memory corruption occurring somewhere that our testing is not discovering.
It may be due to your specific SIP call flows or something similar. The rub is – we have been testing the SH** out of these latest versions and are not seeing the situation you are reporting. Software…..
We will investigate and fix immediately as soon as we get a dump from you.
Support
|
Back to Top |
|
|
juice Vetran
Joined: December 05 2006 Location: United States Posts: 139
|
Posted: August 13 2008 at 9:55pm | IP Logged
|
|
|
It's hard to get the sample app to crash, but such is expected with real traffic going to/from different UA's and SIP Providers. We understand tracking down the issue is going to be a bit painful.
We just experienced a break/crash, and uploaded a dump.dmp file to the ftp site. This looks different than the others, though if it is some kind of pointer issue, it's unlikely that any crash/break will look like another. This appears to have happened within some small callstack of a lanscape thread as an incoming call was happening.
Thanks again for your continued support in this matter.
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: August 14 2008 at 6:50am | IP Logged
|
|
|
Juice,
We will look into this immediately this morning. Good work getting the dump image. We know its a pain...
One question:
This time, did it take a long time to experience the crash?
We will repost as soon as we locate something.
Support
|
Back to Top |
|
|
support Administrator
Joined: January 26 2005 Location: United States Posts: 1666
|
Posted: August 14 2008 at 9:50am | IP Logged
|
|
|
Juice,
Never mind our last question. We get what you were saying – It is hard to get the thing to crash - It takes a while to see the crash.
Looking at the dump image:
1)
This “crash” is a different issue from the last snafu you reported.
2)
It appears that a G729 call is active on a phone line when the crash occurs.
3)
Internally the media engine was in the middle of performing an audio mix operation of sample block audio that is destined to be transmitted out the phone line.
4)
When the crash occurs, the mix operation is attempting to convert internal sample block data from 8k PCM to G729 just before sending the converted G729 sample block data to the phone line’s RTP transmitter code. The format/rate conversion from PCM to G729 looks to be failing. Possibly due to a memory corruption somewhere that is affecting the format/rate conversion or due to the call being terminated while the digital mix operation is being performed (a critical section issue). Even though the digital mixing and format/rate code internally is multithreaded and is properly protected by critical sections, there may be a critical section or multithreading logic hole – its possible.
At this point it is hard to see post mortem what actually is causing the crash. What we need is a mini dump with full heap in order to really see what led up to the crash. We need to inspect heap memory. Just looking at stack space and stack variables doesn’t give us the info we need.
If we try to hack in a change, we will only be guessing at what the real problem is. We could end up guessing for many software iterations and still not get it resolved.
Please let us know if you can get us a full heap mini dump.
In the mean time, we will try to synthesize the failure here.
As always, if you have a sample app that exhibits this crash – that will be a huge help.
Support
|
Back to Top |
|
|