Our Wavecom Fastrack 9000E GSM modem suddenly started giving an error and was refusing to send text messages. The device is attached to Centos 5.x VM which in turn uses a serial port on an ESXi 5.0 server. We use smsd to send the text messages.
# tail -f /var/log/smsd.log
2013-07-27 10:35:18,6, GSM1: Modem is registered to the network
2013-07-27 10:35:18,6, GSM1: Selecting PDU mode
2013-07-27 10:35:18,7, GSM1: -> AT+CMGF=0
2013-07-27 10:35:18,7, GSM1: Command is sent, waiting for the answer
2013-07-27 10:35:19,7, GSM1: <- OK
2013-07-27 10:35:19,7, GSM1: -> AT+CMGS=116
2013-07-27 10:35:19,7, GSM1: Command is sent, waiting for the answer
2013-07-27 10:35:19,7, GSM1: <- >
2013-07-27 10:35:19,7, GSM1: -> 0011000C914457099662760000FF742A95313CA73BCB74D0B42CB7A7C76550905D96D3552A500B242D0E9FD6A234AB03D1D1B458EB5E16D75BF698CB1C9ED35DEE32DD555FEB40436815C47C87C9A0F41CF45C8250CF25A80562BFC36450D85E9687CF651D48E6B2CD5820994B966381622EDC2D05
2013-07-27 09:45:22,7, GSM1: Command is sent, waiting for the answer
2013-07-27 09:45:24,7, GSM1: <- +CMS ERROR: 512
2013-07-27 09:45:24,3, GSM1: The modem said ERROR or did not answer.
2013-07-27 09:45:24,5, GSM1: Waiting 10 sec. before retrying
CMS Error 512 seems to be manufacturer specific which didn't really help. All config settings looked untouched and a reboot of the modem seemed to fix it for a few moments (or 2 texts to be accurate). We then swapped the modem out with a completely new one. Again it worked for 1-2 texts then fell over.
Rebooting the Centos OS, deleting all the messages from the SIM card using AT commands (as I suspected it may have been full), changing the serial cable, migrating the VM to another ESXi host (and therefore serial port) didn't help. we suspected it could be the network but that didn't seem likely as the log files suggested a modem error.
As a last resort we moved the GSM modem to the top of the rack as the only remaining variable was signal strength. This seemed to fix the issue as we had no trouble for a few days. However we wanted to confirm this so we started to investigate by writing a script that connected to the GSM modem and grabbed the signal strength. To do this we used minicom to connect to the serial port and used the AT+CSQ command to get the strength.
With the modem in its new position @mr_jamesparker decided to graph the GSM signal strength using Cacti (which isn't as easy as it sounds as he'll testify). Below you'll see the results. We moved the GSM modem back to its original position at 12:00 and as you see the signal started to become flaky.
From the Wavecom documentation:
Via Minicom it now showed the strength as low and containing "bit errors" (as shown below as well)
So that proved it was just the position of the GSM modem (which was half-way down the rack). When we moved it to the top of the rack the problem was resolved.