Stephen Gutknecht (SAPDB)
2002-10-04 08:53:52 UTC
As some of you know, I'm pulling my hair out trying to track down why we
have both driver crashes and complete stalls of the SAPDB software. On our
production web sites, about every 24 hours we run into a stall of the
database server or a crash of the driver. Stalls can last as long as 5
minutes where SAPDB doesn't respond. We believe we have eliminated all
hardware / network issues (yes, even DNS).
Our site has a lot of concurrency, so I've been trying to devise "stress
tests" that do a lot of small transactions over and over like we do on our
web site. The problem seems to only crop up after accumulated usage,
pointing to small leaks or other problems...
A week or so ago I posted a command line dotNet stress test program that
could pretty quickly generate errors out of the ODBC driver. Most common
were -709 errors.
Now I have a new symptom of the problem. While running my web site testing
today, I was looking at DBMCLI and found that I once got hit with an error
from DBMCLI! So I decided to stress test DBMCLI.
Following is a batch file to run DBMCLI in a loop and have it stop once it
hits an error.
==== BEGIN Win32 BATCH FILE =====
@ECHO OFF
REM ***
REM *** Change the following line to be unique for each run
REM ***
SET O=lasterr1.txt
SET Udbm=dbm,dbm
SET DB1=TST
SET A=
:Top
SET A=%A%!
IF %A%==!!!!!!!!!!!!!!!!!! GOTO ShowOne
dbmcli -n localhost -d %DB1% -u %Udbm% -uSQL -c info state > %O%
IF ERRORLEVEL 1 GOTO ERROR1
GOTO Top
:ShowOne
ERASE %0%
dbmcli -n localhost -d %DB1% -u %Udbm% -uSQL -c info state
IF ERRORLEVEL 1 GOTO ERROR1
rem *** ping a host you can't find to simulate a sleep comand.
ping -w 1000 -n 1 192.187.188.2
SET A=
GOTO Top
:ERROR1
ECHO ***
ECHO *** Error encountered!
ECHO ***
IF EXIST %0% TYPE %O%
==== END Win32 BATCH FILE =====
Works great, runs for hours without problem.
The problem starts when you go ahead and open up a second CMD.EXE prompt and
run a second instance of the batch file at the same time.
IMPORTANT: To run more than one test at the same time, you need to make a
second copy of the BATCH file and revise the O parameter on line 5. The O
parameterneeds to be unique for each (concurrent) instance. Example:
TEST1.BAT have O=test1.txt and TEST2.BAT have O=test2.txt on line 5.
After only a few minutes, I start getting errors like:
1. Error! Connection failed to node localhost for database TST:
ERR_USRFAIL: user authorization failed
2. -24988,ERR_SQL: sql error
-4008,Unknown user name/password combination
3. Error! Connection failed to node localhost for database TST: could not
connect to socket [10048]
4. The syntax of the command is incorrect.
Or just run ONE instance at the same time you have a looping ODBC
application running, and you start getting these random errors from the ODBC
side:
ERROR [08001] [SAP AG][SQLOD32 DLL][SAP DB]Unable to connect to data
source;-709 CONNECT: (could not connect to socket [10048])
Correct me if I'm wrong, but the DBMCLI isn't using ODBC, so is the problem
deeper within SAPDB than the ODBC driver? I notice DBMCLI produces
different errors depending on how long things have been running. If you
reboot and start fresh, it takes a while for errors to appear -- and the
pattern of errors seems to change after the tests have been running for some
time. Leaks?
Please, if you have time, try to reproduce and track down these problems.
Anyone have time to test Linux for the same problem?
Thank you.
Stephen Gutknecht
Renton, Washington USA
have both driver crashes and complete stalls of the SAPDB software. On our
production web sites, about every 24 hours we run into a stall of the
database server or a crash of the driver. Stalls can last as long as 5
minutes where SAPDB doesn't respond. We believe we have eliminated all
hardware / network issues (yes, even DNS).
Our site has a lot of concurrency, so I've been trying to devise "stress
tests" that do a lot of small transactions over and over like we do on our
web site. The problem seems to only crop up after accumulated usage,
pointing to small leaks or other problems...
A week or so ago I posted a command line dotNet stress test program that
could pretty quickly generate errors out of the ODBC driver. Most common
were -709 errors.
Now I have a new symptom of the problem. While running my web site testing
today, I was looking at DBMCLI and found that I once got hit with an error
from DBMCLI! So I decided to stress test DBMCLI.
Following is a batch file to run DBMCLI in a loop and have it stop once it
hits an error.
==== BEGIN Win32 BATCH FILE =====
@ECHO OFF
REM ***
REM *** Change the following line to be unique for each run
REM ***
SET O=lasterr1.txt
SET Udbm=dbm,dbm
SET DB1=TST
SET A=
:Top
SET A=%A%!
IF %A%==!!!!!!!!!!!!!!!!!! GOTO ShowOne
dbmcli -n localhost -d %DB1% -u %Udbm% -uSQL -c info state > %O%
IF ERRORLEVEL 1 GOTO ERROR1
GOTO Top
:ShowOne
ERASE %0%
dbmcli -n localhost -d %DB1% -u %Udbm% -uSQL -c info state
IF ERRORLEVEL 1 GOTO ERROR1
rem *** ping a host you can't find to simulate a sleep comand.
ping -w 1000 -n 1 192.187.188.2
SET A=
GOTO Top
:ERROR1
ECHO ***
ECHO *** Error encountered!
ECHO ***
IF EXIST %0% TYPE %O%
==== END Win32 BATCH FILE =====
Works great, runs for hours without problem.
The problem starts when you go ahead and open up a second CMD.EXE prompt and
run a second instance of the batch file at the same time.
IMPORTANT: To run more than one test at the same time, you need to make a
second copy of the BATCH file and revise the O parameter on line 5. The O
parameterneeds to be unique for each (concurrent) instance. Example:
TEST1.BAT have O=test1.txt and TEST2.BAT have O=test2.txt on line 5.
After only a few minutes, I start getting errors like:
1. Error! Connection failed to node localhost for database TST:
ERR_USRFAIL: user authorization failed
2. -24988,ERR_SQL: sql error
-4008,Unknown user name/password combination
3. Error! Connection failed to node localhost for database TST: could not
connect to socket [10048]
4. The syntax of the command is incorrect.
Or just run ONE instance at the same time you have a looping ODBC
application running, and you start getting these random errors from the ODBC
side:
ERROR [08001] [SAP AG][SQLOD32 DLL][SAP DB]Unable to connect to data
source;-709 CONNECT: (could not connect to socket [10048])
Correct me if I'm wrong, but the DBMCLI isn't using ODBC, so is the problem
deeper within SAPDB than the ODBC driver? I notice DBMCLI produces
different errors depending on how long things have been running. If you
reboot and start fresh, it takes a while for errors to appear -- and the
pattern of errors seems to change after the tests have been running for some
time. Leaks?
Please, if you have time, try to reproduce and track down these problems.
Anyone have time to test Linux for the same problem?
Thank you.
Stephen Gutknecht
Renton, Washington USA