I have a client who recently move a database server from on-premise to a Cloud provider. A database on this database server had a database link to their E-business database in the Oracle Cloud. Since the move, any sessions in the database that use the database link to the E-business database would hang if the query was to return large dataset.
Below is selecting from dual over the database link that worked:
SQL> set timing on
SQL> set autotrace on
SQL> select * from dual@ebs;
D
-
X
Elapsed: 00:00:00.13
Execution Plan
----------------------------------------------------------
Plan hash value: 272002086
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT REMOTE| | 1 | 2 | 2 (0)| 00:00:01 | |
| 1 | TABLE ACCESS FULL | DUAL | 1 | 2 | 2 (0)| 00:00:01 | PWGPSI |
----------------------------------------------------------------------------------------
Note
-----
- fully remote statement
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
0 consistent gets
0 physical reads
0 redo size
511 bytes sent via SQL*Net to client
492 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
SQL>
But selecting from a table hangs:
SQL> select * from XXX_PER_PEOPLE_F@ebs;
Hangs!......
The session in the source database shows the wait “SQL*Net more data from dblink”:
SID USERNAME STATUS MACHINE TERMINAL PROGRAM EVENT WAIT_CLASS WAIT_TIME SECONDS_IN_WAIT STATE
---------- -------------- -------- ------------------------------- ------------------------------ ------------------------------------------------ -------------------------------------------- ---------------------------- --------- --------------- -------------------
144 XXXXXXXXX ACTIVE XXX unknown sqlplus@xxxxx.xxxxx.xxx (TNS V1-V3) SQL*Net more data from dblink Network 0 341 WAITING
The session in the target database shows the wait “SQL*Net message from client”:
SID USERNAME EVENT SQL_ID MACHINE PROGRAM PROCESS
--- -------- --------------------------- -------- ------------- ------------------------------ ----------
35 XXX SQL*Net message from client XXX XXX.XXX.XXX oracle@XXX.XXX.XXX (TNS V1-V3) 15340
I traced the source session and could see the session hangs waiting for more data from the target (EBS) database:
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920328443464
*** 2018-07-17 11:13:47.934
WAIT #2: nam='SQL*Net message from client' ela= 11336000 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920339779586
WAIT #0: nam='single-task message' ela= 179653 p1=0 p2=0 p3=0 obj#=-1 tim=1495920339960679
WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920339989423
WAIT #0: nam='SQL*Net message from dblink' ela= 29064 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340018526
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340018982
WAIT #0: nam='SQL*Net message from dblink' ela= 30534 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340049542
WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340049793
WAIT #0: nam='SQL*Net message from dblink' ela= 42171 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340091996
WAIT #0: nam='SQL*Net more data from dblink' ela= 4 driver id=1413697536 #bytes=17 p3=0 obj#=-1 tim=1495920340092053
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340092253
WAIT #0: nam='SQL*Net message from dblink' ela= 28448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340120720
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340120854
WAIT #0: nam='SQL*Net message from dblink' ela= 30884 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340151761
WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340151816
WAIT #0: nam='SQL*Net message from dblink' ela= 28590 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340180420
WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340180491
WAIT #0: nam='SQL*Net message from client' ela= 207 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340180713
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340180787
WAIT #0: nam='SQL*Net message from dblink' ela= 28635 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340209434
WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340209472
WAIT #0: nam='SQL*Net message from client' ela= 70 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340209554
WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340209601
WAIT #0: nam='SQL*Net message from dblink' ela= 29357 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340238969
WAIT #0: nam='SQL*Net more data from dblink' ela= 6 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340239133
WAIT #0: nam='SQL*Net more data from dblink' ela= 37 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340239196
WAIT #0: nam='SQL*Net more data from dblink' ela= 124 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340239340
WAIT #0: nam='SQL*Net more data from dblink' ela= 37 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340239409
WAIT #0: nam='SQL*Net more data from dblink' ela= 93 driver id=1413697536 #bytes=3 p3=0 obj#=-1 tim=1495920340239528
WAIT #0: nam='SQL*Net more data from dblink' ela= 79 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340239632
WAIT #0: nam='SQL*Net more data from dblink' ela= 28330 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340268062
WAIT #0: nam='SQL*Net more data from dblink' ela= 4 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340268111
WAIT #0: nam='SQL*Net more data from dblink' ela= 168 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340268297
WAIT #0: nam='SQL*Net more data from dblink' ela= 55 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340268378
WAIT #0: nam='SQL*Net more data from dblink' ela= 69 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340268483
WAIT #0: nam='SQL*Net more data from dblink' ela= 47 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340268553
Hangs!...
After raising a SR with My Oracle Support (MOS) for the database we manage and for the one Oracle manage in the Oracle Cloud, it was concluded the package size (Session Data Unit) was larger then the allowable for a network component between the two databases. They referred to MOS note:
Query on “bigtable” from remote Client hangs (certain queries or fields) (Doc ID 2104257.1)
“SYMPTOMS
Certain queries are hanging when run from some remote Clients. However, other (smaller) queries are successful.
This is especially evident on queries that require more than 2kb of data to be returned.
Some examples:
select * from v$database; –> hangs
select count(*) from v$database; –> works
DESCRIBE with large data results –> hangs
DESCRIBE with small data results –> works
This might also show up with Database Links (DBLINK) as well.
CAUSE
A Network “security” device or setting (possibly local especially on a Microsoft Windows machine) is preventing or “altering” larger TCP packets from being transported across the network.
This in turn is causing the Client to wait on the Server for the data from the query, and the Server to wait on the Client (which thinks part of the packet is still on the way).
1. Check for settings like the DF (“Don’t Fragment”) bit being set.
2. Check for ALG SQL settings being enabled.
*Note: these causes are all external to Oracle so provided only as potential causes.
SOLUTION
Workaround
~~~~~~~~~
As a workaround (or test to prove this is or is not the issue) lower the SQL*Net SDU from the default size of 8192 to 1400 (see reference below for more details on this setting):
1. Add the following single line to the sqlnet.ora file on BOTH ends of the communication:
DEFAULT_SDU_SIZE = 1400
2. Restart the Listener(s) servicing the Database in question, make a new connection from the Client, and test the query that was hanging.
3. If this corrects the issue and allows queries to complete, then there is a network / system device or setting causing fragmentation, detention, or alteration of SQL packets mid-stream.”
Oracle Support set the SDU in the sqlnet.ora file on the target database server:
DEFAULT_SDU_SIZE = 1400
More info on DEFAULT_SDU_SIZE can be found here:
Database Net Services Reference -> 5 Parameters for the sqlnet.ora File -> DEFAULT_SDU_SIZE
And then restarted the listener. I did the same for the source database server and then the session no longer hanged 🙂 :
SQL> select * from XXX_PER_PEOPLE_F@ebs;
...
1069 rows selected.
Elapsed: 00:00:02.57
Update
28/08/2018 The issue wasn’t resolved and the MTU of the network cards on both the source and target servers was also changed to 1400. It could possibly been resolved by reducing the SDU further but it was decided to change the MTU on network cards. In most cases the SDU change will fix the issue, but otherwise the MTU on network card can also resolve the issue as in this case.
If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.
Thanks
Zed DBA (Zahid Anwar)