I have a client who recently move a database server from on-premise to a Cloud provider. A database on this database server had a database link to their E-business database in the Oracle Cloud. Since the move, any sessions in the database that use the database link to the E-business database would hang if the query was to return large dataset.
Below is selecting from dual over the database link that worked:
SQL> set timing on SQL> set autotrace on SQL> select * from dual@ebs; D - X Elapsed: 00:00:00.13 Execution Plan ---------------------------------------------------------- Plan hash value: 272002086 ---------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Inst | ---------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT REMOTE| | 1 | 2 | 2 (0)| 00:00:01 | | | 1 | TABLE ACCESS FULL | DUAL | 1 | 2 | 2 (0)| 00:00:01 | PWGPSI | ---------------------------------------------------------------------------------------- Note ----- - fully remote statement Statistics ---------------------------------------------------------- 0 recursive calls 0 db block gets 0 consistent gets 0 physical reads 0 redo size 511 bytes sent via SQL*Net to client 492 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed SQL>
But selecting from a table hangs:
SQL> select * from XXX_PER_PEOPLE_F@ebs; Hangs!......
The session in the source database shows the wait “SQL*Net more data from dblink”:
SID USERNAME STATUS MACHINE TERMINAL PROGRAM EVENT WAIT_CLASS WAIT_TIME SECONDS_IN_WAIT STATE ---------- -------------- -------- ------------------------------- ------------------------------ ------------------------------------------------ -------------------------------------------- ---------------------------- --------- --------------- ------------------- 144 XXXXXXXXX ACTIVE XXX unknown sqlplus@xxxxx.xxxxx.xxx (TNS V1-V3) SQL*Net more data from dblink Network 0 341 WAITING
The session in the target database shows the wait “SQL*Net message from client”:
SID USERNAME EVENT SQL_ID MACHINE PROGRAM PROCESS --- -------- --------------------------- -------- ------------- ------------------------------ ---------- 35 XXX SQL*Net message from client XXX XXX.XXX.XXX oracle@XXX.XXX.XXX (TNS V1-V3) 15340
I traced the source session and could see the session hangs waiting for more data from the target (EBS) database:
WAIT #2: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920328443464 *** 2018-07-17 11:13:47.934 WAIT #2: nam='SQL*Net message from client' ela= 11336000 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920339779586 WAIT #0: nam='single-task message' ela= 179653 p1=0 p2=0 p3=0 obj#=-1 tim=1495920339960679 WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920339989423 WAIT #0: nam='SQL*Net message from dblink' ela= 29064 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340018526 WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340018982 WAIT #0: nam='SQL*Net message from dblink' ela= 30534 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340049542 WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340049793 WAIT #0: nam='SQL*Net message from dblink' ela= 42171 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340091996 WAIT #0: nam='SQL*Net more data from dblink' ela= 4 driver id=1413697536 #bytes=17 p3=0 obj#=-1 tim=1495920340092053 WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340092253 WAIT #0: nam='SQL*Net message from dblink' ela= 28448 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340120720 WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340120854 WAIT #0: nam='SQL*Net message from dblink' ela= 30884 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340151761 WAIT #0: nam='SQL*Net message to dblink' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340151816 WAIT #0: nam='SQL*Net message from dblink' ela= 28590 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340180420 WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340180491 WAIT #0: nam='SQL*Net message from client' ela= 207 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340180713 WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340180787 WAIT #0: nam='SQL*Net message from dblink' ela= 28635 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340209434 WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340209472 WAIT #0: nam='SQL*Net message from client' ela= 70 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=1495920340209554 WAIT #0: nam='SQL*Net message to dblink' ela= 0 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340209601 WAIT #0: nam='SQL*Net message from dblink' ela= 29357 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340238969 WAIT #0: nam='SQL*Net more data from dblink' ela= 6 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340239133 WAIT #0: nam='SQL*Net more data from dblink' ela= 37 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340239196 WAIT #0: nam='SQL*Net more data from dblink' ela= 124 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340239340 WAIT #0: nam='SQL*Net more data from dblink' ela= 37 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340239409 WAIT #0: nam='SQL*Net more data from dblink' ela= 93 driver id=1413697536 #bytes=3 p3=0 obj#=-1 tim=1495920340239528 WAIT #0: nam='SQL*Net more data from dblink' ela= 79 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340239632 WAIT #0: nam='SQL*Net more data from dblink' ela= 28330 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340268062 WAIT #0: nam='SQL*Net more data from dblink' ela= 4 driver id=1413697536 #bytes=2 p3=0 obj#=-1 tim=1495920340268111 WAIT #0: nam='SQL*Net more data from dblink' ela= 168 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340268297 WAIT #0: nam='SQL*Net more data from dblink' ela= 55 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1495920340268378 WAIT #0: nam='SQL*Net more data from dblink' ela= 69 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340268483 WAIT #0: nam='SQL*Net more data from dblink' ela= 47 driver id=1413697536 #bytes=4 p3=0 obj#=-1 tim=1495920340268553 Hangs!...
After raising a SR with My Oracle Support (MOS) for the database we manage and for the one Oracle manage in the Oracle Cloud, it was concluded the package size (Session Data Unit) was larger then the allowable for a network component between the two databases. They referred to MOS note:
Query on “bigtable” from remote Client hangs (certain queries or fields) (Doc ID 2104257.1)
“SYMPTOMS
Certain queries are hanging when run from some remote Clients. However, other (smaller) queries are successful.
This is especially evident on queries that require more than 2kb of data to be returned.
Some examples:
select * from v$database; –> hangs
select count(*) from v$database; –> works
DESCRIBE with large data results –> hangs
DESCRIBE with small data results –> works
This might also show up with Database Links (DBLINK) as well.
CAUSE
A Network “security” device or setting (possibly local especially on a Microsoft Windows machine) is preventing or “altering” larger TCP packets from being transported across the network.
This in turn is causing the Client to wait on the Server for the data from the query, and the Server to wait on the Client (which thinks part of the packet is still on the way).
1. Check for settings like the DF (“Don’t Fragment”) bit being set.
2. Check for ALG SQL settings being enabled.
*Note: these causes are all external to Oracle so provided only as potential causes.
SOLUTION
Workaround
~~~~~~~~~
As a workaround (or test to prove this is or is not the issue) lower the SQL*Net SDU from the default size of 8192 to 1400 (see reference below for more details on this setting):
1. Add the following single line to the sqlnet.ora file on BOTH ends of the communication:
DEFAULT_SDU_SIZE = 1400
2. Restart the Listener(s) servicing the Database in question, make a new connection from the Client, and test the query that was hanging.
3. If this corrects the issue and allows queries to complete, then there is a network / system device or setting causing fragmentation, detention, or alteration of SQL packets mid-stream.”
Oracle Support set the SDU in the sqlnet.ora file on the target database server:
DEFAULT_SDU_SIZE = 1400
More info on DEFAULT_SDU_SIZE can be found here:
Database Net Services Reference -> 5 Parameters for the sqlnet.ora File -> DEFAULT_SDU_SIZE
And then restarted the listener. I did the same for the source database server and then the session no longer hanged 🙂 :
SQL> select * from XXX_PER_PEOPLE_F@ebs; ... 1069 rows selected. Elapsed: 00:00:02.57
Update
28/08/2018 The issue wasn’t resolved and the MTU of the network cards on both the source and target servers was also changed to 1400. It could possibly been resolved by reducing the SDU further but it was decided to change the MTU on network cards. In most cases the SDU change will fix the issue, but otherwise the MTU on network card can also resolve the issue as in this case.
If you found this blog post useful, please like as well as follow me through my various Social Media avenues available on the sidebar and/or subscribe to this oracle blog via WordPress/e-mail.
Thanks
Zed DBA (Zahid Anwar)
Hello Zed
The post is good. Keep it up
LikeLike
Thanks 🙂
LikeLike
i have similar issue we are trying to migrate the data from one db to another via db link. when the data set is large like 30M or so it will complete fetching data over the db link and hten insert in the calling db. Now it again does a query over the same dbl ink to get next set of 30 M and it will hang. Session needs to be killed. But if i end the session and call the next 30M in different script it will work. What is reason for that ? any workarounds ?
LikeLike
Hi Aarti,
It’s hard to say without knowing the exact details. I’d say check the wait events on both source and target. Use tracing if necessary like I did. Failing this, raise an SR with Oracle Support to assist you.
Hope that helps
Kind Regards
ZedDBA
LikeLike