A. Solaris mpathadm and Auto-probing
MPATHADM has AUTO-PROBING feature but it is always NA. (look Table-1)
Auto-probing may be useful for path restore
( path restore is different than path failover/failback) after some logical (not hardware issues) storage problems
occured and some paths failed. (look
section B)
If you ask Oracle by Service Request, Oracle tells that it
is a closed feature and it will never be opened again, it is an Oracle
engineering decision. But they tell you if you want to use this feature, you
can make a crontab job running the
command “luxadm probe” every minute, it will have the same
effect. Oracle simply explains that they
request Storage vendors to send
notifications (like RSCN) to Solaris , instead of enabling auto-probing feature
in mpathadm.
If you ask storage vendors by Service Request, they say that RSCN notifications
are generated at hardware issues due to SCSI RFC, but for logical problems generally
multipathing softwares periodically checks paths and restores failed paths when
issue solved, many multipathing softwares has similar features for path
checking (look Table-2), so sending
notifications like RSCN to OS is not needed.
Table-1. mpathadm output showing Auto
Probing feature
[solaris-1]/#mpathadm
show lu /dev/rdsk/c0t60060E801330AB00502030AB0000164Dd0s2 |
Table-2. Some multipath softwares
having similar feature.
|
B. Auto-probing ? why so important?
SCSI RFC covers hardware issues
(port failure,cable failure etc.) and sends RSCN notifications to Solaris and
mpathadm handles them perfectly, there is no problem with that. But there may
be logical issues at Storage side. For
example, think that your LUN is coming from a Hitachi GAD pair. GAD pair is formed
from 2 Hitachi Storage boxes. (boxes are exactly same model and have exactly
same configs). Hitachi GAD is Active/Active config, it is not Hitachi HUR or it
is not ALUA. Think that GAD synchronization failed between 2 Storage boxes for
some reason. When this happens half of your paths fail. Mpathadm understands it
because it tries to make IO in RR fashion and detects failures when it can’t
make IO on some paths. Think that logical problem at SAN solved and GAD
synchronization re-established, now failed paths must be active again. But if
no such path checking method like auto-probing, mpathadm does not try to check
failed paths to understand if they became active again or not, so failed paths
remain failed forever. Mpathadm
continues making IO in RR fashion only among its active paths and it is happy
continuing its life with its remaining active paths. So, auto-probing is
important for path restore if some logical storage errors caused path failures.
Auto-probing is not necessary is problem is only at hardware failure, because
paths will be restored automatically when hardware problem solved due to RSCN
notification mechanism.
*
Please feel free to communicate by bulent.yucesoy@gmail.com