A.    Solaris mpathadm and Auto-probing

MPATHADM has AUTO-PROBING feature but it is always NA. (look Table-1)

Auto-probing may be useful for path restore ( path restore is different than path failover/failback) after some logical (not hardware issues) storage problems occured and some paths failed. (look section B)

If you ask Oracle by Service Request, Oracle tells that it is a closed feature and it will never be opened again, it is an Oracle engineering decision. But they tell you if you want to use this feature, you can make a crontab job running the command “luxadm probeevery minute, it will have the same effect. Oracle simply explains that they request Storage vendors  to send notifications (like RSCN) to Solaris , instead of enabling auto-probing feature in mpathadm.

If you ask storage vendors by Service Request, they say that RSCN notifications are generated at hardware issues due to SCSI RFC, but for logical problems generally multipathing softwares periodically checks paths and restores failed paths when issue solved, many multipathing softwares has similar features for path checking (look Table-2), so sending notifications like RSCN to OS is not needed.

 Table-1. mpathadm output showing Auto Probing feature

[solaris-1]/#mpathadm show lu /dev/rdsk/c0t60060E801330AB00502030AB0000164Dd0s2
Logical Unit:  /dev/rdsk/c0t60060E801330AB00502030AB0000164Dd0s2
        mpath-support:  libmpscsi_vhci.so
        Vendor:  HITACHI
        Product:  OPEN-V      -SUN
        Revision:  7303
        Name Type:  unknown type
        Name:  60060e801330ab00502030ab0000164d
        Asymmetric:  no
        Current Load Balance:  round-robin
        Logical Unit Group ID:  NA
        Auto Failback:  on
        Auto Probing:  NA

 

Table-2. Some multipath softwares having similar feature.

 

Linux MPxIO

polling_interval

AIX MPxIO

hcheck_interval            

Veritas DMP

dmp_restore_interval   

Hitachi HDLM

Path Health Checking   

 

 

 

 

B.    Auto-probing ?  why so important?

SCSI RFC covers hardware issues (port failure,cable failure etc.) and sends RSCN notifications to Solaris and mpathadm handles them perfectly, there is no problem with that. But there may be logical issues at Storage side.  For example, think that your LUN is coming from a Hitachi GAD pair. GAD pair is formed from 2 Hitachi Storage boxes. (boxes are exactly same model and have exactly same configs). Hitachi GAD is Active/Active config, it is not Hitachi HUR or it is not ALUA. Think that GAD synchronization failed between 2 Storage boxes for some reason. When this happens half of your paths fail. Mpathadm understands it because it tries to make IO in RR fashion and detects failures when it can’t make IO on some paths. Think that logical problem at SAN solved and GAD synchronization re-established, now failed paths must be active again. But if no such path checking method like auto-probing, mpathadm does not try to check failed paths to understand if they became active again or not, so failed paths remain failed forever.  Mpathadm continues making IO in RR fashion only among its active paths and it is happy continuing its life with its remaining active paths. So, auto-probing is important for path restore if some logical storage errors caused path failures. Auto-probing is not necessary is problem is only at hardware failure, because paths will be restored automatically when hardware problem solved due to RSCN notification mechanism.

* Please feel free to communicate by bulent.yucesoy@gmail.com