Hi,
In datacenter, our environment is combine with 2 esxi servers with SAN and backup with TSM for VE via API.
We have setup schedule with IBM TSM for VE to backup specific VM for several months with no problem.
However, I have just faced the problem with Oracle DB VM on Linux, while the backup process is creating, transfer, and remove snapshot.
Moreover, the VM is freeze and notice the error : "The redo log of EBS_DB_PRD-000001.vmdk is corrupted. If the problem persists, discard the redo log."
We try to find out the root cause with error log from ESXi
host.d
......
2014-10-03T20:14:38.720Z [6DAA1B70 info 'Vmsvc.vm:/vmfs/volumes/5379e76e-ad7aea00-b1cd-90e2ba62bb14/EBS_DB_PRD/EBS_DB_PRD.vmx'] State Transition (VM_STATE_CREATE_SNAPSHOT -> VM_STATE_ON)
2014-10-03T20:14:38.720Z [6DAE2B70 verbose 'Hostsvc'] Received state change for VM '158'
2014-10-03T20:14:38.720Z [6DAE2B70 info 'Guestsvc.GuestFileTransferImpl'] Entered VmPowerStateListener
2014-10-03T20:14:38.720Z [6DAE2B70 info 'Guestsvc.GuestFileTransferImpl'] VmPowerStateListener succeeded
2014-10-03T20:14:38.720Z [6DAA1B70 info 'Vimsvc.TaskManager'] Task Completed : haTask-158-vim.VirtualMachine.createSnapshot-162084035 Status success
2014-10-03T20:14:38.721Z [6F701B70 verbose 'Hbrsvc'] Replicator: VmReconfig ignoring VM 158 not configured for replication
2014-10-03T20:14:38.721Z [6F701B70 info 'Snmpsvc'] VmConfigListener: vm state change received, queueing reload request
2014-10-03T20:14:38.721Z [6F701B70 info 'Snmpsvc'] QueueReloadRequest: reload already scheduled, discarding event
2014-10-03T20:14:38.721Z [6DAE2B70 info 'Hbrsvc'] Replicator: powerstate change VM: 158 Old: 1 New: 1
2014-10-03T20:14:38.722Z [6DAE2B70 verbose 'Hbrsvc'] Replicator: Remove group no matching entry for VM (id=158)
2014-10-03T20:14:38.722Z [6DAE2B70 info 'Snmpsvc'] WriteV1Trap: generic 6 specific 1
2014-10-03T20:14:38.723Z [6DAE2B70 info 'Snmpsvc'] WriteV1Trap: serialized 3 varbinds
2014-10-03T20:14:38.723Z [6DAE2B70 info 'Snmpsvc'] WriteV1Trap: wrote 229 bytes
2014-10-03T20:14:38.723Z [6DAE2B70 info 'Snmpsvc'] NotifyAgent: write(76, /var/run/snmp.ctl, N) 1 bytes to snmpd
2014-10-03T20:14:38.723Z [6DAE2B70 info 'Snmpsvc'] WriteV1Trap: agent was notified
2014-10-03T20:14:38.747Z [FFB49B70 verbose 'Default' opID=14fdbdcc user=vpxuser] AdapterServer: target='vim.PerformanceManager:ha-perfmgr', method='GetPerfCounter'
2014-10-03T20:14:38.752Z [FFB49B70 verbose 'Locale' opID=14fdbdcc user=vpxuser] Default resource used for 'counter.vsanDomObj.writeThroughput.summary' expected in module 'perf'.
2014-10-03T20:14:38.765Z [FFB49B70 verbose 'Default' opID=14fdbdcc user=vpxuser] AdapterServer: target='vim.HostSystem:ha-host', method='retrieveInternalCapability'
2014-10-03T20:14:38.766Z [FFB49B70 verbose 'Default' opID=14fdbdcc user=vpxuser] AdapterServer: target='vim.PerformanceManager:ha-perfmgr', method='queryPerfCounterInt'
2014-10-03T20:14:38.774Z [FFB49B70 verbose 'Default' opID=14fdbdcc user=vpxuser] AdapterServer: target='vim.LicenseManager:ha-license-manager', method='GetLicenses'
2014-10-03T20:14:38.774Z [FFB49B70 verbose 'Vimsvc.ha-license-manager' opID=14fdbdcc user=vpxuser] Load: Loading existing file: /etc/vmware/license.cfg
2014-10-03T20:14:38.784Z [FFB49B70 verbose 'Default' opID=14fdbdcc user=vpxuser] ha-license-manager:Validate -> Valid license detected for "VMware ESX Server 5.0" (lastError=0, desc.IsValid:Yes)
2014-10-03T20:14:40.022Z [6F701B70 warning 'Statssvc.vim.PerformanceManager'] Calculating read OIO for scsi0:0 - delta is negative, prevTime = 1412367260 curTime = 1412367280 previIOTime = 310943703130 curIOTime = 1116324
2014-10-03T20:14:40.022Z [6F701B70 warning 'Statssvc.vim.PerformanceManager'] Calculating read I/O size for scsi0:0 -- commands delta is negative,prevBytes = 2623783380992 curBytes = 7589888 prevCommands = 217367621curCommands = 724
2014-10-03T20:14:40.022Z [6F701B70 warning 'Statssvc.vim.PerformanceManager'] Calculating write OIO for scsi0:0 - delta is negative, prevTime = 1412367260 curTime = 1412367280 previIOTime = 126822959837 curIOTime = 428952
2014-10-03T20:14:40.022Z [6F701B70 warning 'Statssvc.vim.PerformanceManager'] Calculating write I/O size for scsi0:0 -- commands delta is negative,prevBytes = 411122904064 curBytes = 102400 prevCommands = 47757172curCommands = 18
2014-10-03T20:14:40.064Z [6F701B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information
2014-10-03T20:14:40.064Z [6F701B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information
2014-10-03T20:14:40.064Z [6F701B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection. Turn on 'trivia' log for details
2014-10-03T20:14:43.566Z [6F701B70 info 'Snmpsvc'] DoReport: VM Poll State cache - report starting
2014-10-03T20:14:43.567Z [6F701B70 info 'Snmpsvc'] ReportVMs: processing vm 148
2014-10-03T20:14:43.568Z [6F701B70 info 'Snmpsvc'] ReportVMs: processing vm 149
2014-10-03T20:14:43.569Z [6F701B70 info 'Snmpsvc'] ReportVMs: processing vm 150
....
2014-10-03T20:14:59.903Z [FFB49B70 verbose 'SoapAdapter'] Responded to service state request
2014-10-03T20:15:00.065Z [6EC60B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not found. Please see the VMkernel log for detailed error information
2014-10-03T20:15:00.065Z [6EC60B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection: Sysinfo error on operation returned status : Not initialized. Please see the VMkernel log for detailed error information
2014-10-03T20:15:00.065Z [6EC60B70 verbose 'Statssvc.vim.PerformanceManager'] HostCtl Exception in stats collection. Turn on 'trivia' log for details
2014-10-03T20:15:01.547Z [6EE40B70 verbose 'Vmsvc.vm:/vmfs/volumes/5379e76e-ad7aea00-b1cd-90e2ba62bb14/ EBS_DB_PRD/ EBS_DB_PRD.vmx'] Handling message _vmx2: The redo log of EBS_DB_PRD-000001.vmdk is corrupted. If the problem persists, discard the redo log.
-->
2014-10-03T20:15:01.547Z [6EE40B70 warning 'Vmsvc.vm:/vmfs/volumes/5379e76e-ad7aea00-b1cd-90e2ba62bb14/[BRLN] EBS_DB_PRD/ EBS_DB_PRD.vmx'] Failed to find activation record, event user unknown.
2014-10-03T20:15:01.549Z [6EE40B70 info 'Vimsvc.ha-eventmgr'] Event 39448 : Message on EBS_DB_PROD on cloud01.a-host.co.th in ha-datacenter: The redo log of EBS_DB_PRD-000001.vmdk is corrupted. If the problem persists, discard the redo log.
-->
2014-10-03T20:15:01.549Z [6EE40B70 verbose 'Vmsvc.vm:/vmfs/volumes/5379e76e-ad7aea00-b1cd-90e2ba62bb14/ EBS_DB_PRD/ EBS_DB_PRD.vmx'] Setting current question to '_vmx2'
2014-10-03T20:15:01.773Z [FFB49B70 verbose 'SoapAdapter'] Responded to service state request
pam_per_user: create_subrequest_handle(): doing map lookup for user "root"
pam_per_user: create_subrequest_handle(): creating new subrequest (user="root", service="system-auth-generic")
Accepted password for user root from 172.25.2.2
....
We continue investigate on vmkernal.log
...
2014-10-03T20:14:36.420Z cpu11:39112)VSCSI: 3792: handle 8218(vscsi0:0):Creating Virtual Device for world 39099 (FSS handle 16330839) numBlocks=1363148800 (bs=512)
2014-10-03T20:14:36.420Z cpu11:39112)VSCSI: 271: handle 8218(vscsi0:0):Input values: res=0 limit=-2 bw=-1 Shares=1000
2014-10-03T20:14:47.471Z cpu21:33648)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/0> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu4:39102)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/4> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu4:39102)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <1/6> sid x010500, did x010300, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu4:39102)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/1> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu21:32834)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/2> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu4:39102)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/3> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu4:39102)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/5> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:14:47.472Z cpu4:39102)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/7> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:01.345Z cpu0:39106)WARNING: VmfsSparse: 3784: Real sector 1375731720 exceeds free sector 43712
2014-10-03T20:15:12.274Z cpu0:33649)NMP: nmp_ThrottleLogForDevice:2321: Cmd 0x85 (0x412e8b062480, 34572) to dev "naa.6005076040a1bfc01b03912c71ca3000" on path "vmhba0:C2:T0:L0" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0. Act:NONE
2014-10-03T20:15:12.274Z cpu0:33649)ScsiDeviceIO: 2337: Cmd(0x412e8b062480) 0x85, CmdSN 0x336 from world 34572 to dev "naa.6005076040a1bfc01b03912c71ca3000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2014-10-03T20:15:12.274Z cpu0:33649)ScsiDeviceIO: 2337: Cmd(0x412e8b062480) 0x4d, CmdSN 0x337 from world 34572 to dev "naa.6005076040a1bfc01b03912c71ca3000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x20 0x0.
2014-10-03T20:15:12.274Z cpu0:33649)ScsiDeviceIO: 2337: Cmd(0x412e8b062480) 0x1a, CmdSN 0x338 from world 34572 to dev "naa.6005076040a1bfc01b03912c71ca3000" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x5 0x24 0x0.
2014-10-03T20:15:47.381Z cpu0:41269)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/0> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:41269)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/4> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:39605)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <1/6> sid x010500, did x010300, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:32813)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/1> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:32813)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <1/2> sid x010500, did x010300, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:32813)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/3> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:32813)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/5> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:15:47.382Z cpu0:32813)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/7> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:17.027Z cpu29:34893)CBT: 2214: Created device 142e0b9-cbt for cbt driver with filehandle 21160121
......
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/0> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/4> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/6> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/1> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/2> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/3> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/5> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:16:46.876Z cpu28:33097)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <0/7> sid x010400, did x010000, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:17:48.472Z cpu30:33131)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <1/0> sid x010500, did x010300, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:17:48.472Z cpu30:33131)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/4> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:17:48.472Z cpu2:39099)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 0:(0):3271: FCP cmd x4d failed <1/6> sid x010400, did x010200, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
2014-10-03T20:17:48.472Z cpu30:33131)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 1:(0):3271: FCP cmd x4d failed <0/1> sid x010500, did x010100, oxid xffff SCSI Chk Cond - Illegal Req: Data(x2:x5:x20:x0)
....
I don't sure where the root cause of problems be: Backup solution, Storage, SAN Switch, ESXi, or Connectivity.
I pretty sure the space of this LUN is sufficiency for creating snapshot.
Could you please suggest me the root cause?
Thank you in advance