Problem |
TSM Client cluster password deleted when generic resource brought online |
Solution |
Manually updating the cluster node password while the MSCS generic resource is offline causes the checkpoint file held on the quorum disk to become out of sync with the password entry for the cluster node in the registry. When bringing the generic resource online, the password entry for the cluster node is deleted and the ANS2050E error is observed in the error log. Assumptions:
In a cluster environment, the generic service resource used by TSM is used to control the stopping and starting of the scheduler service. It is also used to start the TSM scheduler service on the failover machine when a failover occurs. When the generic service resource is initialized, it compares the registry value of: HKEY_LOCAL_MACHINE\SOFTWARE\IBM\ADSM\CurrentVersion\BackupClient\Nodes\NODENAME\SERVERNAME With a checkpoint file located on the quorum drive (.cpt file). If the password for the client node is changed while the generic service resource was offline, this checkpoint file and the registry may become out of sync. When this occurs, the generic service resource will overwrite the value in the registry with the value in the checkpoint file, or it will remove the password value in the registry. One way to verify if the checkpoint file and registry have become out of sync, is to take the generic service resource offline, reset the password for the client node (using DSMC Q SE -OPTFILE=XXXX from the client command line), and try to start the TSM scheduler service without the generic service resource. If the scheduler service starts and maintains a "started" state, this confirms the out of sync state between the checkpoint file and the registry. There are two possible solutions; one is to contact Microsoft support to recreate the checkpoint file. The other is to follow the steps below which should also create the checkpoint file
The new checkpoint file has been written, matches the registry key, and the clusternode TSM scheduler can once again run under the control of the cluster Generic Resource. Without that second password reset, the Generic Resource fails as soon as the cluster fails over. With the second password reset, done while the Generic Resource is running, it rewrites the checkpoint file. |
Monday, June 20, 2011
TSM Client cluster password deleted when generic resource brought online
Subscribe to:
Posts (Atom)