The usual IT babble
Posts tagged Linux-HA
OCF agent for Tivoli Storage Manager: redux
Jun 5th
Well, after I finished my first OCF agent back in October 2008, we have it running in production now for about ten months. During that time, we found quite a few points in which we’d like to improve the behaviour with that Linux-HA should handle TSM.
- Shutdown TSM nicely if possible (Cancel client sessions, cancel running processes and dismount mounted volumes)
- Better error handling
So, after another week of writing and testing with a small instance, I present the new OCF agent for Tivoli Storage Manager. It still has one or two weak points, but they are negligible. I still need to write the documentation for it, but the script should just work …
TSM: Restoring the database/recovery log to a point-in-time
Apr 24th
Well, my co-worker just called on my cell (it’s Friday, 16:00), and asked me which start-up script he needed to change in order to restore the database. My first response was, “ummm, that’s gonna be hard, we’re using heartbeat”.
Okay, so after a bit of asking I got out of him what he wanted to achieve by changing the start-up script. Apparently he did something to crash Tivoli Storage Manager (or rather repeatedly crash it) and wanted to restore the database. He talked to one of the systems partner we do have (and I’m happy we have them most of the time), who in return told him how to do it, but forgot a minute after he hung up the phone.
So, I went digging while he still was telling me how he got Tivoli to kick his own ass … After a bit, I thought “hrrrrrm, shouldn’t this be covered in the Tivoli documentation ?”, and surprisingly it’s actually covered in the documentation.
It’s actually rather simple.
- Stop the dsmserv Linux-HA cluster service (tsm-control ha stop tsm1)
- Setup the environment (since we’re running multiple instances of Tivoli Storage Manager – export DSMSERV_DIR, export DSMSERV_CONFIG)
- Enter the path of the server
- Run dsmserv restore db
- Wait some time (took about half an hour to restore the 95G database and the 10G recovery log)
- Start the dsmserv Linux-HA cluster service (tsm-control ha start tsm1)
- Update the server-to-server communication, since the restore db changes the communication verification token
> tsm-control ha stop tsm1 - tsm1 (dsmserv) -> ha: [ OK ] > export DSMSERV_DIR=/opt/tivoli/tsm/server/bin > export DSMSERV_CONFIG=/opt/tivoli/tsm/server/tsm1/dsmserv.opt > cd /opt/tivoli/tsm/server/tsm1 > /opt/tivoli/tsm/server/bin/dsmserv restore db todate=TODAY totime=08:00:00 source=dbbackup preview=no .... wait some time .... > tsm-control ha start tsm1 - tsm1 (dsmserv) -> ha: [ OK ]