Been a while

Posted on Saturday, 16th February, 2008 in Life

Well, it’s been quite a while since most of the people last heard a word from me. The last few months I’ve been extremely busy with work-related tasks (and as a side-effect of that, didn’t want to spend much time in front of the computer after 9 hours of work). I also started spending more and more time in the gym, like nearly two hours every Tuesday and Thursday.

  • I finally fixed our replication issues, we do now have a working! MySQL Multi-Master (1. Node, 2. Node — bear in mind, this boxes are *only* serving MySQL and nothing else, so don’t use these configurations on mixed setups) Replication Setup as database back end for our TYPO3-vHosts.
  • all the web nodes are now serving the content from a clustered, shared SAN volume (is that a good thing ? :P - don’t know yet …)
  • our VI environment is getting more and more acceptance (even if you hear some complaints now and then, like “awww, damn that crap my 4GiB RAM, 2×3.0GHz Windows 2008 is running soooo choppy” - simple answer, don’t use Windows Server 2008 and/or Windows Vista!)
  • I finished prepping our VM templates (at least the Windows ones)
  • we’re still putting together the plans on whether or not invest into a VDI solution.

The next few weeks are gonna be as frantic as the weeks before, I still have to migrate a lot of TYPO3 installations to our new cluster (which sadly needs time, as we need to wait for DNS changes to propagate). Honestly, I might be ending up extending the SAN volume for the MySQL data storage, as even with only three somewhat busy sites, the binary log of the last 5 days is about 2GiB in size. And we still have ~20 other busy sites on a separate box.

Lucky me, I created the MySQL data storage on a logical volume, so I can easily extend the volume in the san-manager semi-online (the fs needs to be unmounted and thus the MySQL process), then extend the physical volume (LVM2 PV) and the logical volume (LV) afterwards, and at last the underlying EXT3 file system.

As some of you know by now, I am on extended leave for now. I don’t have tree access (at my own request), though I’m gonna try to keep up with Chris and 2008.0 … So long!


Typo3 and MySQL replication

Posted on Saturday, 8th September, 2007 in Life

Apparently the TYPO3 version we are using, doesn’t play too nice with the MySQL Master< ->Master replication.

Sometimes, something like this is going to happen:

070826  0:44:32 [ERROR] Slave: Error 'Duplicate entry '75-222419149' for key 1' on query. Default database: 't3nb'. Query: 'INSERT INTO cache_pagesection
070826  0:44:32 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'dbc-mysql1.000192' position 611861372

Well, as you can see from the last line in the log, the Slave-SQL thread found a duplicate entry and thought it is smart to just turn off the thread instead of disregarding the just made entry. So from now on, both databases drift since there ain’t no replication anymore until someone kick starts the replication again (someone being me).

Anyway, I think I finally traced the fucker down, supposedly one of the problematic cases is located in t3lib/class.t3lib_tstemplate.php on line 362.

$GLOBALS['TYPO3_DB']->exec_DELETEquery('cache_pagesection', 'page_id='.intval($GLOBALS['TSFE']->id).' AND mpvar_hash='.t3lib_div::md5int($GLOBALS['TSFE']->MP));
$GLOBALS['TYPO3_DB']->exec_INSERTquery('cache_pagesection', $insertFields);

Basically what TYPO3 is doing is a DELETE and an INSERT right afterwards. But apparently, it doesn’t check whether the DELETE even succeeded. I hacked it for now, simply adding this:

-                               $GLOBALS['TYPO3_DB']->exec_INSERTquery('cache_pagesection', $insertFields);
+                               // Only insert a new cache entry with the same value, if the DELETE succeeded
+                               if ($GLOBALS['TYPO3_DB']->sql_affected_rows() == 1)
+                                       $GLOBALS['TYPO3_DB']->exec_INSERTquery('cache_pagesection', $insertFields);
+

Sadly, this looks more and more like a race-condition between the two boxes (as in the replication / UPDATE being too slow), when users visit a edited site, that hasn’t had it’s cache regenerated yet. Problem is, it ain’t just this single spot, but also the search indexing, image cache and the whole page cache. For now we switched the cluster to active/passive load balancing, till we have a chance to see if a newer TYPO3 fixes those issues.


Continuing on SLES10

Posted on Saturday, 16th June, 2007 in Life

OK, it turns out that I was rather stupid when configuring the my.cnf. As it turned out, the effect I was seeing was due to the presence of two log-bin lines, which looked like the following:

[mysqld]
port = 3306
datadir = /mysql/dbase
log = /mysql/logs/dbc-mysql1.log
log-error = /mysql/logs/dbc-mysql1.err
socket = /var/lib/mysql/mysql.sock
bind = 172.16.234.31
 
# custom paths for binary logs
log-bin = /mysql/binlogs/dbc-mysql1
log-bin-index = /mysql/binlogs/dbc-mysql1.idx
relay-log = /mysql/binlogs/dbc-mysql1.relay

And some lines down there was this:

# custom paths for binary logs
log-bin

Now the next thing I encountered was while importing our old databases (they are like 1.1GiB each, 25 databases total). The second MySQL Master (and his Slave) will choke as soon as you dump the data too fast into the first Master, as the binlog seems to be too big for MySQL to transfer it via TCP (smth like “Packet too large - try increasing max_packet_size” in the error-log; only problem was that max_packet_size was already at 1GiB which is the absolut maximum for MySQL 5.0 according to the handbook).

A way around this (thanks to a co-worker who pushed me towards this road) is disabling all the MySQL Master/Slave stuff in your my.cnf, start the mysql daemon as a simple, dumb database, import all your databases, stop the mysql daemon; tar up the whole BASEDIR and scp/rssh it to your second master.

Clean out the BASEDIR on the second master, untar your tarball, edit your my.cnf again to include the whole Master/Slave portions on both boxes and you should be up and running :grin:
I haven’t run any tests on the Master< ->Master replication yet, but I’ll do that as soon as I’m at work again (which is the 27th June, as I’m off for vacation since yesterday, yay!)