HipHop-PHP APC memory leaks

I run a very busy personal hobby website that runs HipHop and gets a lot of queries.  99% of the website has very low TTL apc cached data and HipHop really hates this.  The HipHop apc_store() doesn’t seem to free expired items up and without regular restarts, it would quickly overflow the 24gb of RAM in the server.

So, if you are using HipHop, please don’t use apc_store.  Try memcache or Redis and avoid killing your server.

Missing dependancy Cache::Cache with Munin mysql_

If you’re getting this error with Ubuntu, then

apt-get install libcache-cache-perl
perl -e shell -MCPAN
install Cache::Cache
install IPC::ShareLite

 

 

Fatal HipHop-PHP bug

whatever you do, do not trust HipHop-PHP for doing any sort of packed string / binary operations where the packed string *could* start with \x00

There is a very nasty bug that will result in incorrect results when code hits this issue.

Forefront TMG 2010 Install errors

Do you get “The upgrade patch cannot be installed by the windows installer service because the program to be upgraded may be missing, or the update….” error message when you attempt to install Forefront TMG SP2?

if you do, then make sure that you install SP1 Update 1 first as there seems to be some issues when going directly to SP2.

Building HipHop with Percona MySQL

If you try to build HipHop with Percona Mysql libraries, you will get CMake MYSQL_INCLUDE_DIR not found errors. To fix

ln -s /usr/include/mysql.h /usr/include/mysql/

ln -s /usr/include/mysql_version.h /usr/include/mysql/

 

Redis vs MySQL or should it be MySQL + Redis?

One thing that I’ve found with very large tables in MySQL is that they suck when you need to return a large subset of ordered records.  The MySQL call for this is this (baseline explain shows filesort of 1,000,000 rows).  This demo table has 10,000,000 rows.

select detail1, detail2, city, country from data (...lots of joins...) where country = 'us' order by username desc limit 200000,1000; -- Time: 43.503ms

Even with indexes setup, on my backup server, this takes 20-40 seconds to execute: an extremely painful exercise (the real select has three  joins and IF conditions)

No matter what indexes where configured, forced or ignored , the query just would not execute fast enough.  This is where Redis was used to provide some much needed help.

Redis has an extremely useful sorted sets, which were used here to provide another query condition to MySQL, thus dropping the number of rows in the query.  Once the script code has been updated to ZADD the country code to a sorted set and the Redis cache updated with the MySQL data, the process looks like this (in semi-php)

$start = $redis->zrevrange('us', 210000,210000); (returns 5299000)
$end = $redis->zrevrange('us',200000,200000); (returns 5421000)

The two Redis calls to get the values, takes milliseconds.

If you then feed the values into MySQL and get the new call of

select detail1, detail2, country from data  (...lots of joins...)  where country = 'us' and id >= 5299091 and id < 5421000; -- Time: 0.406ms

As I am updating Redis on insert anyway, adding another Redis command to the pipeline, adds next to nothing to the overall processing.   Sure there is a risk of data getting out of sync if you purge data directly, outside the scope of the control scripts but I don’t do that.   It wouldn’t take more than a couple of minutes to add deletion code and maybe a check/resync but this speed is such a major benefit to my site and it stops a Google crawl from killing MySQL.

Pros – Order by Magnitude speed increases, much lower system load, faster web pages, less site lag

Cons – Redis uses 600mb of memory for the data, possible data sync issues, some extra coding.

Wishes – that MySQL would talk to Redis and that Redis sorted sets could take a packed int as the score (as an option) , instead of just a double as this would save a lot of memory.  It might even fit into the ziplist

Microsoft SQL recovery from cluster failure

Recently I was tasked with the recovery of a 300gb SQL database.  No problems I thought until I was given the details.  The server was one half of a Microsoft Cluster, the cluster service itself had failed and because of this, the SQL and IIS application weren’t available.  Not a problem I thought, until they went on and said that the server was off the network on a ESX server and wasn’t allowed to contact a domain controller for authentication.  This is were things started to get difficult.  Cluster server needs to authenticate as it needs certain rights.  It won’t run as LOCALSYSTEM, NETWORK_SERVICE or from the non-existent cached credentials.

This is how I got clustered SQL to work without cluster server, thus making the IIS application work.  Note, SQL cant use Shared Memory client access when it thinks its clustered.

  1. Set the SQL service to Automatic and to run as NETWORK SERVICE (in this case, it was MSSQL.1).  You can enter the username and any random password and it will accept it.
  2. Set the Cluster service to Disabled, just incase.
  3. Added the clustered IP number to the local network card.
  4. In HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL Server\MSSQL.1 I deleted the Cluster key and same again for MSSQL.2 and MSSQL.3 (as the server had Reporting and Analysis services running as well)
  5. In HKEY_LOCAL_MACHINE\Software\Microsoft\Microsoft SQL Server\MSSQL.1\MSSQLServer\SuperSocketNetLib\Tcp I deleted all the subkeys expect IP1 IP2 IP3 and IPAll.  In each of those keys, I populated the IpAddress fields with the IP address of the interface, the clustered IP number for the SQL resource and 127.0.0.1
  6. In the SQL Configuration Manager, I disabled Shared Memory for the server and client.  Set the server Named Pipe name to  \\.\pipe\sql\query (note that the old field might read something like \\.\pipe\$$\CLUSTERNAME\sql\query)
  7. Edit each web.config so that the connection string for each application is set to the name of the server and not the name of the cluster.

SQL started without Cluster and the IIS application was able to start.  This took some time of poking around, reverting snapshots but it shows that you can get a node up without much in the way of any domain service.  This process effectually strips out Cluster server from the application and service layers.

FYI, event viewer was showing a lot of errors in the run up to this

  • SQL Server could not spawn FRunCM thread (EventID 17120)
  • Could not start the network library because of an internal error in the network library (EventID 17826)
  • TDSSNIClient initialization failed with error 0x6d9 (EventID 17162)
  • Could not find any IP address that this SQL Server instance depends upon.  Make sire that the cluster service in running, that the dependency relationship between SQL server and network Name resources is correct and that the IP addresses on which this SQL Server instance depends are available)