test suite Archives - Random hacking

You know everything about how MySQL treats UTF8 and LATIN1 charsets and how the collation table impacts on selection and insertion of data, right?

Great, then stop reading :)

I don't and since I'm in the process of setting up a new version of the Opera accounts database, I really don't want to screw up things. I tried to fully understand how MySQL works in this respect (charsets, collations, etc…) but reading documentation and memorizing it wasn't very easy. Plus, there's a thousands blog posts on the matter, not always 100% accurate.

So I thought I'd better get hands on and I wrote a kind of database test suite.

Now this test suite is hooked up to the main project builds on Jenkins. Here's a sample output:


[...]
[workspace] $ /bin/sh -xe /tmp/hudson3255767718598715423.sh
+ ./bin/run-dbtest-suite
basedir=/var/lib/jenkins/jobs/auth-db/workspace
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/__initdb__.my ........................................... 
1..2
ok 1 - Using utf8tests database
ok 2 - Server charset is latin1
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_bin.my ................................... 
1..6
ok 1 - All our records are there. No duplicate key error.
ok 2 - utf8_bin collation does not collate a/Ã¢/Ã /A/...
ok 3 - utf8_bin collation does not collate a/Ã¢/Ã /A/...
ok 4 - utf8_bin collation does not collate a/Ã¢/Ã /A/...
ok 5 - Query for mixed-case username does not return lowercase username
ok 6 - Query for upper-case username does not return lowercase username
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_general_ci.my ............................ 
1..7
ok 1 - Collation for t007 is utf8_general_ci
ok 2 - utf8_general_ci collation normalizes accents, diacritics and the like
ok 3 - A and Ã… are collated to the same character in the utf8_general_ci table
ok 4 - Ã¥ and Ã… are collated to the same character in the utf8_general_ci table
ok 5 - lower/upper case chars are collated in the utf8_general_ci table
ok 6 - lower/upper case chars are collated in the utf8_general_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/collation-utf8_unicode_ci.my ............................ 
1..7
ok 1 - Collation for t005 is utf8_unicode_ci
ok 2 - utf8_unicode_ci collation normalizes accents, diacritics and the like
ok 3 - A and Ã… are collated to the same character in the utf8_unicode_ci table
ok 4 - Ã¥ and Ã… are collated to the same character in the utf8_unicode_ci table
ok 5 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 6 - lower/upper case chars are collated in the utf8_unicode_ci table
ok 7 - We are allowed to insert all records just because there is no unique constraint
ok
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/default-table-charset.my ................................ 
1..3
ok 1 - Default character set is utf8 when no charset is specified (from server)
ok 2 - Default character set is utf8 when "CHARSET utf8" specified in the CREATE TABLE
ok 3 - Default character set is utf8 when "CHARSET utf8" and "COLLATE" specified in the CREATE TABLE
ok
...
/var/lib/jenkins/jobs/auth-db/workspace/t/database-tests/username-with-utf8-chars.my ............................. 
1..5
ok 1 - We have some UTF-8 encoded string in our hands (hex)
ok 2 - We have some UTF-8 encoded string in our hands (charset)
ok 3 - Can select back UTF-8 content from a CHARSET utf8 table
ok 4 - Given string is exactly 24 bytes long (length)
ok 5 - Given string is exactly 8 (wide) characters long (char_length)
ok
All tests successful.
Files=11, Tests=80, 0.739731 wallclock secs ( 0.05 usr  0.02 sys +  0.07 cusr  0.01 csys =  0.15 CPU)
Result: PASS
Recording test results
Finished: SUCCESS

And here's an example of "sanity check" test case, which doesn't do much:


   1 -- Check that we can insert and retrieve UTF-8 content correctly
   2 
   3 BEGIN;
   4 
   5 SET NAMES utf8;
   6 
   7 SELECT tap.plan(5);
   8 
   9 USE auth_utf8tests;
  10 
  11 SET @username = 'ä»Šæ—¥è¯é¢˜ä»Šæ—¥è¯é¢˜';
  12 SET @encoded  = 'C1BB8AE697A5E8AF9DE9A298E4BB8AE697A5E8AF9DE9A298';
  13 
  14 SELECT tap.eq(
  15     HEX(@username),
  16     @encoded,
  17     'We have some UTF-8 encoded string in our hands (hex)'
  18 );
  19 
  20 SELECT tap.eq(
  21     CHARSET(@username),
  22     'utf8',
  23     'We have some UTF-8 encoded string in our hands (charset)'
  24 );
  25 
  26 INSERT INTO t001 (f1) VALUES (@username);
  27 
  28 SELECT tap.eq(
  29     (SELECT HEX(f1) FROM t001 WHERE f1 = @username),
  30     @encoded,
  31     'Can select back UTF-8 content from a CHARSET utf8 table'
  32 );
  33 
  34 SELECT tap.eq(
  35     (SELECT LENGTH(f1) FROM t001 WHERE f1 = @username),
  36     24,
  37     'Given string is exactly 24 bytes long (length)'
  38 );
  39 
  40 SELECT tap.eq(
  41     (SELECT CHAR_LENGTH(f1) FROM t001 WHERE f1 = @username),
  42     8,
  43     'Given string is exactly 8 (wide) characters long (char_length)'
  44 );
  45 
  46 -- Finish the tests and clean up.
  47 CALL tap.finish();
  48 ROLLBACK;

This SQL test code uses mytap. You can see how the SELECT tap.* calls are just the equivalents of the TAP testing framework of Perl. SELECT tap.eq() is the equivalent of Test::More::is(), and so on.

Another, more interesting test case, is the following:


   1 --
   2 -- Verify how the utf8_unicode_ci collation works
   3 --
   4 
   5 BEGIN;
   6 
   7 SET NAMES utf8;
   8 
   9 SELECT tap.plan(12);
  10 
  11 USE auth_utf8tests;
  12 

  [...]

  40 SELECT tap.eq(
  41     (SELECT TABLE_COLLATION FROM information_schema.TABLES WHERE TABLE_SCHEMA=SCHEMA() AND TABLE_NAME='t015'),
  42     'utf8_unicode_ci',
  43     'Collation for t015 is utf8_unicode_ci'
  44 );

  [...]

  48 
  49 SELECT tap.eq(
  50     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1a' ORDER BY id),
  51     '10',
  52     'utf8_unicode_ci collation normalizes accents, diacritics and the like'
  53 );
  54 
  55 SELECT tap.eq(
  56     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Ã…' ORDER BY id),
  57     '10',
  58     'A and Ã… are collated to the same character in the utf8_unicode_ci table'
  59 );
  60 
  61 SELECT tap.eq(
  62     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'testuser1Ã¥' ORDER BY id),
  63     '10',
  64     'Ã¥ and Ã… are collated to the same character in the utf8_unicode_ci table'
  65 );
  66 
  67 SELECT tap.eq(
  68     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TestUser1A' ORDER BY id),
  69     '10',
  70     'lower/upper case chars are collated in the utf8_unicode_ci table'
  71 );
  72 
  73 SELECT tap.eq(
  74     (SELECT GROUP_CONCAT(id) FROM t015 WHERE username = 'TESTUSER1A' ORDER BY id),
  75     '10',
  76     'lower/upper case chars are collated in the utf8_unicode_ci table'
  77 );
  78 
  79 SELECT tap.eq(
  80     (SELECT COUNT(*) FROM t015),
  81     1,
  82     'We are allowed to insert only 1 record, because the others collate to the same string'
  83 );
  84 
  85 -- Finish the tests and clean up.
  86 CALL tap.finish();
  87 ROLLBACK;

An interesting thing that I didn't know how to do in the beginning is how to trap errors. I left out that part from the test code to simplify, but here it is:


  13 DELIMITER //
  14 
  15 DROP PROCEDURE IF EXISTS populate_table //
  16 
  17 CREATE PROCEDURE populate_table ()
  18 BEGIN
  19 
  20     DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' BEGIN
  21         SELECT tap.ok(
  22             1,
  23             'We should get dupkey errors when inserting data with collation utf8_unicode_ci'
  24         );
  25     END;
  26 
  27     INSERT INTO t015 (id,username,note) VALUES (10, 'testuser1a', 'plain');
  28     INSERT INTO t015 (id,username,note) VALUES (20, 'testuser1Ã¢', 'circumflex a');
  29     INSERT INTO t015 (id,username,note) VALUES (30, 'testuser1Ã ', 'a grave');
  30     INSERT INTO t015 (id,username,note) VALUES (40, 'testuser1Ã…', 'A circ');
  31     INSERT INTO t015 (id,username,note) VALUES (50, 'TestUser1A', 'mixed case');
  32     INSERT INTO t015 (id,username,note) VALUES (60, 'TESTUSER1A', 'upper case');
  33 
  34 END;
  35 
  36 //
  37 
  38 DELIMITER ;
  39 
  46 /* Should generate 5 dupkey errors (taken as successful tests) */
  47 CALL populate_table;

It's a bit convoluted. To trap errors you have use the DECLARE HANDLER statement. DECLARE CONTINUE HANDLER FOR SQLSTATE '23000' means that whenever SQLSTATE is '23000', and that corresponds to a duplicate key error, then execute this block of code. All of that must necessarily be wrapped into a stored procedure. Handlers outside of stored procedures are not allowed.

In this particular tests, the table uses the utf8_unicode_ci collation table, so we are expecting a duplicate key error on username whenever we insert the string 'testuser1Ã ' or 'TESTUSER1A', because 'testuser1a' was already inserted at the beginning. Of all the INSERT statements, only the first one is bound to succeed, so I put a SELECT tap.ok(1) for the duplicate key HANDLER and I expect 5 tests when I make the CALL populate_table;.

This of course may seem trivial. And I guess it is, but for me it's a much better way of learning than scouring through the manuals or the many blog posts out there that may or may not reflect the environment I'm working with.

Routinely running this kind of test suite makes it possible and easy to verify the database behaviour:

instantly
after upgrades (5.1 -> 5.5? -> 6?) or storage engine changes
after mysql configuration changes. For example, I discovered in this way that adding default-charset=utf8 in my MySQL config breaks everything.

I consider this my live documentation on how MySQL works. I would really appreciate if you have any feedback on this. Have fun!

Today I was trying to run some Test::Class-based unit tests for a project I'm working on. This project has lots of functional tests that go through Test::WWW::Mechanize, so they run inside Apache and mod_perl.

What I was trying to do was testing the same code outside the Apache server, just as regular command-line-runnable tests. Being a Perl project, you have all the nice infrastructure already done for you, and so you can just run make test or Build test depending on the exact tool you're using.

However, when I run this, I got a strange error:

PERL_DL_NONLAZY=1 /usr/bin/perl "-MExtUtils::Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*/*.t
t/unit-tests/basic-sanity.t .. ok
t/unit-tests/handler-xml.t ... Can't load '/usr/lib/perl5/auto/APR/Request/Apache2/Apache2.so' for module APR::Request::Apache2: /usr/lib/perl5/auto/APR/Request/Apache2/Apache2.so: undefined symbol: modperl_xs_sv2request_rec at /usr/lib/perl/5.10/DynaLoader.pm line 193.
 at /usr/lib/perl5/Apache2/Request.pm line 3
Compilation failed in require at /usr/lib/perl5/Apache2/Request.pm line 3.
BEGIN failed--compilation aborted at /usr/lib/perl5/Apache2/Request.pm line 3.

The weird error is:

Can't load '/usr/lib/perl5/auto/APR/Request/Apache2/Apache2.so' for module APR::Request::Apache2: /usr/lib/perl5/auto/APR/Request/Apache2/Apache2.so: undefined symbol: modperl_xs_sv2request_rec at /usr/lib/perl/5.10/DynaLoader.pm line 193.

and specifically, undefined symbol: modperl_xs_sv2request_rec.
Why is this happening?

A search for this error message gives back very few results, and not very useful. However, this more specific search turns up fewer results, but that led me to the solution.

The first hit, in particular, I have no idea what that page says, because it's Japanese, but the code blocks are clearly referring to the PERL_DL_NONLAZY environment variable:

PERL_DL_NONLAZY

    Set to one to have perl resolve all undefined symbols when it
    loads a dynamic library.  The default behaviour is to resolve symbols
    when they are used.

    Setting this variable is useful during testing of extensions as it
    ensures that you get an error on misspelled function names even if
    the test suite doesn't call it.

AFAIK, it all boils down to the modperl_xs_sv2request_rec function not being available if your code is not running inside Apache with mod_perl loaded. So, it makes sense to run the tests with PERL_DL_NONLAZY=0 (that is, be lazy!), like the Japanese guy suggests.

However, changing that in the Makefile incantations seems to be a Royal Pain, so I just gave prove a shot. And once more, I had the proof that prove is just the way to run your test suite, because prove seems to be smart enough to set PERL_DL_NONLAZY automatically to zero.

So now, I'm running prove through a shell script, like this:

...
BASE_DIR=<project_root>
...
export CONFIG_FILE=/etc/blah-blah
export WHATEVER_ELSE=...
...
prove -I $BASE_DIR/lib $* $BASE_DIR/t/unit-tests

And that's it, all tests successful! o/

Random hacking

Assume nothing. Code defensively. Keep it simple, stupid!

Tag Archives: test suite

Verifying MySQL behaviour with automated test suites and mytap

Undefined symbol: modperl_xs_sv2request_rec at /usr/lib/perl/5.10/DynaLoader.pm line 193