• Importing .rdb files into Redis with Protocol Streams

    As a preface, read this for the motivation and concept behind Redis protocol streams (as usual, Redis documentation is amazingly well written and a joy to read).

    redis-rdb-tools is a utility that creates a protocol stream from a .rdb file. However, the original repo has a Unicode decoding bug that prevents it from working properly. Thankfully, someone forked it and patched it, and I can confirm that the patch works for me. To install (make sure you’re on Python 2.x, not 3.x):

    $ pip install git+https://github.com/lesandr/redis-rdb-tools@1f7bcf366073adf5510ad18f1efe0bf46ae5e0c1
    

    (I’m installing the specific patch commit because it’s a fork and who knows what’ll happen to it in the future.)

    Then, to import the file, just a simple one-liner:

    $ rdb --command protocol /path/to/dump.rdb | redis-cli --pipe
    

    If successful, you’ll see something like:

    All data transferred. Waiting for the last reply...
    Last reply received from server.
    errors: 0, replies: 4341259
    

    as well as this in your Redis server logs:

    95086:M 01 Mar 21:53:42.071 * 10000 changes in 60 seconds. Saving...
    95086:M 01 Mar 21:53:42.072 * Background saving started by pid 98223
    98223:C 01 Mar 21:53:44.277 * DB saved on disk
    

    Notes:

    1. Make sure you already have a Redis server running. The --pipe flag is only available for Redis 2.6 onwards.
    2. If you want to inspect the protocol stream visually before importing, you can leave out the piping to redis-cli and it will pipe the stream to STDOUT (or you could pipe it to a text file).

    The alternative way to import is to copy the .rdb file to the location specified in your redis.conf (or modify redis.conf to point to your .rdb file). However, I think using protocol streams is a cooler solution. 😎

  • Partial Functions in Haskell Prelude

    Partial functions are functions that are not defined for all possible arguments of its specified type.

    The most common example of a partial function is head. It has the innocent-looking type signature of

    head :: [a] -> a
    

    head fails when an empty list is given:

    head [1,2,3] -- 1
    head []      -- Exception!
    

    Because of Haskell’s type erasure, head doesn’t even know what the supplied type is, so there’s no way for it to return a value of type a when there are no values in the list (besides the obvious fact that there are no values in the list).

    In Brent Yorgey’s CIS 194 Haskell course:

    head is a mistake! It should not be in the Prelude. Other partial Prelude functions you should almost never use include tail, init, last, and (!!).

    Haskell’s official wiki provides a complete list of partial functions in Prelude. Note that some of these functions are considered partial functions because they do not terminate if given an infinite list.

    Given all that, I think using something like head is okay if composed with other functions that guarantee non-emptiness of the list, or if the function type signature is NonEmpty, a type class which guarantees a list with at least one element.

    For example, consider lastItem, a function which returns the last item in a list:

    lastItem :: [a] -> a
    lastItem = head . reverse
    
    lastItem [1,2] -- 2
    lastItem []    -- Exception!
    

    In addition, consider toDigits, a function which, when given an Integral value, returns a list of its constituent digits:

    toDigits :: Integral a => a -> [a]
    toDigits x
      | x < 0 = toDigits $ x * (-1)
      | x < 10 = [x]
      | otherwise = toDigits (div x 10) ++ [mod x 10]
    
    toDigits 123   -- [1,2,3]
    toDigits 0     -- [0]
    toDigits (-43) -- [4,3]
    

    A function like toDigits guarantees a non-empty list, and when combined with lastItem, we can get the last digit of an Integral value:

    lastDigit :: Integral a => a -> Int
    lastDigit = lastItem . digits
    
    lastDigit 123   -- 3
    lastDigit 01    -- 1
    lastDigit (-1)  -- 1
    

    Or consider (!!), which accesses an element in a list by index. If the index provided is too large, an exception is thrown:

    (!!) [1,2,3] 0 -- 1
    (!!) [1,2,3] 4 -- Exception!
    

    An idea is to wrap (!!) with the Maybe data type in a function like this:

    findByIndex :: [a] -> Int -> Maybe a
    findByIndex xs index
      | index >= length xs = Nothing
      | otherwise          = Just ((!!) xs index)
    
    findByIndex [1,2,3] 0 -- Just 1
    findByIndex [1,2,3] 4 -- Nothing
    

    I’m surprised that such commonly used functions in the Prelude are so dangerous, so it’s good to pay attention when using them. Partial functions like head are easy to replace with pattern matching, but others may be harder to supplant.

  • Agnostic HTTP Endpoint Testing with Jasmine and Chai

    In this post, I’m going share my strategy for endpoint testing. It has a few cornerstones:

    1. It should test against a running server by sending HTTP requests to it, instead of hooking onto the server instance directly, like supertest does. This way, the strategy becomes agnostic and portable - it can be used to test any endpoint server, even servers written in other languages, as long as they communicate through HTTP.

    2. Each suite should be written as a narrative. To this end, BDD-style testing is very suitable. As an example, consider the narrative desribing the authentication flow for an app:

    I register as a user, providing a suitable email and password. The server should return a 200 response and an authentication token. Then, I login using the same email and password as earlier before. The server should return a 200 response and a authentication token. I login using a different email and password. This time, the server should return a 401 response. If I register with the same email as before, the server should return a 422 response and an error message in the response body indicating that the email has been taken.

    A few points to take note of:

    1. Even though the strategy is meant to be as agnostic as possible, you need to find a way to run the server with a empty test database, and then have some (hopefully scripted) way to drop it once the tests are complete. This part will depend on what database adapter/ORM you are using. I will share my solution for an Express server backed by RethinkDB later.

    2. Remember that the database is a giant, singular hunk of state. If you’re going to be adopting this style of testing, there is no way around this. You’re not just going to be running GET requests - you’re going to be running POST and PUT and DELETE requests as well. This means that you need to be very careful about tests running concurrently or in parallel. It’s great to have performant tests, but don’t trade performance for tests that are easy to reason about and which reveal clearly which parts of your app are breaking.

    I tried Ava first, and was actually halfway through writing the test suite for a project with it. I really liked it, but Ava was built for running tests concurrently and in parallel. There came a point where the test suite would fail unpredictably depending on the order in which the tests were run. Although it’s possible to run Ava tests in serial, I felt like I was fighting against the framework.

    I also considered Tape, but I consider Ava to be superior to Tape for stateless unit testing. If you’re using Tape, do consider checking out Ava for future projects. Their syntaxes are very similar, except Ava is noticeably faster.

    In the end, I settled with Jasmine, although I imagine Mocha would be equally suitable. There are three technical issues I would like to talk about: how I write the Jasmine specs in ES2015 JavaScript, how and why I used Chai, and how to setup and teardown the test database.

    ES2015

    There is only 2 words to describe why this is so important here:

    async/await

    (I know - technically, it’s not part of the ES2015 spec, but let’s dispense with the pedantism here.)

    Thankfully, jasmine-es6 exists, and installing it is exactly the same as plain Jasmine. It ships with async support out of the box.

    Chai

    Jasmine ships with its own BDD-style expect assertions, but I chose to overwrite it in favour of Chai’s assertions instead, which features a much richer plugin ecosystem. In particular, the existence of chai-http prompted the switch. chai-http provides assertions for HTTP testing, as well as a thin superagent wrapper with which to make requests with. Perfecto!

    It’s not really difficult to roll your own assertions and request wrapper, as I did with Ava, but why bother if you can piggyback on the hard work of others?

    Database Setup/Teardown

    Setup is quite straighforward - depending on what server framework you’re using, configure it (ideally using environment variables passed in through the command line) to connect to a test database using a different set of credentials from your usual development credentials.

    I also reset the database in between each narrative (or what Jasmine calls specs). I find that this is a good balance between not resetting at all, which would make keeping track of database state untenable, or resetting after each expectation, which makes setup and teardown much more tedious and slows testing down (e.g. registering a user before each expectation).

    With that in mind, a good rule of thumb emerges. If a narrative becomes so long as to make the database state confusing to reason about, it’s probably time to split it up.

    As for database teardown, I rolled my own solution. For this particular project, I’m using thinky, a ORM for RethinkDB. thinky exposes the RethinkDB driver r, which allows me to write this:

    // spec/utils/teardown.js
    var config = require('../../config') // this file contains database connection credentials
    var thinky = require('thinky')(config.rethinkdb.test)
    
    thinky.r.dbDrop(config.rethinkdb.test.db).run(function() {
      console.log('Tests complete. Test database dropped.')
      process.exit()
    })
    

    which can then be run after the tests are complete:

    "scripts": {
      "start-test-server": "env NODE_ENV=test nodemon index.js",
      "test": "jasmine ; node ./spec/utils/teardown.js"
    }
    

    Generally speaking, as long as you have access to the exposed database driver, you can write a variant of the above.

    Code Examples

    Below, I show an abridged snippet from the test suite I wrote using this strategy:

    // spec/AuthSpec.js
    import chai from 'chai'
    import chaiHttp from 'chai-http'
    import { resetTables } from './helpers/databaseHelper'
    chai.use(chaiHttp)
    
    // overwrite Jasmine's expect global with Chai's
    const expect = chai.expect
    const req    = chai.request('http://localhost:9005/')
    
    // helper function to avoid ugly try/catch clauses in async calls
    async function tryCatch(promise) {
      try { return await promise }
      catch(e) { return e }
    }
    
    describe("Authentication", () => {
    
      // we reset the tables before and after each spec
      beforeAll(async () => await resetTables())
      afterAll(async () => await resetTables())
    
      it("should fail registration without any parameters", async () => {
        const res = await tryCatch(req
          .post('auth/register')
        )
        expect(res).to.have.status(422)
      })
    
      it("should pass registration with appropriate email and password", async () => {
        const res = await req
          .post('auth/register')
          .send({ email: 'a@a.com', password: 12341234 })
        expect(res).to.have.status(200)
        expect(res.body).to.have.all.keys(['token'])
      })
    
      it("should fail registration with the same email", async () => {
        const res = await tryCatch(req
          .post('auth/register')
          .send({ email: 'a@a.com', password: 12341234 })
        )
        expect(res).to.have.status(422)
      })
    
      it("should should pass login with correct email and password", async () => {
        const res = await req
          .post('auth/login')
          .send({ email: 'a@a.com', password: 12341234 })
        expect(res).to.have.status(200)
        expect(res.body).to.have.all.keys(['token'])
      })
    
      it("should should fail login with incorrect email and password", async () => {
        const res = await tryCatch(req
          .post('auth/login')
          .send({ email: 'b@a.com', password: 12341234 })
        )
        expect(res).to.have.status(401)
      })
    
    })
    

    The code for resetting the database tables between each spec is as follows:

    // spec/helpers/databaseHelper.js
    var config = require('../../config')
    var thinky = require('thinky')(config.rethinkdb.test)
    var r      = thinky.r
    var testDb = config.rethinkdb.test.db
    
    export async function resetTables() {
    
      // first, get the list of all the tables in the database
      const tableList = await r.db(testDb).tableList()
    
      // then we create an Array of all the promises returned by r.table(table).delete() and await on them to complete before the function returns
      await Promise.all(tableList.map(table => r.table(table).delete()))
    
    }
    

    I spent a week, on and off, working on refining this strategy, and I hope it will prove to be portable across future projects for me.

  • Submit Behaviour in the button HTML Element

    Not sure how I only found out about this only today, but a <button> HTML element without a type attribute defaults to the submit value. Test it yourself with the test rig:

    <html>
    <body>
      <form id="test">
        <input type="text" />
        <button type="submit">Button with Submit Type</button>
        <button>Button With No Type</button>
        <button type="button">Button with Button Type</button>
        <button type="reset">Button With Reset Type</button>
      </form>
    
      <script>
        var form = document.getElementById('test')
        form.addEventListener('submit', function(e) {
          e.preventDefault()
          console.log('form submitted!')
        }, false)
      </script>
    </body>
    

    So if you still want to have buttons in the form that do not trigger the submit event, you have to explicitly give it a type of button.

    This is confirmed by the W3C spec:

    The missing value default is the Submit Button state.
    - https://www.w3.org/TR/2011/WD-html5-20110525/the-button-element.html#attr-button-type

  • Preprocessing in Searchkick

    According to Searchkick’s documentation, one can control what data is indexed with the search_data method. It’s not apparent, but what actually happens is that if a search_data instance method is indeed present, the return value of it will be what goes into Elasticsearch when indexing.

    With this in mind, we can use it to preprocess data before it goes into Elasticsearch. Searchkick’s documentation shows a trivial example:

    class Product < ActiveRecord::Base
      def search_data
        as_json only: [:name, :active]
        # or equivalently
        {
          name: name,
          active: active
        }
      end
    end
    

    But we can go further. For example, I have a serialized field in my example Post model called metadata which might take the following form (don’t go hating on the data modelling - sometimes one has no choice when taking on legacy code):

    > Post.first.metadata
    > [{type: "title", content: "some title"}, {type: "category", content: "a category"}]
    

    We can’t be chucking the entire serialized metadata in - there’d be stuff like type: \"title\" which will really mess up the text searching.

    Instead, we can do something like:

    def search_data
      {
        content: content,
        metadata: {body: body, metadata: metadata.map{|x| x["content"]}.join(" ") }
      }
    end
    

    which will concatenate all of the actual content before assigning it to the metadata field.

    This is a trivial example. Being an instance method, you are allowed the same access to whatever typical Active Record instance methods have access to, which gives you a lot of latitude.