A river flowing from PostgreSQL to Elasticsearch

So, we wanted to try using the querying power of Elasticsearch on top of our PostgreSQL database. By using this JDBC river plugin, we import our tables by setting up the river, which looks a little bit like this:

curl -XPUT "localhost:9200/_river/my_table/_meta" -d ' {  
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:postgresql://localhost:5432/my_database",
        "user" : "a_great_username",
        "password" : "a_super_secret_password",
        "sql" : "select * from my_table;"
        "index" : "jdbc_index",
        "type" : "jdbc_type",
        "strategy" : "oneshot"
    }
}'

Note that your jdbc index and type are for you to setup, it will make things accessible on http://localhost:9200/jdbc_index/jdbc_type/.

The river will try its best to give your table fields the correct type in Elasticsearch, but if you want to setup custom type mappings, you can add another key in "jdbc" called "type_mapping". But be careful, the value of this should be a JSON string, so when you try to add this manually, you're going to end up with something like this:

"type_mapping" : "{\"jdbc_type\":{\"properties\": {\"field\" : { \"type\" : \"string\" ... }}}}"

To simplify this a bit, we can manually setup the type mapping after setting up the river.

curl -XPUT "localhost:9200/jdbc_index/jdbc_type/_mapping" -d ' {  
    "jdbc_type" : {
        "properties" : {
            "field" : {
                "type" : "string",
                ...
            }
        }
    }
}'

You've probably noticed on the first JSON the property "strategy" : "oneshot". This will do the query now, but only once. After this we can call the same _river again by using the "simple" strategy, and setting up a cron that will poll our database every hour. You could do this without ever doing the oneshot strategy, but you would have to wait until the cron "schedule" fire up, and you wouldn't want to wait an hour!

curl -XPUT "localhost:9200/_river/my_table/_meta" -d ' {  
    "type" : "jdbc",
    "jdbc" : {
        "url" : "jdbc:postgresql://localhost:5432/my_database",
        "user" : "a_great_username",
        "password" : "a_super_secret_password",
        "sql" : "select * from my_table;"
        "index" : "jdbc_index",
        "type" : "jdbc_type",
        "autocommit" : true,
        "schedule" : "0 0 * * * ?",
        "strategy" : "simple"
    }
}'

Now let's put some Rivers on some tables

rivers-on-tables

You can continue adding one river for each tables, but don't refrain from doing some table JOIN if you think you'll need some data that you don't need to query as a specific entity. Otherwise, if you still want to keep the table relationships, one cool thing you can do is rename your primary key to _id and foreign key to _parent:

"sql" : "select id as _id, relation_id as _parent ..."

The river will setup Parent/Childs Types for Elasticsearch. Be aware that you have to setup and fill your parent river before the child, or the universe as we know it may cease to exist.

Now with all this setup, you can start doing some awesome queries!