CouchDB Query Protocol
16/Nov 2016
Recently at $DAYJOB
I had the opportunity to look deeper into how CouchDB indexing works and compared the performance of different indexers. Hopefully this series of articles will shed some light on how to get the most out of CouchDB’s view indexing.
In this article we’ll be going over the concept of CouchDB views and find out just how CouchDB indexes documents with the help of external query servers.
The version of CouchDB we will be using is 1.6.
CouchDB views
CouchDB is a document-oriented database that’s super easy to get started. You serialize your data into JSON, and throw it on the wire to CouchDB via an HTTP interface. To get the data out (querying) though, you will need to write “design docs” that contain map/reduce functions. Internally, CouchDB uses the map/reduce functions to build up a B+Tree index, so querying against that view is a simple tree lookup.
Out-of-the-box, CouchDB supports views written in JavaScript, however, support for other languages are available thanks to the CouchDB query server protocol. CouchDB process itself does not do any querying. It simply loops through all the changes that have not been indexed since the last time, and send the changes to the external query server.
MapReduce view updater
View indexing is triggered by the map-reduce view updater.
The following snippet is where it happens (couch_mrview/src/couch_mrview_updater.erl
):
map_docs(Parent, State0) ->
case couch_work_queue:dequeue(State0#mrst.doc_queue) of
...
{ok, Dequeued} ->
...
QServer = State1#mrst.qserver,
DocFun = fun
...
({Id, Seq, Doc}, {SeqAcc, Results}) ->
{ok, Res} = couch_query_servers:map_doc_raw(QServer, Doc),
{erlang:max(Seq, SeqAcc), [{Id, Res} | Results]}
end,
...
couch_work_queue:queue(State1#mrst.write_queue, Results),
map_docs(Parent, State1)
end.
and couch_query_servers:map_doc_raw
is encodes the document into JSON and sends it over to the external query server process:
map_doc_raw(Proc, Doc) ->
Json = couch_doc:to_json_obj(Doc, []),
{ok, proc_prompt_raw(Proc, [<<"map_doc">>, Json])}.
...
proc_prompt_raw(#proc{prompt_fun = {Mod, Func}} = Proc, Args) ->
apply(Mod, Func, [Proc#proc.pid, Args]).
External Query Server
The document is sent along with the action to be performed (map_doc) to beh external query server process’ stdin. Let’s have a look at how an external query server is implemented. We’ll take couchpy as an example but it applies to any external query servers.
When a map_docs
command is received via stdin of the couchpy process, the following code is being executed:
def map_doc(doc):
results = []
for function in functions:
try:
results.append([[key, value] for key, value in function(doc)])
except Exception as e:
log.error('runtime error in map function: %s', e,
exc_info=True)
results.append([])
_log(traceback.format_exc())
return results
Nothing too special here. The doc is passed to all functions
inside the closure and the output of the function is gathered into a list to be returned. functions
here is a variable available to the map_doc
closure and is initialized by the add_lib
call, which is triggered when a view is being indexed.
def add_fun(string):
string = BOM_UTF8 + string.encode('utf-8')
globals_ = {}
try:
util.pyexec(string, {'log': _log}, globals_)
except Exception as e:
return {'error': {
'id': 'map_compilation_error',
'reason': e.args[0]
}}
err = {'error': {
'id': 'map_compilation_error',
'reason': 'string must eval to a function '
'(ex: "def(doc): return 1")'
}}
if len(globals_) != 1:
return err
function = list(globals_.values())[0]
if type(function) is not FunctionType:
return err
functions.append(function)
return True
util.pyexec
is a simple wrapper around the exec
keyword in Python that turns code string into a function object:
def pyexec(code, gns, lns):
exec(code, gns, lns)
What this tells us is that CouchDB’s view functions inside the same design document get indexed together. If you have a design document that contains multiple view functions that operate on different types of documents, this could negatively impact your performance. It’s usually a good idea to have one view function per design document, unless the view functions have very strong cohesion.
CouchDB Query Protocol in Practice
Now that we looked at how CouchDB invokes external query servers and how an external query server respond to certain protocol commands, let’s put this into practice. We’d like to examine what’s going on when a view is being indexed. We’ll use couchpy again as an example.
First, let’s create a couple of docs, each with a key “type” to indicate the type of documents:
$ curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"type": "foo"}'
{"ok":true,"id":"db13c4d7dbf11110f1eadce071001e87","rev":"1-52308d7a89c9fb18c4c4e47d759834ed"}
$ curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"type": "foo"}'
{"ok":true,"id":"db13c4d7dbf11110f1eadce071000f53","rev":"1-52308d7a89c9fb18c4c4e47d759834ed"}
$ curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"type": "bar"}'
{"ok":true,"id":"db13c4d7dbf11110f1eadce07100189b","rev":"1-17ac4eb1776289a8d73ea6990401c56a"}
Then let’s create a design doc that has a view function to map docs according to their types:
def map(doc):
yield doc.get('type', None), None
Save it as _design/byType
and view name: byType
.
To turn on couchpy logging, we can update CouchDB configuration via Futon. Go to http://localhost:5984/_utils/config.html and find key python
the section query_servers
, change the command to /usr/local/bin/couchpy --debug --log-file=/tmp/couchpy.log
. Erlang config change does not require server reload, so this should take effect immediately.
Next, let’s tail the log file:
log -F /tmp/couchpy.log
Now, trigger the view index by querying the view you have just created:
curl http://localhost:5984/test/_design/byType/_view/byType
You should be seeing in /tmp/couchpy.log
:
[2016-12-05 23:52:31,380] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-05 23:52:31,381] [DEBUG] Returning True
[2016-12-05 23:52:31,381] [DEBUG] Processing [u'add_fun', u"def map(doc):\n yield doc.get('type', None), None"]
[2016-12-05 23:52:31,381] [DEBUG] Returning True
[2016-12-05 23:52:31,381] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-17ac4eb1776289a8d73ea6990401c56a', u'_id': u'db13c4d7dbf11110f1eadce071002ee1', u'type': u'bar'}]
[2016-12-05 23:52:31,381] [DEBUG] Returning [[[u'bar', None]]]
[2016-12-05 23:52:31,382] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-52308d7a89c9fb18c4c4e47d759834ed', u'_id': u'db13c4d7dbf11110f1eadce071002546', u'type': u'foo'}]
[2016-12-05 23:52:31,382] [DEBUG] Returning [[[u'foo', None]]]
[2016-12-05 23:52:31,382] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-52308d7a89c9fb18c4c4e47d759834ed', u'_id': u'db13c4d7dbf11110f1eadce071001e87', u'type': u'foo'}]
[2016-12-05 23:52:31,382] [DEBUG] Returning [[[u'foo', None]]]
There we have it! CouchDB querying in action!
reset
[u'reset', {u'timeout': 5000, u'reduce_limit': True}]
The first step is a reset
operation when the internal state of the query server is cleared. Couchpy simply clears the functions
list:
def reset(config=None):
del functions[:]
return True
Couchpy seems to be ignoring the timeout
and reduce_limit
parameters passed in. Hrm…
add_fun
[u'add_fun', u"def map(doc):\n yield doc.get('type', None), None"]
This is the next stage of indexing. CouchDB server sends the functions in the design doc to the query server. As we have seen earlier, couchpy calls Python’s exec
on the code, and save it as function objects internally.
map_doc
[u'map_doc', {u'_rev': u'1-17ac4eb1776289a8d73ea6990401c56a', u'_id': u'db13c4d7dbf11110f1eadce071002ee1', u'type': u'bar'}]
After the internal state of the query server is setup, CouchDB will pass all eligible documents to the query server along with the operation map_doc
. This will invoke all functions added via add_fun
call in the query server’s process on the document being passed.
The result of the function is collected into a list and returned to CouchDB via the processes’ stdout.
Index New Changes
Adding a New Document
Now what happens if we add another document to the database?
Let’s try it.
$ curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"type": "bar"}'
{"ok":true,"id":"db13c4d7dbf11110f1eadce071003893","rev":"1-17ac4eb1776289a8d73ea6990401c56a"}
There we added a second bar
document. Observe /tmp/couchpy.log
. Nothing! That’s right. Views in CouchDB 1.x does not get indexed until the view is being queried. If you’re using CouchDB 1.x, it pays to have a cron job that periodically query all your views just to keep the views “warm”, so your less-frequently used views are kept up to update and don’t get hit with long indexing time when they do get queried.
Let’s trigger query the view:
$ curl http://localhost:5984/test/_design/byType/_view/byType
{"total_rows":4,"offset":0,"rows":[
{"id":"db13c4d7dbf11110f1eadce071002ee1","key":"bar","value":null},
{"id":"db13c4d7dbf11110f1eadce071003893","key":"bar","value":null},
{"id":"db13c4d7dbf11110f1eadce071001e87","key":"foo","value":null},
{"id":"db13c4d7dbf11110f1eadce071002546","key":"foo","value":null}
]}
and now we get something in couchpy.log
file:
[2016-12-06 00:49:22,327] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 00:49:22,327] [DEBUG] Returning True
[2016-12-06 00:49:22,327] [DEBUG] Processing [u'add_fun', u"def map(doc):\n yield doc.get('type', None), None "]
[2016-12-06 00:49:22,328] [DEBUG] Returning True
[2016-12-06 00:49:22,328] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-17ac4eb1776289a8d73ea6990401c56a', u'_id': u'db13c4d7dbf11110f1eadce071003893', u'type': u'bar'}]
[2016-12-06 00:49:22,328] [DEBUG] Returning [[[u'bar', None]]]
As you can see, reset
and add_fun
are called again, because couchpy processes do not preserve their internal state once a view is finish indexing, and the map_doc
operation is only called on the new document.
Updating a Document
Let’s now change one of the docs:
$ curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"_id": "db13c4d7dbf11110f1eadce071001e87", "_rev": "1-52308d7a89c9fb18c4c4e47d759834ed", "type": "quux"}'
{"ok":true,"id":"db13c4d7dbf11110f1eadce071001e87","rev":"2-2f95b6ed98722bf4a7d8750faa316313"}
and trigger view indexing. See what we’ve got in couchpy logs:
[2016-12-06 00:55:33,411] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 00:55:33,412] [DEBUG] Returning True
[2016-12-06 00:55:33,412] [DEBUG] Processing [u'add_fun', u"def map(doc):\n yield doc.get('type', None), None "]
[2016-12-06 00:55:33,412] [DEBUG] Returning True
[2016-12-06 00:55:33,412] [DEBUG] Processing [u'map_doc', {u'_rev': u'2-2f95b6ed98722bf4a7d8750faa316313', u'_id': u'db13c4d7dbf11110f1eadce071001e87', u'type': u'quux'}]
[2016-12-06 00:55:33,412] [DEBUG] Returning [[[u'quux', None]]]
No surprises here. Couchpy only gets sent the new revision the doc was changed to. The query server does not need to know anything else.
Deleting a Document
What about deletes? Now remember in CouchDB, deleting a document does not zero out the content, but rather, it creates a new revision of the document, marking it as “deleted”.
$ curl -XDELETE http://localhost:5984/test/db13c4d7dbf11110f1eadce071001e87?rev=2-2f95b6ed98722bf4a7d8750faa316313
{"ok":true,"id":"db13c4d7dbf11110f1eadce071001e87","rev":"3-0dbaf84cadb9754e6619a10159852d4e"}
Documents marked as deleted won’t be returned by any queries. Now let’s have a look at how the query server respond to the request. Trigger the view query and observe /tmp/couchpy.log
file.
[2016-12-06 01:01:59,143] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:01:59,144] [DEBUG] Returning True
[2016-12-06 01:01:59,144] [DEBUG] Processing [u'add_fun', u"def map(doc):\n yield doc.get('type', None), None "]
[2016-12-06 01:01:59,144] [DEBUG] Returning True
Nothing! That’s right, even though the indexer was triggered, no other commands were sent to the query server. Deleted documents only get excluded from the final result, not from the view index file itself. To get rid of the deleted documents from the view, you will have to trigger a view compaction.
Reduce
We have been ignoring one important functionality of CouchDB view so far - reduce functions. Reduce functions let you aggregate the mapped data. For example, if we want to sum up the number of docs per type, we will be providing the view a reduce function.
def reduce(keys, values):
return len(keys)
Here we’re counting the number of keys. Save the reduce function into that view, trigger it with
$ curl http://localhost:5984/test/_design/byType/_view/byType?group=true
{"rows":[
{"key":"bar","value":2},
{"key":"foo","value":1}
]}
and observe the output in couchpy.log
:
[2016-12-06 01:18:55,286] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:18:55,286] [DEBUG] Returning True
[2016-12-06 01:18:55,286] [DEBUG] Processing [u'add_fun', u"def map(doc):\n yield doc.get('type', None), None "]
[2016-12-06 01:18:55,286] [DEBUG] Returning True
[2016-12-06 01:18:55,286] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-17ac4eb1776289a8d73ea6990401c56a', u'_id': u'db13c4d7dbf11110f1eadce071003893', u'type': u'bar'}]
[2016-12-06 01:18:55,286] [DEBUG] Returning [[[u'bar', None]]]
[2016-12-06 01:18:55,287] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-17ac4eb1776289a8d73ea6990401c56a', u'_id': u'db13c4d7dbf11110f1eadce071002ee1', u'type': u'bar'}]
[2016-12-06 01:18:55,287] [DEBUG] Returning [[[u'bar', None]]]
[2016-12-06 01:18:55,287] [DEBUG] Processing [u'map_doc', {u'_rev': u'1-52308d7a89c9fb18c4c4e47d759834ed', u'_id': u'db13c4d7dbf11110f1eadce071002546', u'type': u'foo'}]
[2016-12-06 01:18:55,287] [DEBUG] Returning [[[u'foo', None]]]
[2016-12-06 01:31:51,135] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:31:51,135] [DEBUG] Returning True
[2016-12-06 01:31:51,135] [DEBUG] Processing [u'reduce', [u'def reduce(keys, values):\n return len(keys)'], [[[u'bar', u'db13c4d7dbf11110f1eadce071003893'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071002ee1'], None]]]
[2016-12-06 01:31:51,135] [DEBUG] Returning [True, [2]]
[2016-12-06 01:31:51,136] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:31:51,137] [DEBUG] Returning True
[2016-12-06 01:31:51,137] [DEBUG] Processing [u'reduce', [u'def reduce(keys, values):\n return len(keys)'], [[[u'foo', u'db13c4d7dbf11110f1eadce071002546'], None]]]
[2016-12-06 01:31:51,137] [DEBUG] Returning [True, [1]]
First you will notice is that since the design document is changed, all documents (changes) are reindexed. If you have a huge database, you may want to think twice before updating a design doc. Better yet, don’t ever update the design document, but rather, migrate to a new design document (and finish indexing) and cut over to the new design document.
Then there are the reduce
commands from CouchDB server to the query server. The reducer is sent with the result of the map function in the forms of keys
, values
. Since we don’t emit any values, values
will always be None
.
Now our test database only contains 3 documents. Let’s make it more interesting and add a lot more documents in.
Let’s add 100 bar
documents and 100 foo
documents:
for x in {1..100}; do curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"type": "bar"}'; done
for x in {1..100}; do curl -XPOST -H"Content-Type:application/json" http://localhost:5984/test -d'{"type": "foo"}'; done
Let’s call the reducer:
$ curl http://localhost:5984/test/_design/byType/_view/byType?group=true
{"error":"os_process_error","reason":"{exit_status,1}"}
Hrmm… we got an error! Couchpy process crashed. Here’s another complaint I have about CouchDB. View errors are very opaque. If you look at CouchDB logs, you will see a huge Erlang stack trace. Not much help there. However, the couchpy log does give us something useful. Maybe the query protocol can be amended to allow query servers to return more descriptive error message and having CouchDB relaying these error messages back to the user?
Anyhow, the error message we have in the couchpy log are:
[2016-12-06 01:34:59,213] [DEBUG] Processing [u'rereduce', [u'def reduce(keys, values):\n return len(keys)'], [14, 18, 18, 18, 18, 18]]
[2016-12-06 01:34:59,214] [ERROR] Error: object of type 'NoneType' has no len()
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/couchdb/view.py", line 146, in run
retval = handlers[cmd[0]](*cmd[1:])
File "/usr/local/lib/python2.7/dist-packages/couchdb/view.py", line 129, in rereduce
return reduce(*cmd, **{'rereduce': True})
File "/usr/local/lib/python2.7/dist-packages/couchdb/view.py", line 124, in reduce
results = function(keys, vals)
File "<string>", line 2, in reduce
TypeError: object of type 'NoneType' has no len()
Interesting! The operation that failed was actually rereduce
. Rereduce
is a CouchDB’s optimization applied to divide-and-conquer large data sets. During rereduce
, no keys
are sent and only values
are sent with rereduce
flag set to True
.
Let’s modify our reduce
function to take this into account.
def reduce(keys, values, rereduce):
if rereduce:
return sum(values)
return len(keys)
and trigger the view:
$ curl http://localhost:5984/test/_design/byType/_view/byType?group=true
{"rows":[
{"key":"bar","value":102},
{"key":"foo","value":101}
]}
That’s it! And if we take a look at how CouchDB distributes documents to the reducer:
[2016-12-06 01:47:16,882] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:47:16,882] [DEBUG] Returning True
[2016-12-06 01:47:16,882] [DEBUG] Processing [u'reduce', [u'def reduce(keys, values, rereduce):\n if rereduce:\n return sum(values)\n return len(keys)'], [[[u'bar', u'db13c4d7dbf11110f1eadce07103bf77'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07103b6b2'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07103a903'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07103a06c'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071039212'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071038246'], None], [[u'bar', u'db13c4d7dbf11110f1eadce0710372d6'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071036338'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071035d9e'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071035735'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071035594'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071034722'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071033ef4'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071033105'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071032d0f'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071031f75'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071031b47'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07100b399'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07100a846'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07100987c'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071009107'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071008240'], None], [[u'bar', u'db13c4d7dbf11110f1eadce0710080e9'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071007510'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071007295'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071007169'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071006178'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07100600b'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07100591f'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071004c04'], None], [[u'bar', u'db13c4d7dbf11110f1eadce07100425a'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071003aa3'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071003893'], None], [[u'bar', u'db13c4d7dbf11110f1eadce071002ee1'], None]]]
[2016-12-06 01:47:16,882] [DEBUG] Returning [True, [34]]
[2016-12-06 01:47:16,883] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:47:16,883] [DEBUG] Returning True
[2016-12-06 01:47:16,883] [DEBUG] Processing [u'rereduce', [u'def reduce(keys, values, rereduce):\n if rereduce:\n return sum(values)\n return len(keys)'], [17, 17, 17, 17, 34]]
[2016-12-06 01:47:16,883] [DEBUG] Returning [True, [102]]
[2016-12-06 01:47:16,884] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:47:16,884] [DEBUG] Returning True
[2016-12-06 01:47:16,884] [DEBUG] Processing [u'reduce', [u'def reduce(keys, values, rereduce):\n if rereduce:\n return sum(values)\n return len(keys)'], [[[u'foo', u'db13c4d7dbf11110f1eadce07106d438'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106ccf4'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106bec2'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106bcce'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106b7ec'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106b0d0'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106a6d4'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106a50b'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106a12e'], None], [[u'foo', u'db13c4d7dbf11110f1eadce0710692e5'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071068702'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071067f9c'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106780f'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071066912'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07106665a'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071065d4b'], None], [[u'foo', u'db13c4d7dbf11110f1eadce0710437d2'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071042cff'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071042b61'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071042375'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071041800'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07104172e'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071040aa1'], None], [[u'foo', u'db13c4d7dbf11110f1eadce0710404f1'], None], [[u'foo', u'db13c4d7dbf11110f1eadce0710402d4'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071040234'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07103f423'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07103e63a'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07103d6c0'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07103c7df'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07103c6c0'], None], [[u'foo', u'db13c4d7dbf11110f1eadce07103c183'], None], [[u'foo', u'db13c4d7dbf11110f1eadce071002546'], None]]]
[2016-12-06 01:47:16,884] [DEBUG] Returning [True, [33]]
[2016-12-06 01:47:16,885] [DEBUG] Processing [u'reset', {u'timeout': 5000, u'reduce_limit': True}]
[2016-12-06 01:47:16,885] [DEBUG] Returning True
[2016-12-06 01:47:16,885] [DEBUG] Processing [u'rereduce', [u'def reduce(keys, values, rereduce):\n if rereduce:\n return sum(values)\n return len(keys)'], [17, 17, 17, 17, 33]]
[2016-12-06 01:47:16,885] [DEBUG] Returning [True, [101]]
Because we turned on the group
flag, CouchDB will send the keys of the same group down to the reducer. When the reducer finishes and returns the results, CouchDB further sends the collected results to query server with the rereduce
command to further aggregate the results.
For more information regarding reduce
vs rereduce
, see the section from CouchDb: The Definitive Guide.
Conclusion
The CouchDB query protocol is hauntingly simple. It uses JSON for inter-process communication and stdin/stdout as communication channel - very unixy and very easy to understand and to extend. For example, it wouldn’t take too long for a seasoned programmer to write a Ruby query server to support Ruby views. However, it may not be the most efficient way to query the document store as we will find out in the next article.