CouchDB

Apache CouchDB . It is a document- oriented database that can be queried using JavaScript in a MapReduce form. CouchDB also offers incremental replication with bi-directional conflict detection and resolution.

Summary

[ hide ]

  • 1 Features
  • 2 Documentary database without schema
  • 3 MapReduce-style queryable
    • 1 Map and Reduce
  • 4 Accessible by REST
  • 5 Integrated replication
  • 6 Scalable Transactions for Cloud Web Applications
  • 7 APIs for access to couchdb
  • 8 Downloading SoftWare
  • 9 Sources

characteristics

CouchDB offers a JSON API that can be accessed from any environment that allows HTTP requests . There are thousands of third-party client libraries that make it even easier based on your programming language of choice. CouchDB built into the web administration console is directed directly to the database using HTTP requests issued from the browser .

CouchDB is written in Erlang , a robust functional concurrent programming language for building distributed systems. Erlang allows for a flexible design that is easily scalable and extensible.

Documentary database without schema

CouchDB saves the data in the form of documents. All we store is a document without schema, which allows us to save documents with different fields together within the same Database . the documents are stored in JSON , lightweight format convenient, simple to use from any language. Example of typical CouchDB document:

{

“_id”: “234a41170621c326ec63382f846d5764”,

“_rev”: “1-480277b989ff06c4fa87dfd0366677b6”,

“type”: “article”,

“title”: “This is a test”,

“body”: “I am the content of a test article”,

“tags”: [“cinema”, “comedy”]

}

The _id is for CouchDB to distinguish it from other documents and is valid to be able to retrieve it later. It is a string that can contain what we want although if we do not put anything CouchDB will generate a UUID. Using the UUID allows you to have a unique id UNIVERSALLY, which is useful in the matter of replication. The field _ revit is special and is used by CouchDB to control the version of the document. Every time a change is saved to a document, the revision number changes (it increases by 1 before – and the rest of the number changes). This is useful because every time you try to save a document, the version number to be modified is passed, so if CouchDB sees that a change over an old revision is being saved, it gives an error and does not allow to continue. In the rest of the fields you can use valid JSON expressions , as in the example where you have the tags attribute which is an array of strings. It could be a dictionary:

({“Key1 ″:“ value1 ″, “key2 ″:” value2 ″}), a number (2), etc…

By working without Schema the system adapts to the changes in the structure of the documents that need to be stored. In this way, the user can not worry about what is being entered in the database.

MapReduce-style queryable

CouchDB does not offer a SQL type language to make queries, but rather offers a MapReduce based system to obtain the data. This system is made up of a Map part and a Reduce part .

Map and Reduce

  • Map: is a function that is executed for each document. This function receives the document itself as a parameter and can return key-value pairs. A function can return 0, 1, or more of these pairs for a single input document. At first glance this may seem very inefficient, but the function only executes once for each document and it stores the results in an index that relates keys and values ​​so that subsequent queries will attack this index. Of course, if any of the documents in the Database are modified, the index will be redone (but only for the modified documents)

example:

function (doc) {

for (var i in doc.tags)

emit (doc.tags [i], doc);

}

Map functions (and Reduce them) are defined in Javascript . CouchDB offers a pluggable architecture through which you can create these in any language, such as Python , Ruby and others.

This function returns each tag as a key and the document itself as a value. In this way, executed on the example doc, it would give 2 rows: one for “cinema” and the other for “comedy”, both having the document itself as a value. Then on this set of results you can filter by key or by a pair of start and end keys. In this way, if you want to know all the articles that are of cinema, we would filter those that have the key “cinema”. the keys can be any type of data supported by JSON such as arrays, numbers, dictionaries … which can be useful for more advanced queries.

  • Reduce: In broad strokes this groups the results of the Map to obtain a number. This way if the previous Map part were like this:

function (doc) {

for (var i in doc.tags)

emit (doc.tags [i], 1);

}

<pre>

Definition of a reduce function:

<pre>

function (keys, values) {

return sum (values);

}

The Reduce function receives as input all the keys and all the values. With the sum function, provided by CouchDB, the 1 that the map function returns are accumulated so that as a result of this, several rows are obtained with each of the tags as the key and the number of documents that this tag has as value. In the CouchDB nomenclature a pair of MapReduce functions is called view (it is not mandatory to define the reduce part).

Accessible by REST

REST allows access to data in a very simple way through URLs. For example, to retrieve documents with id 6e1295ed6c29495e54cc05947f18c8af from Database “albums”, the following URL would be accessed, which would return the corresponding JSON document:

http: // localhost: 5984 / albums / 6e1295ed6c29495e54cc05947f18c8af

In the same way if you want to access a view you have to go to the URL Similarly if you want to access a view like the one mentioned when we explained the Map and retrieve any result we will go to the URL:

http: // localhost: 5984 / blog / _design / doc / _view / tag? key = ”cine”

This URL means that the Database is being accessed called blog, to retrieve a design document (where the views are stored within the Database) called doc and within this the view called tag. Then inside the view if you want to retrieve the result identified by the cine key. In this URL you get a result similar to this:

 

{

“total_rows”: 4,

“offset”: 0,

“rows”: [{

“id”: “9280b03239ca11af9cfedf66b021ae88”,

“key”: “cinema”,

“value”: {

“_id”: “9280b03239ca11af9cfedf66b021ae88”,

“_rev”: “1-0289d70fe05850345fd4e9118934a99b”,

“tags”: [“cinema”, “comedy”]

}

}, {

 

“id”: “a92d03ff82289c259c9012f5bfeb639c”,

“key”: “cinema”,

“value”: {

“_id”: “a92d03ff82289c259c9012f5bfeb639c”,

“_rev”: “2-97377eef95764a4dbf107d8142187f53”,

“tags”: [“cinema”, “drama”]

}

}

]}

In key and value are the expected results: the tag and the document that contains it. Apart CouchDB includes the id of the document that gave rise to that result (the one that is entered as a parameter in the Map function). In addition, the total number of rows returned and the offset of the result are returned. Instead of the key parameter, a pair of startkey and endkey parameters can be passed to our view to obtain a range of results that interests us (eg, in a view that returns a string representing a date as a key).

Integrated replication

A relatively exotic functionality that allows Databases to synchronize their data in a very simple way (a simple call REST activates it) with another remote or local Database . In this way, one or more replicas of the DB are easily available to implement high availability or load balancing architectures. Similarly, the previously mentioned _rev attribute allows CouchDB to detect cases in which the same document has been modified in several databases at the same time (each document would have a different _rev).

Scalable Transactions for Cloud Web Applications

NoSQL Cloud data warehouses provide scalability and high availability features for web applications, but at the same time sacrifice data consistency. However, many applications cannot afford any data inconsistency. CloudTPS is a scalable transaction manager that guarantees full ACID properties of multi-element operations issued by web applications , even in the presence of server and network partition failures. We implement this approach on top of the two main families of scalable data layers: Bigtable and SimpleDB. Performance assessment on top of HBase (an open source version of Bigtable) in our on-premises group and Amazon SimpleDB in the Amazon cloud shows that our system scales linearly at least up to 40 nodes in our on-premises groupings and 80 nodes in the Amazon cloud.

APIs for couchdb access

There are APIs in several languages ​​to access CouchDB databases, for example: JavaScript , Erlang , .Net, Java , perl , PHP , Python , Ruby , Lua

 

Leave a Comment