Pages

Monday, February 26, 2018

Step by step guide: Migrate data from MongoDB/Postgres/File to Elasticsearch

Step by step guide: Migrate data from MongoDB/Postgres/File to Elasticsearch

Migrating data from MongoDB (or other data sources like Postgres/Redis/RethinkDB/File) to Elasticsearch might be a painful task. The Elasticsearch is being mainly used for text based search engines and similar niche use-cases in most of its deployments. Very few consider Elasticsearch to be used as a primary datastore(which is not recommended) or for storing some metadata. However, Elasticsearch is more than suitable for such use cases.

And once you decide to use Elasticsearch to replace your existing datastore, you will come to know that there is not much ecosystem/tools available to do that. However, there are indeed some open source tools using which you can achieve the task.

Transporter

GitHub link: https://github.com/compose/transporter

Quote from Transporter GitHub page – “Transporter allows the user to configure a number of data adaptors as sources or sinks. These can be databases, files or other resources. Data is read from the sources, converted into a message format, and then send down to the sink where the message is converted into a writable format for its destination. The user can also create data transformations in JavaScript which can sit between the source and sink and manipulate or filter the message flow.

Adaptors may be able to track changes as they happen in source data. This “tail” capability allows a Transporter to stay running and keep the sinks in sync.

 

I have personally used this tool and it works flawlessly.

Steps to use Transporter (simple one time data migration use-case):

  1. Download the latest (v0.5.2 at the time of writing) binary of transporter from https://github.com/compose/transporter/releases/tag/v0.5.2.
  2. Open Command prompt on Windows or terminal on Linux.
  3. Navigate to the directory where you downloaded the Transporter’s binary.
  4. Create one file named pipeline.json in that directory. (You can name the file anything you like).
  5. Edit the file and add necessary information in the file. Refer to Transporter documentation on how to do this. A sample pipeline.json to move data from MongoDB to Elasticsearch is

    [javascript]var source = mongodb({
    "uri": "mongodb://127.0.0.1:27017/test"
    // "timeout": "30s",
    // "tail": false,
    // "ssl": false,
    // "cacerts": ["/path/to/cert.pem"],
    // "wc": 1,
    // "fsync": false,
    // "bulk": false,
    // "collection_filters": "{}",
    // "read_preference": "Primary"
    })

    var sink = elasticsearch({
    "uri": "https://localhost:9200/test_index"
    // "timeout": "10s", // defaults to 30s
    // "aws_access_key": "ABCDEF", // used for signing requests to AWS Elasticsearch service
    // "aws_access_secret": "ABCDEF" // used for signing requests to AWS Elasticsearch service
    })

    t.Source(source).Save(sink)
    // t.Source("source", source).Save("sink", sink)
    // t.Source("source", source, "namespace").Save("sink", sink, "namespace")[/javascript]

  6. Run the transporter command:

    [shell]$ ./transporter run [-log.level "info"] pipeline.js[/shell]

  7. Check the output to see if the above command executed successfully.

For other use-cases, please explore Transporter documentation on its GitHub page.

No comments:

Post a Comment