Pages

Tuesday, February 27, 2018

How to solve: '_csv.reader' object has no attribute 'next' : Python 3.6 CSV reader error

If you are trying to read a CSV file in Python 3.6 using this:

[python]
with open(filepath) as f:
reader = csv.reader(f, delimiter=’,’, quotechar=’"’, skipinitialspace=True)
header = next(reader)
# rest of the code
f.close()
[/python]

Then you might get the following (similar)error:

[shell]

Traceback (most recent call last):
File "<filepath>", line 17, in csv_to_json
header = reader.next()
AttributeError: ‘_csv.reader’ object has no attribute ‘next’

[/shell]

This is very simple error to solve.

[python]reader.next()[/python]

is actually Python 2.x syntax and that’s why it doesn’t work in Python 3.x. To make it work in Python 3.x, you should do:

[python]next(reader)[/python]

That’s it. Please comment if you face any issues.

Monday, February 26, 2018

Step by step guide: Migrate data from MongoDB/Postgres/File to Elasticsearch

Step by step guide: Migrate data from MongoDB/Postgres/File to Elasticsearch

Migrating data from MongoDB (or other data sources like Postgres/Redis/RethinkDB/File) to Elasticsearch might be a painful task. The Elasticsearch is being mainly used for text based search engines and similar niche use-cases in most of its deployments. Very few consider Elasticsearch to be used as a primary datastore(which is not recommended) or for storing some metadata. However, Elasticsearch is more than suitable for such use cases.

And once you decide to use Elasticsearch to replace your existing datastore, you will come to know that there is not much ecosystem/tools available to do that. However, there are indeed some open source tools using which you can achieve the task.

Transporter

GitHub link: https://github.com/compose/transporter

Quote from Transporter GitHub page – “Transporter allows the user to configure a number of data adaptors as sources or sinks. These can be databases, files or other resources. Data is read from the sources, converted into a message format, and then send down to the sink where the message is converted into a writable format for its destination. The user can also create data transformations in JavaScript which can sit between the source and sink and manipulate or filter the message flow.

Adaptors may be able to track changes as they happen in source data. This “tail” capability allows a Transporter to stay running and keep the sinks in sync.

 

I have personally used this tool and it works flawlessly.

Steps to use Transporter (simple one time data migration use-case):

  1. Download the latest (v0.5.2 at the time of writing) binary of transporter from https://github.com/compose/transporter/releases/tag/v0.5.2.
  2. Open Command prompt on Windows or terminal on Linux.
  3. Navigate to the directory where you downloaded the Transporter’s binary.
  4. Create one file named pipeline.json in that directory. (You can name the file anything you like).
  5. Edit the file and add necessary information in the file. Refer to Transporter documentation on how to do this. A sample pipeline.json to move data from MongoDB to Elasticsearch is

    [javascript]var source = mongodb({
    "uri": "mongodb://127.0.0.1:27017/test"
    // "timeout": "30s",
    // "tail": false,
    // "ssl": false,
    // "cacerts": ["/path/to/cert.pem"],
    // "wc": 1,
    // "fsync": false,
    // "bulk": false,
    // "collection_filters": "{}",
    // "read_preference": "Primary"
    })

    var sink = elasticsearch({
    "uri": "https://localhost:9200/test_index"
    // "timeout": "10s", // defaults to 30s
    // "aws_access_key": "ABCDEF", // used for signing requests to AWS Elasticsearch service
    // "aws_access_secret": "ABCDEF" // used for signing requests to AWS Elasticsearch service
    })

    t.Source(source).Save(sink)
    // t.Source("source", source).Save("sink", sink)
    // t.Source("source", source, "namespace").Save("sink", sink, "namespace")[/javascript]

  6. Run the transporter command:

    [shell]$ ./transporter run [-log.level "info"] pipeline.js[/shell]

  7. Check the output to see if the above command executed successfully.

For other use-cases, please explore Transporter documentation on its GitHub page.

Tuesday, February 20, 2018

How to fix out of memory error in Maven – “java.lang.OutOfMemoryError: PermGen”

How to fix out of memory error in Maven – “java.lang.OutOfMemoryError: PermGen”

While compiling/building a project in Maven, you might sometimes encounter the following error:

[shell]java.lang.OutOfMemoryError: PermGen[/shell]

It might be caused due to any of the following points:

  1. You are building a very big multi-module project, each module requires a certain amount of memory so with increasing number of modules the amount of required memory increases as well until the JVM finally runs out of “Java heap space”.
  2. You are using some plugins that perform memory-intensive operations like analyzing the class files of all project dependencies.
  3. You are using the Maven Compiler Plugin with the option fork=false (default) and your project has a lot of source files to compile. When using the Java compiler in embedded mode, each compiled class will consume heap memory and depending on the JDK being used this memory is not subject to gargabe collection, i.e. the memory allocated for the compiled classes will not be freed. The resultant error message typically says “PermGen space”.

How to fix “redis and transport connections not initialized” error while starting Sensu / Uchiwa?

How to fix “redis and transport connections not initialized” error while starting Sensu / Uchiwa?

It is a simple fix:

  1. Install and start Redis server if not already done.
  2. Configure Redis with Sensu.
  3. Install and start RabbitMQ if not already done.
  4. Configure RabbitMQ with Sensu.
  5. Restart Redis and RabbitMQ
This should fix the error.

How to add entry to crontab using bash script or in a single command?

How to add entry to crontab using bash script or in a single command?

It is very simple to do, just run the following command from the user in which cron needs to be created or add the command to your bash script and run the script from the user in which cron needs to be created:
Shell command:

[shell]
(crontab -l 2>/dev/null; echo "* * * * * /path/to/script or command here") | crontab –
[/shell]

If you need to add it to the root‘s crontab, then run the command after logging in as root, or run the above command with sudo or add the command to your bash script and run the script as sudo or from root user:

[shell]
(sudo crontab -l 2>/dev/null; echo "* * * * * /path/to/script or command here") | sudo crontab –
[/shell]

Although, executing above might give an error message – “no crontab for user”. However, you can ignore the error as it creates the crontab entry anyway.

Eclipse: “Workspace in use or cannot be created, chose a different one.” How to unlock a workspace?

Eclipse: “Workspace in use or cannot be created, chose a different one.” How to unlock a workspace?

Sometime when opening Eclipse IDE, you may get the error – “Workspace in use or cannot be created, chose a different one.”. This generally happens if previous Eclipse process didn’t quit gracefully and thus failed to delete the .lock file in the workspace directory. .lock file is there to ensure that there is only one instance of Eclipse working in a particular workspace. Fixing this is very simple – just delete the .lock file in the .metadata directory in your Eclipse workspace directory and restart Eclipse.
Shell command:

[shell]

rm <path to Eclipse workspace>/.metadata/.lock
[/shell]

Caution – Don’t delete the .metadata folder. If you delete the .metadata folder all your Eclipse preferences will be deleted.

How to print an elasticsearch query made through its Java client?

How to print an elasticsearch query made through its Java client?

Elasticsearch Java client uses TCP and default port 9300 to connect to the ES cluster and run the query, unlike the REST clients which uses HTTP protocol and the default 9200 port. So, making a query using the Java client sometimes makes the query unreadable as it is not in JSON, but Java code. However, there is an easy way to print the JSON query which the Java client is executing in the backend.

 

Following is the code snippet to achieve the same:

[java]
SearchRequestBuilder searchRequestBuilder = elasticSearchClient().prepareSearch(
"Your index name").setQuery(Your query here).setSize(100);
System.out.println(searchRequestBuilder.internalBuilder());
[/java]