Current location - Education and Training Encyclopedia - Resume - How to handle JSON files in Hadoop's MapReduce program
How to handle JSON files in Hadoop's MapReduce program
JSON configuration files need to be parsed to simplify Java programs and processing logic. But Hadoop itself doesn't seem to have built-in JSON file parsing function, so it has to turn to the third-party JSON toolkit. Json-simple is selected here to realize this function. Brief introduction:

(1): The command to execute Java programs on Hadoop is as follows:

(2): My-MapReduce。 Jar is a MapReduce program for log processing. Now suppose you need to deal with a configuration file in JSON format. Here, we ignore the details of how to read files in Hadoop cluster and only focus on how to use JSON toolkit. The following is a simple HelloWorld program:

In the HelloWorld program, you only need to modify the JSON object and print out its contents, thus verifying the process of parsing and modifying the JSON content.

Second compilation:

Because MapReduce programs need to be submitted to Hadoop cluster for execution, if there is no corresponding jar package on the cluster, the json-simple package that HelloWorld depends on must exist in the classpath of the cluster. The following exception occurs when executing HelloWorld:

Exception in thread "main" java.lang.noclassDefoundError: org/json/simple/JSONObject.

The simple solution is to package the json-simple package directly with the compilation result of HelloWorld, and then execute the jar package that you need to decompress json-simple with the command hadoop jar HelloWorld.jar, and package it with HelloWorld.

The compilation command is as follows:

Third, realize HelloWorld: