In this snippet, I print my name in the method composite(), write another composite() to deal with the order which contains an equipartitioned join. And I change the code in summarizeResult() in order to output the result with an equipartitioned join.
Stringjob=""; ... elseif (args.length == 7) // If number of input arguments is 7 and // first input is == "composite"; parses // data to composite { if (job.equals("composite")) { composite(args[1], args[2], args[3], args[4], args[5], Integer.parseInt(args[6])); } else// In case the function name doesn't match up { System.err .println("Please check the name of the function you wish to call and try again"); } } elseif (args.length == 8) // If number of input arguments is 8 and // first input is == "composite"; parses // data to composite { if (job.equals("composite")) { composite(args[1], args[2], args[3], args[4], args[5], args[6], Integer.parseInt(args[7])); } ... if (i % 2 == 1) // As i increments at the last step, for odd i, interim2 // is the input directory { deleteDirectory(interim1); // deletes other directory counter++; if (nameInput != null) { translate(nameInput, interim2, interim1, reducers); finish(interim1, output, reducers); } else { finish(interim2, output, reducers); } summarizeResult(output); } else// for even i, interim1 is the input directory { deleteDirectory(interim2); // Deletes other directory counter++; if (nameInput != null) { translate(nameInput, interim1, interim2, reducers); finish(interim2, output, reducers); } else { finish(interim1, output, reducers); } summarizeResult(output); } ...
publicvoidmap(LongWritable key, Text value, Context context)throws IOException, InterruptedException, IllegalArgumentException { Stringline= value.toString(); // Converts Line to a String /* * TODO: Just echo the input, since it is already in adjacency list format. */ String[] sections = line.split(":"); if (sections.length == 2){ context.write(newText(sections[0].trim()),newText(sections[1].trim())); }else thrownewIOException("Incorrect data format"); }
}
InitReducer.java
input: key = nodeIdentifier, values = vertexes the node links to
emit: key = nodeIdentifier+rank_value, value = adjacency list
publicvoidmap(LongWritable key, Text value, Context context)throws IOException, InterruptedException, IllegalArgumentException { Stringline= value.toString(); // Converts Line to a String String[] sections = line.split("\t"); // Splits it into two parts. Part 1: node+rank | Part 2: adj list
if (sections.length > 2) // Checks if the data is in the incorrect format { thrownewIOException("Incorrect data format"); } if (sections.length != 2) { return; } /* * TODO: emit key: adj vertex, value: computed weight. * * Remember to also emit the input adjacency list for this node! * Put a marker on the string value to indicate it is an adjacency list. */ String[] pair = sections[0].split("\\"+PageRankDriver.MARKER_DELIMITER); doublerank= Double.valueOf(pair[1]); String[] adjList = sections[1].split(" "); doubleweight= rank / adjList.length; for (inti=0; i < adjList.length; i++){ context.write(newText(adjList[i]), newText(String.valueOf(weight))); } context.write(newText(pair[0]), newText(PageRankDriver.MARKER_ADJ + ":" + sections[1])); }
}
IterReducer.java
input key = node, values = rankValues form adj vertexes Or adjacency list
publicclassIterReducerextendsReducer<Text, Text, Text, Text> { publicvoidreduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException { doubled= PageRankDriver.DECAY; // Decay factor /* * TODO: emit key:node+rank, value: adjacency list * Use PageRank algorithm to compute rank from weights contributed by incoming edges. * Remember that one of the values will be marked as the adjacency list for the node. */ doublesum=0; StringadjList=""; for (Text val:values){ Stringval_s= val.toString(); if (val_s.startsWith(PageRankDriver.MARKER_ADJ)) adjList = val_s.split(":")[1]; else sum += Double.valueOf(val.toString()); } doublevalue= (1-d) + d * sum; context.write(newText(key.toString() + PageRankDriver.MARKER_DELIMITER + String.valueOf(value)), newText(adjList)); } }
publicvoidmap(LongWritable key, Text value, Context context)throws IOException, InterruptedException, IllegalArgumentException { Stringline= value.toString(); // Converts Line to a String String[] sections = line.split("\t"); // Splits each line if (sections.length > 2) // checks for incorrect data format { thrownewIOException("Incorrect data format"); } /** * TODO: read node-rank pair and emit: key:node, value:rank */ String[] pair = sections[0].split("//"+PageRankDriver.MARKER_DELIMITER); if (pair.length == 2) context.write(newText(pair[0]), newText(pair[1])); }
publicvoidreduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException { double[] ranks = newdouble[2]; /* * TODO: The list of values should contain two ranks. Compute and output their difference. */ inti=0; for (Text val: values){ ranks[i++] = Double.valueOf(val.toString()); if (i >= 2) break; } context.write(newText(String.valueOf(Math.abs(ranks[0] - ranks[1]))), newText()); } }
publicvoidmap(LongWritable key, Text value, Context context)throws IOException, InterruptedException, IllegalArgumentException { Strings= value.toString(); // Converts Line to a String
publicvoidreduce(Text key, Iterable<Text> values, Context context)throws IOException, InterruptedException { doublediff_max=0.0; // sets diff_max to a default value /* * TODO: Compute and emit the maximum of the differences */ for (Text val: values){ doublediff= Double.valueOf(val.toString()); diff_max = Math.max(diff, diff_max); } context.write(newText(String.valueOf(diff_max)), newText()); } }
United_States_postal_abbreviations 17400.21276502926 United_States 13078.758142261851 Geographic_coordinate_system 11721.388760989967 Biography 9046.94404539674 2008 7074.6425361942565 2007 6800.352714147496 United_Kingdom 4910.857511340169 Music_genre 4873.710868083095 France 4742.359910022161 Record_label 4693.041866616612 Biological_classification 4504.695513826754 England 4235.032041106078 Canada 3690.885542951598 Personal_name 3576.978399648568 2006 3575.6745346145035 Internet_Movie_Database 3463.2832928822127 India 3106.5889721148874 Binomial_nomenclature 2900.531729940027 Germany 2813.7080608753836 Australia 2801.5539246554167 2005 2753.2105256013992 Japan 2738.5863561278975 Studio_album 2653.126455458513 Village 2443.744194498339 Record_producer 2390.2859118177685 Football_(soccer) 2309.2199488266556 Politician 2297.477993147134 Romania 2223.1518281845247 English_language 2213.6348292720913 Time_zone 1995.897478274637 Departments_of_France 1989.2385407660036 Wiktionary 1986.601595665923 Geocode 1957.2310712914546 UN/LOCODE 1927.7685672468049 2004 1923.7717011191974 Television 1887.5449027795114 Italy 1876.5179226178395 Europe 1876.4133549775547 Album 1862.5000237009542 Conservation_status 1822.3255838594016 Website 1786.3698129091722 Animal 1755.4690932603446 London 1750.4268351247983 IUCN_Red_List 1693.081548359981 Wikimedia_Commons 1691.2920868385286 Poland 1660.9849679859221 Population_density 1594.3872097107585 Public_domain 1524.3655342422294 Actor 1430.0997921719966 Digital_object_identifier 1394.3430354664986 2001 1372.891851411744 Elevation 1359.6035580473917 Norway 1354.81422265507 China 1290.9928602406003 School 1176.2465753102497
5 reducers ,10 reducers, 20 reducers
1 master, 4 cores, spent 9 mins.
1 master, 9 cores, spent 6 mins 11 secs (first)
1 master, 9 cores, spent 5 mins 30 secs (second)
1 master, 19 cores, spent 4 mins 25secs(first).
1 master, 19 cores, spent 4 mins 52secs (second)
According to the different number reducers’ different time, we know that running twice with the exact same arguments still show a notable difference. And we use the double reducers from 10 to 20, but the time cost didn’t improve as much as that. Because if we have too more reducers, more time will spend on comunication and calling function, rather than the benifit of increasing reducer. That means adding reducers may slow overall time.
Ankai Liang's Blog
Strong but not aggressive, weak but not feeble.
This piece of writing is an original article, utilizing theCC BY-NC-SA 4.0Agreement. For complete reproduction, please acknowledge the source as Courtesy ofAnkai Liang's Blog