有個大檔案的每一行是中文或者英文單詞混合(或者說key是中文),相對其進行全域排序,使用hadoop-example terasort時出現如下錯誤,是不是中文key排序要修改Trie樹的實作啊....
14/03/06 16:13:46 INFO terasort.TeraSort: starting
14/03/06 16:13:47 INFO mapred.FileInputFormat: Total input paths to process : 1
14/03/06 16:13:47 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/03/06 16:13:47 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Making 2 from 13 records
Step size is 6.5
14/03/06 16:13:47 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/03/06 16:13:47 INFO mapred.JobClient: Running job: job_201403061609_0004
14/03/06 16:13:48 INFO mapred.JobClient: map 0% reduce 0%
14/03/06 16:13:55 INFO mapred.JobClient: map 50% reduce 0%
14/03/06 16:13:57 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_0, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method) [59/1965]
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:04 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_1, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:09 INFO mapred.JobClient: Task Id : attempt_201403061609_0004_m_000000_2, Status : FAILED
java.lang.ArrayIndexOutOfBoundsException: -23 //
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner$InnerTrieNode.findPartition(TeraSort.java:91)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:221)
at org.apache.hadoop.examples.terasort.TeraSort$TotalOrderPartitioner.getPartition(TeraSort.java:57)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:526)
at org.apache.hadoop.mapred.lib.IdentityMapper.map(IdentityMapper.java:38)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
at java.security.AccessController.doPrivileged(Native Method) [29/1965]
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.Child.main(Child
14/03/06 16:14:15 INFO mapred.JobClient: Job complete: job_201403061609_0004
14/03/06 16:14:15 INFO mapred.JobClient: Counters: 29
14/03/06 16:14:15 INFO mapred.JobClient: File System Counters
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of bytes read=156
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of bytes written=171585
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of large read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: FILE: Number of write operations=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of bytes read=266
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of bytes written=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of read operations=2
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of large read operations=0
14/03/06 16:14:15 INFO mapred.JobClient: HDFS: Number of write operations=0
14/03/06 16:14:15 INFO mapred.JobClient: Job Counters
14/03/06 16:14:15 INFO mapred.JobClient: Failed map tasks=1
14/03/06 16:14:15 INFO mapred.JobClient: Launched map tasks=5
14/03/06 16:14:15 INFO mapred.JobClient: Data-local map tasks=5
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all maps in occupied slots (ms)=26032
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all reduces in occupied slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
14/03/06 16:14:15 INFO mapred.JobClient: Map-Reduce Framework
14/03/06 16:14:15 INFO mapred.JobClient: Map input records=7
14/03/06 16:14:15 INFO mapred.JobClient: Map output records=7
14/03/06 16:14:15 INFO mapred.JobClient: Map output bytes=95
14/03/06 16:14:15 INFO mapred.JobClient: Input split bytes=105 [0/1965]
14/03/06 16:14:15 INFO mapred.JobClient: Combine input records=0
14/03/06 16:14:15 INFO mapred.JobClient: Combine output records=0
14/03/06 16:14:15 INFO mapred.JobClient: Spilled Records=7
14/03/06 16:14:15 INFO mapred.JobClient: CPU time spent (ms)=410
14/03/06 16:14:15 INFO mapred.JobClient: Physical memory (bytes) snapshot=376647680
14/03/06 16:14:15 INFO mapred.JobClient: Virtual memory (bytes) snapshot=1845735424
14/03/06 16:14:15 INFO mapred.JobClient: Total committed heap usage (bytes)=331022336
14/03/06 16:14:15 INFO mapred.JobClient: org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
14/03/06 16:14:15 INFO mapred.JobClient: BYTES_READ=88
14/03/06 16:14:15 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1388)
at org.apache.hadoop.examples.terasort.TeraSort.run(TeraSort.java:248)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.examples.terasort.TeraSort.main(TeraSort.java:257)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
uj5u.com熱心網友回復:
java陣列越界,打個標debug下看看呢~~uj5u.com熱心網友回復:
這個是在findPartition中出現的,實際上是對中文key 處理時超過acscii值導致的
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/110000.html
