How to check Hadoop and Matlab Integrated Properly or Not ???

2 views (last 30 days)
Hi
We integrated Matlab R2016b with Hadoop-2.7.2 ...but we are not sure it is working properly ....how we can check the program is running on cluster and each node is contributing in processing ...
1. It is taking more time for map-reduce (on cluster with 50 Nodes ) compare to matlab map-reduce with single computer..
2. Where to set Matlab Distributed Computer Server Properties ...like how many nodes are there, parallel pool etc .
3.How to see Matlab+Hadoop Cluster Configuration in Matlab Interface ??
Please provide me all detail in answer Thanks
  1 Comment
lov kumar
lov kumar on 2 Jun 2019
Please help me.
how to fix this error:
Error using mapreduce (line 124)
The HADOOP job failed to submit. It is possible that there is some issue with the HADOOP configuration.
Error in bg1 (line 9)
meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,mr,...
I am using this code:
setenv('HADOOP_HOME','C:/hadoop-2.8.0');
cluster = parallel.cluster.Hadoop;
mr = mapreducer(cluster);
ds = datastore('hdfs://localhost:9000/lov/airlinesmall.csv','TreatAsMissing','NA',...
'SelectedVariableNames','ArrDelay','ReadSize',1000);
preview(ds)
outputFolder = 'hdfs://localhost:9000/results/out1';
meanDelay = mapreduce(ds,@meanArrivalDelayMapper,@meanArrivalDelayReducer,mr,...
'OutputFolder',outputFolder)

Sign in to comment.

Accepted Answer

Rick Amos
Rick Amos on 12 May 2017
To look at whether MATLAB is running on the Hadoop cluster correctly, your best bet is to look at the Hadoop/Yarn Web UI. By default, this is:
http://hadoophostname:8088/
Where hadoophostname should be replaced by the hostname of the head node of Hadoop. During a mapreduce operation in MATLAB, you should see a running job in the web UI.
If you don't see a job running, it might be possible that the Hadoop installation you provided to MATLAB is not configured to run jobs in cluster mode. This can happen if the Hadoop property mapreduce.jobtracker.address found in ${HADOOP_INSTALL}/etc/hadoop/mapred-site.xml has not been set or has been set to "local". This property should be set to the hostname of the headnode of the cluster.
In a Hadoop cluster, the number of workers that are launched are controlled by Hadoop. By default, it will run as many workers as it can fit in the memory given to it.
  1 Comment
Pulkesh  Haran
Pulkesh Haran on 16 May 2017
Then we integrated Correctly Matlab and Hadoop. we can see Running program on Hadoop Interface.. But I have following questions ...
1. Matlab+Hadoop cluster (with 50 Nodes) taking more time for running map-reduce Job compare to single machine Matlab Map-Reduce ?
2. What is role of MDCS ??
3. How to define or configure property of MDCS ??
4. In Matlab GUI how can we see Matlab+Hadoop Cluster or Program running on Cluster with detailed info.
5. Matlab+hadoop CLuster with 50 nodes taking more time compare to normal matlab map-reduce running on single machine. How to can we resolve that problem.
6.We are getting Errors like :
while (Converting 50K images into Sequence File)
i. Unable to read MAT file ... : not a binary File .
ii. we are getting serialization error.
how to resolve them ??
Thanks Kojiro on 11 May 2017 at 0:51
1. It's seems strange.
2. MDCS does parallel execution of mapreduce with Hadoop.
3. In order to use MDCS with Hadoop, you need to set the following parameters.
In Hadoop 2.x settings, set "yarn.nodemanager.user-home-dir" ($HADOOP_HOME/etc/hadoop/yarn-site.xml)
and in MATLAB, setenv 'HADOOP_HOME' and create a parallel.cluster.Hadoop and mapreducer.
This example will help you.
4. You can monitor the status using Hadoop Web UI( http://YOUR_HADOOP_HOST:8088/ ) or by the following command in terminal,
yarn application -status ID
5.6 Could you give us more detail MATLAB scripts? or did you also ask to MathWorks technical support ? Pulkesh Haran on 14 May 2017 at 7:08
This Problem we are Facing while Running our Matlab Script on Cluster for 50K images.
We are Creating Sequence File for 50 thousand images. * I am attaching Matlab Code and other details. Please help us doing same.*
------------------------------- Matlab Error1--------------------------------------------
Parallel mapreduce execution on the Hadoop cluster:
****************************
MAPREDUCE PROGRESS *******************************
Map 0% Reduce 0%
Map 1% Reduce 0%
Map 33% Reduce 0%
Map 93% Reduce 0%
Error using mapreduce (line 118)
Unable to read MAT-file /tmp/tp1ce5fe8e_0189_4e64_85a3_b671c61453a4/task_0_675_MAP_4.mat: not a binary MAT-file.
Error in create_seq (line 101)
seqds = mapreduce(imageDS, @identityMap, @identityReduce,'OutputFolder',output_identity);
Error -2
> whos -file '/home/nitw_viper_user/task_0_1081_MAP_4.mat'
Name Size Bytes Class Attributes
Error 1x1 2336 MException
>> Error
Error =
MException with properties:
identifier: 'parallel:internal:DeserializationException'
message: 'Deserialization threw an exception.'
cause: {0×1 cell}
stack: [3×1 struct]
Error -3
>> create_seq
Hadoop with properties:
HadoopInstallFolder: '/home/nitw_viper_user/hadoop-
2.7.2'
HadoopConfigurationFile: ''
SparkInstallFolder: ''
HadoopProperties: [2×1 containers.Map]
SparkProperties: [0×1 containers.Map]
ClusterMatlabRoot: '/usr/local/MATLAB/R2016b'
RequiresMathWorksHostedLicensing: 0
LicenseNumber: ''
AutoAttachFiles: 1
AttachedFiles: {}
AdditionalPaths: {}
Parallel mapreduce execution on the Hadoop cluster:
**************************** * MAPREDUCE PROGRESS *****************************
Map 0% Reduce 0%
Map 1% Reduce 0%
Map 2% Reduce 0%
Map 22% Reduce 0%
Map 40% Reduce 0%
Map 80% Reduce 0%
Error using mapreduce (line 118)
The HADOOP job failed to complete.
Error in create_seq (line 101)
seqds = mapreduce(imageDS, @identityMap, @identityReduce,'OutputFolder',output_identity);
Caused by:
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
Error using distcompdeserialize
Deserialization threw an exception.
>>
date:07-05-2017
[WARN] BlockReaderFactory - I/O error constructing remote block reader. <java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461>java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.ch eckBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockRea der2.java:456)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockR eader2.java:424)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockR eaderFactory.java:818)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp (BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.ja va:355)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java :656)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream .java:882)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934)
at java.io.DataInputStream.read(Unknown Source)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
[WARN] DFSClient - Failed to connect to /192.168.193.177:50010 for block, add to deadNodes and continue. java.io.IOException: Got
error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461 <java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461>java.io.IOException: Got error, status message opReadBlock BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 received exception org.apache.hadoop.hdfs.server.datanode.ReplicaNotFoundException: Replica not found for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461, for OP_READ_BLOCK, self=/192.168.192.128:60935, remote=/192.168.193.177:50010, for file /fv_13l_2/.matlaberror /task_0_797_MAP_1.mat, for pool BP-581788350-127.0.1.1-1490718884252 block 1075155042_1414461
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.ch eckBlockOpStatus(DataTransferProtoUtil.java:140)
at org.apache.hadoop.hdfs.RemoteBlockReader2.checkSuccess(RemoteBlockRea der2.java:456)
at org.apache.hadoop.hdfs.RemoteBlockReader2.newBlockReader(RemoteBlockR eader2.java:424)
at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReader(BlockR eaderFactory.java:818)
at o
rg.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp( BlockReaderFactory.java:697)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355) at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:656) at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:882) at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:934) at java.io.DataInputStream.read(Unknown Source) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792) at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769) at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
[INFO] DFSClient - Successfully connected to /192.168.193.167:50010 for BP-581788350-127.0.1.1-1490718884252:blk_1075155042_1414461 Error using mapreduce (line 118) Unable to read MAT-file /tmp/tp3c745940_f508_4326_93f7_fc5f6fb9ef06/task_0_805_MAP_1.mat: not a binary MAT-file.
Error in main (line 270) res = mapreduce(seqds, @Ltrp_db1_seq_file_mapper, @Ltrp_db1_reducer, 'OutputFolder', Ltrp_db1_seq_file_result); bold

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!