MATLAB Answers


Can I install parallel compute toolbox without full installation of Matlab?

Asked by Ernesto Samano on 13 Nov 2018
Latest activity Commented on by Jason Ross
on 6 Dec 2018
I need to run matlab software inside a hadoop cluster, and that cluster is maintained by third parties, and they are not willing to support the instalation of full matlab, but can allow the installation of a "plugin" or smaller "library".
The idea behind this is be able to develop software algorithms on desktop computers and then send them to hadoop to be executed, the only requiretment is not install anything inside a cluster or at least not a full matlab installation.


Sign in to comment.

1 Answer

Answer by Jason Ross
on 14 Nov 2018

It is possible to compile an application on MATLAB and then submit that application to run on the cluster. The documentation is here and there is an example of the workflow here.
Note also that there are other ways to tackle the problem, too. For example, if you have a common filesystem that the Hadoop cluster nodes have access to, you could place a MATLAB installation on the common filesystem and set the ClusterMatlabRoot property to use that, no local installation required -- but this also makes a lot of assuptions about your computing environment -- that you have a common filesystem, that you have the network capacity to run MATLAB over the network at your scale, etc.
Another approach is that you can install only the toolboxes you require. MDCS is recommended to be installed with all toolboxes so that any arbitrary command from any arbitrary client will have access to the same functions on the cluster and avoid "function not found" errors. If you know that you only use functions from a few toolboxes, you can reduce the installation size by using that smaller installation, but this takes more tending and upkeep than just installing everything.


Thanks for this answer, it's definitely a lot of help.
I have tried to find a detailed guide with these alternatives, do you know where I can find one?
For the common filesystem installation, all you do is install MATLAB as normal, but in a location that is shared on your network. When someone submits a job to the Hadoop cluster, they set the ClusterMatlabRoot property on the Hadoop object to that location. Then, when the job runs, it executes MATLAB from that area. The upside of this install method: you only have one installation location, so no local installs. The downside: you are executing an application from a network location, so you will have more network traffic and the application will run somewhat more slowly than if it's coming off off local disk. It is highly dependent on your network capability and architecture.
I don't know that any detailed guide exists for the above, as it's the same process as sharing/exporting anything else on a network filesystem.
For the installation of only the toolboxes you require, all you need to do is select only those toolboxes during the installation of MDCS. The default mode is to install everything, but you can also select only what you want/need to slim it down. The reason for the recommendation of installing everything is to ensure that submitted jobs find all the functions they require from any client. If you know with 100% certainty what toolboxes are used, there's no hard requirement to install ones you don't need. As I mentioned before, the big risk here is that someone starts using a new toolbox that isn't installed on the cluster, and gets "function not found" errors when their job runs, causing rework of the cluster installations to add that toolbox.

Sign in to comment.