I wrote the MATLAB script that solves the Laplace equation with oblique boundary conditions using Boundary Element Method (BEM). Everything worked surprisingly smooth until I decided to modify this code to use sparse matrixes.
I decided not to initialize the matrix via triplets but instead to use spalloc. As far as I know, triplets have to be stored in RAM alongside a created sparse matrix, which was impossible due to the size of the problem.
My code for assembling this matrix is as follows :
M = spalloc(sz,sz,elements_sum);
[row,ind] = whole_Mrow(mesh_generated.Points,mesh_generated.node_panels,m,far_zone_dist,sz);
M(m,:) = sparse(1,ind,row,1,sz);
M size is 283 560 x 283 560
In the best case, it has only 11882253152 non-zero elements (I can count them exactly before assembling)
The server used for computation has 512 Gb RAM installed. I was able to run this script when M was dense and had a size M(150 00x150 000).
mod_parfor_progress is a function to track parafor progress (modified parfor_progress written by Jeremy Scheff)
The code was working well as long as M was dense.
However, if M is sparse, this script hangs. I can see from the content of file parafor_progres.txt that loops filled all rows of M, but then everything stuck for a long time ( more than two times longer than spent on the loop execution, [76 hours !]).
Could you explain to me what MATLAB is doing? My assumption is that there is some sort of memory defragmentation, which is inefficient in the case of such a big matrix.
I did a memory RAM usage test using small logger written in bash, but results puzzles me even more. The memory usage tends to explode (+200Gb and more) after assembling loop ends.
Summing up:
1) The parafor_progress suggests that M matrix is assembled; however, nothing after the loop is executed.
2) The htop results confirm that all workers have been terminated. Only one process of Matlab is running at 100% cpu usage (the single core of CPU).
3) The memory usage tends to fluctuate a lot, but this is after M was technically assembled. In some cases whole script is terminated for this reason.
I hope that I have provide enough details about the problem, but any suggestions for further debugging are welcome.