How to use memmapfiles safely for inter-process communication?

I am interested in using memory mapped files to implement inter-process communication between a Matlab process and a foreign process. Portability (Windows / Linux) is a concern, but my main concern is reliability.
Looking at the example in Share Memory Between Applications, I am surprised the code is that simple. Does the code actually work? The shared byte m.Data(1) controls which process is allowed to access the shared data, but m.Data(1) itself doesn't seem to be protected against data races. To implement this example in C++, one would typically add some synchronization object, either a locking one (mutex, semaphore or condition variable) or a lock-free one (involving some kind of memory barrier). The Boost library provides some good examples of such mechanisms.
Can we use such synchronization objects with Matlab's memmapfiles? Or is there some kind of magic the Matlab compiler adds behind the scenes, that makes my concern pointless?
Edit: I am specifically concerned by the compiled code of this example.

Answers (1)

In MATLAB on Intel (x86) platforms I believe this code is safe and will work correctly. Because the communication process is token based and designed to only work between two processes a simple mechanism is possible. If this was done in C with a memory mapped file the only change needed would be to declare the first memory location atomic.
Also note that this is not the most efficient way to do this type of thing. The sleep calls dictate the maximum call throughput and the poling mechanism is inefficient. Doing this with one or two mex files and proper inter-process communication would have much better performance.

11 Comments

Thanks for this quick reply.
I am not concerned by efficiency here. Just portability and reliability. I would like to be very sure the compiled code of this example works correctly.
I am not sure to understand your reasoning about "token based" and "simple mechanism". But, if the equivalent C program would require additional synchronization stuff like use of atomic operations (which I believe), the question is: why is there no such requirement in the Matlab code?
Understanding would be nice, but it is not a requirement for me, if I can have some guarantee...
MATLAB will not optimize a matrix read operation on a memmaped objects data so there is no need for special handling. In c the atomic would be needed to prevent the poling loop from being optimized out or having the data element placed in a register.
This is a simple example because there are well defined ownership/writing rules and only two process. If you tried to have multiple processing threads then this solution would break down.
I am not sure what you mean by complied code. Are you planning on using MATLAB coder to produce a mex file or the MATLAB Compiler to produce an application?
By compiled code I mean what the MATLAB Compiler generates.
If I'm correct, you says MATLAB will not try to optimize read (and probably also write) operations to a memmaped object. This means there will be no instruction reordering at compile time.
But what about reordering at execution time (out-of-order execution)? Does the MATLAB compiler add some stuff behind the scenes to prevent reordering at execution time?
The following link is related to my question (though not specific to MATLAB). why-is-it-in-valid-to-synchronize-a-single-reader-and-a-single-writer I've found the accepted anwser very informative, and the point 3 of "What goes wrong" is about out-of-order execution.
That link is correct and theoretically the MATLAB code could have a problem due to out of order execution however given the amount of code that most likely executes between the reads in typical MATLAB code execution I feel that encountering a problem is somewhere between very unlikely and never going to happen. The optimized c code in that example is a completely different situation. In the future it is possible that MATLAB or processor changes could effect the odds...
The MATLAB compiler runs code identically to how it runs in MATLAB so it's use has no effect on the answer.
I don't see the point with the amount of code being executed between two reads. If this amount is high, surely the reads and writes to m.Data(1) are very rare compared to the remaining parts. But out-of-order execution can apply potentially to any memory access. So the probability that it affects m.Data(1) in a way that breaks the expected global behavior only depends on the immediate execution context when m.Data(1) is accessed and the specific rules for out-of-order execution on the platform.
As far as I understand, the MATLAB system (just) guarantees no instruction reordering is done at compile time when dealing with memmaped object. That's a good point, but not enough to get proper synchronization. Still I don't get why, even for a simple 1 producer 1 consumer, C/C++ (and others) programmers carefully make use of synchronization objects to guarantee data consistency, while MATLAB users would just have to say "it just isn't going to happen".
Considering that the scientists at my office do care about the consistency of their data, what would you recommend?
Is there any safe way to use MATLAB memmaped objects for a simple 1 producer 1 consumer task, within the MATLAB language, or do we have to do the actual io thing in C++ via mex files to guarantee data consistency?
There is a confusion here between MATLAB Compiler and MATLAB Coder.
MATLAB Compiler is MATLAB without the desktop or command prompt, so any MATLAB behavior will be the same under MATLAB Compiler. MATLAB Compiler builds threaded data structures that are interpreted at run time -- it is, in that way, much the same as pcode, all the tokens parsed into data structures. MATLAB Compiler does not generate C/C++ code. There is no possibility of instructions being reordered compared to what MATLAB itself would use.
MATLAB Coder generates C/C++ code, and there are risks with that in the area of optimization and reordering.
The confusion, if any, can't be mine: I am not concerned with C/C++ code, which I am familiar with, and if I wanted to ask anything about C/C++ code, I surely would ask somewhere else.
When I mentioned C or C++ code, it was to give some examples about how people deal with synchronization in concurent programming in other languages, since many concepts in this field are the same no matter the language.
And since Philip Borghesani asked me, I said by compiled code I meant what (whatever) the MATLAB Compiler generates, NOT what you could obtain by using the Matlab Coder and compiling the generated C++ code by yourself.
Philip Borghesani already clarified to me the fact that the MATLAB compiler runs code identically to how it runs in MATLAB. That's an important point to know, but doesn't help much regarding my concern.
He also already explained that no instruction reordering is done at compile time (at least when dealing with memmapfile objects), which is a very good point regarding data races safety. It is, though, only part of what is needed to get proper synchronization. The easiest part by the way. More details can be found by following the link to a related question in stackoverflow I've given.
"But what about reordering at execution time (out-of-order execution)? Does the MATLAB compiler add some stuff behind the scenes to prevent reordering at execution time?"
No, the MATLAB Compiler does not add anything. It is exactly the same execution engine as MATLAB itself. If MATLAB gets the order right then MATLAB Compiler gets the order right.
MATLAB only reorders execution for some combinations of mathematical operations that it recognizes as matching the LINPACK or BLAS patterns or MKL, and then only when the operations are actually dispatched to those libraries. For example, A + B.*C could potentially get dispatched through a Multiply and Add routine that was technically B .* C + A, and that libraries in turn might convert that into the SSE2 version of Fused Multiply and Add using SIMD for up to 8 fused operations per instruction. It could happen. But it isn't going to reorder your test of the synchronization value, for example.
If I write "if (data[0]=value) then b=data[1]" in c then that could end up being only a few assembly statements and it is quite possible that a processor could execute the fetch for data 1 before the fetch for data[0]. If I write the equivalent statements in MATLAB there will be more intervening instructions, and calls, between the two operations then the size of the execution pipeline and probably more then the instruction decode pipeline such that out of order execution of the two MATLAB statements is impossible for the processor.
It is conceivable that there could be problems if you did mydata=m.Data and then accessed mydata(0) and mydata(1) but m.data must go through an overloaded subsref operation.
If you were to access m.data(I,J) with vector I or J then the order of the accesses to the elements is not defined and could be different for large enough arrays (because the pattern of copying out to call the high performance libraries could be different)
@Philip Borghesani
I misunderstood your point about the intervening instructions that MATLAB generates. I get it now.
In no way am I an expert in this field, but it seems to me that, regarding data races issues with out-of-order execution, one should not compare the intervening instructions with the size of the execution pipeline, nor the instruction decode pipeline. It seems to me the intervening instructions should be compared to the sizes of the reorder buffer or the reservation station. These can be fairly large: for exemple the reorder buffer has 224 uop entries and the reservation station has 97 uop entries on the Intel Skylake.
One should notice that, even if "only" one word of data is altered, this could lead to huge data losses: this one word could be an index needed to interpret other data.
Another thing to keep in mind, is that any of the two processes can be halted for a short time, if the processor that run it decides to swich for some urgent task. This could potentially increase data losses/corruption.

Sign in to comment.

Categories

Find more on Get Started with MATLAB Coder in Help Center and File Exchange

Products

Asked:

on 6 Jun 2017

Commented:

on 15 Jun 2017

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!