double parameter not copying correctly

double parameter not copying correctly

We are attempting to use the rapidjson project (https://code.google.com/p/rapidjson/) to parse through a json file, but we get a segfault when running it natively on the Xeon Phi Linux enviroment.  It runs fine on the Xeon chip, however.  The segfault occurs in the following lines of code in the rapidjson project:

reader.h

d *= internal::Pow10(exp + expFrac);
handler.Double(minus ? -d : d);

document.h

//this is the function that is called above
void Double(double d) { new (stack_.template Push<ValueType>()) ValueType(d); }

I don't have a clue what the .template Push is doing, in fact i have never even seen syntax like that. Regardless, i don't believe this to be the source of the issue, as when i use gdb to debug through it, the error is occurring because in document.h, "d" being passed in to void Double is some random double (like 3.34155418345e-317) with the address of 0x0.  GDB output is below:

Breakpoint 1, rapidjson::GenericReader<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::ParseNumber<1u, rapidjson::GenericInsituStringStream<rapidjson::UTF8<char> >, rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> > > (this=0x7fffffffcbe8,
stream=..., handler=...) at /home/jjekeli/workspace/com.src.ewir.keystonert_xeonphi/src/rapidjson/reader.h:636
636 d *= internal::Pow10(exp + expFrac);
(gdb) print d

$1 = 45,000

(gdb) next
637 handler.Double(minus ? -d : d);
(gdb) print d
$2 = 4500
(gdb) print /a d
$3 = 0x1194
(gdb) print minus

$4 = false
(gdb) step
0x000000000044e7e4 in rapidjson::GenericDocument<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> >::Double (this=0x83df98,
d=4.2699109613558058e-317) at /home/jjekeli/workspace/com.src.ewir.keystonert_xeonphi/src/rapidjson/document.h:776
776 void Double(double d) { new (stack_.template Push<ValueType>()) ValueType(d); }
(gdb) print d
$4 = 4.2699109613558058e-317
(gdb) print /a d
$5 = 0x0

As you cane see, before passing into the Double function, d has a valid value and a valid memory address, but upon entering the Double function, it no longer has a valid memory address or a valid value, and causes a seg fault.

Any thoughts as to why this may be occurring?

publicaciones de 8 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Could you print out the assembly code for the function call and entry point and send them for us? In addition, you might want to try compiling with a lower optimization level to see if the code works in that case.

How can i determine the assembly code for the function call and entry point? Also, we had already tried setting to the lowest optimization level with no luck.

A long shot, but have you checked that you have enough stack space allocated? 

The values I see on the card are

% ulimit -a
-f: file size (blocks) unlimited
-t: cpu time (seconds) unlimited
-d: data seg size (kb) unlimited
-s: stack size (kb) 8192
-c: core file size (blocks) 0
-m: resident set size (kb) unlimited
-l: locked memory (kb) 64
-p: processes 61357
-n: file descriptors 10240
-v: address space (kb) unlimited
-w: locks unlimited

It might be worth trying with ulimit -s unlimited

Somewhat worried about setting the stack size to unlimited... isn't that a good way to smash one's stack and destroy the kernel?

Nothing you do with the stack should be able to cause the kernel to crash.

However if you are paranoid,  by all means just double the size and see if that affects what happens; if it does, that's a strong signal that the amount of stack space may be the problem. (You may need to change the pthread stack size, rather than just using ulimit, of course, if your code is threaded).

You should also be able to work out how much stack you're using from the debugger. Look at the value of %rsp at the point of the crash, and then go back to the top of the thread's stack and look at the address of a local variable there. The difference is (close to) the stack usage. If it's nearly the stack limit that's a strong hint.

Also, if %rsp at the point of the crash is just below a page boundary, that's suspicious. If you then look at /proc/pid/maps for the process you shouldbe able to see if %rsp is pointing at valid memory or not.

Setting the stack size to unlimited did not alleviate the problem. Sorry.

The stack pointer at the beginning of the thread was  0x7fffffffe100.  At the point of the crash, the stack pointer was at  0x7fffffffc2f0, a difference of 7696.  That is near-ish the original stack size, but setting ulimit -s unlimited did not help.  

Wasn't able to access the maps through proc/pid/maps? The pid was 7497, but there was no 7497 directory?

It doesn't seem like it is the stack, then, since the limit is in KB, not bytes, so you seem to be a long way away from the limit.

I'll crawl back under my stone :-)

Inicie sesión para dejar un comentario.