project-navigation
Personal tools

Author Topic: Build of 2.5-dev fails to initialize com_genericPool before use on Darwin/Clang  (Read 8247 times)

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
I'll start off by recognizing that I don't think anyone's ever said that Clang is supported as a build environment, but what I've found investigating the bug so far suggests that it isn't a pure compiler-compatibility issue, so here goes. Also, though I consider myself pretty competent with C, the issue looks to be solidly in C++-land, so there's some risk of me starting to talk out of my ass at some point.

Symptoms: attempting to run built ufo executable results in segfault. Attempting to run ufomodel exits with message "Error: Mem_Alloc: Error - no pool given [newline] alloc: src/common/mem.cpp:#656".

Last known good commit: b7ffc466749335dc45024fbc10bf014f7ea2227e (henceforth "Good")
First known bad commit: 8adf00afba41a139f7ff8c8a9a26ecc659084a21 (henceforth "Bad")

The issue, based on what I've been able to find with the debugger, is that as of Bad a custom definition of new in mem.cpp makes reference to com_genericPool, which is initialized in Mem_Init, which breakpointing has confirmed is never called in Bad.

Based on the stack trace, it looks to me like the issue is that the redefinition of new goes into effect immediately (at least relative to what we're doing), but the calls to Mem_Init (inside Qcommon_Init) occur inside the UFO:AI main, which looks to be getting inlined into CustomApplicationMain in SDL, which in turn generates a call to new before getting around to running the UFO:AI main where initialization happens.

Tracking down this issue was complicated by the fact that the main ufo executable detects the uninitialized pool and tries to upload a crash report (which, incidentally, is something I don't remember having been disclosed, nor do I remember having been offered a chance to opt out…), but because none of the initialization has happened yet net_ipv4 is still null when we try to access a member in NET_GetAddrinfoForNode, which is where the segfault comes from.

The next question, I guess, is why everyone else hasn't noticed the same problem. If anyone is building and running 2.5-dev on Darwin and isn't seeing this issue, please let me know immediately (and, in particular, if you're using GCC). Otherwise, I'm going to start looking into the platform-specific entry points to see what the difference could be.

~J

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
For the record, the applicable stack traces:

ufomodel built from 8adf00afba41a139f7ff8c8a9a26ecc659084a21 stopped at line from Mem_Alloc error:
Code: [Select]
* thread #1: tid = 0x1f03, 0x000000010001482c ufomodel`operator new(unsigned long) + 12 at mem.cpp:656, stop reason = breakpoint 1.1
    frame #0: 0x000000010001482c ufomodel`operator new(unsigned long) + 12 at mem.cpp:656
    frame #1: 0x00007fff93be1509 CarbonCore`SCClientSession::checkinWithServer(unsigned int*) + 141
    frame #2: 0x00007fff93be1323 CarbonCore`_ZL22connectToCoreServicesDv + 71
    frame #3: 0x00007fff93be12a5 CarbonCore`_ZL9getStatusv + 24
    frame #4: 0x00007fff93be121f CarbonCore`scCreateSystemServiceVersion + 50
    frame #5: 0x00007fff895e22a1 LaunchServices`_ZL45SetupCoreApplicationServicesCommunicationPortv + 147
    frame #6: 0x00007fff895e2206 LaunchServices`IsLSServerConnected(bool) + 24
    frame #7: 0x00007fff895f6c89 LaunchServices`_LSApplicationCheckIn + 125
    frame #8: 0x00007fff8b1773af HIServices`_RegisterApplication + 1139
    frame #9: 0x00007fff8b176f0d HIServices`GetCurrentProcess + 31
    frame #10: 0x00007fff88f3162f HIToolbox`_GetAggregateUIMode + 37
    frame #11: 0x00007fff88f315e8 HIToolbox`IsMenuBarVisible + 42
    frame #12: 0x00007fff89caa0eb AppKit`_NSInitializeAppContext + 47
    frame #13: 0x00007fff89ca9627 AppKit`-[NSApplication init] + 519
    frame #14: 0x00007fff89ca924e AppKit`+[NSApplication sharedApplication] + 142
    frame #15: 0x0000000100023c2e ufomodel`main [inlined] CustomApplicationMain + 71 at SDLMain.m:204
    frame #16: 0x0000000100023be7 ufomodel`main + 135 at SDLMain.m:377
    frame #17: 0x0000000100004c74 ufomodel`start + 52

ufo built from 8adf00afba41a139f7ff8c8a9a26ecc659084a21 stopped at EXC_BAD_ACCESS:
Code: [Select]
* thread #1: tid = 0x1f03, 0x00000001000b5122 ufo`NET_GetAddrinfoForNode + 66 at net.cpp:928, stop reason = EXC_BAD_ACCESS (code=1, address=0x3c)
    frame #0: 0x00000001000b5122 ufo`NET_GetAddrinfoForNode + 66 at net.cpp:928
    frame #1: 0x00000001000b5e2a ufo`NET_ResolvNode(char const*, char*, unsigned long) + 42 at net.cpp:1189
    frame #2: 0x0000000100095dfc ufo`HTTP_ResolvURL + 156 at http.cpp:160
    frame #3: 0x0000000100095b59 ufo`HTTP_PutFile(char const*, char const*, char const*, upparam_s const*) + 89 at http.cpp:214
    frame #4: 0x00000001000a3fc7 ufo`Com_UploadCrashDump(char const*) + 135 at common.cpp:694
    frame #5: 0x0000000100134bff ufo`Sys_Backtrace() + 447 at unix_main.cpp:274
    frame #6: 0x000000010013497b ufo`Sys_Error(char const*, ...) + 315 at unix_main.cpp:83
    frame #7: 0x00000001000b2051 ufo`_Mem_Alloc(unsigned long, int, memPool_t*, int, char const*, int) + 65 at mem.cpp:298
    frame #8: 0x00000001000b2e7f ufo`operator new(unsigned long) + 47 at mem.cpp:656
    frame #9: 0x00007fff93be1509 CarbonCore`SCClientSession::checkinWithServer(unsigned int*) + 141
    frame #10: 0x00007fff93be1323 CarbonCore`_ZL22connectToCoreServicesDv + 71
    frame #11: 0x00007fff93be12a5 CarbonCore`_ZL9getStatusv + 24
    frame #12: 0x00007fff93be121f CarbonCore`scCreateSystemServiceVersion + 50
    frame #13: 0x00007fff895e22a1 LaunchServices`_ZL45SetupCoreApplicationServicesCommunicationPortv + 147
    frame #14: 0x00007fff895e2206 LaunchServices`IsLSServerConnected(bool) + 24
    frame #15: 0x00007fff895f6c89 LaunchServices`_LSApplicationCheckIn + 125
    frame #16: 0x00007fff8b1773af HIServices`_RegisterApplication + 1139
    frame #17: 0x00007fff8b176f0d HIServices`GetCurrentProcess + 31
    frame #18: 0x00007fff88f3162f HIToolbox`_GetAggregateUIMode + 37
    frame #19: 0x00007fff88f315e8 HIToolbox`IsMenuBarVisible + 42
    frame #20: 0x00007fff89caa0eb AppKit`_NSInitializeAppContext + 47
    frame #21: 0x00007fff89ca9627 AppKit`-[NSApplication init] + 519
    frame #22: 0x00007fff89ca924e AppKit`+[NSApplication sharedApplication] + 142
    frame #23: 0x00000001001c336e ufo`main [inlined] CustomApplicationMain + 71 at SDLMain.m:204
    frame #24: 0x00000001001c3327 ufo`main + 135 at SDLMain.m:377
    frame #25: 0x0000000100007404 ufo`start + 52

For reference, ufo built from b7ffc466749335dc45024fbc10bf014f7ea2227e stopped at breakpoint set on Mem_Init:
Code: [Select]
* thread #1: tid = 0x1f03, 0x00000001000b2b39 ufo`Mem_Init() + 25 at mem.cpp:625, stop reason = breakpoint 2.1
    frame #0: 0x00000001000b2b39 ufo`Mem_Init() + 25 at mem.cpp:625
    frame #1: 0x00000001000a5091 ufo`Qcommon_Init(int, char**) + 1473 at common.cpp:1177
    frame #2: 0x0000000100132f85 ufo`SDL_main + 37 at osx_main.cpp:68
    frame #3: 0x00000001001c43d0 ufo`-[SDLMain applicationDidFinishLaunching:] + 48 at SDLMain.m:300
    frame #4: 0x00007fff8f9f1d0e Foundation`__-[NSNotificationCenter addObserver:selector:name:object:]_block_invoke_1 + 47
    frame #5: 0x00007fff924e97ba CoreFoundation`_CFXNotificationPost + 2634
    frame #6: 0x00007fff8f9ddfc3 Foundation`-[NSNotificationCenter postNotificationName:object:userInfo:] + 65
    frame #7: 0x00007fff89cb44e3 AppKit`-[NSApplication _postDidFinishNotification] + 212
    frame #8: 0x00007fff89cb4249 AppKit`-[NSApplication _sendFinishLaunchingNotification] + 78
    frame #9: 0x00007fff89cb2f10 AppKit`-[NSApplication(NSAppleEventHandling) _handleAEOpenEvent:] + 242
    frame #10: 0x00007fff89cb2c71 AppKit`-[NSApplication(NSAppleEventHandling) _handleCoreEvent:withReplyEvent:] + 330
    frame #11: 0x00007fff92533541 CoreFoundation`-[NSObject performSelector:withObject:withObject:] + 65
    frame #12: 0x00007fff8fa147c7 Foundation`__-[NSAppleEventManager setEventHandler:andSelector:forEventClass:andEventID:]_block_invoke_1 + 101
    frame #13: 0x00007fff8fa1374e Foundation`-[NSAppleEventManager dispatchRawAppleEvent:withRawReply:handlerRefCon:] + 283
    frame #14: 0x00007fff8fa135dc Foundation`_NSAppleEventManagerGenericHandler + 105
    frame #15: 0x00007fff8f82cc25 AE`aeDispatchAppleEvent(AEDesc const*, AEDesc*, unsigned int, unsigned char*) + 200
    frame #16: 0x00007fff8f82cb03 AE`_ZL25dispatchEventAndSendReplyPK6AEDescPS_ + 38
    frame #17: 0x00007fff8f82c9f7 AE`aeProcessAppleEvent + 250
    frame #18: 0x00007fff88f3fd7d HIToolbox`AEProcessAppleEvent + 102
    frame #19: 0x00007fff89cb007d AppKit`_DPSNextEvent + 1247
    frame #20: 0x00007fff89caf735 AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 135
    frame #21: 0x00007fff89cac071 AppKit`-[NSApplication run] + 470
    frame #22: 0x00000001001c4a23 ufo`main [inlined] CustomApplicationMain + 1164 at SDLMain.m:227
    frame #23: 0x00000001001c4597 ufo`main + 135 at SDLMain.m:377
    frame #24: 0x0000000100006784 ufo`start + 52

~J

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
I got lazy, and also had a hankering to actually play for a bit (been a few weeks now), so as a hackish stopgap I simply commented out the operator declarations in mem.c/h ; this was sufficient to resolve the issue, so at least I'm not barking entirely up the wrong tree.

~J

Nokim

  • Guest
I got lazy, and also had a hankering to actually play for a bit (been a few weeks now), so as a hackish stopgap I simply commented out the operator declarations in mem.c/h ; this was sufficient to resolve the issue, so at least I'm not barking entirely up the wrong tree.
Thanks! This solved my issue with crash on exit too. My config is different: Gentoo 64-bit, GCC 4.5.3. I have tried valgrind (just start game and exit, no loading saves), it shows many wrong memory usage (reading beyond limits). After hack no wrong access, just memory leaks (3665 bytes in 19 blocks).
« Last Edit: June 08, 2012, 08:24:23 am by Nokim »

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
Huh. I might have to break down and install GCC, as my hypothesis for what precisely was going on would, I think, have required you to either get the "no pool given" error or be unaffected—but I may just be getting mixed up about what is guaranteed to be in what state before initialization.

(Thanks for running Valgrind, incidentally. I'd actually spent some time debating whether to try installing it or not, and eventually decided it would be too much hassle. Need to get around to it someday.)

~J

Nokim

  • Guest
I think we have different bugs in same subsystem. In my case it looks like miscalculated memory size to be allocated somewhere (like somebody thinks that sizeof(int) always equal to sizeof(uint32)), in your case it can be related to uninitialized variables (yes, valgrind reported such things too). Anyway, after commenting out overriding operators new and delete those errors vanished, so it's clear where to dig.

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
Well, I'm stumped. I've confirmed that everything goes pear-shaped before anything in UFO:AI's main is run, so it's not a simple issue of reordering the initialization functions; digging around has failed to uncover any SDL-approved way of running anything before SDL's own initialization runs. The closest thing I could find to this sort of situation involved people giving the relevant memory pools static duration, which seems like it would be a pretty big change.

Looking at the commit that introduced the change, the commit message only discusses std::min and std::max-related changes—can anyone discuss the purpose of this change? I'm assuming that somehow the pools were used before it was made…

Secondarily, is there someone out there working on the Mac-specific parts who might be able to help figure out why other platforms don't seem to be having a problem?

~J

Offline Mattn

  • Administrator
  • PHALANX Commander
  • *****
  • Posts: 4831
  • https://github.com/mgerhardy/vengi
    • View Profile
    • Vengi Voxel Tools
the std::min/std::max change was made by me as a little optimization. it now uses the inline versions of min/max, not the macros, thus, it only executes the statements that are given to the functions once.

Nokim

  • Guest
My guess is that external libs start to use our operator new. And do so before pool get initialized. And this should be linker-related, not compiler.

In my case was usage in external libs of our delete operator after global freeing memory.

So I commented out those new and delete operator and have all running fine. They are supposed to speed up allocating memory and get some stat on it, but i prefer stability. And i'm using debug version which is slow anyway.

You can try to set breakpoint in operator new to find when it's first time called. And from where.

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
the std::min/std::max change was made by me as a little optimization. it now uses the inline versions of min/max, not the macros, thus, it only executes the statements that are given to the functions once.
Sorry, what I meant was: the addition of the new/delete operator redefinitions occurs in a commit whose commit message only discusses std::min/std::max. I was asking for expansion on the operator redefinitions—the rest of the changes in the commit are indeed std::min/max related.

My guess is that external libs start to use our operator new. And do so before pool get initialized. And this should be linker-related, not compiler.
Yeah, that's what I concluded—though I'm not sure it's linker-related, as it does seem to be doing exactly what it was told. I'd been looking to see if there wasn't something that should be done to restrict the scope of the redefinition, or if SDL might not have some way to provide an initialization block that would run before SDL's internal initialization, but no dice.

Quote
So I commented out those new and delete operator and have all running fine. They are supposed to speed up allocating memory and get some stat on it, but i prefer stability.
Are we sure the redefinitions are (ostensibly) purely an optimization?

Quote
You can try to set breakpoint in operator new to find when it's first time called. And from where.
For about five minutes I was smacking myself on the forehead wondering why I didn't think of this, but I realized that it's because I already have the stack trace (posted above). In any event, since it happens strictly before anything in the UFO:AI entry point runs the details of precisely where it all goes wrong don't seem critical.

On the other hand, I think this makes for a satisfactory explanation of why the uninitialized pool only shows up on OS X (it just happens that on OS X SDL's initialization generates a call to new that doesn't appear on other platforms).

But other than simply removing the operators, I'm still no closer to identifying a solution.

~J

Nokim

  • Guest
Are we sure the redefinitions are (ostensibly) purely an optimization?
Optimization and statistics. Speed up because of reuse of already allocated memory and some statistics on memory usage in different pools which represent different game subsystems. However more common way to do so is to replace malloc function, like using TCMalloc (which is added to app just by linker, no code editing if no profiling required). But in game engine in all parts is used our Mem_Alloc/Mem_Free, so there is no real need to redeclare new/delete if you not trying to speed up external code like STL in such way.

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
It's not clear to me exactly what distinction I was trying to draw there. Maybe I'd just gotten confused.

My best candidate for a solution at the moment (other than removing the operator redefinitions) is to stick memory pool initialization, at least for com_genericPool, into a function with a constructor attribute to force initialization before main. As a solution it makes me feel a little unhappy, but I've confirmed that it works on ufomodel at least.

~J

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
Repeating a question from IRC in case no one ends up seeing it there:

Do we know that there are clear benefits to the redefinitions? Benchmarks showing clear performance improvements, or additional known-valuable information about allocations, or somesuch? I'm considering digging into Nokim's issue, but I want to make sure that they're actually worth salvaging before putting in the effort.

~J

Offline Mattn

  • Administrator
  • PHALANX Commander
  • *****
  • Posts: 4831
  • https://github.com/mgerhardy/vengi
    • View Profile
    • Vengi Voxel Tools
the clear benefit will be visible once we use sharedptr and scopedptr implementations. then we can just use new/delete but in the back we are still using our pools.

Offline Jon_dArc

  • Squad Leader
  • ****
  • Posts: 134
    • View Profile
Ok, so it's groundwork for using the pools transparently in the future. Time to install that VM, then!

~J