speed up ufo2map.exe over 100% (gcc optimization)

Technical support > Windows

(1/2) > >>

Muton:
gcc doc
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/i386-and-x86_002d64-Options.html
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Optimize-Options.html
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Preprocessor-Options.html

def gcc options are (ufoai\build\projects\ufo.cbp)

--- Code: --- <Add option="-Wall" />
<Add option="-ffloat-store" />
<Add option="-D__GNUWIN32__" />
<Add option="-DWINVER=0x501" />
<Add option="-DNODEBUG" />

--- End code ---

optimization for a AMD K8 system

--- Code: --- <Add option="-march=k8" />
<Add option="-O3" />
<Add option="-msse3" />
<Add option="-mfpmath=sse" />
<Add option="-mieee-fp" />
<Add option="-D__GNUWIN32__" />
<Add option="-DWINVER=0x501" />
<Add option="-DNODEBUG" />

--- End code ---

The option -ffloat-store
cost most CPU time
If you remove it you can build maps twice as fast

I've done so
and the maps build with and without -ffloat-store are the same (hashsum)
This options should prevent false float calculation (as far as i understood it)
So
if you remove -ffloat-store be shure the resulting map is the same as build with def. gcc options
To do that you must build maps using -t 1 option and hash both maps
md5sum.exe -b V:\MinGW\ufoai\base\maps\bunker.bsp
md5sum is part of C::B package

Core2Duo optimization

--- Code: --- <Add option="-march=core2" />
<Add option="-O3" />
<Add option="-mssse3" />
<Add option="-mfpmath=sse" />
<Add option="-mieee-fp" />
<Add option="-D__GNUWIN32__" />
<Add option="-DWINVER=0x501" />
<Add option="-DNODEBUG" />

--- End code ---

odie:
Was wondering if anyone is considering implementing this on the current ufo2map.exe yet?? :D

geever:
I think the problem is that it's hardware specific..
It may go much faster on a K8 but slower on any other...

-geever

Mattn:
we need that option as the fpu has a different accuracy for intel and amd (not sure about others)

Muton:
To check calc errors goto http://gcc.gnu.org/ml/gcc/2004-03/msg01494.html
download http://www.netlib.org/paranoia/paranoia.c

my machine K8 (Windsor)
gcc.exe -O3 -ffloat-store -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse -mfpmath=sse -march=pentium-m V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse3 -mfpmath=sse -march=k8 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -march=k8 V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -ffloat-store -march=k8 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -ffloat-store -march=i386 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -msse -mfpmath=sse -march=pentium4 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse -mfpmath=sse -march=prescott V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -m3dnow -march=athlon V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse -mfpmath=sse -march=athlon-xp V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse2 -mfpmath=sse -march=athlon64 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse3 -mfpmath=sse,387 -march=k8 V:\codeblocks\paranoia.c -lm -> far more errors

Conclusion:
If you compile before pentium-m and athlon64 better use -ffloat-store
If you make use of 387 (classicfpu) you are forced to use -ffloat-store
If you mix up SSE and 387 you are forced to use -ffloat-store
If you use 3dnow you are forced to use -ffloat-store
btw. I had no problem compiling and running maps using -O3 -msse -mfpmath=sse -march=pentium3

Its not a problem to add more target-options (save side) into C::B (done this already with ufo.exe)
but i dont start working on it until i got a green light from a dev.
===============================================

gcc.exe -O3 -msse -mfpmath=sse -march=k8 V:\codeblocks\paranoia.c -lm
run a.exe

--- Code: ---....
To continue, press RETURN

Diagnosis resumes after milestone Number 220 Page: 10

No failures, defects nor flaws have been discovered.
Rounding appears to conform to the proposed IEEE standard P754.
The arithmetic diagnosed appears to be Excellent!
END OF TEST.

--- End code ---

gcc.exe -O3 -march=k8 V:\codeblocks\paranoia.c -lm
a.exe

--- Code: ---....
Diagnosis resumes after milestone Number 120 Page: 10

The Underflow threshold is 0.00000000000000000e+000, below which
calculation may suffer larger Relative error than merely roundoff.
Since underflow occurs below the threshold
UfThold = (2.00000000000000000e+000) ^ (-1.#INF0000000000000e+000)
only underflow should afflict the expression
(2.00000000000000000e+000) ^ (-1.#INF0000000000000e+000);
actually calculating yields: 0.00000000000000000e+000 .
This computed value is O.K.

Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065040e+000 as X -> 1.
^C

--- End code ---
Hang on that test

gcc.exe -O3 -ffloat-store -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -

--- Code: ---Diagnosis resumes after milestone Number 220 Page: 10

The number of FLAWs discovered = 1.

The arithmetic diagnosed seems Satisfactory though flawed.
END OF TEST.

--- End code ---

Navigation

[0] Message Index

[#] Next page

Go to full version