UFO:Alien Invasion

Technical support => Windows => Topic started by: Muton on May 22, 2009, 04:51:25 pm

Title: speed up ufo2map.exe over 100% (gcc optimization)
Post by: Muton on May 22, 2009, 04:51:25 pm
gcc doc
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/i386-and-x86_002d64-Options.html
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Optimize-Options.html
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Preprocessor-Options.html


def gcc options are (ufoai\build\projects\ufo.cbp)
Code: [Select]
  <Add option="-Wall" />
  <Add option="-ffloat-store" />
  <Add option="-D__GNUWIN32__" />
  <Add option="-DWINVER=0x501" />
  <Add option="-DNODEBUG" />

optimization for a AMD K8 system
Code: [Select]
  <Add option="-march=k8" />
  <Add option="-O3" />
  <Add option="-msse3" />
  <Add option="-mfpmath=sse" />
  <Add option="-mieee-fp" />
  <Add option="-D__GNUWIN32__" />
  <Add option="-DWINVER=0x501" />
  <Add option="-DNODEBUG" />

The option -ffloat-store
cost most CPU time
If you remove it you can build maps twice as fast

I've done so
and the maps build with and without -ffloat-store are the same (hashsum)
This options should prevent false float calculation (as far as i understood it)
So
if you remove -ffloat-store be shure the resulting map is the same as build with def. gcc options
To do that you must build maps using -t 1 option and hash both maps
md5sum.exe -b V:\MinGW\ufoai\base\maps\bunker.bsp
md5sum is part of C::B package

Core2Duo optimization
Code: [Select]
  <Add option="-march=core2" />
  <Add option="-O3" />
  <Add option="-mssse3" />
  <Add option="-mfpmath=sse" />
  <Add option="-mieee-fp" />
  <Add option="-D__GNUWIN32__" />
  <Add option="-DWINVER=0x501" />
  <Add option="-DNODEBUG" />
Title: Re: speed up ufo2map.exe over 100% (gcc optimization)
Post by: odie on June 10, 2009, 05:19:42 am
Was wondering if anyone is considering implementing this on the current ufo2map.exe yet?? :D
Title: Re: speed up ufo2map.exe over 100% (gcc optimization)
Post by: geever on June 10, 2009, 11:10:10 am
I think the problem is that it's hardware specific..
It may go much faster on a K8 but slower on any other...

-geever
Title: Re: speed up ufo2map.exe over 100% (gcc optimization)
Post by: Mattn on June 10, 2009, 06:31:57 pm
we need that option as the fpu has a different accuracy for intel and amd (not sure about others)
Title: Re: speed up ufo2map.exe over 100% (gcc optimization)
Post by: Muton on June 20, 2009, 08:55:12 am
To check calc errors goto http://gcc.gnu.org/ml/gcc/2004-03/msg01494.html
download http://www.netlib.org/paranoia/paranoia.c

my machine K8 (Windsor)
gcc.exe -O3 -ffloat-store -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse -mfpmath=sse -march=pentium-m V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse3 -mfpmath=sse -march=k8 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -march=k8 V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -ffloat-store -march=k8 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -ffloat-store -march=i386 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -msse -mfpmath=sse -march=pentium4 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse -mfpmath=sse -march=prescott V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -m3dnow -march=athlon V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse -mfpmath=sse -march=athlon-xp V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse2 -mfpmath=sse -march=athlon64 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse3 -mfpmath=sse,387 -march=k8 V:\codeblocks\paranoia.c -lm -> far more errors

Conclusion:
If you compile before pentium-m and athlon64 better use -ffloat-store
If you make use of 387 (classicfpu) you are forced to use -ffloat-store
If you mix up SSE and 387 you are forced to use -ffloat-store
If you use 3dnow you are forced to use -ffloat-store
btw. I had no problem compiling and running maps using -O3 -msse -mfpmath=sse -march=pentium3

Its not a problem to add more target-options (save side) into C::B (done this already with ufo.exe)
but i dont start working on it until i got a green light from a dev.

===============================================


gcc.exe -O3  -msse -mfpmath=sse -march=k8 V:\codeblocks\paranoia.c -lm
run a.exe
Code: [Select]
....
To continue, press RETURN

Diagnosis resumes after milestone Number 220          Page: 10



No failures, defects nor flaws have been discovered.
Rounding appears to conform to the proposed IEEE standard P754.
The arithmetic diagnosed appears to be Excellent!
END OF TEST.

gcc.exe -O3 -march=k8 V:\codeblocks\paranoia.c -lm
a.exe
Code: [Select]
....
Diagnosis resumes after milestone Number 120          Page: 10


The Underflow threshold is 0.00000000000000000e+000,  below which
calculation may suffer larger Relative error than merely roundoff.
Since underflow occurs below the threshold
UfThold = (2.00000000000000000e+000) ^ (-1.#INF0000000000000e+000)
only underflow should afflict the expression
        (2.00000000000000000e+000) ^ (-1.#INF0000000000000e+000);
actually calculating yields: 0.00000000000000000e+000 .
This computed value is O.K.

Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065040e+000 as X -> 1.
^C
Hang on that test

gcc.exe -O3 -ffloat-store -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -
Code: [Select]
Diagnosis resumes after milestone Number 220          Page: 10


The number of  FLAWs  discovered =           1.

The arithmetic diagnosed seems Satisfactory though flawed.
END OF TEST.

Title: Re: speed up ufo2map.exe over 100% (gcc optimization)
Post by: Muton on December 26, 2009, 07:19:43 pm
currently most fastest:
bunker.map
standard: 1077 sec
standard -O3 -fno-strict-aliasing -march=k8-sse3: 754sec
full optimized (look down): 445 sec

build\projects\ufo2map.cbp
replace
Code: [Select]
<Add option="-ffloat-store" />
with this
Code: [Select]
<Add option="-march=...." />
<Add option="-m....." />
<Add option="-O1" />
<Add option="-fthread-jumps" />
<Add option="-falign-functions" />
<Add option="-falign-jumps" />
<Add option="-falign-loops" />
<Add option="-falign-labels" />
<Add option="-fcaller-saves" />
<Add option="-fcrossjumping" />
<Add option="-fcse-skip-blocks" />
<Add option="-fdelete-null-pointer-checks" />
<Add option="-fexpensive-optimizations" />
<Add option="-fgcse-lm" />
<Add option="-foptimize-sibling-calls" />
<Add option="-fpeephole2" />
<Add option="-fregmove" />
<Add option="-freorder-blocks" />
<Add option="-freorder-functions" />
<Add option="-frerun-cse-after-loop" />
<Add option="-fsched-interblock" />
<Add option="-fsched-spec" />
<Add option="-fschedule-insns2" />
<Add option="-fno-strict-aliasing" />
<Add option="-fstrict-overflow" />
<Add option="-ftree-pre" />
<Add option="-ftree-vrp" />
<Add option="-finline-functions" />
<Add option="-funswitch-loops" />
<Add option="-fpredictive-commoning" />
<Add option="-fgcse-after-reload" />
<Add option="-ftree-vectorize" />
<Add option="-mfpmath=sse" />
<Add option="-mieee-fp" />
(-O2 and -O3 errors ufo2map out if SSE is used and more than one threat is used )
(-O2 and -O3 need -fno-strict-aliasing if gcc 4.4.0 is used)
watch the 1st and 2nd value

Easy way:
-march=native
-msse


Hard way:

cpuz (http://www.cpuid.com/cpuz.php) will tell you what type of cpu and instruction set it does support
pentium3 <- SSE
pentium-m <- SSE2
pentium4 <- SSE2
prescott <- SSE3
core2 <- SSE3 - SSE4.2
athlon-4, athlon-xp, athlon-mp <- SSE (Socket A)
athlon64 <- SSE2
k8-sse3 <- SSE3
amdfam10, barcelona <- SSE4A

-msse
-msse2
-msse3
-mssse3
-msse4a
-msse4.1
-msse4.2

For an Intel Core 2 Duo E6400
Code: [Select]
<Add option="-march=core2" />
<Add option="-mssse3" />
<Add option="--param l1-cache-line-size=32" />
<Add option="--param l1-cache-size=32" />
<Add option="--param l2-cache-size=2048" />

Code: [Select]
<Add option="-fcse-follow-jumps" /> errors ufo2map out
<Add option="-fgcse" /> errors ufo2map out
<Add option="-fschedule-insns" />  prevent successful compile
<Add option="-fstrict-aliasing" /> prevent successful compile replaced by <Add option="-fno-strict-aliasing" />