project-navigation
Personal tools

Author Topic: speed up ufo2map.exe over 100% (gcc optimization)  (Read 6567 times)

Offline Muton

  • Sergeant
  • *****
  • Posts: 496
    • View Profile
speed up ufo2map.exe over 100% (gcc optimization)
« on: May 22, 2009, 04:51:25 pm »
gcc doc
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/i386-and-x86_002d64-Options.html
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Optimize-Options.html
http://gcc.gnu.org/onlinedocs/gcc-4.3.3/gcc/Preprocessor-Options.html


def gcc options are (ufoai\build\projects\ufo.cbp)
Code: [Select]
  <Add option="-Wall" />
  <Add option="-ffloat-store" />
  <Add option="-D__GNUWIN32__" />
  <Add option="-DWINVER=0x501" />
  <Add option="-DNODEBUG" />

optimization for a AMD K8 system
Code: [Select]
  <Add option="-march=k8" />
  <Add option="-O3" />
  <Add option="-msse3" />
  <Add option="-mfpmath=sse" />
  <Add option="-mieee-fp" />
  <Add option="-D__GNUWIN32__" />
  <Add option="-DWINVER=0x501" />
  <Add option="-DNODEBUG" />

The option -ffloat-store
cost most CPU time
If you remove it you can build maps twice as fast

I've done so
and the maps build with and without -ffloat-store are the same (hashsum)
This options should prevent false float calculation (as far as i understood it)
So
if you remove -ffloat-store be shure the resulting map is the same as build with def. gcc options
To do that you must build maps using -t 1 option and hash both maps
md5sum.exe -b V:\MinGW\ufoai\base\maps\bunker.bsp
md5sum is part of C::B package

Core2Duo optimization
Code: [Select]
  <Add option="-march=core2" />
  <Add option="-O3" />
  <Add option="-mssse3" />
  <Add option="-mfpmath=sse" />
  <Add option="-mieee-fp" />
  <Add option="-D__GNUWIN32__" />
  <Add option="-DWINVER=0x501" />
  <Add option="-DNODEBUG" />

odie

  • Guest
Re: speed up ufo2map.exe over 100% (gcc optimization)
« Reply #1 on: June 10, 2009, 05:19:42 am »
Was wondering if anyone is considering implementing this on the current ufo2map.exe yet?? :D

Offline geever

  • Project Coder
  • PHALANX Commander
  • ***
  • Posts: 2561
    • View Profile
Re: speed up ufo2map.exe over 100% (gcc optimization)
« Reply #2 on: June 10, 2009, 11:10:10 am »
I think the problem is that it's hardware specific..
It may go much faster on a K8 but slower on any other...

-geever

Offline Mattn

  • Administrator
  • PHALANX Commander
  • *****
  • Posts: 4831
  • https://github.com/mgerhardy/vengi
    • View Profile
    • Vengi Voxel Tools
Re: speed up ufo2map.exe over 100% (gcc optimization)
« Reply #3 on: June 10, 2009, 06:31:57 pm »
we need that option as the fpu has a different accuracy for intel and amd (not sure about others)

Offline Muton

  • Sergeant
  • *****
  • Posts: 496
    • View Profile
Re: speed up ufo2map.exe over 100% (gcc optimization)
« Reply #4 on: June 20, 2009, 08:55:12 am »
To check calc errors goto http://gcc.gnu.org/ml/gcc/2004-03/msg01494.html
download http://www.netlib.org/paranoia/paranoia.c

my machine K8 (Windsor)
gcc.exe -O3 -ffloat-store -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse -mfpmath=sse -march=pentium-m V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse3 -mfpmath=sse -march=k8 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -march=k8 V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -ffloat-store -march=k8 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -ffloat-store -march=i386 V:\codeblocks\paranoia.c -lm -> error
gcc.exe -O3 -msse -mfpmath=sse -march=pentium4 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse -mfpmath=sse -march=prescott V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -m3dnow -march=athlon V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse -mfpmath=sse -march=athlon-xp V:\codeblocks\paranoia.c -lm -> program hang
gcc.exe -O3 -msse2 -mfpmath=sse -march=athlon64 V:\codeblocks\paranoia.c -lm -> no error
gcc.exe -O3 -msse3 -mfpmath=sse,387 -march=k8 V:\codeblocks\paranoia.c -lm -> far more errors

Conclusion:
If you compile before pentium-m and athlon64 better use -ffloat-store
If you make use of 387 (classicfpu) you are forced to use -ffloat-store
If you mix up SSE and 387 you are forced to use -ffloat-store
If you use 3dnow you are forced to use -ffloat-store
btw. I had no problem compiling and running maps using -O3 -msse -mfpmath=sse -march=pentium3

Its not a problem to add more target-options (save side) into C::B (done this already with ufo.exe)
but i dont start working on it until i got a green light from a dev.

===============================================


gcc.exe -O3  -msse -mfpmath=sse -march=k8 V:\codeblocks\paranoia.c -lm
run a.exe
Code: [Select]
....
To continue, press RETURN

Diagnosis resumes after milestone Number 220          Page: 10



No failures, defects nor flaws have been discovered.
Rounding appears to conform to the proposed IEEE standard P754.
The arithmetic diagnosed appears to be Excellent!
END OF TEST.

gcc.exe -O3 -march=k8 V:\codeblocks\paranoia.c -lm
a.exe
Code: [Select]
....
Diagnosis resumes after milestone Number 120          Page: 10


The Underflow threshold is 0.00000000000000000e+000,  below which
calculation may suffer larger Relative error than merely roundoff.
Since underflow occurs below the threshold
UfThold = (2.00000000000000000e+000) ^ (-1.#INF0000000000000e+000)
only underflow should afflict the expression
        (2.00000000000000000e+000) ^ (-1.#INF0000000000000e+000);
actually calculating yields: 0.00000000000000000e+000 .
This computed value is O.K.

Testing X^((X + 1) / (X - 1)) vs. exp(2) = 7.38905609893065040e+000 as X -> 1.
^C
Hang on that test

gcc.exe -O3 -ffloat-store -msse -mfpmath=sse -march=pentium3 V:\codeblocks\paranoia.c -
Code: [Select]
Diagnosis resumes after milestone Number 220          Page: 10


The number of  FLAWs  discovered =           1.

The arithmetic diagnosed seems Satisfactory though flawed.
END OF TEST.


Offline Muton

  • Sergeant
  • *****
  • Posts: 496
    • View Profile
Re: speed up ufo2map.exe over 100% (gcc optimization)
« Reply #5 on: December 26, 2009, 07:19:43 pm »
currently most fastest:
bunker.map
standard: 1077 sec
standard -O3 -fno-strict-aliasing -march=k8-sse3: 754sec
full optimized (look down): 445 sec

build\projects\ufo2map.cbp
replace
Code: [Select]
<Add option="-ffloat-store" />
with this
Code: [Select]
<Add option="-march=...." />
<Add option="-m....." />
<Add option="-O1" />
<Add option="-fthread-jumps" />
<Add option="-falign-functions" />
<Add option="-falign-jumps" />
<Add option="-falign-loops" />
<Add option="-falign-labels" />
<Add option="-fcaller-saves" />
<Add option="-fcrossjumping" />
<Add option="-fcse-skip-blocks" />
<Add option="-fdelete-null-pointer-checks" />
<Add option="-fexpensive-optimizations" />
<Add option="-fgcse-lm" />
<Add option="-foptimize-sibling-calls" />
<Add option="-fpeephole2" />
<Add option="-fregmove" />
<Add option="-freorder-blocks" />
<Add option="-freorder-functions" />
<Add option="-frerun-cse-after-loop" />
<Add option="-fsched-interblock" />
<Add option="-fsched-spec" />
<Add option="-fschedule-insns2" />
<Add option="-fno-strict-aliasing" />
<Add option="-fstrict-overflow" />
<Add option="-ftree-pre" />
<Add option="-ftree-vrp" />
<Add option="-finline-functions" />
<Add option="-funswitch-loops" />
<Add option="-fpredictive-commoning" />
<Add option="-fgcse-after-reload" />
<Add option="-ftree-vectorize" />
<Add option="-mfpmath=sse" />
<Add option="-mieee-fp" />
(-O2 and -O3 errors ufo2map out if SSE is used and more than one threat is used )
(-O2 and -O3 need -fno-strict-aliasing if gcc 4.4.0 is used)
watch the 1st and 2nd value

Easy way:
-march=native
-msse


Hard way:

cpuz will tell you what type of cpu and instruction set it does support
pentium3 <- SSE
pentium-m <- SSE2
pentium4 <- SSE2
prescott <- SSE3
core2 <- SSE3 - SSE4.2
athlon-4, athlon-xp, athlon-mp <- SSE (Socket A)
athlon64 <- SSE2
k8-sse3 <- SSE3
amdfam10, barcelona <- SSE4A

-msse
-msse2
-msse3
-mssse3
-msse4a
-msse4.1
-msse4.2

For an Intel Core 2 Duo E6400
Code: [Select]
<Add option="-march=core2" />
<Add option="-mssse3" />
<Add option="--param l1-cache-line-size=32" />
<Add option="--param l1-cache-size=32" />
<Add option="--param l2-cache-size=2048" />

Code: [Select]
<Add option="-fcse-follow-jumps" /> errors ufo2map out
<Add option="-fgcse" /> errors ufo2map out
<Add option="-fschedule-insns" />  prevent successful compile
<Add option="-fstrict-aliasing" /> prevent successful compile replaced by <Add option="-fno-strict-aliasing" />