Jump to content


Photo

Advanced Optimization Via Profiling With Gcc4


  • Please log in to reply
29 replies to this topic

#16 notaz

notaz

    Mega GP Mania

  • GP Guru
  • 1737 posts
  • Location:Lithuania

Posted 07 August 2007 - 08:43 PM

Reviving this thread to tell that I am using -fprofile-use and -fprofile-generate for quite some time now. It is giving ~20% improvement to recent uae4all builds, for example.

But there is one thing: it is quite important to do those profiling runs on GP2X itself. It doesn't require to port your program to PC and avoids all those CRC errors. The only problem is that you need Linux box to do it properly. It might be doable using cygwin, though.

Here is the method I am using, step by step:
  1. If you havent already done so, install open2x toolchain to your linux box. You may also need a prebuilt library pack
  2. Make a directory on your Linux box which can be made on your GP2X too. I use "sudo mkdir -p /mnt/sd/tmp", "chown notaz /mnt/sd/tmp"
  3. Copy you whole sourcecode there. Add "-fprofile-generate" option to your compile flags. Make sure it gets passed to both compiling c/cpp files and the linking phase.
  4. Clean old .o files, if there are any and recompile. This should create a bunch of .gcno files along with usual .o ones.
  5. Copy the resulting binary from /mnt/sd/tmp on your PC to /mnt/sd/tmp on GP2X. It's important that the paths match, because full path is now embedded into the binary and will be used for the output profiling data files. You don't need to copy any .gcno files.
  6. Run your newly built program on GP2X. If it's an emulator, load some heavy titles, which cause most slowdowns. I mostly load up 3-4 games and play every of them for a minute or two. Note that your program will run very slow, because it is also collecting profiling data, along with doing what it was meant to (or not smile.gif).
  7. Exit your program. Do not kill it. Oh heck, I almost forgot. Make sure your program doesn't run the menu on exit (my emus have a command line option for this), or else it won't produce needed files. If everything went right, you now should have a bunch of *.gcda files in your GP2X /mnt/sd/tmp directory (and sub-dirs, if your source tree is more complicated).
  8. Copy all those *.gcda files from GP2X /mnt/sd/tmp to your PC's /mnt/sd/tmp. They should be paired with *.gcno files, if you sort them by name.
  9. Open up your Makefile again and change all "-fprofile-generate" to "-fprofile-use". Do not change any other compile options.
  10. Clean your old .o files again (but do not kill .gcno and .gcda)
  11. Recompile
Now you should have your optimized binary. It depends on your program itself how much better it performs. For example, PicoDrive doesn't perform much better with genesis games (as rendering code is asm anyway), but Sega CD scaling/rotation chip code performs much better (maybe 50% improvement?).

#17 Squidge

Squidge

    Mega GP Mania

  • X-treme Team
  • 8498 posts
  • Gender:Male
  • Location:UK

Posted 07 August 2007 - 09:00 PM

thanks notaz, I'll be sure to try this and report back smile.gif


#18 critical

critical

    Mega GP Mania

  • GP Guru
  • 666 posts

Posted 08 August 2007 - 09:21 AM

You're a legend. Cheers dude!

#19 JyCet

JyCet

    GP Mania

  • GP32 Hardcore
  • PipPipPipPipPip
  • 464 posts
  • Location:France

Posted 09 August 2007 - 05:55 PM

QUOTE(notaz @ Aug 7 2007, 08:43 PM) View Post

Reviving this thread to tell that I am using -fprofile-use and -fprofile-generate for quite some time now. It is giving ~20% improvement to recent uae4all builds, for example.

But there is one thing: it is quite important to do those profiling runs on GP2X itself. It doesn't require to port your program to PC and avoids all those CRC errors. The only problem is that you need Linux box to do it properly. It might be doable using cygwin, though.

Here is the method I am using, step by step:
  1. If you havent already done so, install open2x toolchain to your linux box. You may also need a prebuilt library pack
  2. Make a directory on your Linux box which can be made on your GP2X too. I use "sudo mkdir -p /mnt/sd/tmp", "chown notaz /mnt/sd/tmp"
  3. Copy you whole sourcecode there. Add "-fprofile-generate" option to your compile flags. Make sure it gets passed to both compiling c/cpp files and the linking phase.
  4. Clean old .o files, if there are any and recompile. This should create a bunch of .gcno files along with usual .o ones.
  5. Copy the resulting binary from /mnt/sd/tmp on your PC to /mnt/sd/tmp on GP2X. It's important that the paths match, because full path is now embedded into the binary and will be used for the output profiling data files. You don't need to copy any .gcno files.
  6. Run your newly built program on GP2X. If it's an emulator, load some heavy titles, which cause most slowdowns. I mostly load up 3-4 games and play every of them for a minute or two. Note that your program will run very slow, because it is also collecting profiling data, along with doing what it was meant to (or not smile.gif).
  7. Exit your program. Do not kill it. Oh heck, I almost forgot. Make sure your program doesn't run the menu on exit (my emus have a command line option for this), or else it won't produce needed files. If everything went right, you now should have a bunch of *.gcda files in your GP2X /mnt/sd/tmp directory (and sub-dirs, if your source tree is more complicated).
  8. Copy all those *.gcda files from GP2X /mnt/sd/tmp to your PC's /mnt/sd/tmp. They should be paired with *.gcno files, if you sort them by name.
  9. Open up your Makefile again and change all "-fprofile-generate" to "-fprofile-use". Do not change any other compile options.
  10. Clean your old .o files again (but do not kill .gcno and .gcda)
  11. Recompile
Now you should have your optimized binary. It depends on your program itself how much better it performs. For example, PicoDrive doesn't perform much better with genesis games (as rendering code is asm anyway), but Sega CD scaling/rotation chip code performs much better (maybe 50% improvement?).


Thanks for this excellent profiling help !
I've some question
On gp2x do I run the GPE file or main O file ?
If I want to have a GPE file with -fprofile-generate I need to add -lgcov.

Do I need to launch the binary by telnet or can I by a menu ?

Doest it work with static lib ? (I didnt use OPEN2X firmware yet ...) So I need to statically link the library to play with binary

Thanks for your help
wink.gif

Edit1: Sorry I've my answer:
The GPE can by launch by a menu
It work with static library and I need to copy all O files with the GPE wink.gif
Good I progress wink.gif

Edit2: Unfortunately for me I win nothing sad.gif
But It's very good to know how make and use profiling for me now rolleyes.gif
Thank for your explaination Notaz smile.gif

Edited by JyCet, 09 August 2007 - 06:14 PM.


#20 notaz

notaz

    Mega GP Mania

  • GP Guru
  • 1737 posts
  • Location:Lithuania

Posted 09 August 2007 - 07:38 PM

I'm doing all this on standard 2.1.1 firmware (with my SDHC patch, but that shouldn't affect anything), only compiling with open2x toolchain and statically linking the resulting binary (.gpe).
Strange that you need to copy .o files, I never copy them. Never needed to add -lgcov too. I would try not to use static library (just link all .o instead) for the profiling gpe. After all profiling is done and all .o files are rebuilt with -fprofile-use, only then I would make the .a file out of them (by "static library" you meant .a file, didn't you?).

#21 Simon Parzer

Simon Parzer

    GP32 Hardcore

  • Member
  • PipPipPipPip
  • 126 posts

Posted 10 August 2007 - 07:29 AM

Thanks for your tut, notaz. Gained 20-30% through it.

#22 Adventus

Adventus

    GP Mania

  • GP32 Hardcore
  • PipPipPipPipPip
  • 460 posts
  • Gender:Male
  • Location:Canberra, Australia

Posted 29 January 2008 - 04:03 AM

Not sure if this has been posted before or not, but you can get profiling to work with a cross compiling windows dev environment.

Caution: Messing with binaries is not very safe, I do not take any responsibility for your actions.

Heres what i do with my port of hexen 2:
    1. Compile Program with the compiler & linker flag "-fprofile-generate". On my compiler (the one that comes with devkitGP2X) it doesnt seem to like using this flag with goto statements.... you can either rewrite the sources without goto or remove the flag from them. I also needed to add -lgcov as a linker flag.
    2. Using a string search and replace program (notepad is enough) open your binary and replace all instances of the string *your source directory* with a linux style directory on your SD card which is the exact same string length and already exists. For Example:

    I compiled in "C:\2xHexen2 v0.02 Src\hexen2" so i replaced this string with "/mnt/sd/Hexen2ProfilingTemp"

    3. Transfer the binary to anywhere on your sd card and run.
    4. After a sufficently intense run through, exit your program and check the "same length" directory on your SD card. There should be *.gcda files there.
    5. Copy these .gcda files to your compile directory on your PC.
    6. Recompile with "-fprofile-use" compiler and linker flag.
This may also be possible using cygwin but i find this easier, automating it shouldn't be hard either.

Edited by Adventus, 30 January 2008 - 05:07 AM.


#23 pupnik

pupnik

    GP32 Hardcore

  • Member
  • PipPipPipPip
  • 106 posts

Posted 11 March 2008 - 04:11 PM

Hi, i've seen good improvements with uae4all as well. If you get -gcov errors at link-time make sure you're adding -fprofile-generate to your LDFLAGS.

I haven't been able to find answers to some questions though:

When I create a binary with -fprofile-generate for the profiling run, should I include optimization CFLAGS like
-fomit-frame-pointer -fforce-addr -fforce-mem
-falign-loops=2 -falign-functions=2 -falign-jumps=2
-funroll-loops
to the intial build?

Or will gcc decide whether and where to use such optimizations in the -fprofile-use stage, based on the run-time .gcda data? In other words, if I specify for e.g. -funroll-loops in either the 'generate' or 'use' stage, am I forcing gcc to apply a potentially slower flag?

I'm using gcc-3.4 codesourcery - haven't gotten a 4.x running yet.

Thanks - cheers.

#24 slaanesh

slaanesh

    Mega GP Mania

  • GP Guru
  • 1918 posts
  • Gender:Male
  • Location:Melbourne, Australia
  • Interests:GP32, GP2X, Zodiac, PSP, Dingoo, Pandora.

Posted 17 August 2009 - 02:16 AM

Thanks for the great guide to profiling. Just gone through the motions on the A320 Dingoo and all went well. Speed up was about the 20% mark - which is fantastic! My executable is also smaller - interestingly by about 20%. Another bonus for the memory starved Dingoo.

A couple pointers to other people doing this:

The .gcda files are written on the host's filesystem to the same place where the equivalent .o file would have gone.
I've cunningly compiled my code to all go to the one place.

Once compiled with -fprofile-generate, you can see where they will be written to by running something like the following commands:

strings emulator.exe | grep path_to_exe_objects

where "path_to_exec_objects" is the path where you built your executable. As notaz and others have already mentioned this path is always built in to your new profiling executable.

Now that my proof-of-concept run was successful, I will do some more comprehensive profiling.

#25 slaanesh

slaanesh

    Mega GP Mania

  • GP Guru
  • 1918 posts
  • Gender:Male
  • Location:Melbourne, Australia
  • Interests:GP32, GP2X, Zodiac, PSP, Dingoo, Pandora.

Posted 15 September 2009 - 01:21 AM

I have found that later versions of gcc (>= 4.4.0) allow you to specify the path where .gcda files are generated - very handy!

ie. -fprofile-generate=/usr/local/profile

This means you don't have to change your compiling directory or create links or hand-edit your .o files.

#26 barzoule

barzoule

    GP32 User

  • Member
  • PipPipPip
  • 70 posts

Posted 20 November 2011 - 11:40 PM

Hi all,
I'm using devkitGP2X (gcc4.0.2), when I enabe profiling (-lprofile-generate) I get the following error:
SRC/bz_3D.c: In function 'drawtritex16b_FixPoint_split2':
SRC/bz_3D.c:1893: internal compiler error: in int_mode_for_mode, at stor-layout.c:251

If I comment out the function (which isn't used anyway) it says the same thing on the next one.

Only info google gave me is that some other projects had the same error, but it was "only with gcc4.0.x".

When looking for other compilers, the only one I found was the official GP2X dev kit (gcc 3.4.6). I tried compilig with DevCPP's profiling switch (-lgmon -pg). It complained about not finding libgmon (or something like that). So I tried compiling with the -pg switch alone, but then the app crashes on loading. (as a side note, my app is 3 times slower with this compiler)

What compiler are you guys using? Any equivalent available on Win32?

All I want is the timings and calls so I can know where to optimize and/or cut.
thanks.

#27 critical

critical

    Mega GP Mania

  • GP Guru
  • 666 posts

Posted 23 November 2011 - 01:52 PM

What compiler are you guys using? Any equivalent available on Win32?

All I want is the timings and calls so I can know where to optimize and/or cut.
thanks.


As Notaz posted, the Open2x toolchain:
http://wiki.open2x.o...ain#Linux_users

Note the section for Windows users too.

Good luck!

#28 Ziz

Ziz

    GP32 User

  • Member
  • PipPipPip
  • 91 posts

Posted 23 November 2011 - 06:12 PM

Thanks for this great Tutorial, notaz.

Today I tried a bit with the profiles, but it doesn't matter, what I do, the profiler isn't faster than my normal optimization. Sometimes it is even slower (just a few fps, but still slower). But nevertheless thanks a lot, I hope, I am able to use it in another project. In my current game, it is senseless. :-\

#29 barzoule

barzoule

    GP32 User

  • Member
  • PipPipPip
  • 70 posts

Posted 24 November 2011 - 04:40 AM

Critical: thanks I'll give it a try. Somehow I was sure it was Linux-only

Ziz: If you see it as the compiler not being able to optimize better than yourself, that it's a good thing :) Losing perfs when using profiling-aided optimization is scary tho..

#30 Ziz

Ziz

    GP32 User

  • Member
  • PipPipPip
  • 91 posts

Posted 24 November 2011 - 11:14 AM

barzoule: It is not really slower, but always smaller. It is only slower, if I use "static" instead of inline. ;-)