How to do profiling with Mac OS X.
The version I'm running is Mac OS X 10.6.1 (Snow Leopard). I have installed the Development toolkit X-code and the typical tools like autoconf, automake, etc. You may also need to run it as root depending on your configuration. Mac OSX provide a great program for profiling on different languages. It's called "Shark".
1. To run it type cmd+space bar and type Shark (Spotlight search is useful).
2. Now that you have Shark opened select "Launch" at the last combo (It list all the running processes, and you could also profile everything, but we are interested only on our app).
3. Now you can click on "Start", and a new dialog box will ask you for the executable path, and the arguments. Set them as you need. You may also set environment vars for debugging or whatever.
4. Now just click Ok and wait to see the results.
Now that the results are generated, you can order them by each of the columns "Self" and "Total" will indicate you the cost of each part, and this will give you the most critical sections. They can be simple code, but maybe it's executed a lot of times. So even appering very simple, if you can improve that part just a bit, you will win on performance!
Now you can check some features of Shark. You can expand in a hierarchical list, the traces of the function calls, and also the self/total cost% will be splitted into each function call, so you may see a grouped call with 24% but expanding it, you may have one call that use 10% another 5% and another 9%. This way you can get a draft in your mind of the execution flow. It is also able to show the profile by threads, and the call stack as "Heavy", "Tree" or both
If you don't have the source code, don't worry. Shark disassemble the application for you, lol!
So now you may think you need to check the code of that part. Shark will also help you on that. Double click on the function call and the code will be displayed automatically. The cpu cost will appear with each important line.
The rest is at your logic. You may need to determine where a loop can be costing a lot, or a performance improvement can be done. You can also have a look at the generated charts. The id of the cpu can be also specfied here.