Hello my name is Daniel Hanrahan. Do you think my games should have optional calls to assembly functions for certain CPUs and GPUs in order to reach maximum possible performance and use parts/functions of the CPUs and GPUs that are not standard, but it’s use for them would for tasks that is not perfect for the standard parts of CPUs and GPUs if possible, for example: instead of using the standard parts/functions of the z80 for the randomization you use the refresh register instead. Let me be clear my games have good performance.
Link to my games: https://daniel-hanrahan-tools-and-games.github.io/
This is called compute dispatching and is super common. I have done a whole bunch of DSP implementations where you use a CUDA or AVX kernel depending on availability. Or you dispatch to standard library or even Python kernel otherwise.
Do you think compute dispatching is worth it for my video games.
Do you want to? Go for it.
Does your game crawl? Have you identified this code as the bottleneck? Are you certain that asm will give you a meaningful performance increase, and that your issue doesn’t lie with your approach to the problem? Sure, I guess. You said your game runs fine though, so this probably doesn’t apply.
Is your game fast already? If you don’t want to do it, don’t.
Writing asm by hand is almost always a waste of time. There are only a few times where it’s actually necessary, and unless you’re writing a bootloader and running your game on bare metal, I can’t imagine why it’d be necessary. But you know your code better than anyone else here, so you should know whether it’s needed or not more than any of us do.
To begin with, you’re apparently targeting the Z80, which I haven’t seen used for games in the wild… probably in my entire life? Maybe an arcade machine I played on once used it, but I can’t think of any other times. If your targets need custom assembly, then you should already know that. We don’t know your targets.
I was just using a feature of the z80 as an example and thank you for your help and if anyone wants to add that functionality in their game to increase performance they can.
I would argue that if your games are already performant on the platforms you care about that you would get diminishing returns. The only reason to experiment with specialist asm would be for your own experience and enrichment which is a perfectly reasonable reason to pursue it.
It’s probably not worth comparing to an OS where even shaving a few cycles off of code that runs all the time on millions of computers across the world would end up with significant impact.
Thanks for the information.
You need to profile your binaries to find out where they spend most of their cpu time and try and optimise those areas with more efficient code before you even consider micro optimisations like asm for specific cpus. Considerations like algorithm choice and cache efficiency of your data will all likely have a larger effect.
Thank you but I already made sure that my games was as efficient as possible and I was asking if I can make my games more performant by adding optional assembly functions to use non standard parts of the cpu and gpu to make sure that my games are at maximum performance, because linux does something similar.
“have optional calls” is not really how this works.
If you’re in an interpreted language, like python, java, c#, you don’t have to do anything, because they compile for the architecture they’re running on already, i.e. using whatever CPU features are available.
If you have a compiled language, and your users compile themselves, then they are choosing which CPU features to use, so you don’t have to do anything. If you distribute pre-built binaries, then you simply have to compile it once for each architecture you want to support, and distribute the correct binary to each user (usually done with an installer).
For graphics, your graphics API also already takes care of using system-specific instructions, and shaders are compiled by it before/while running also using system-specific instructions.
So there’s really no “optional” path that you have to specifically put into your program, so nothing like
Func work() If isArm then doArmStuff() Else if isZen4 then doZen4Stuff() ... EndThe reason I ask is because I know linux does something like that to make sure it is at maximum performance.
This won’t meaningfully improve performance, especially for CPU stuff.
Thank you, but are you sure because I know linux does something similar to make sure it is at maximum performance.
On modern CPUs it doesn’t matter that much. And any optimization would have to be updated for each CPU type (Zen/4, Alder lake, etc.) Modern CPUs have insane out of order execution that makes compiler generated code nearly as fast as the most optimized handwritten ones. On older CPUs you’d see more of a performance bump.
Thanks for the information.
99% of the time, memory is the bottleneck. It’s why DMA is so huge of a feature.



