Could I procedurally map all ASM by condition-ignorant tracing? (A: No.)

Started by Silver Skree, May 15, 2014, 08:04:11 AM

Previous topic - Next topic

Silver Skree

EDIT:
Actually, further discussion brought up things like arbitrary code execution through input and X-Ray climbing out of bounds to run around RAM, meaning we hit the halting problem and the game is no longer a closed system. Especially if we're talking about handling code other people write.

I feel dumb.  :blush:

Still, I'll leave the topic open, with the original post below if anyone is interested by this sort of thing and wants to talk about it.

Silver Skree

I want to say this before I start:
I'm nowhere near ready to write this. Right now I just want to know if this is possible with the model I have in mind.

I need to borrow the brainpower of people that are familiar with Super Metroid's code to some extent, and 65816 / SNES software limits in general.
I wanted to start discussion on this as soon as I thought of it, because the possibility of this seems very tough to determine.

The goal is as follows:
Given the address of the emulation starting point, evaluate all ASM that is ever pointed to (by a jump or either condition of a branch) into individual "data block" objects. The evaluation would determine whether or not the code for a condition branches to a place within the current block, or if it can be treated as the start of a new block. These blocks would have two lists of references to other blocks: ASM blocks that they point to, and ASM blocks that point to them. This would, in theory, allow you to move ASM around and auto-repoint anything that pointed there, as long as the move is within the bounds of every relevant pointer (e.g. no cross-bank moving if every jump into and out of the routine isn't a JSL). If combined with a hex editor in the same program (which I'm planning for), you could treat compiled ASM like putty.

I'm aware of the monstrous complexity of what I'm talking about, but I have a key concern, which is whether or not this is actually possible on any practical level. I've given it a few hours of thought and had a discussion or two about it, all of which I will summarize here:




I do have the advantage of Super Metroid being a fully closed system, so this actually might not be impossible.
I could write a "demi-emulator" which, every time there's a conditional branch, dies and spawns a new instance of itself for every possible outcome of that condition, each with its own current read location and RAM snapshot.
It wouldn't allow endless loops, though, because the ROM-Map object prevents elements from being submitted more than once, and it won't "walk" or "trace" anything it already knows about.
The program would only run one of these demi-emulator objects at a time, to keep the CPU intensity of the process from growing exponentially out of control.
Would this loop-preventing and depth-first model break the tracing? I don't think it would, because I'm following all paths regardless of the actual conditions of any branches.

Honestly, if the pointer itself isn't actually in the RAM, I think it'll work fine.
But is there anywhere in Super Metroid where pointers aren't hardcoded? If the game uses something in RAM as a pointer, or to preform math on a pointer, then the model needs to change.
That might be solved by each demi-emulator instance having its own RAM,though.
I mean, the pointer would have to be created from data in the ROM, right?
Even if there's some code like "If left is pressed, go there, and if not, go here", I'd be ignoring the "If left is pressed" part and just following both pointers anyway.
As long as I properly follow all the ASM instructions and keep the values in RAM updated, I'd be able to derive any pointers created in RAM.
This model would, in theory, have an emulator instance for every possible condition a routine could run under, and thus exhaust all the places it could possibly point to.

[06:57:53] <PY> for every memory access or jump, determine whether the address is hardcoded or dynamic. For every dynamic address, determine the range it can exist within. Anything hardcoded and outside dynamic ranges can be moved, anything within dynamic ranges could plausably be moved if you can prove it's only an offset+n
[06:58:19] <PY> of course, you only need one jump that takes arbitrary data to break the whole thing




So, I'd like the opinions of Metroid Construction's greatest minds. Could this be accomplished with the model described above? If not, what is the factor that makes this impossible?

Also, I'm not looking for responses like "You have no idea what you're getting yourself into" or "This is an unreasonable amount of work, don't do this."
I believe I understand the amount of work this would take...  :pale:
I'm counting on being able to get a lot of help if this is determined to be possible. I'm certainly no SNES or assembly expert, but working on this pet project might change that.

If I've talked over your head, but you have knowledge that is relevant here or you think you could help me in some way, please ask as many things as you need to. I'm happy to explain more about what I'm trying to do in this theory for anyone that doesn't understand some of the terminology here but thinks they could still be of help.