There are a few edge cases in the TIA emulation in Stella. I know about all of them, and have a reasonably good idea what's happening, but I haven't had time to do more research and properly fix them. In general, Stella is extremely accurate in all the common cases, and sometimes falls down in the few edge cases:
- 'illegal' MOVES: still not emulated 100%, but much better than pre-3.0 versions (probably most noticeable in Kool Aid Man)
- changing NUSIZx while drawing is happening: works for most ROMs, but isn't correctly emulated completely (we cheat in the source code and use hard-coded values, most noticeable in Meltdown)
- lack of VSYNC doesn't cause rolling: this is one that I hope to fix sooner rather than later, in fact there's preliminary code in the TIA class already
Basically, if you don't do late HMOVEs, don't change NUSIZx while drawing with it, and don't forget to include VSYNC, then Stella is very accurate. If you want to do any of the previously mentioned things, I strongly advise also testing on real hardware.
Note that an error isn't always apparent. Sometimes Stella does things that real hardware wouldn't (ie, it is more lenient with programming errors, making your ROM look like it's working correctly when it isn't). Other times, there's nothing wrong with your code, and the output you get from Stella seems wrong (but is right on real hardware).
In everything but the edge cases, I'm fairly confident in trusting Stella output.
EDIT: As for the palette, I'm not sure this can ever be properly addressed. Even real hardware is inconsistent. You are able to create a custom palette file and have Stella use that instead, but I suspect that even if I were to include it in Stella, someone else would come along and say the colours are wrong. Never The Same Colour indeed
EDIT 2: I forgot to mention, I think the only emulation issues are in the TIA code, not elsewhere. One area of the TIA emulation that only Stella deals with (AFAIK) is the behaviour of 'floating' TIA pins. In this case, there's a commandline argument to introduce random data onto the bus which clearly shows if the ROM has been programmed correctly. Also, Stella is currently the only emulator that properly deals with inadvertent reads from the write port of cartridge RAM space. In most cases, if you do that you'll get back garbage data
and overwrite what was in RAM at that point. That tends to lead to graphical garbage just as you'd see on a real system.