I've been playing with this again recently, and thought it might be worth mentioning that to send the 16K of UART boot code, you simply need to write one byte at a time at 19200bps (that is, actually write the bytes at about that speed, as well as configure the port for that speed), msleep(4) between bytes seems to be sufficient.
Also, only CfgBOOTMODE0 needs to be pulled down, my USB-serial cable currently has RTS connected to CfgBOOTMODE0 (pin 9), and DTR connected to Power_En (pin 6), so I can power cycle and choose between UART and NAND boot in software.
EDIT: make that usleep(417), seems I was out a digit :/
EDIT2: or do it properly and call tcdrain, although that's a fair bit slower
Edited by pseudonym404, 21 February 2012 - 08:53 PM.