Get yourself a copy of altium to evaluate, its fpga dev tools are far more user friendly then the webpack by itself, and they give away a 30 day trial on their website. Also they support c-to-hardware compiling, that might get you round the limitations you are facing with using a micro for this job.

personally I would go for an fpga card with a PCI interface on it, as that way you could theoretically add support for EMC at a later date, whilst a USB solution would be problematic for that due to rt-linux support.

EDIT: personally this kind of approach doesn't appeal to me, as i am looking to move towards having drives and IO on a standard fieldbus like EtherCat, and a small embedded PC for machine control. The benefit of this is you don't have to bit bang steps out of the printer port, so your timing becomes less problematic, as you are commanding the drive setpoints directly over ethernet. I have heard of people getting 20khz position update rates over ethercat, which is more then enough for any cnc machine. You can get off the shelf interface ICs that speak ethercat, and the host software is free, with an open source implementation available.