You have rightly observed the biggest cause of runtime performance issue is the many many roundtrips it is doing to the server to fetch the small JS files.
While the modularized design of Dojo is very beneficial at design time (widget extensions, namespacing etc), at runtime, it is expected you optimize the dojo bits - the way to do that is to do a custom build.
Doing a custom build will give you a big performance boost - the hundreds of roundtrips will be reduced to one or 2 and the size of the payload will also dramatically decrease. We have seen a 50x performance improvement with custom build
Custom build will create an optimized, minified JS file that will contain only the code you use in the app.
You can define multiple layers depending on how you want to segregate your application JS files (for example, one single compressed file versus multiple files included in different UIs)
depending on the version of dojo you are using, see:
http://dojotoolkit.org/reference-guide/1.7/build/index.html#build-index
http://dojotoolkit.org/reference-guide/1.7/build/pre17/build.html#build-pre17-build
While it looks daunting at first, sitck with it and you will be able to create an optimized version and see the benefits :)