I recently wrote an application using the threading module in the Python standard library. The application itself was basically attempting to discover Open Reading Frames (ORFs) in a DNA sequence. The application appeared to be mostly CPU bound.
Running the application in a single thread took about 6 seconds for my test data. Running it continuously over 3 threads took about 30 seconds per run! The more threads added, the slower it ran on average. That is actually what I expected because of Python's Global Interpreter Lock (GIL). I decided to look at the multiprocessing module to see if I could get the average run time back down to 6 seconds.
Here's the result running the same data over 3 threads using the threading module versus the same setup but with 3 processes using the multiprocessing module.
Threading DataThread-3 took 22.3910000324 seconds
Thread-1 took 23.2190001011 seconds
Thread-2 took 38.8129999638 seconds
Thread-3 took 24.7969999313 seconds
Thread-1 took 26.375 seconds
Thread-2 took 35.2030000687 seconds
Thread-3 took 30.1089999676 seconds
Thread-1 took 29.375 seconds
Thread-1 took 24.109000206 seconds
Thread-3 took 26.5 seconds
Thread-2 took 36.0160000324 seconds
Thread-1 took 29.390999794 seconds
Thread-3 took 30.6720001698 seconds
Thread-2 took 32.5779998302 seconds
Thread-1 took 31.25 seconds
Thread-3 took 30.8439998627 seconds
Thread-2 took 32.0150001049 seconds
Thread-1 took 30.9220001698 seconds
Thread-3 took 30.6089999676 seconds
Thread-2 took 23.125 secondsAVERAGE = 29.4 secondsMultiprocessing DataOrfDetection-2 took 6.65599989891 secondsOrfDetection-1 took 12.4379999638 secondsOrfDetection-3 took 12.4530000687 secondsOrfDetection-2 took 6.43799996376 secondsOrfDetection-2 took 6.375 secondsOrfDetection-1 took 12.3589999676 secondsOrfDetection-3 took 12.4070000648 secondsOrfDetection-2 took 6.39099979401 secondsOrfDetection-2 took 6.35900020599 secondsOrfDetection-1 took 12.3280000687 secondsOrfDetection-3 took 12.4059998989 secondsOrfDetection-2 took 6.45399999619 secondsOrfDetection-2 took 6.3900001049 secondsOrfDetection-1 took 12.25 secondsOrfDetection-3 took 12.2660000324 secondsOrfDetection-2 took 6.43799996376 secondsOrfDetection-2 took 6.42199993134 secondsOrfDetection-1 took 12.3439998627 secondsOrfDetection-3 took 12.2650001049 secondsOrfDetection-2 took 6.15600013733 secondsAVERAGE = 9.4 secondsBesides the faster average run times, one other difference between the two implementations was that the application using threading tended to run at about 40% of CPU whereas the one using multiprocessing ran at 100% of CPU (each python process took about 33%).
I just now noticed that in the multiprocessing implementation, OrfDetection-2 always took around 6 seconds whereas OrfDetection-1 and OrfDetection-3 always took around 12 seconds or twice as long. Hmmmm. Wonder what that means. I'll have to investigate that further. I expected each to run in around 6 seconds.