Let us say we were given to count all the occurrences of all the words from a million pages in ONE days time (24 hours). How can we do that? Here is a possible design:
- Pick up a dictionary which has all the words that ever exists. Get the count of number of words – word-count.
- Have a file which has all the links to those million pages which has to be counted.
- Now run the program to count the occurrences of any one word and note the time. Next we need to get number of threads – thread-count. If the time is nearly a day, we plan to create word-count number of threads. If it more than a day, we create enough threads for each word so that the program ends in a day. If it takes less than half a day, we use same thread for two words and so on. In this fashion we get the thread-count. In the mean time note down the memory required too.
- Now look out for a parallel programming system that can support word-count * thread-count number of processes running at the same time. (Like a CUDA system with OMG number of cores assembled together!) Well, yes, we would also know the amount of memory required.
- Every other constraint to be suitably assumed, recorded and added to the process. (Like the master machine carrying out all this, program assigning and calling the threads etc. )