| Home | Trees | Indices | Help |
|
|---|
|
|
1 7 1435 54 5817 import multiprocessing 18 multiprocessing.freeze_support() 19 20 self.np = multiprocessing.cpu_count() 21 self.pool = multiprocessing.Pool(self.np*2)2261 pool = start() 62 next_i = 0 63 for i,n_i in enumerate(pool.imap(_sqr, xrange(57))): 64 #print i,n_i 65 assert i**2==n_i,"i**2=%d,n_i=%d"%(i**2,n_i) 66 assert i == next_i 67 next_i += 1 68 #return 69 next_i = 0 70 for i,n_i in pool.imap(_sqr, xrange(57), enumerated=True): 71 assert i**2==n_i,"i**2=%d,n_i=%d"%(i**2,n_i) 72 assert i == next_i 73 next_i += 1 74 for i,n_i in pool.imap(_sqr, xrange(57), 75 enumerated=True, ordered=False): 76 #print i,n_i 77 assert i**2==n_i,"i**2=%d,n_i=%d"%(i**2,n_i)78 79 if __name__ == "__main__": test() 80 81 # Design idea: partition kernel. 82 # Use cases: gemm, fem solvers, spin wave relaxation 83 # 84 # Split the problem across multiple nodes. Perform work on each node. 85 # Share values with neighbouring nodes. User needs to supply partition(), 86 # workunit() and share(). Kernel handles transfer and sync. Number 87 # of partitions can be larger than the number of workers. Want to 88 # maximize partition size to minimize boundary transfers. Want to 89 # minimize partition size to maximize load balancing. The former 90 # will dominate. Will work best for dynamic partition schemes, 91 # where computation of a partition may spawn many other partitions 92 # to achieve the desired accuracy. 93 # 94 # Note: we can implement a partition kernel on top of map-reduce 95 # by having the map assign the partitions to the queue and the 96 # reduce exchange data between the partitions. This will help 97 # turn our parallel machine from a batch system to a time-share 98 # system with little idle time on the nodes. 99 # 100 # Note: Data locality: prefer a work unit that uses the same data 101 # as already available on the node. Communication with master could 102 # kill the scheme, as could communication bandwidth. 103 104 # Design idea: dataflow kernel. 105 # Use cases: data reduction 106 # 107 # Inherently serial problems such as audio signal processing (first 108 # do one operation on a signal, then another, then another, ...) 109 # can be improved by putting the operations on separate CPUs, with 110 # different stages running in parallel. The delay will be fixed 111 # by the cost of executing each stage (plus overhead for communication 112 # between stages). The data rate will be limited by the cost of the 113 # most expensive stage. 114 # 115 # This pattern does not maximize system throughput, but it can 116 # help provide the user with live views of the data. Where the 117 # data is not live, the wall clock time for processing the signal 118 # can be reduced to the cost of each stage plus the cost of 119 # pushing the entire signal through the most expensive stage. In 120 # systems with a mix of dataflow and non-dataflow processes and 121 # fixed CPU allocations this can reduce process idle and make 122 # high priority jobs more responsive. 123 # 124 # Setup is pretty simple since it is just a list of stages with 125 # the expected amount of computation for each stage. The kernel 126 # can decide how best to distribute the stages between the cpus. 127 # 128 # Note: the infrastructure needs to pull output and push input. 129 # This allows the executive to control buffering. If the stages 130 # push output, then slow stages will need to take time to buffer 131 # input from fast stages slowing them further. 132 # 133 # Note: Need a scheme for mixing dataflow and map-reduce on a 134 # time-share machine. One strategy is to map workunits to all 135 # nodes so that they can run while the dataflow is blocked waiting 136 # for upstream or downstream processes. 137 # 138 # Note: checkpoint/restore can be done by sending a checkpoint 139 # message through the stream. Each stage will report its state 140 # when the checkpoint goes through. On restore the stage is 141 # prepared with its checkpoint value and the stream is restarted. 142 # This restore will incur another pipeline fill delay, but restores 143 # will be rare so this case does not need to be optimized, and 144 # the simplicity of the outlined strategy will dominate. 145 # 146 # Note: some stages might buffer the entire pipeline before 147 # doing their work. Such stages should be broken into a collection 148 # phase which gathers the data, doing what ever work it can and 149 # a production phase which takes the gathered data and emits it 150 # as it becomes available. The pipeline will be broken between 151 # collection and emission so that the cluster does not have keep 152 # the idle stages alive. 153 154 # Design idea: butterfly kernel. 155 # Use cases: sort, FFT. 156 # 157 # Sort is challenging. It can be seen as a dataflow stage which 158 # buffers the stream into runs, sorts the runs and emits them one 159 # by one. The following stage can merge the runs as they arrive. 160 # If the stream is long enough, another merge stage which works on 161 # the k*runs at a time can be added. That does mean the number of 162 # stages is not fixed in advance, hence the challenge. 163 # 164 # Google may already have a map-reduce based sort. 165 # 166 167 # Design idea: stochastic exchange kernel 168 # 169 # Use cases: genetic algorithms 170 # 171 # Simple implementation based on sequence of map-reduce cycles 172 173 # Design idea: precondition kernel 174 # Use cases: make 175 # 176 # Each work unit has a set of preconditions. When all preconditions are 177 # satisfied the work unit can be run. 178 # 179 # Note: could be used to implement a partition kernel, where the 180 # precondition for cycle n+1 is that the neighbours for cycle n 181 # have been calculated. 182 # 183 # Wants continuous map-reduce where new completion events arriving from 184 # the queue trigger new work units added to the queue. 185 # 186 # Scalability is not a problem if work units are expensive. If work units 187 # are cheap, then may need queue splitting to avoid queue contention. 188 # Reduce is a fundamental limitation. 189
| Home | Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Mar 16 15:04:47 2009 | http://epydoc.sourceforge.net |