| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Cellular automata are finite deterministic machines that can model many complex systems. They have been proved to be successful in serving as an alternative framework for mathematical descriptions of physical systems. With the inspiration from the success of modeling physical systems we would as well like to provide cellular framework for computational systems also. It should be noted that this is preliminary overview of a much generic framework which is still in development and is subjected to change as improvements are adapted - and we gladly welcome constructive criticism, if any. Relevant source code can be found at [1]. What is Cellection Sort ?Cellection Sort is the result of the application of Cellection Framework for sorting problems. Cellection framework is a way of programming the programmable matter. Just the way one solves complex functions with the mechanisms of structured programming methods, cellection framework provides mechanisms for solving the problems using programmable matter methods. What distinguishes the programmable matter is the availability of a massive amount of fine-grained parallelism [3]. Cellection Sort, then, is the result of the application of programmable matter methods for sorting problems. What distinguishes this from other sorting mechanisms is the presence of data-independence along with the more important property - learning. This, and all other adaptations of Cellection Framework in general, use data-independent operations in their workings, and can "by heart" the computed results with the ability to recall and use them at any later point with full precision. So, what follows here?In the following, we examine a way of solving the sorting problem using data-independent operations. By data-independent we mean, the path of execution remains the same irrespective of the nature of the data the algorithm is working upon. The importance of data-independence need not be stressed twice, considering the fact that even the best algorithms turn out to be worst performers with mismatched data. Quick sort is one of the best examples - while it proves to be optimal on average, it performs worst for inversely sorted data (if not taken enough care in pivot selection). This is true for all data-dependent algorithms - they perform well only when they are presented with the data having the properties they are expecting. Any slight variation in the nature of data leads to large variations in the path of execution. This property of data-sensitivity poses a critical restriction on the applicability of algorithms. While we have many algorithms for solving the problem of sorting, not all of them are independent of data-sensitivity and so cannot be used as general purpose sorters. No matter how many times we have answered it before, we always encounter the problem once more- which algorithm should I use to sort this data? Good, what is the solution then?Cellection sort, just like cellection framework, operates on the relations among the data rather than the data itself. It accepts a relation table as input and produces an indexed list as output. For a problem of sorting n-elements, the relation table would be a two-dimensional array of size n x n, with the produced output list being a single dimensional array of size n. A cell [i,j] in the input table indicates the relation between ith and jth data elements, and could be written as relation( i, j ); It should be noted that the table need not be symmetric. That is, relation( i, j ) need not be equal to relation( j, i ). Typically the values of these relations would be {=,>,<,#}. Since the input data elements are treated as numbers (in case of strings, the individual characters can be treated as equivalent ASCII-numbers), the relation between any two data elements should be from one of these four mentioned relations. In fact, the relation set {=,>,<} should suffice. However, we use the fourth relation {#} to indicate the don't care condition, the significance of which would be made clear as we progress further. Equipped with these four relations, we now could turn any data-dependent sorting algorithm into data-independent one. In the following, we use the odd-even transportation sorting as the basis to prove this. In fact, any other sorting algorithm can also be used as the base to achieve the corresponding data-independent algorithm in similar fashion. Here our purpose in choosing the odd-even transportation sorting is - it is easier to demonstrate and suits best for explanation.
Figure1: Odd-even transportation sorting network (n=6) The conventional odd-even transportation sort works in phases. In even phases, elements at even positions compare their values with their next neighbors and exchange with them if necessary. In odd phases, elements at odd positions would take up the onus of comparing their values with their next neighbors and exchanging with them if appropriate. A total of n phases would require to sort the data of n elements. Figure1 presents an example with n=6. The algorithm for this conventional method would look like below. void Odd-EvenTransportationSort(int nArray[], int nCount) { for(int i = 0; i < nCount; i++) for(int j = i%2; j < nCount-1; j+=2) if(nArray[j] > nArray[j+1]) Swap(nArray+j, nArray+j+1); } As per this above code snippet, it is clear that the algorithm directly operates on the input data elements supplied in nArray[]. Depending on even phase or odd phase, determined by checking if i is even or odd, the elements nArray[j] and nArray[j+1] are compared and exchanged as necessary. Controlled by i, this process is carried out as many times as the number of elements in the array, given by nCount. The modifications that we make to this conventional method are as follows. First we create a pool of relation tables and corresponding sorted results for a reasonable value of n, and would use that pool of pre-computed values as the basis for new computations. Towards this extent we also avoid operating directly on the input data elements. Instead we build a corresponding relation table for the given data elements and use it to index into the pre-computed pool of relation table - sorted result pairs. Typically the values obtained from such indexing would once again need to be de-indexed into data elements before they could be actually used afterwards. The process of pre-computation is as follows. For a data consisting of n elements, the relation table generated would be of size n x n. Now, recall that each cell in the table represents a relation between the actual data elements and so can take any of the four values {=,>,<,#}. With two bits to represent the four relations {00 =,01 >,10 <,11 #} a table with n x n cells requires a storage of 22 x n x n units. However, we could reduce this by understanding that the lower part of the table (including the main diagonal) can be ignored without loosing any information. The remaining upper part then would be of size n x (n-1)/2; which requires a storage of 2n x (n-1) units. For example, when n=4, the size of the relation table would be n x n = 16, while the size of the upper triangle would be 4 x 3 / 2 = 6; This means, the table of 16 cells (requiring 232 units) can be equivalently represented with just 6 cells (requiring 212 units) without loosing any information. The following presents a relation table for some pseudo data elements { 56, 23, 45, 11, 78 }. Observe that the grayed region could be discarded without loosing any information.
Thus, for a data with n elements we can have utmost 2n x (n-1)/2 different relation tables. Each relation table would lead to one sorted result. Typically the sorted result would be a list of indices pointing to the actual data elements in the sorted order. For example, the above table would produce a sorted result = {3,1,2,0,4}, which when de-indexed gives us {11,23,45,56,78}; Note that we have used 0-based indexing for the sorted result. Being purely based on the indices, these sorted results are independent of the actual data elements they are sorting, and hence can be used to evaluate the sorted output of any n data elements satisfying the underlying relation (i.e.. they should produce the same relation table). Consider a pre-computation involving n=4 data elements. Among the 16 cells of the n x n relation table, we just need the upper 6 cells. By varying the relation in each cell we could get different sorted result. In total we can have utmost 26 sorted results. We could actually find these sorted results by taking any hypothetical 4-element data and applying any existing sorting algorithm such as insertion or selection sort. These sorted results would look like, ........ By representing each relational operator with two bits, as mentioned before, the above would look like the following. ........ Now, with these results in place, given any 4-element data we could give the instantaneous sorted order for them just with a single lookup into this SortedResult[] array. For example, for data = {28,13,22,-8} the sorted result could be produced instantaneously by using the lookup SortedResult[010101100101] = {3,1,2,0}; which when de-indexed gives us {-8,13,22,28}, which is the required output (Note that, the data in question produces 010101100101 in its relation table, which we have used as the index into the SortedResult[] array). Similarly any 4-element data can be sorted instantaneously with single array lookup. Reason behind choosing 4-data elements for pre-computation is - memory limitation. With 4 elements the SortedResult[] need to hold 212 listed indices, with each list composed of 4 indices, this amounts to 4096 x 4 x 2 = 32768 bits. With n=5 this would become 220 x 5 x 3 bits. As we increase the n value further the storage requirements would go beyond obvious limits. For most practical purposes, n=4 should suffice. However, when the size of data elements is less than 4, we could use dummy values to make it up 4. For these dummy values, none of the three relations {=,>,<} are valid. We need to use the fourth relation {#}, the don't care relation for them. In such cases these don't care dummy values would not be treated for sorting, and hence would be passed onto output as they are, without changing their positions. This facilitates us in discarding the dummy values easily when we need to extract the actual data from the produced output. On the other hand, when we need to sort more than 4-elements, we could achieve that by sorting 4-elements at a time and merging the results. The Odd-even transportation sort when modified to bring this effect would look like the following (with n =8).
Figure2: Modified Odd-even transportation sorting network (n=8) The modified algorithm would now look like the following: void CellectionSort(int nArray[],int nCount,SortedResult Results[]) { int nSortedOutput[4]; for(int i=0; i<nCount; i++) for(int j=i%4; j<nCount-3; j+=4) { int nSortedResultIndex = RelationTable(nArray[j],nArray[j+1], nArray[j+2],nArray[j+3]); nSortedOutput[0] = nArray[j+Results[nSortedResultIndex].Index[0]]; nSortedOutput[1] = nArray[j+Results[nSortedResultIndex].Index[1]]; nSortedOutput[2] = nArray[j+Results[nSortedResultIndex].Index[2]]; nSortedOutput[3] = nArray[j+Results[nSortedResultIndex].Index[3]]; nArray[j] = nSortedOutput[0]; nArray[j+1] = nSortedOutput[1]; nArray[j+2] = nSortedOutput[2]; nArray[j+3] = nSortedOutput[3]; } } From the above code it is clear that the algorithm depends highly on the relation table in its computations. It uses the relation table entries as the index into the pre-computed SortedResult[] array. A block of 4 elements are sorted in unit time. To Sort n elements in this method we require O(n2) time. Then Why Cellection Sort?At this point the typical questions that one might be thinking are: What is the significance of all this? How is this better than other sorting algorithms, if at all better? The obvious answers, among others, are:
With this we conclude this article by inviting the problem solvers to explore this framework as part of their problem solving mechanisms. We hope we could soon realize many yet not discovered potentials of automated systems along with the new paths of utilizing them in building more advanced mechanisms of computations. Hmmm...Anything else?
By P.GopalaKrishna http://www.geocities.com/krishnapg
Submitted: 02/03/2004 Article content copyright © P. GopalaKrishna, 2004.
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
All content copyright © 1998-2007, Generation5 unless otherwise noted.
- Privacy Policy - Legal - Terms of Use -