Delete Redundant
Blanks
A Discussion using APL
In this talk I will discuss some functions for "deleting redundant blanks". I will use "standard" (ISO) APL and "extended" APL. This is a useful exercise to show different algorithms for a particular problem, how the same algorithm is expressed in different versions of APL, and performance differences between algorithms.
First, a definition: "deleting redundant blanks" typically refers to a function ...
r„db v
... where v is a character vector with an arbitrary number of blanks, and r is the same vector with leading and trailing blanks removed, and multiple internal blanks replaced by a single blank,
For example,
showblank db ' a b c '
a.b.c
(The function "showblank", which replaces a blank with a period, may be used to highlight the positions of the blanks.)
A function for "deleting redundant blanks" is very useful. The requirement for deleting blanks occurs in many applications, such as token processing and text analysis. (An interesting aside - at least one APL interpreter had this algorithm defined as a quad function! And another - the SmartArrays object software has a character method for the vector class to remove redundant blanks. See the [ SmartArrays Documentation Manual ].)
[ Back to top ]
The following APL functions (using the specified methods) will be discussed:
dbloop | loop |
dbrl | rotate left |
dbrlo | rotate left (optimized) |
dbrlw | rotate left (words) |
dbshiftleft | shift left |
dbrr | rotate right |
dbrro | rotate right (optimized |
dbrrw | rotate right (words) |
dbshiftright | shift right |
dbscan | scan |
dbss | substring search |
An algorithm for removing redundant blanks can be quite simply stated: take each character in turn, and if the character is blank, and its neighbour to the right (i.e. the character with the next highest index) is blank, do not include the character in the result. Most of the algorithms are variations on this theme.
dbloop
’ r„dbloop v;Œio;b;i;n
[1] © delete blanks using loop
[2] ©.t 1998.6.22.23.2.24
[3] Œio„1
[4] © catenate blank front and back
[5] v„' ',v,' '
[6] b„v=' '
[7] n„(½v)-1
[8] i„0
[9] r„¼0
[10] l10:
[11] i„i+1
[12] …(n<i)/l10end
[13] © if x[i] and x[i+1] are blank, do not keep x[i]
[14] …(b[i]^b[i+1])/l10
[15] r„r,v[i]
[16] …l10
[17] l10end:
[18] © remove blank at front
[19] r„1‡r
’
dbrl
’ r„dbrl v;b
[1] © delete blanks by rotating left
[2] ©.t 1998.6.22.23.2.23
[3] b„v=' '
[4] r„1‡((1,b) 1²1,b)/' ',v
’
dbrlo
’ r„dbrlo v;b;bv
[1] © delete blanks by rotating left (optimized)
[2] ©.t 1998.6.24.23.43.35
[3] b„' '=bv„' ',v
[4] r„1‡(b 1²b)/bv
’
dbrlw
’ r„x Nand y
[1] r„x y
’
’ r„x Rotate y
[1] r„x²y
’
’ r„x Drop y
[1] r„x‡y
’
’ r„dbrlw v;b
[1] © delete blanks by rotating left (words, not glyphs)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„1 Drop((1,b) Nand 1 Rotate 1,b)/' ',v
’
dbshiftleft
’ r„dbshiftleft v;b
[1] © delete blanks by shifting left (equivalent to rotating left)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„1‡((1,b) b,1)/' ',v
’
dbrr
’ r„dbrr v;b
[1] © delete blanks by rotating right
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„¯1‡((b,1) ¯1²b,1)/v,' '
’
dbrro
’ r„dbrro v;b;vb
[1] © delete blanks by rotating right (optimized)
[2] ©.t 1998.6.24.23.48.0
[3] b„' '=vb„v,' '
[4] r„¯1‡(b ¯1²b)/vb
’
dbrrw
’ r„dbrrw v;b
[1] © delete blanks by rotating right (words, not glyphs)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„¯1 Drop((b,1) Nand ¯1 Rotate b,1)/v,' '
’
dbshiftright
’ r„dbshiftright v;b
[1] © delete blanks by shifting right (equivalent to rotating right)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„¯1‡((b,1) 1,b)/v,' '
’
dbscan
’ r„dbscan v
[1] © delete blanks using "less than" scan
[2] ©.t 1998.6.22.22.35.13
[3] r„v¬' '
[4] r„(rŸ1²r><\r)/v
’
dbss
’ r„dbss v
[1] © delete blanks by string search for double blank
[2] ©.t 1998.6.22.23.2.24
[3] v„' ',v,' '
[4] r„1‡¯1‡(~' 'ºv)/v
’
[ Back to top ]
’ r„showblank y
[1] © show the positions of blank in vector <y
[2] ©.t 1998.7.12.0.4.45
[3] r„y
[4] r[(r=' ')/¼½r]„'.'
’
cvector„' a b c '
showblank cvector
.a..b..c...
showblank ''
showblank dbloop cvector
a.b.c
showblank dbloop ''
showblank dbrl cvector
a.b.c
showblank dbrl ''
showblank dbscan cvector
a.b.c
showblank dbscan ''
showblank dbss cvector
a.b.c
showblank dbss ''
[ Back to top ]
0.00274 y„dbrl cvector
0.00274 y„dbshiftleft cvector
0.00275 y„dbrlo cvector
0.00275 y„dbrro cvector
0.00275 y„dbscan cvector
0.00275 y„dbshiftright cvector
0.00329 y„dbrr cvector
0.00384 y„dbrlw cvector
0.00384 y„dbrrw cvector
0.00550 y„dbss cvector
1.64619 y„dbloop cvector
[ Back to top ]
This concludes the discussion of "deleting redundant blanks", using several algorithms, and several formulations, in various versions of APL. Similarities and differences were noted.
I have always had an interest in the problem of deleting redundant blanks, even though it might be considered an "overly simple" function to analyze in depth. Perhaps because when I first learned APL and saw the array solution, I found it quite mysterious. Since then, it continues to appeal to me for several reasons. there are several subtleties and edge conditions, in spite of its apparent simplicity, it is interesting to note the various approaches for even a simple problem; it can be used to illustrate various aspects of the APL language without the distraction of an otherwise complex algorithm, and it remains a nice example of an "array-oriented" solution.
[ Back to top ]
Notes from a talk given by Richard Levine to a meeting of the Toronto APL
Special Interest Group ... June 23, 1998.
Article written ... 29 July 1998
Revised ... 24 September 2003
Thanks to ... Keith Smillie for his comments on the article; Rex Swain for his contribution of algorithm "dbscan".
[ Back to top ]