Delete Redundant Blanks
A Discussion using APL

Back to Topics in APL


Contents

Introduction

In this talk I will discuss some functions for "deleting redundant blanks". I will use "standard" (ISO) APL and "extended" APL. This is a useful exercise to show different algorithms for a particular problem, how the same algorithm is expressed in different versions of APL, and performance differences between algorithms.

First, a definition: "deleting redundant blanks" typically refers to a function ...

r„db v

... where v is a character vector with an arbitrary number of blanks, and r is the same vector with leading and trailing blanks removed, and multiple internal blanks replaced by a single blank,

For example,

showblank db ' a b c '

a.b.c

(The function "showblank", which replaces a blank with a period, may be used to highlight the positions of the blanks.)

A function for "deleting redundant blanks" is very useful. The requirement for deleting blanks occurs in many applications, such as token processing and text analysis. (An interesting aside - at least one APL interpreter had this algorithm defined as a quad function! And another - the SmartArrays object software has a character method for the vector class to remove redundant blanks. See the [ SmartArrays Documentation Manual ].)

[ Back to top ]

Discussion

The following APL functions (using the specified methods) will be discussed:

dbloop loop
dbrl rotate left
dbrlo rotate left (optimized)
dbrlw rotate left (words)
dbshiftleft shift left
dbrr rotate right
dbrro rotate right (optimized
dbrrw rotate right (words)
dbshiftright shift right
dbscan scan
dbss substring search

An algorithm for removing redundant blanks can be quite simply stated: take each character in turn, and if the character is blank, and its neighbour to the right (i.e. the character with the next highest index) is blank, do not include the character in the result. Most of the algorithms are variations on this theme.

dbloop

’ r„dbloop v;Œio;b;i;n
[1] © delete blanks using loop
[2] ©.t 1998.6.22.23.2.24
[3] Œio„1
[4] © catenate blank front and back
[5] v„' ',v,' '
[6] b„v=' '
[7] n„(½v)-1
[8] i„0
[9] r„¼0
[10] l10:
[11] i„i+1
[12] …(n<i)/l10end
[13] © if x[i] and x[i+1] are blank, do not keep x[i]
[14] …(b[i]^b[i+1])/l10
[15] r„r,v[i]
[16] …l10
[17] l10end:
[18] © remove blank at front
[19] r„1‡r

dbrl

’ r„dbrl v;b
[1] © delete blanks by rotating left
[2] ©.t 1998.6.22.23.2.23
[3] b„v=' '
[4] r„1‡((1,b) 1²1,b)/' ',v

dbrlo

’ r„dbrlo v;b;bv
[1] © delete blanks by rotating left (optimized)
[2] ©.t 1998.6.24.23.43.35
[3] b„' '=bv„' ',v
[4] r„1‡(b 1²b)/bv

dbrlw

’ r„x Nand y
[1] r„x y


’ r„x Rotate y
[1] r„x²y


’ r„x Drop y
[1] r„x‡y


’ r„dbrlw v;b
[1] © delete blanks by rotating left (words, not glyphs)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„1 Drop((1,b) Nand 1 Rotate 1,b)/' ',v

dbshiftleft

’ r„dbshiftleft v;b
[1] © delete blanks by shifting left (equivalent to rotating left)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„1‡((1,b) b,1)/' ',v

dbrr

’ r„dbrr v;b
[1] © delete blanks by rotating right
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„¯1‡((b,1) ¯1²b,1)/v,' '

dbrro

’ r„dbrro v;b;vb
[1] © delete blanks by rotating right (optimized)
[2] ©.t 1998.6.24.23.48.0
[3] b„' '=vb„v,' '
[4] r„¯1‡(b ¯1²b)/vb

dbrrw

’ r„dbrrw v;b
[1] © delete blanks by rotating right (words, not glyphs)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„¯1 Drop((b,1) Nand ¯1 Rotate b,1)/v,' '

dbshiftright

’ r„dbshiftright v;b
[1] © delete blanks by shifting right (equivalent to rotating right)
[2] ©.t 1998.6.22.23.2.24
[3] b„v=' '
[4] r„¯1‡((b,1) 1,b)/v,' '

dbscan

’ r„dbscan v
[1] © delete blanks using "less than" scan
[2] ©.t 1998.6.22.22.35.13
[3] r„v¬' '
[4] r„(rŸ1²r><\r)/v

dbss

’ r„dbss v
[1] © delete blanks by string search for double blank
[2] ©.t 1998.6.22.23.2.24
[3] v„' ',v,' '
[4] r„1‡¯1‡(~' 'ºv)/v

[ Back to top ]

Examples

’ r„showblank y
[1] © show the positions of blank in vector <y
[2] ©.t 1998.7.12.0.4.45
[3] r„y
[4] r[(r=' ')/¼½r]„'.'

cvector„' a b c '
showblank cvector
.a..b..c...
showblank ''

showblank dbloop cvector
a.b.c
showblank dbloop ''

showblank dbrl cvector
a.b.c
showblank dbrl ''

showblank dbscan cvector
a.b.c
showblank dbscan ''

showblank dbss cvector
a.b.c
showblank dbss ''

[ Back to top ]

Performance

0.00274 y„dbrl cvector
0.00274 y„dbshiftleft cvector
0.00275 y„dbrlo cvector
0.00275 y„dbrro cvector
0.00275 y„dbscan cvector
0.00275 y„dbshiftright cvector
0.00329 y„dbrr cvector
0.00384 y„dbrlw cvector
0.00384 y„dbrrw cvector
0.00550 y„dbss cvector
1.64619 y„dbloop cvector

[ Back to top ]

Concluding Remarks

This concludes the discussion of "deleting redundant blanks", using several algorithms, and several formulations, in various versions of APL. Similarities and differences were noted.

I have always had an interest in the problem of deleting redundant blanks, even though it might be considered an "overly simple" function to analyze in depth. Perhaps because when I first learned APL and saw the array solution, I found it quite mysterious. Since then, it continues to appeal to me for several reasons. there are several subtleties and edge conditions, in spite of its apparent simplicity, it is interesting to note the various approaches for even a simple problem; it can be used to illustrate various aspects of the APL language without the distraction of an otherwise complex algorithm, and it remains a nice example of an "array-oriented" solution.

[ Back to top ]

Acknowledgements and References

Notes from a talk given by Richard Levine to a meeting of the Toronto APL Special Interest Group ... June 23, 1998.
Article written ... 29 July 1998
Revised ... 24 September 2003

Thanks to ... Keith Smillie for his comments on the article; Rex Swain for his contribution of algorithm "dbscan".


[ Back to top ]