Delete Redundant Blanks (using J)

Back to Topics in J


Contents

Introduction

In this talk I will discuss some functions (programs) in the J programming language for "deleting redundant blanks". This is a useful exercise to show different algorithms for a particular problem, and performance differences between algorithms. It will also no doubt be interesting to those of you in this APL audience who are eager to discover what J is all about - I hope you are not disappointed! [Editor's note - unedited remarks from the original talk - this was delivered back in 1998]

First, a definition: "deleting redundant blanks" typically refers to a function ...

r =: db v

... where v is a character vector with an arbitrary number of blanks, and r is the same vector with leading and trailing blanks removed, and multiple contiguous internal blanks replaced by a single blank,

For example,

db ' a b c '

a b c

showblank db ' a b c '

a.b.c

(The function "showblank", which replaces a blank with a period, is used to highlight the positions of the blanks.)

A function for "deleting redundant blanks" is very useful in many applications, for example, token processing and text analysis.

[ Back to top ]

Discussion

The following J functions (using the specified methods) will be presented:

dbloop loop
dbrl rotate left
dbrlw rotate left (words)
dbss substring search
dbssw substring search (words)
dbssw1 substring search (words)
dbtacit substring search (tacit formulation)

First, we introduce some definitions (synonyms) which will be useful in explaining the alogorithms using J.

Nand =: *:
Nor =: +:
Not =: -.
Take =: {.
Drop =: }.
Rotate =: |. NB. dyad

dbloop

NB. Delete redundant blanks in vector y.
NB. loop method
NB. Catenate a blank front and back
NB. Loop character by character through vector
NB. Keep character if ...
NB. character is blank and neighbour is not blank
NB. or character is not blank
dbloop =: 3 : 0
v =. ' ', y. ,' '
b =. v=' '
n =. (#v)-1
i =. _1
z =. ''
while.
NB. Do loop while index is less than n
NB. i will go from 0 to n-1 (remember origin 0)
NB. This is from the 1-st to the n-th character in v
NB. There are n+1 characters in v
i =. i+1
i < n
do.
if.
NB. Nand returns true if any argument is false
(i{b) Nand (i+1){b
do.
z =. z,i{v
end.
end.
NB. Above loop did not keep last character (a blank) so don't consider it
NB. However, remove first character of z (a blank)
1 Drop z
)

dbrl

dbrl =: 3 : 0
b =. y. = ' '
1 }. ((1,b) *: 1 |. 1,b)#' ', y.
)

dbrlw =: 3 : 0
b =. y. = ' '
1 Drop ((1,b) Nand 1 Rotate 1,b)#' ', y.
)

dbss

dbss =: 3 : 0
v =. ' ', y. ,' '
1 }. _1 }. ( -. ' ' E. v) # v
)

dbssw =: 3 : 0
v =. ' ', y. ,' '
1 Drop _1 Drop ( Not ' ' E. v) # v
)

NB. Behead means to remove from head of a list
Behead =: }. NB. }. y is 1 Drop y

NB. Curtail means to remove from the tail (end) of a list
Curtail =: }: NB. }: y is _1 Drop y

dbssw1 =: 3 : 0
v =. ' ', y. ,' '
Behead Curtail ( Not ' ' E. v) # v
)

dbtacit

dbtacit =: Behead @ Curtail @ (Not @ (' ' & E.) # [) @ (' '"_ , [ , ' '"_ )

[ Back to top ]

Examples

showblank =. 3 : 0
'.' ((y. = ' ')#i. # y.) } y.
)

showblank cvector =: ' a b c '
.a..b...c..
showblank ''


showblank dbloop cvector
a.b.c
showblank dbloop ''


showblank dbrl cvector
a.b.c
showblank dbrl ''


showblank dbss cvector
a.b.c
showblank dbss ''


showblank dbtacit cvector
a.b.c
showblank dbtacit ''

[ Back to top ]

Performance

0.00033 dbtacit cvector
0.00039 dbss cvector
0.00044 dbrl cvector
0.00044 dbssw cvector
0.00044 dbssw1 cvector
0.00049 dbrlw cvector
0.14775 dbloop cvector

[ Back to top ]

Concluding Remarks

This concludes the discussion of "deleting redundant blanks", using several algorithms, and several formulations, in J. Similarities and differences were noted. The technique of using "word synonyms" was demonstrated in J.

[ Back to top ]

Afterword - Observations on Comparing J with APL

This afterword was written as a handout for distribution after the talk, to draw attention to some points of comparison between APL and J.

For the most part the J functions are almost "word for word" translations into J of similar APL functions. However, the function "dbtacit" uses "tacit formulation" which is not available in most versions of APL.

Substituting words for the glyphs emphasizes the strong relationship between the J algorithm and the corresponding APL algorithm. Defining words for "glyphs" is also supported in APL, but in J the definition process is simpler.

The algorithm, namely, rotate vector one character to left and perform "nand", appears the same in J as in APL. The function "dbrlw" which uses word synonyms shows this quite nicely.

E. is equivalent to "epsilon underbar" in Sharp APL. This is again shown when the word synonyms are used instead of the J "glyphs" and the function dbss is compared as an APL function and as a J function.

J has a large number of primitive functions (larger than any version of APL). This is done using a large number of pairs consisting of an ASCII character followed by a period or colon. A large number of special characters would be required for an APL-type language to provide the same number of primitives.

The examples in this article may be compared with similar examples in a [ companion article in APL ] about deleting redundant blanks.

[ Back to top ]

Acknowledgements and References

Notes from a talk given by Richard Levine to a meeting of the Toronto APL Special Interest Group, June 23, 1998.
Article written ... 29 July 1998
Revised ... 30 September 2003

Thanks to ... Keith Smillie for his comments on organizing the material, and his encouragement.


[ Back to top ]