Delete Redundant Blanks
(using J)
In this talk I will discuss some functions (programs) in the J programming language for "deleting redundant blanks". This is a useful exercise to show different algorithms for a particular problem, and performance differences between algorithms. It will also no doubt be interesting to those of you in this APL audience who are eager to discover what J is all about - I hope you are not disappointed! [Editor's note - unedited remarks from the original talk - this was delivered back in 1998]
First, a definition: "deleting redundant blanks" typically refers to a function ...
r =: db v
... where v is a character vector with an arbitrary number of blanks, and r is the same vector with leading and trailing blanks removed, and multiple contiguous internal blanks replaced by a single blank,
For example,
db ' a b c '
a b c
showblank db ' a b c '
a.b.c
(The function "showblank", which replaces a blank with a period, is used to highlight the positions of the blanks.)
A function for "deleting redundant blanks" is very useful in many applications, for example, token processing and text analysis.
[ Back to top ]
The following J functions (using the specified methods) will be presented:
dbloop | loop |
dbrl | rotate left |
dbrlw | rotate left (words) |
dbss | substring search |
dbssw | substring search (words) |
dbssw1 | substring search (words) |
dbtacit | substring search (tacit formulation) |
First, we introduce some definitions (synonyms) which will be useful in explaining the alogorithms using J.
Nand =: *:
Nor =: +:
Not =: -.
Take =: {.
Drop =: }.
Rotate =: |. NB. dyad
dbloop
NB. Delete redundant blanks in vector y.
NB. loop method
NB. Catenate a blank front and back
NB. Loop character by character through vector
NB. Keep character if ...
NB. character is blank and neighbour is not blank
NB. or character is not blank
dbloop =: 3 : 0
v =. ' ', y. ,' '
b =. v=' '
n =. (#v)-1
i =. _1
z =. ''
while.
NB. Do loop while index is less than n
NB. i will go from 0 to n-1 (remember origin 0)
NB. This is from the 1-st to the n-th character in v
NB. There are n+1 characters in v
i =. i+1
i < n
do.
if.
NB. Nand returns true if any argument is false
(i{b) Nand (i+1){b
do.
z =. z,i{v
end.
end.
NB. Above loop did not keep last character (a blank) so don't consider it
NB. However, remove first character of z (a blank)
1 Drop z
)
dbrl
dbrl =: 3 : 0
b =. y. = ' '
1 }. ((1,b) *: 1 |. 1,b)#' ', y.
)
dbrlw =: 3 : 0
b =. y. = ' '
1 Drop ((1,b) Nand 1 Rotate 1,b)#' ', y.
)
dbss
dbss =: 3 : 0
v =. ' ', y. ,' '
1 }. _1 }. ( -. ' ' E. v) # v
)
dbssw =: 3 : 0
v =. ' ', y. ,' '
1 Drop _1 Drop ( Not ' ' E. v) # v
)
NB. Behead means to remove from head of a list
Behead =: }. NB. }. y is 1 Drop y
NB. Curtail means to remove from the tail (end) of a list
Curtail =: }: NB. }: y is _1 Drop y
dbssw1 =: 3 : 0
v =. ' ', y. ,' '
Behead Curtail ( Not ' ' E. v) # v
)
dbtacit
dbtacit =: Behead @ Curtail @ (Not @ (' ' & E.) # [) @ (' '"_ , [ , ' '"_ )
[ Back to top ]
showblank =. 3 : 0
'.' ((y. = ' ')#i. # y.) } y.
)
showblank cvector =: ' a b c '
.a..b...c..
showblank ''
showblank dbloop cvector
a.b.c
showblank dbloop ''
showblank dbrl cvector
a.b.c
showblank dbrl ''
showblank dbss cvector
a.b.c
showblank dbss ''
showblank dbtacit cvector
a.b.c
showblank dbtacit ''
[ Back to top ]
0.00033 dbtacit cvector
0.00039 dbss cvector
0.00044 dbrl cvector
0.00044 dbssw cvector
0.00044 dbssw1 cvector
0.00049 dbrlw cvector
0.14775 dbloop cvector
[ Back to top ]
This concludes the discussion of "deleting redundant blanks", using several algorithms, and several formulations, in J. Similarities and differences were noted. The technique of using "word synonyms" was demonstrated in J.
[ Back to top ]
This afterword was written as a handout for distribution after the talk, to draw attention to some points of comparison between APL and J.
For the most part the J functions are almost "word for word" translations into J of similar APL functions. However, the function "dbtacit" uses "tacit formulation" which is not available in most versions of APL.
Substituting words for the glyphs emphasizes the strong relationship between the J algorithm and the corresponding APL algorithm. Defining words for "glyphs" is also supported in APL, but in J the definition process is simpler.
The algorithm, namely, rotate vector one character to left and perform "nand", appears the same in J as in APL. The function "dbrlw" which uses word synonyms shows this quite nicely.
E. is equivalent to "epsilon underbar" in Sharp APL. This is again shown when the word synonyms are used instead of the J "glyphs" and the function dbss is compared as an APL function and as a J function.
J has a large number of primitive functions (larger than any version of APL). This is done using a large number of pairs consisting of an ASCII character followed by a period or colon. A large number of special characters would be required for an APL-type language to provide the same number of primitives.
The examples in this article may be compared with similar examples in a [ companion article in APL ] about deleting redundant blanks.
[ Back to top ]
Notes from a talk given by Richard Levine to a meeting of the Toronto APL
Special Interest Group, June 23, 1998.
Article written ... 29 July 1998
Revised ... 30 September 2003
Thanks to ... Keith Smillie for his comments on organizing the material, and his encouragement.
[ Back to top ]