A Complete Catalog
@Michael Berry's answer is very clear and concise. A sterling example of the J idiom table (f"0/~
). I love it because it demonstrates how the subtle design of J has permitted us to generalize and extend a concept familiar to anyone from 3rd grade: arithmetic tables (addition tables, +/~ i. 10
and multiplication tables */~ i.12
¹), which even in APL were relatively clunky.
In addition to that fine answer, it's also worth noting that there is a primitive built into J to calculate the Cartesian product, the monad {
.
For example:
> { 2 # <1 0 _1 NB. Or i:_1 instead of 1 0 _1
1 1
1 0
1 _1
0 1
0 0
0 _1
_1 1
_1 0
_1 _1
Taking Inventory
Note that the input to monad {
is a list of boxes, and the number of boxes in that list determines the number of elements in each combination. A list of two boxes produces an array of 2-tuples, a list of 3 boxes produces an array of 3-tuples, and so on.
A Tad Excessive
Given that full outer products (Cartesian products) are so expensive (O(n^m)
), it occurs to one to ask: why does J have a primitive for this?
A similar misgiving arises when we inspect monad {
's output: why is it boxed? Boxes are used in J when, and only when, we want to consolidate arrays of incompatible types and shapes. But all the results of { y
will have identical types and shapes, by the very definition of {
.
So, what gives? Well, it turns out these two issues are related, and justified, once we understand why the monad {
was introduced in the first place.
I'm Feeling Ambivalent About This
We must recall that all verbs in J are ambivalent perforce. J's grammar does not admit a verb which is only a monad, or only a dyad. Certainly, one valence or another might have an empty domain (i.e. no valid inputs, like monad E.
or dyad ~.
or either valence of [:
), but it still exists.
An valence with an empty domain is "real", but its range of valid inputs is empty (an extension of the idea that the range of valid inputs to e.g. +
is numbers, and anything else, like characters, produces a "domain error").
Ok, fine, so all verbs have two valences, so what?
A Selected History
Well, one of the primary design goals Ken Iverson had for J, after long experience with APL, was ditching the bracket notation for array indexing (e.g. A[3 4;5 6;2]
), and recognizing that selection from an array is a function.
This was an enormous insight, with a serious impact on both the design and use of the language, which unfortunately I don't have space to get into here.
And since all functions need a name, we had to give one to the selection function. All primitive verbs in J are spelled with either a glyph, an inflected glyph (in my head, the .
, :
, .:
etc suffixes are diacritics), or an inflected alphanumeric.
Now, because selection is so common and fundamental to array-oriented programming, it was given some prime real estate (a mark of distinction in J's orthography), a single-character glyph: {
².
So, since {
was defined to be selection, and selection is of course dyadic (i.e having two arguments: the indices and the array), that accounts for the dyad {
. And now we see why it's important to note that all verbs are ambivalent.
I Think I'm Picking Up On A Theme
When designing the language, it would be nice to give the monad {
some thematic relationship to "selection"; having the two valences of a verb be thematically linked is a common pattern in J, for elegance and mnemonic purposes.
That broad pattern is also a topic worthy of a separate discussion, but for now let's focus on why catalog / Cartesian product was chosen for monad {
. What's the connection? And what accounts for the other quirk, that its results are always boxed?
Bracketectomy
Well, remember that {
was introduced to replace -- replace completely -- the old bracketing subscript syntax of APL and (and many other programming languages and notations). This at once made selection easier, more useful, and also simplified J's syntax: in APL, the grammar, and consequently parser, had to have special rules for indexing like:
A[3 4;5 6;2]
The syntax was an anomaly. But boy, wasn't it useful and expressive from the programmer's perspective, huh?
But why is that? What accounts for the multi-dimensional bracketing notation's economy? How is it that we can say so much in such little space?
Well, let's look at what we're saying. In the expression above A[3 4;5 6;2]
, we're asking for the 3rd and 4th rows, the 5th and 6th columns, and the 2nd plane.
That is, we want
- plane 2, row 3, column 5, and
plane 2, row 3, column 6, and
plane 2, row 4, column 5 and
- plane 2, row 4, column 6
Think about that a second. I'll wait.
The Moment Ken Blew Your Mind
Boom, right?
Indexing is a Cartesian product.
Always has been. But Ken saw it.
So, now, instead of saying A[3 4;5 6;2]
in APL (with some hand-waving about whether []IO
is 1
or 0
), in J we say:
(3 4;5 6;2) { A
which is, of course, just shorthand, or syntactic sugar, for:
idx =. { 3 4;5 6;2 NB. Monad {
idx { A NB. Dyad {
So we retained the familiar, convenient, and suggestive semicolon syntax (what do you want to bet link being spelled ;
is also not a coincidence?) while getting all the benefits of turning {
into a first-class function, as it always should have been³.
Opening The Mystery Box
Which brings us back to that other, final, quibble. Why the heck are monad {
's results boxed, if they're all regular in type and shape? Isn't that superfluous and inconvenient?
Well, yes, but remember that an unboxed, i.e. numeric, LHA in x { y
only selects items from y
.
This is convenient because it's a frequent need to select the same item multiple times (e.g. in replacing 'abc'
with 'ABC'
and defaulting any non-abc character to '?'
, we'd typically say ('abc' i. y) { 'ABC','?'
, but that only works because we're allowed to select index 4, which is '?'
, multiple times).
But that convenience precludes using straight numeric arrays to also do multidimensional indexing. That is, the convenience of unboxed numbers to select items (most common use case) interferes with also using unboxed numeric arrays to express, e.g. A[17;3;8]
by 17 3 8 { A
. We can't have it both ways.
So we needed some other notation to express multi-dimensional selections, and since dyad {
has left-rank 0 (precisely because of the foregoing), and a single, atomic box can encapsulate an arbitrary structure, boxes were the perfect candidate.
So, to express A[17;3;8]
, instead of 17 3 8 { A
, we simply say (< 17;3;8) { A
, which again is straighforward, convenient, and familiar, and allows us to do any number of multi-dimensional selections simultaneously e.g. ( (< 17;3;8) , (<42; 7; 2) { A
), which is what you'd want and expect in an array-oriented language.
Which means, of course, that in order to produce the kinds of outputs that dyad {
expects as inputs, monad {
must produce boxes⁴. QED.
Oh, and PS: since, as I said, boxing permits arbitrary structure in a single atom, what happens if we don't box a box, or even a list of a boxes, but box a boxed box? Well, have you ever wanted a way to say "I want every index except the last" or 3rd, or 42nd and 55th? Well...
Footnotes:
¹ Note that in the arithmetic tables +/~ i.10
and */~ i.12
, we can elide the explicit "0
(present in ,"0/~ _1 0 1
) because arithmetic verbs are already scalar (obviously)
² But why was selection given that specific glyph, {
?
Well, Ken intentionally never disclosed the specific mnemonic choices used in J's orthography, because he didn't want to dictate such a personal choice for his users, but to me, Dan, {
looks like a little funnel pointing right-to-left. That is, a big stream of data on the right, and a much smaller stream coming out the left, like a faucet dripping.
Similarly, I've always seen dyad |:
like a little coffee table or Stonehenge trilithon kicked over on its side, i.e. transposed.
And monad #
is clearly mnemonic (count, tally, number of items), but the dyad was always suggestive to me because it looked like a little net, keeping the items of interest and letting everything else "fall through".
But, of course, YMMV. Which is precisely why Ken never spelled this out for us.
³ Did you also notice that while in APL the indices, which are control data, are listed to the right of the array, whereas in J they're now on the left, where control data belong?
⁴ Though this Jer would still like to see monad {
produce unboxed results, at the cost of some additional complexity within the J engine, i.e. at the expense of the single implementer, and to the benefit of every single user of the language
n There is a lot of interesting literature which goes into this material in more detail, but unfortunately I do not have time now to dig it up. If there's enough interest, I may come back and edit the answer with references later. For now, it's worth reading Mastering J, an early paper on J by one of the luminaries of J, Donald McIntyre, which makes mention of the eschewing of the "anomalous bracket notation" of APL, and perhaps a tl;dr version of this answer I personally posted to the J forums in 2014.