Every once in a while I get irritated by the edges in programming languages. One notorious edge is the absence or presence of a method on an object. The method is either there or it isn't and if you guess wrong you have a program that simply doesn't run.
In some programming languages, there is a way around this. You can hook into the runtime to generate a method on an object when it is called but doesn't yet exist. Of course, that leaves us with the decision of what method to generate. And, I suspect that some of you reading this are wondering now whether this is some sort of rabbit hole - what problem am I trying to solve? In most cases, the fact that a method doesn't exist is important. We shouldn't gloss over it. But, sometimes we can gain advantage when we do.
Consider finder methods. You have some criterion you want to use to find an object. Maybe some unique id. You call your finder and you have to deal with possibility that the thing you are looking for doesn't exist. Most of the time this means that you are obligated to use a null check in your language, but that's messy and people can easily forget to do it. Some functional programming languages get around this by using a Maybe or Option type, but there's another way to punt. You can make your finder always return a collection. If the thing you are looking for is not found, you receive an empty collection. "Okay," I can hear you say, "that is just passing the buck. You have to check later." The fact of the matter is that you may not have to. You could use a map to perform the same operation on each of the elements, provided there is no chance of having more than one.
finder(criterion).map(&:run)
This makes me wonder about something. What if we were were able to treat values as collections when we are programming?
Consider this in a dynamic language.
x = 4
When we type that, we expect that x[0]
will yield an error, and we expect that x[1][59][3]
would yield an error also. But what if they didn't? What if each of those references returned 4
. What if values could be treated as infinite collections of themselves in programming? Could that be a creative way of erasing an edge?
Consider this in a Haskell-ish language.
map (+1) [1,2,3]
We know that would yield [2,3,4]
Okay, how about this?
map (+1) 3
Couldn't that evaluate to 4
?
With an operation like map, this is easy. What about fold (codenamed reduce or inject in some languages)?
If we did something like: [0,1,2].reduce(:+)
in Ruby, we'd expect the sum.
What should we expect for this?
2.reduce(:+)
The sensible answer is: 2
. A case could be made for infinity as it is the sum of an infinite list of 2s, but I think we can just say that it evaluates to 2
and maintain consistency.
Maybe there is an inconsistency in this and it all falls down. But I hope that it works and that there is an efficient way to merge scalar values and collections of the same value. I suspect I'm just reinventing APL in conventional programming languages.
Should not 2.reduce(:+) return a partial application?
Posted by: Malk'Zameth | March 18, 2013 at 02:22 AM
Malk: Can you say more about that?
Posted by: Michael Feathers | March 18, 2013 at 02:28 AM
Interesting idea that reminds me of a dilemma I was facing a little while ago in ruby to do with designing an API. I wanted to retrieve stored objects with zero or more named properties and zero or more conditions
store.retrieve('Account', :name) #retrieve objects of type Account populating just the name field
store.retrieve('Account', :name, limit: 1) #retrieve first object of type Account populating just the name field
store.retrieve('Account', [:name, :last_contact_date], {tier: 'Gold', country: 'US'}) #retrieve US, Gold-tier objects of type Account populating name and last_contact_date fields
This feels very Rubyish: a single method with three parameters. the last two of which are optional (no field names means all fields and no conditions means an unconstrained retrieval). The optional parameters may be a single thing or a list of things.
Basically this doesn't work without some nasty code (if is_a? or if type_of?) to distinguish between whether an optional argument is a single value or a collection (and, in the case of a Hash its actually both). Asking around for opinions the responses I got were mixed:
* Have different names for the methods to distinguish between a single field and multiple fields. Nicely intention-explicit but a complex API rather than a single method
* Always pass in a collection (e.g. [:name]. Simple code but the intention is less clear (do I mean :name or a collection with :name in it?) and the onus is on the caller of the method to do extra work to make the provider of the method's life easier.
* Write a little language for the query so there is always one parameter, e.g. retrieve('Account', "Name, LastContactDate where Tier = 'Gold' and Country = 'US"). Nice but the point of the API is to make retrieval a lot more declarative and writing a robust parser will require more and more complex code than writing the API as envisaged with a bit of type checking.
Someone pointed out to me that the explode (*) operator always returns a collection so the if is_a? Array code can be replaced with *property_names which will always be a collection even if property_names is passed in as a single value. But it seems to me a language where a thing is also a collection of itself (I'd go for single value rather than infinite but suspect there are philosophical and practical problems with both) would make life a lot simpler. JQuery has some aspect of this.
Posted by: Pauldyson.wordpress.com | March 18, 2013 at 02:49 AM
How about some kind of implicit conversion feature? You might have seen something like this in Scala, and type theorists have been studying more powerful systems with dependent types under the name of "Coercive Subtyping". Here's the type theory view
Basically, we add an inference rule
f : B -> C x : A c : A sub B
------------------------------------
f(x) = f(c x) : C
And declare certain functions as coercions A sub B (from A to B)
So, if we want values to act like collections, we can declare (\x -> [x]) as a coercion and hence (sum 2) now works.
What about (map sum 2)? Do we want the coercion to be applied twice? And what about this coercion's interaction with other coercions? (We have answers to some of these questions.)
Does this translate to a dynamically typed OO setting? Maybe - it might boil down to "if we can't find method M in class/proto C or its ancestors then look in different class D"... That might be fun to try.
Posted by: Paulcc_two | March 18, 2013 at 03:25 AM
It does remove an edge, but it introduces another one.
The new edge becomes obvious when you do your trick with an object that is already a collection.
You have two rules
x[0] returns x
vs
x[0] returns the first element of x which is != x
So you just replaced the edge with another one.
If you actually disallow null and use Option you actually remove an edge from your code and at least transform it in a proper step.
Posted by: Jens Schauder | March 18, 2013 at 03:25 AM
2 places I think of as having done this (at least to an extent)
- jQuery (and similar) selectors/objects, where $('#foo') will return a collection even though your base js method would be getElementById. It's a good example that "finders" don't have to return a scalar that then necessitates a null check.
- PowerShell pipelines, although they have the same issue as Jens mentions when you feed in a scalar that is enumerable, and on assignment will collapse to $null, scalar, collection (necessitating the workaround of wrapping with @() when you want to ensure the result is a collection)
You could certainly construct programs that kept everything as enumerators/generators/etc, but how many places would go from "missing null check" to "missing empty check"? For LINQ usage, you could certainly just replace your First(OrDefault)/Single(OrDefault) usages with Take(1) and see how things went from there. :)
My knee-jerk reaction is you'd end up with code that's more difficult to reason about, even if it increased brevity with removed null checks - however, given how much working code there is leveraging jQuery selectors, maybe that's either wrong, or small enough to be worth the trade off. :)
Posted by: James Manning | March 18, 2013 at 03:58 AM
Re. comment from Jens - the ambiguity only arises if your implicit rule is allowed to apply everywhere. That's why approaches like coercive subtyping only trigger when the value type does not match the expected type - ie it uses both source type and target type, not just source type alone.
Posted by: Paulcc_two | March 18, 2013 at 04:02 AM
Perhaps there are two concerns here?
1. Caller-side concern: Scalar vs Collection stuff (regarding which your suggestions seem very sensible), and the disposition of the caller to the resulting value (is it an error for there to be no returned value?)
2. Callee-side concern: The question of whether calling a method means "invoke function X on Y" or, the more adventurous "send message X to Y" (a la SmallTalk). If the former, a function must exist for the operation to be valid. If the latter, a message receiver only need handle the message when appropriate.
Posted by: Andrew Gibson | March 18, 2013 at 04:08 AM
Michael:
trying to say more about that:
lets just consider for the sake of generalicity that '+' is a function so I shall use + 1, 2, 3 instead of 1+2+3, my argument would be the same if '+' was an operator really, is just that I think the example is even more useful on functions
so +(1) is just a partial application
let +1 = +(1)
if I understood you correctly
Collections should have the current behaviour of most languages
map (+1) [1,2,3] → [+1(1), +1(2), +1(3)] → [2, 3, 4]
map (+1) [3] → [+1(3)] → [4]
map (+1) [] → []
A scalar should behave the same way as a collection with a single element, except that the result should remain scalar
map (+1) 3 → +1(3) → 4
A null should behave like the empty collection
map (+1) ∅ → ∅
IF that understandig is correct, I DO wholeheartdly agree with you, in my eyes this would be rendering the maybe monad the default behaviour with normal language syntax, I think it is a sensible default behaviour, we could have specific syntax for when we want to get nitpicky.
Well if I follow that same logic for reduce, in my humble eyes:
[].reduce(:+) → []
∅.reduce(:+) → ∅
[1, 2, 3].reduce(:+) → +(+(1,2),3) → 6
and
[2].reduce(:+) → +(2)
Which is just a single argument passed to the + function, whose arity is bigger than 1, so it is a partial application (like our +1 up there)
and hence
2.reduce(:+) → +(2)
which is the same thing
What is your opinion on this?
Posted by: Malk'Zameth | March 18, 2013 at 04:20 AM
My post was based on the "what" and "how" and now I realize that maybe you just wanted "why"!
so: to add a why
for me that respects the semantics of the applied function
we could use a reduce with a starting element say
[1, 2].reduce(0, +) → +(+(0,1),2) → 3
Such a reduce is made to accept lists of a single element
We have chosen a reduce whose behaviour is to start with the two first elements of a collection and reduce from there, applying a function that needs at least two arguments but whose partial application is useful (as shown by your usage of it in map)
So it makes sense in that choice to say "we only have part of our elements lets use what we have and this would be a function accepting others"
when I think of it like that, one could also argue that
2.reduce(:+) becomes another way of saying
reduce :+ 2
and hence a partial application of reduce itself
the returned function would reduce anything passed to it as argument using the + function and using 2 as its initial element.
I have mixed feelings about which choice would be the sanest default
Posted by: Malk'Zameth | March 18, 2013 at 04:54 AM
Actually, if you want to, you can make Ruby do this pretty easily:
class Object
include Enumerable
def each
yield self
end
def [](_)
self
end
end
However, I can't remember when I last wrote a method that could either take a singular object or an array of them (other than via varargs), so this would not benefit me much.
Posted by: Apohorecki | March 18, 2013 at 04:55 AM
At first this sounded like a great idea - an isomorphism between 2 and the list [unit, unit]. You could use a foreach loop over a number and get a while loop! Then you kept talking, and it sounded worse and worse and worse.
This kind of "It's a type error to do such and so now, but we could make up a rule for what happens instead, and people would only have to remember the rule!" (e.g. "a number behaves like a singleton list of that number, except when it behaves like an infinite stream of that number" - how would you even distinguish 1 from 2 in this scenario?) violates the continuity and symmetries of a language.
There was a long elaborate discussion in the logic community of "tonk", a connective with an introduction rule like Or and an elimination rule like And. The point is, there are symmetries and continuities in logic which are violated by adding tonk, and the general impression that when you are defining a new thing, you can define it to have whatever rules you want is wrong.
http://www.logicandlanguage.net/archives/2005/03/tonk_and_local_1.html
Posted by: Johnicholas | March 18, 2013 at 05:12 AM
Programming is hard enough without complicating reasoning about types for the sake of minor convenience.
I would sooner see functions like map and reduce accept scalars and treat them as collections-of-one than see the entire system try to coerce scalars to collections-of-one. Look at the nonsense Javascript does to coerce strings to numbers and the confusion that causes. I'd like not to encourage more of that.
Posted by: J. B. Rainsberger | March 18, 2013 at 07:53 AM
from what I understand, R does this (scalars are not really scalars) without any macroscopic issue
> a b class(a)
[1] "numeric"
> class(b)
[1] "numeric"
> a+b
[1] 2 3 4
> sum(a)
[1] 1
> sum(b)
[1] 6
> for(e in a) print(e)
[1] 1
> for(e in b) print(e)
[1] 1
[1] 2
[1] 3
Posted by: gabriele renzi | March 18, 2013 at 04:16 PM
PowerShell v3 allows scalars to be treated as collections:
$foo = 7
$foo.Length # 1 element
$foo[0] # returns 7
And the reverse
$bar = @( [DateTime]::Now , [DateTime]::Today )
$bar.AddDays(1) # normally a method on the scalar, applies to all items in the array.
Posted by: Jason Stangroome | March 19, 2013 at 01:11 PM
Seems to work in Python
import numpy as np
a,b,c = 1, np.arange(2), np.arange(4).reshape((2,2))
for x in (a*2,b*2,c*2): print x
gives
2
[0 2]
[[0 2]
[4 6]]
Posted by: Peter Williams | April 07, 2013 at 01:24 AM
Old mathematicians trick: "Singleton sets are identified with their sole element" or something alike is in the preface of many mathematics books ...
Posted by: H.M.Müller | November 11, 2013 at 07:21 AM