Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extending Base.stack for DimArrays #645

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

brendanjohnharris
Copy link

@brendanjohnharris brendanjohnharris commented Feb 23, 2024

Adds methods for Base.stack, and related non-exported functions from Base, that are compatible with DimArrays.

Syntax follows Base: stacking DimArrays along a given axis dims creates a new dimension. However, existing dimension data is preserved, and the new dimension becomes an AnonDim.

Optionally, a Dimension dim can be provided as the first argument to stack, in which case the new dimension is assigned as dim rather than AnonDim.

@codecov-commenter
Copy link

codecov-commenter commented Feb 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.99%. Comparing base (9846bfe) to head (884ab68).
Report is 41 commits behind head on main.

Current head 884ab68 differs from pull request most recent head 2c7b8b9

Please upload reports for the commit 2c7b8b9 to get more accurate results.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #645      +/-   ##
==========================================
+ Coverage   83.83%   83.99%   +0.15%     
==========================================
  Files          45       45              
  Lines        4102     4136      +34     
==========================================
+ Hits         3439     3474      +35     
+ Misses        663      662       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -131,15 +131,15 @@ end
"""
function Base.eachslice(A::AbstractDimArray; dims)
dimtuple = _astuple(dims)
if !(dimtuple == ())
if !(dimtuple == ())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice to remove all these whitespace changes so we can see the real changes

end

newdims = first(origdims)
newdims = ntuple(d -> d == newdim ? AnonDim() : newdims[d-(d>newdim)], length(newdims) + 1)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe make this a do block so its easier to read


To fix for `AbstractDimArray`, pass new lookup values as `cat(As...; dims=$D(newlookupvals))` keyword or `dims=$D()` for empty `NoLookup`.
"""

function Base._typed_stack(::Colon, ::Type{T}, ::Type{S}, A, Aax=_iterator_axes(A)) where {T,S<:AbstractDimArray}
origdims = dims.(A)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
origdims = dims.(A)
origdims = map(dims, A)
DimArray(_A, newdims)
end

function Base.stack(dim::Dimension, A::AbstractVector{<:AbstractDimArray}; dims=nothing, kwargs...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whats the idea with dim as the first argument?

The tests also don't hit this code.

@brendanjohnharris
Copy link
Author

Thanks for the suggestions! Went through and cleaned up the PR.

The Base.stack(dim, X; dims) method lets you supply a Dimension to assign to the newly created array dimension; not strictly necessary, but more convenient than setting or rebuilding the resulting array to overwrite the new AnonDim (is an AnonDim a suitable way to treat a newly created dimension, as I've done here?).

Let me know if you have more thoughts on this, happy to work on it further :)

@brendanjohnharris brendanjohnharris marked this pull request as ready for review February 27, 2024 05:01
true
```
"""
function Base.stack(dim::Dimension, A::AbstractVector{<:AbstractDimArray}; dims=nothing, kwargs...)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not pass dim in dims?

This method signature is a bit strange , we usually try not to add varients on Base methods, we just allow dims to specify named dimensions.

@rafaqz
Copy link
Owner

rafaqz commented Feb 27, 2024

We try not to change base method signatures besides allowing dims to be named. To keep the mental model very simple.

I would just put that new Dimension in dims like cat does. If its a constructed dimension it replaces the existing one. Otherwise Type/Symbol etc just choose them. Just check d isa Dimension

You will also need to format(newdims, A) afterwards.

@brendanjohnharris
Copy link
Author

Base.stack behaves differently than Base.cat in that it always creates a new dimension (i.e. Base.stack([A, B,...]; dims=2) inserts a dimension at position 2, increasing ndims(A) by 1); in general this new dimension is unrelated to the existing array dimensions, so we need to specify both position (dims) and a Dimension (dim).

I see your point about keeping new signatures to a minimum, though. How about allowing the keyword argument dims to be assigned as an 'Integer=>Dimension' pair, with the following behavior:

  1. If dims isa Integer: The new dimension is an AnonDim at position dims
  2. If dims isa Dimension: The new dimension is a DImension dims at position ndims(A)+1
  3. If dims isa Pair{Integer, Dimension}: The new dimension is a Dimension last(dims) at position first(dims)

We could also, instead, have that the new dimension is always an AnonDim, leaving it up to the user to set that dimension at a later time; in that case, however, there is ambiguity if there are any existing AnonDims in the input arrays.

@rafaqz
Copy link
Owner

rafaqz commented Feb 28, 2024

Using a Pair sounds like a good solution (as do points 1 and 2 as simpler cases). For 2 I guess putting it as the last dimension would be the majority use-case? (I rarely use stack)

Always AnonDim will require using set all the time with AnonDim => somedim. We may as well just put that pair in stack and skip the step.

true
```
"""
function Base.stack(A::AbstractVector{<:AbstractDimArray}; dims=Pair(ndims(A[1]), AnonDim()), kwargs...)
Copy link
Owner

@rafaqz rafaqz Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function Base.stack(A::AbstractVector{<:AbstractDimArray}; dims=Pair(ndims(A[1]), AnonDim()), kwargs...)
function Base.stack(A::AbstractVector{<:AbstractDimArray}; dims=Pair(ndims(A[1]) + 1, AnonDim()), kwargs...)

Isn't the default ndims(A) + 1 ?

Comment on lines 623 to 624
dims isa Integer && (dims = dims => AnonDim())
dims isa Dimension && (dims = ndims(A[1])+1 => dims)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
dims isa Integer && (dims = dims => AnonDim())
dims isa Dimension && (dims = ndims(A[1])+1 => dims)
if dims isa Integer
dims = dims => AnonDim())
elseif dims isa Dimension
dims = ndims(A[1])+1 => dims)
end

Assigning after && is best avoided as the assignment is hard to see when skimming code

end
newdims = format(newdims, B)

B = rebuild(B; dims=newdims)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
B = rebuild(B; dims=newdims)
B = rebuild(B; dims=format(newdims, B))

User specified dims are usually incomplete and possibly incorrect

@@ -547,6 +547,100 @@ $message on dimension $D.
To fix for `AbstractDimArray`, pass new lookup values as `cat(As...; dims=$D(newlookupvals))` keyword or `dims=$D()` for empty `NoLookup`.
"""

function Base._typed_stack(::Colon, ::Type{T}, ::Type{S}, A, Aax=_iterator_axes(A)) where {T,S<:AbstractDimArray}
Copy link
Owner

@rafaqz rafaqz Mar 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How stable do you think these methods are... could we add a method to Base.stack instead? What do we gain from touching these internals?

I know we use internals elsewhere, but we should stop:
#522

end

newdims = first(origdims)
newdims = ntuple(length(newdims) + 1) do d
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this type-stable?

src/array/methods.jl Outdated Show resolved Hide resolved
src/array/methods.jl Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants