Hacker News new | past | comments | ask | show | jobs | submit login
Span – An abstraction over all types of memory available to .NET programs (github.com/dotnet)
103 points by gokhan on Oct 15, 2016 | hide | past | favorite | 17 comments



Cool. Interesting value-add inclusion.

I wrote a FileSpanStream class to implement BitTorrent file writes in BTSharp years ago.

While it's not targeted toward in-memory implementations, it is a similar type of wrapper abstraction and might be well understood by developers familiar with the "Span" abstraction (esp. after the Span<T> generic is in the BCL).

I can imagine scenarios where the FileSpanStream could be leveraged for non-BitTorrent use cases. (max-filesize constraints, load disribution, other, use your imagination).

I'm on mobile right now but I'll follow-up with a license-compatible contribution for consideration for inclusion.

P.S. Keep up the good work .NET Core Team. I've been keeping an eye on the project since you've been providing cross-platform distributions and it's extremely exciting. I loved working at Microsoft (especially when the .NET team with Brad Abrams was publishing things like .NET library design guidelines). I miss the .NET that I loved working with in 2004 and the .NET Core project is bringing me back one design at a time. Cheers.


The main use cases are actually way more mundane, but also much more impactful. Consider some of the most common string manipulation code you see: grab a substring, compare it to some stuff, branch based on results.

Since strings are immutable and non-sharing, all substring calls will create copies of the substring. With Span<T>, you could instead simply request a span of the string and, with unification in the underlying typing, you can perform all of your string manipulation with no allocations or copying.


I'm all aboard the existing use cases for Span<T>. I said that when I said "interesting value add inclusion." for the Span<T> inclusion. Not sure I get your comment.


Oh, just clarifying for people that it's not just unusual apps with large blocks of memory that will benefit. Even mundane apps that do a little string manipulation will see the results.


I wonder how much string manipulation you'd have to be doing to see a benefit. I still can't use C#6 everywhere at work because some teams insist on being in VS2010 / 2012, so I'd be pretty worried about going to C#7.


A lot, I guess. Famously, Java's String worked that way (sharing the buffer for sub-strings and only storing offset and length) for a long time, until they changed that, I think in Java 7 to the same what .NET uses currently (copying the sub-string).

Both approaches have benefits and drawbacks, but apparently copying the sub-string seems to be best for the vast majority of cases and you only benefit from the shared buffer in select cases (also it's a great memory leak opportunity if you don't know how it's implemented). So I guess it makes sense for .NET to add the capability (in a more general form that also works for other things) instead of only having one or the other.


The problem with sharing is that you can inadvertently keep a massive string alive by only holding on to a tiny piece of it.

Span<T> conveniently side steps this issue by only existing on the stack, meaning that you can't stash away the span somewhere and accidentally keep the underlying buffer alive longer than necessary.


Interesting idea, but most interesting to me is the stack-only restriction. I'm curious how they're going to enforce that, and whether that property will be limited to Span or can be used with other types.


It seems that it will CLR-enforced so I would guess that it's not going to be available for any type:

  The fast representation makes the type automatically stack-only, i.e. the constraint 
  will be enforced by CLR type loader. This restriction should also be enforced by 
  managed language compilers and/or analyzers for better developer experience. For 
  the slow span, language compiler checks and/or analyzers is the only option (as 
  the runtimes won't enforce the stack-only restriction).


Right, ultimately it will be enforced by the CLR type system, much the way most types are in .NET, but practically it will be enforced by the compiler.

You can already see this in the "ref parameters" feature in C# today: they can be parameters to methods but cannot be stored in fields. This implies that they can only exist on the stack.

Similarly, when we add support for ref-locals and ref-returns in C# 7 that will still disallow ref fields, so ref variables will still only be allowed on the stack.


Not an answer, but an additional argument for making this stack only: it avoids the problem Java's substring operator had. It used to be O(1), with the resulting string sharing its data buffer with the underlying string.

Problem with that is that a small substring of a larger string prevents the data buffer of the larger string to be garbage collected.

That happened a lot when parsing files. Let's say you read a 1 GB, 10M row cvs file with a small string ID and 10 integers on each line. The strings should take maybe 100MB, but they will take 1GB. Oops.

The behaviour of the substring function changed to copy string data in JDK7 (after quite a bit of deliberation. There was a nice writeup of the results somewhere, but I can't find it)


Awesome!

Just last month I developed an algorithm to solve performance issues related to traversing a large graph. The algorithm required an array of hundreds of millions of structs and performance was very important (as one might assume when using a lot of RAM!). Allocating on the native heap worked like a charm!

My solution was too-coupled to the specific problem I was working on, I'll be looking to use Span<T> instead as soon as I have time to convert.


Not having to copy big arrays when calling unmanaged C dlls from .Net will be great for performance. I wonder if this is going to open up new security holes though.


You can already pass big arrays to unmanaged code without copying just fine: allocate a GCHandle of type Pinned, and pass AddrOfPinnedObject as the pointer.

The use case for Span is more within the managed realm: it allows pointer-like manipulation of memory, without having to special-case it for each type.

And although anything with pointers is of course asking for trouble to some extent, you don't lose any safety in most scenarios (those not involving obvious giveaways like 'unsafe'...)


Are there any good writeups for the new APIs that are coming to .NET? Things like the System.Buffers API don't get the attention they deserve IMO.


Well you can watch Joe Duffy's presentation about their work on Midori and how it is influencing C# 7 and future versions.

"Safe Systems Programming in C# and .NET"

https://www.infoq.com/presentations/csharp-systems-programmi...


  Span<T> is a new type we are adding to the (.NET) platform to represent
  contiguous regions of arbitrary memory, with perfromance characteristics on
  pair with T[]. Its APIs are similar to the array, but unlike arrays, it can
  point to either managed or native memory, or to memory allocated on the stack.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: