It might not end up with the hoped for performance per watt -- or even with all that good performance, period. It might not even be built in silicon.
But it is a damn interesting architecture.
You don't get to present bits and pieces of it over several EE380 talks unless the grownups think it is interesting, too.
(It is also a lot less like the Itanic than it initially seemed. I think we will know a lot more in about a month, since there will be a new talk in Amsterdam about the new vector model and their µthreading model.)
We do not have an FPGA implementation, although we are working on it.
The reason getting to product is slow is that we are by choice a bootstrap startup, with three full-timers and a dozen or so part-timers. Compare a CPU project at a major such as Intel or ARM, with hundreds of full-time engineers, big budgets, and five years of development before the product is even announced - all for minor modifications to an established and well understood design.
The Mill architecture is notable primarily in what it takes out of the CPU, rather than what it puts in.
Patents are a backbreaking amount of work. To economize in filing costs our attorneys have consolidated what we expected to be separate patents into single filings - no less work, just fewer/bigger and hence cheaper. So far the twenty filings that have gone in represent around two thirds of the core technology. And if you think that 80-odd pages of patentese is fun then you are more kinky than should be let out in public.
"Architecture astronaut" is a cute term, not restricted to architecture; I have more than enough experience with those full of ideas - for someone else to do. We have done a lot of architecture over the decade, but a fair amount of work too. For a live demo the result of some of our work see http://millcomputing.com/docs/specification/
While I have worked in academia, I am more comfortable in an environment making real things for real people (or trying to) rather than sheepskins.
The purpose of our talks is to get professional validation of the design in a way we can afford. It is not to sell you something; it will be quite a while before we have something to sell, and you are not our customer.
We have a private placement in process and welcome Reg-D qualified investors. If you do not know what that is then you probably aren't. For details: http://millcomputing.com/investor-list. We are not cruising Sand Hill Road. We estimate ~$6M to an FPGA, and $25M to a product. Heavy semi is serious industry.
In early days I did my first compiler for a Bob Barton-designed machine, the Burroughs B6500. That compiler is still in use forty five years later. I wish Barton were still here today; the Mill owes a great deal to his design philosophy.
The most interesting thing I took from the Mill videos I watched is the way they define the processor features in code. They can describe the whole specification in code, compile and test the CPU. Then get enough information from the simulation to actually put together the layout for the physical cpu. You can add or remove features from the CPU by implementing the proper functions, and test the specification before committing to silicon. My background is software, so maybe it's not as impressive as it sounds, but it really caught my ear as an elegant solution to customizing a processor architecture for different needs.
Oh, I learned today that VHDL has (had?) a standardization group called VASG, which is quite a nested acronym:
VASG →
VHDL Analysis and Standardization Group →
VHSIC Hardware Description Language Analysis and Standardization Group →
Very High Speed Integrated Circuit Hardware Description Language Analysis and Standardization Group
Take a look at Chisel (http://chisel.eecs.berkeley.edu)... they call it a "Hardware Construction Language" instead of a HDL, as you describe how the hardware works in chisel code, which is just a Scala embedded language. It is then able to generate FPGA or VLSI optimized verilog, and a C++ cycle accurate simulator. I have been using it for the past ~9 months designing a new chip.
Yes but you can also build functional or cycle accurate simulator in software which can give you some information on how the processor could potentially perform. Of course having a soft-core running on an FPGA will most likely give you a more accurate idea of how the processor actually works.
I really hope that mills go into production. One thing that current comments have not mentioned is that there are huge security wins using a mill. Even if there are concerns about performance I think that the security architecture has use cases that will see mills brought into production.
The original of this discussion was a blog post on kevmod.com. I posted the following comment on that blog, repeated here verbatim as possibly of general interest:
++++++++++++++++++++++++++++++++++++++++++++++++++++
Your skepticism is completely justified. The Mill may never reach market – we are a startup, and most startups fail; its a fact of life. Although we’ve survived for over a decade, which is pretty good for startups these days.
But it sounds like you are less skeptical about Mill Computing the company, but more about Mill the technology and architecture. There are fewer ground to doubt that. As fast as we have been able to get the patents filed (I seem to have been doing nothing else for the last two years. I hate patents) we have been completely opening the kimono and showing the technical community, in detail, how each part works. Why? because we wanted outside validation before wasting another decade in something that was fatally flawed in some way we had overlooked.
If there was any part of the public Mill that one could point at and say “See? that won’t work, because …” then the web would have been all over us. Buy you know? Skepticism we get, aplenty. What we don’t get is informed skepticism. In fact, the more senior and skilled the commenter, the more they fall in love with the design. Like Andy Glew said one time (and if you don’t know who that is then you are not in the CPU business) – “Yeah, it’ll work, just the way he says it will”.
Sometimes people complain that our presentations are insufficiently detailed to fairly evaluate. Guilty as charged; they are oriented for a high level audience interested in the subject, but not for the specialist. However, if you ask for details on our forum (mill computing.com/forum/themill) or the comp.arch newsgroup, as hundreds have, you will get all the details you want until they flood out your ears and collect in puddles on the floor.
In these days of internet time, when idea to market is measured in days or weeks, it’s east to forget that not all the economy works that way. Building steel mills, cement plants, and yes, CPU silicon takes a long time and a lot of money. We have deliberately swapped money for time: we are a bootstrap startup, not looking for VC funding. There’s good and bad in that choice: a decade without a paycheck is not easy, but today we own it – all of it – and feel we got a good deal.
The proof of the Mill pudding will be when there’s a product with pins on the bottom, and that won’t happen for some years yet. We learned in our first presentation not to make projections of what the eventual chip will have for numbers. Yes, we have guesstimates internally, but we’re quite sure those will be off by a factor of two. The problem is that we have no clue which direction they will be off.
If you have the technical chops to understand a CPU design from first principles then please dig as deep as you can into our stuff and tell us – and the world – what you find. Otherwise you will just have to join us as we wait and work and see. We’ve never said anything different.
I suspect that they have versions of it running in FPGAs but they aren't releasing anything to a wider audience because it's an FPGA and probably 5-50x slower than it would be if it was in custom silicon.
So it's incredibly useful for them to test out ideas, but would probably garner a lot of bad impressions even if you explain "look this is super slow" because people have the idea that revolutionary is supposed to be extra fast, extenuating circumstances be damned.
I think I remember them saying either in one of the talks or in a comment somewhere (sorry I can't be more specific), that, yes, they do have it working on an FPGA, but since the purpose is only to test the design, it doesn't run anything like impressively fast, because they aren't utilizing the resources efficiently.
EDIT: by resources, I mean, the specialized logic and chunks of RAM that exist inside FPGAs.
If you can describe something for an FPGA it's not a huge leap to get it into silicon. A lot of people got pissed once they realized that the custom bitcoin mining ASICs were actually the open-source FPGA designs simply "burned" into an ASIC.
I suspect that the reason everything is taking so long is that they have to re-write so much of the core stuff (standard library, OS, drivers, etc). But once you can boot linux on a very slow FPGA I can't see a lot of reasons not to start making the ASICs.
You are certainly right in a sense, because both FPGA and silicon is usually described using some VHDL or Verilog. So once they have a working FPGA, it's not too difficult to get it into some silicon.
However, to get it into fast and power-efficient silicon is still a ridiculous amount of work. FPGA technology is vastly different from directly designing VLSI circuits. For one thing, they'll most likely have to use different pipeline designs to squeeze the cycle times low enough.
I worry that the Mill people are vastly overconfident. There are very interesting ideas there, but I suspect that they are trying to improve so many aspects of a CPU at once that they will be unable to execute any of them well. The end result is likely that sub-par execution leads to inefficiencies that accumulate to the point where any of the theoretical advantages of their design gets overwhelmed - especially since many of their ideas can be integrated by competitors as well. That said, I'd love to be proven wrong here.
So at first I totally agreed with you, which is why I didn't post anything right away. But now that my subconscious has had a day to chew on things, I think I don't agree quite as much.
The really interesting thing about the Mill is that it does away with a HUGE portion of what a CPU does: out-of-order execution (OOE). These days the amount of die space that's dedicated to reordering and dependency checking and the commit engine is substantial, and that stuff really slows a processor down because the thing that are happening at the end of the pipeline depend on what's happening at the beginning. At 4GHz the best-case scenario the wavelength is 75mm, and the 1/4 wavelength is thus 19mm, on the order of the size of the die!
What the Mill does is push the reordering and dependency checking to the compiler so that the hardware itself is "dumb" at least insofar as not doing all the checking for OOE and multiple dispatch, etc. Further since the instructions are ordered as blocks and that blocking is done by the compiler, the processor doesn't even need any logic to assign instructions to functional units; that's done at compile time.
I agree that fast and power efficient silicon is a challenge, but again because they're not mixing execution with reasoning about execution (OOE, multiple dispatch, commit engine, etc) it's much, much easier to make the layout fast. I believe that instructions and data will flow through the Mill in a way that makes regular CPU architects envious.
I still suspect that a huge amount of the delay is due largely to the general weirdness of the architecture; it's not like an x86 to ARM port because those have a lot of the same basic assumptions baked in and the only difference is the instruction and data encoding. On the Mill few of the assumptions are the same and so you might have to re-write a LOT of code to boot a Linux kernel.
You make a very good point. I do like to give you one additional aspect to consider: While the Mill saves on instruction scheduling logic (and register renaming, and certainly also has the benefit of re-thinking opcodes and instruction decode), there are other features that add complexity again. Making a subroutine call appear as if it were a single instruction has got to be tricky, for example.
In the end, I have only seen CPU design from a physical design tool support point of view. I don't really have the necessary experience to judge the relative complexity of out-of-order+register renaming vs. whatever the Mill aims to do. You're probably right that the Mill is simpler.
Part of it is that all we really know about all this comes from a bunch of talks that focus on what the Mill looks like to the user. Who knows what their physical design story is. I just hope, for their own sake, that they have been working on that part of it for a significant amount of time as well.
On the other hand, their wiki does have a lot of extremely interesting details about the processor and ISA design, and they have presented several concrete examples of how to get around the problems with Itanium.
It would be nice to hear from some more people in the industry about it, or see some more results, but I remain cautiously optimistic.
I have the same impression as you and commented here a while back. I think the answer to your question falls somewhere in the middle.
They are smart guys with interesting ideas, but will not be buying fab capacity or selling products anytime soon or most likely ever.
However there is a lot of value in provocative thinking. It adds to the collective knowledge and inspires new perspectives and approaches to problem solving. That alone makes me glad their around.
On the high end of expectations they may sell some patents or license some IP, but still not in the form of a complete or competitive product.
I've subscribed to the hype, but my knowledge of the domain is near zilch. It would definitely be interesting to hear a high level engineer at a company like Intel or ARM comment on the design.
They don't necessarily believe that it will succeed or that the whole architecture is good -- but the consensus seems to be that it at least is interesting.
Last I heard, they've been spending most their time getting patents filed since the US switched to first-to-file in 2011. There might be more physical progress once that's done, or so I hope.
I filed the Mill CPU under "call me when you ship" quite a while ago, because the constant "let me tell you how we will solve that with Mill" threat-squats got really old and in 30 years of doing this I've seen way to many "no, really, this is going to change all of computing as we know it" architectures proposed, that never amounted to anything worthwhile, and here in 2015 I'm still buying the 8086's grandkid. At least ARM and MIPS are keeping it somewhat interesting, and Power 7 and T5 are looking at interesting architecture innovations, and, most importantly, are actually shipping something instead of hanging out on HN telling us how awesome it will be one day.
I'm interested to see how a compiler for the Mill CPU would handle such large instruction widths. Whilst it doesn't have the restriction VLIW had of having a fixed instruction width, I'm not entirely convinced we would often see 30+ operations packed in the same instruction.
I have watched most of the Mill videos, and while they are certainly impressive and innovative, I believe Ivan Godard is a perpetual non-finisher. He comes up with grandiose complicated ideas, which turn out incredibly difficult to implement, debug, and close. And by the time he makes much progress, he has another idea that takes his focus, and starts working on that. He has these massive goals, which are certainly noteworthy, but possibly does not realize it's likely not possible to do them in one step.
You have to release incrementally, get something out the door, and then move to the next. Not "Let's design and architect a complete ASIC from scratch, based on an architecture from scratch, using a prototype simulator from scratch, with a toolchain built from scratch, with a new ABI for Linux, and a complicated debugger. Oh, and this is actually a family of processors (low perf/high perf), not just a single processor we are building."
It's the opposite of MVP. If he really wanted to make progress, he would find a way to incrementally build one piece of his suite of technology IP that he can license to existing CPU manufacturers and get into production. Then you have revenue, and can rinse and repeat. Otherwise the Mill as vaporware is a virtual certainty.
You do need feedback, lots of it, but not necessarily from customers. It is not a social media platform on the web using the hottest new framework. Your feedback mostly consists of numbers from measurements. That's why they built a simulator first (and it has apparently been running for quite some time). That's also why they did work on a compiler that they did get something useful out of (they have since switched their compiler approach to making an LLVM backend instead).
But getting to a point where you can run it in silicon (even just as an FPGA) is a long, tough slog. Especially if you want decent/realistic speed. Getting good numbers on power use is also both hard and expensive.
And it's all worth nothing if they don't have the patents. And many of the innovations need the other innovations to be useful or even just make sense.
I met both Ivan and a German guy who works on the compiler late last year and have corresponded a bit with both. What I have seen so far fits very well together... but of course there are still many pieces that they haven't shown (and that I haven't seen either).
Obviously this was written from a software engineering perspective, but it seems at least marginally fitting, as the topic in question is literally computer architecture.
"Researcher" is also an appropriate term that also happens to have the virtue of not being pejorative.
Even though they're not popular in the business world, we do also need people who will change their goal when confronted with new ideas, rather than just the ones who filter ideas based on their service to a goal.
We even gave a Turing award someone like that a few years ago.
"Researcher" is appropriate for someone who actually produces academic output. It may not be immediately practical, but it consists of new ideas that have been formalized and tested sufficiently to pass peer review of other researchers.
Has the Mill architecture done that? Are there any actual papers about it? Anything beyond marketing fluff?
It seems like there are some ideas behind the Mill that seem interesting when you hear about them, but I haven't seen anything rigorous enough to even be reviewable.
And it doesn't look like that is the intent, either. Rather, from all appearance, it looks like they are trying to drum up interest for investment.
This may just be a semantic quibble, but I'd say that someone that produces academic output (usually in the form of original research) is an "academic", and the academy has it's own peculiar rules for deciding which output has merit.
The Mill stuff is just an example of research taking place outside of the academy.
If you are not producing any academic output, are not producing any commercial output, and are not even producing any patents, all you are producing is high-level handwavy talks, how do you distinguish that from a charlatan?
Ideas in a vacuum are not worth much. I have some random ideas for systems that I think would work better than current systems that people use all the time; stuff I would like to get to some time to develop into something real. However, without actually either rigorously formalizing those ideas and having them reviewed against the current literature, or producing a shipping implementation that demonstrates their feasibility empirically, they're pretty much worth the cost of the paper they are (not) written on.
Detrminatuon of charlatanism and crackpottery is also a function of past performance; no one considers that Shinichi Mochizuki’s proof of the ABC Conjecture is the product of a crackpot, even though people are still working to understand it nearly 3 years hence.
According to the Financial Times:
“Godard has done eleven compilers, four instruction set architectures, an operating system, an object database, and the odd application or two. In previous lives he has been founder, owner, or CEO of companies through several hundred employees, both technical and as diverse as a small chain of Country Music nightclubs and an importer of antique luxury automobiles. Despite having taught Computer Science at the graduate and post-doctorate levels, he has no degrees and has never taken a course in Computer Science.”[1]
Mill may turn out to be another Xanadu. On the other hand, the nice thing about computers is that you can use them to simulate anything, including a better computer (with respect to X), so it's not crazy to think that Mill may have something serious to offer.
It might not work as well as they hope for but there is nothing crackpottery about the Mill so far.
Or, to put it differently, there might be crackpottery in some of the things they haven't revealed yet. Who knows.
(Naturally, this says nothing about whether it will ever actually be built or whether any actual silicon will actually run fast -- there are so many other reasons besides the purely architectural reasons why it might not. Chips are hard.)
A concrete example of something the Mill does better (at least on paper and in a simulator): it does spills and reloads of values differently, in a way that doesn't go through the data caches. The same mechanism also handles local values in many cases. If they don't pollute the data caches, you can get by with smaller and/or higher-latency caches and still get the same (or higher) performance. The loads that are still there on the Mill are those that really have to be there (and are there in the compiler's intermediate representation, before register allocation).
There are also some innovations in the instruction encoding: they can put immediates inline in the instructions, like on the x86 and other Old Skool CISC architectures. They can even do that for quite large immediates. That's better than having to spend several instructions "building" the constants or having to load them at runtime from a table, as the RISC architectures do. They do this with a very regular encoding while keeping the encoding very tight (and variable-width). The way they pack many operations into each instruction means that even if the instructions are byte-aligned, the individual operations don't have to be. If the operations only need 7 bits for their encodings, then that's what you get. (The bit length is fixed per "slot" in the encoding so it's not as variable -- and slow -- as it sounds.)
Instruction decoding and execution is pipelined: the first part of the instruction is decoded first and starts executing while the rest of the instruction gets decoded. This ties in with the way call/return/branch is handled so the last part of the instruction effectively overlaps its execution with decoding and execution of the first part of the instruction at the new site. This works for loops as well so the end of the loop executes overlapped with the start of the loop and so that the prolog/epilog overlaps with the body. You can think of it as a very strong form of branch delay slots -- or as branch delay slots on acid. This is combined with an innovation that allows vectorized loops with very cheap prologs/epilogs so you can opportunistically vectorize practically all your loops. If a loop only executes twice (or thereabouts) you break even.
Edit: and even if I wasn't it wouldn't make a difference - flitting from one grand project to the next without actually completing any of them is a clear diagnostic marker anyone can read in a book.
Edit 2: Questions 1 and 2 from the Adult ADHD Self Reporting Scale[0] (from Harvard, endorsed by the WHO) -
1. How often do you have trouble wrapping up the final details of a project, once the challenging parts have been done?
2. How often do you have difficulty getting things in order when you have to do a task that requires organization?
[0]: http://www.hcp.med.harvard.edu/ncs/ftpdir/adhd/18Q_ASRS_English.pdf
Architecture astronaut isn't a term for starting projects and not finishing them. It's a term for people who overengineer or overdesign their system.
Overengineering can lead to not finishing a project, but in the case of someone with ADHD that's unlikely to be the only issue.
Overengineering/Overdesigning isn't a problem I have, and is probably the single biggest thing I use to judge whether or not someone is a bad programmer, which is why what you said bothered me.
This feels uncomfortably close to a personal attack.
Not everything can be built via MVP-and-iterate. How much further could a project be from the MVP-and-iterate sweet spot than a brand new CPU architecture?
People should be free to try wild new things, whether they succeed or not. A strong visionary, once at speed, won't be stopped by nitpicking comparison to less ambitious things. But I worry about how many nascent visionaries we miss out on when the culture trains us to disparage those impulses in ourselves, and so many people tell you you're doing it wrong.
B&W was pretty meh though, all hype over actual gameplay, it's the first title of his that made people step back and go "hang on, what was all the hype about?"
Note that because of the Mill's unique architecture... if given the choice between a thread-based model and a fiber-based model, you should probably use the Fiber One.
But it is a damn interesting architecture.
You don't get to present bits and pieces of it over several EE380 talks unless the grownups think it is interesting, too.
(It is also a lot less like the Itanic than it initially seemed. I think we will know a lot more in about a month, since there will be a new talk in Amsterdam about the new vector model and their µthreading model.)