Hacker News new | past | comments | ask | show | jobs | submit login
If you move your mouse continually the query may not fail. Do not stop moving (microsoft.com)
493 points by shawndumas on Jan 4, 2014 | hide | past | favorite | 163 comments



I had this bug once about 10ish years ago when I was still a windows dev. Creating a 2nd window on a thread with a window already but then pumping it's main loop on a background thread can cause this. If you do that the loops get hooked to each other regardless of the thread pumping it. The child window has to wait until the parent window forwards the event for the second thread to pop it off. A mouse move will send a WM event and keep the child windows loop on the other thread spinning. My WM_TIMER on the child window was stuck as well.

It happened to me because I had a hidden window on the background thread to get WM events for USB disconnect and reconnect messages from the system. The bug report was funny. "Connecting USB data collection probe doesn't work unless the user moves the mouse after connecting." and the follow up bug "Data collection only works when moving the mouse."


The rule of thumb as I always heard it is that a window is "owned" by the thread that created it. You can send and post messages from a different thread, but do much else and you're asking for trouble. (And Send instead of Post is often problematic since it will block until the target thread processes it.)

Draining the message queue from a thread that doesn't own it makes absolutely no sense to me ... Does that actually work? Wouldn't you see the messages on the original thread's pump too? I thought the way it worked is each thread with ui objects has a queue, and SendMessage et al. will insert into the owning thread's queue. It took me a while to parse what you are saying, did you pass the hwnd into GetMessage from the other thread and that somehow convinced it to peek at another thread's queue?


COM hates MTA threading, most of the Office object model actually insists on STA single-threads sitting in the main UI thread to even work (i.e. you cannot use BackgroundWorker() )! Some guy @ MS even released this byzantine code to deal with the situation (http://blogs.msdn.com/b/andreww/archive/2008/11/19/implement...) which unfortunately doesn't work. It's completely nuts how bad the object model is.


As much as I make fun of java developers for over complicated solutions, I've never quite seen anything as over-engineered and byzantine as COM. I remember at my last job we had legacy COM objects that inherited from about 5-7 different templated objects, and that was... normal. COM is the worst.

I actually like windows as a user, but I'm amazed that it won as much developer mindshare as it did considering how much the win32 api sucks. It's brutal.


I think the core of COM is sound (pure interfaces, QI, refcounting, IDL).

I think the ugliness is due to threading issues and being welded on to Win32 message pumps. If you have a dedicated COM thread/pool you'll mostly be fine (security stuff can still get complicated). But on the other hand if you start doing COM things on a thread you own where you share control of the message pumping, you open yourself to a world of mental anguish.


I actually like COM. Sure there are historical accidents like STA. But if you are free of that legacy and keep it simple there are some nice features: reference counting, QueryInterface, consistent error codes across components, abstraction away from linking, not having to worry about which allocator an object uses. (Though last one is only important on Windows since different DLLs can end up using different heaps.)


COM doesnt mandate anything silly like that.


Similar to COM, are (beside OLE, DCOM, ActiveX) Gnome's "Bonobo" (component model) [1], KDE's KParts, Mozilla's XPCOM and CORBA.

Apple choose KHTML over Gecko for it's Webkit fork [2] because of the various cons of such component models.

[1] Bonobo is officially deprecated: http://en.wikipedia.org/wiki/Bonobo_(component_model)

[2] http://en.wikipedia.org/wiki/XPCOM


I remain convinced that C++ warps peoples minds into things like this :-). I'll go out on a limb and say that pretty much ALL of those things are horrible.


There are some advantages to COM, in that having a standard, somewhat introspectable notion of what an object is and what its lifetime is helped Firefox add cycle collection several years ago. (By contrast, leaks are harder to avoid in WebKit and Blink, because there is no cycle collector, at least until Blink lands Oilpan.)

I think COM's biggest sin was doing all this stuff without language support. It's just not sensible to graft interfaces and automatic memory management onto C without modifying the language. Objective-C is basically a slightly more duck typed COM, but it's not nearly as reviled because the object model is so deeply integrated into the language.


Pretty certain COM was designed with C in mind, maybe C++ as an after-thought. The differences between C++ objects and COM objects are severe enough to make it not really a lot of fun unless you limit yourself to C-like behaviors anyway (or use Microsoft's dandy language extensions to generate a crapload of boilerplate for you).


I've done COM in straight C and it's not too bad. It's rather reminiscent of the Linux kernel's "operations" struct-pointers. (i.e. where `struct file` contains a `struct file_operations *` and the operations struct itself is full of function pointers - kind of like the C++ vtable concept)


XPCOM is... yeah, it's pretty bad. Writing a Firefox plugin vs. writing a Chrome plugin is darkness vs. light.


Firefox has a new (well, it's at least a couple years old now) system that makes writing plugins not quite so painful:

https://developer.mozilla.org/en-US/docs/Jetpack


COM made several promises, and had it delivered on those promises, the world would be a better place. I can understand why projects tried to copy COM, only to regret it later.

As noted, GNOME deprecated Bonobo even though the GNOME name was meant to emphasize the CORBA/Bonobo aspect of things (Gnu Object Manipulation Environment). Mozilla spent a lot of effort removing gratuitous use of XPCOM ( https://wiki.mozilla.org/Gecko:DeCOMtamination ).


But COM has been adopted by Linux world too, and renamed to Bonobo. In fact, if you throw away all VB-specific stuff, COM and its C++ implementation is a great programming exercise and is imho quite elegant.


"The VB stuff" is pretty much all you get when you work with MS-Office.

Don't know that I'd call COM "elegant", but you could probably do worse for a language-agnostic object system. Funny enough, I never really "got" COM until I sat down and read up on Bonobo. Someone had a really accessible explanation for Monikers that would've saved me so much hair-pulling if I'd had anything similar for the Windows equivalent.


If you're calling an object living in STA, you should do it from the same thread or marshall the pointer before using it from a different thread. In the latter case the STA's thread needs to run message loop to process calls to the object. If the thread is showing a modal dialog or blocked doing a lengthy operation it doesn't happen. As you can see there is nothing wrong with the object model itself, it just allows to call an object that was designed to be single-threaded from the threads other than its "home" thread.


Pumping from the wrong thread is undefined behavior and it results in the bug above. It's a fairly common bug.


Excel codebase is very old, most is still in recent versions.

Excel 2010 still relies on WinAPI's Fibers [1] (lightweight threads). In recent years the idea got popular again with Lua's co-routines and GO's goroutines.

So it is manually scheduled by the application, instead of relying on the OS.

In 2014, there is a lot of code in Excel that dates back to Excel 3 (1990). Excel 2010 still relied on the outdated MDI concept [2] that had been introduced with Win 3 and only one Excel instance/main window can run.

One can embed the Excel OLE component in one app [3]. As soon as the component get the focus it replaces the traditional menu bar with the custom drawn "ribbon bar". It looks weird out of place (Win9x style OLE app with ribbon bar). Office apps started with custom drawn UI objects with Office 97. It had these fancy toolbars and a italic window title [4] instead of the boring Win95 look that WinAPI provided.

[1] http://msdn.microsoft.com/en-us/library/windows/desktop/ms68... ; http://msdn.microsoft.com/en-us/library/windows/desktop/ms68...

[2] http://en.wikipedia.org/wiki/Multiple_document_interface

[3] e.g. using the sample apps that come "Inside OLE 2nd" book.

[4] http://www.cheresources.com/economics.shtml


A few years ago I had to look at two Excel spreadsheets fullscreen side by side each open in their own window (two monitor setup) and this was...not easy!!! I eventually found some hack online to open up multiple instances of Excel at once. Still was bizarre that is a simple use.


Thankfully this is no longer a problem in Excel 2013. (I believe it was still a problem in 2010.) The trade-off seems to be getting help/documentation is worse than ever. :(


For Excel 2007 (on Win8.1) you 'just' middle click (wheel-click) on Excel if it is pinned to the task bar (which is how I happen to have things set up -- same thing as opening two instances as mentioned elsewhere). I agree, not very obvious, but I think multi-mon puts you in the 'top' 5%* (so you won't get as much help from a scenario point of view).

*can't find a reference for that, so I may be off.


I've had this trouble. I think the way around it is to open Excel itself multiple times, and then open your files from each instance of Excel. Or something like that, I'm not at work and don't have access to a Windows box ATM.


Agreed that it's dumb behaviour, but it's not exactly a complicated hack - just run excel again (from start menu, from win+R, etc.) and it creates a new instance.


IIRC, older versions of Excel didn't work that way. The OP talks about "a few years ago", so it may have been with an older version that didn't work that way.

I have never been able to remember how Excel and Word for Windows handle multiple windows, as it changed in subtle ways (can you alt-tab between Excel documents. Between Word ones? Do documents appear in the taskbar, or windows, or just Excel?) between versions, as Word and Excel within an Office version do not appear to behave 100% the same, and I as I tended to encounter different versions, for example when helping a friend, or when using RDP to a different server (yes, we had servers with Office installed, and we didn't even use them for COM controls) at work.


You think I didn't try that?

IIRC opening up Excel again from the start menu didn't create a new instance, it just gave focus to the document/instance I had open. My jaw kinda dropped cause it wasn't like I had an extremely uncommon use case. There was no File > Open new window. I tried dragging the document to outside, like you can do in Firefox with a tab, nope. Opening a new document simply opened it in the current instance I had running. I needed to do a compare side by side of the documents. Switching between them didn't cut it.

Btw this is a different problem with Excel and Office all together - I was using Excel 2007 because the files I was using were created in 2007 and used a customer (who had only 2007) and would mess up if you saved them in a later version. The files had some complicated macros in them. Compatibility Mode (I think that's what they called it) WAS NOT actually compatible!! 2010 was out, and I had that installed as well, but 2007 had to be the one I used for this task.

I was just doing some simple maintenance operations on them. I had nothing to do with their evil creation.

There was some way to eventually get Excel 2007 to run two instances of itself, but like I said, it was a hack that I found online after a bunch of searching.


I may well be wrong about historical behaviour, but for at least a few years now (i.e. within my memory) it's been the case - but you need to create a new instance, not just a new document. I.e. if you open a specific file, or do anything to start a new document from within excel, it will keep the same instance.

It's still ridiculous behaviour, and drives me crazy because on a typical day I'll have a minimum of 3 spreadsheets open (3 sheets I always need quick access to) and often 10-20, and it can be a real pain in the ass remembering which sheets are in which instance, leading to regularly needing to close a sheet, start a new excel instance, then re-open the sheet in that, whenever I want to look at it side by side to another sheet. And even more annoying, if I'm opening a sheet that's an email attachment, I can't just open direct from Outlook, I have to save it locally so I can make sure to open it in the correct instance.

There's also this weird thing about how it thinks you want to shut down spreadsheets. I always forget which way is which, but sometimes it thinks you want to shut just the active sheet, sometimes it thinks every sheet you have open (in that excel instance). Just checked here and it doesn't seem to be the case on my home PC (Office Professional Plus 2010), but it definitely does on my work laptop. I think it's different behaviour between the X in the top right, double clicking the top left logo, and using alt+F4, I just can't remember which is which. But any logic would say that if you're showing multiple windows on the task bar, telling one to close doesn't mean close the others, and I think this is essentially tied to the same issue with trying to view sheets side by side.

I've never had issues with Word as others mentioned, though - possibly due to luck with versions I've used, or maybe I just don't have to compare two word documents side by side as often as I do with spreadsheets.


Excel is MDI so what I do is stretch the main window to two screens then put the child windows side by side.


Office 97 did not have italic title bars. Office 95 did. And I am not sure if Excel ever used fibers.


Can you imagine how crazy you'd think the customer was if you were giving technical support and they insisted they had to do this?


I don't think it's as crazy as telling them to pick their computer up and drop it on the desk: http://en.wikipedia.org/wiki/Apple_III#Design_flaws


Or baking your laser printer in the oven: http://h30434.www3.hp.com/t5/Other-Printing-Questions/Has-HP...


Reflowing stuff in an baking oven seems to be an increasingly common method of fixing electronics. Hell, my friend reflowed an RF dongle for TI Chronos watch with a heat gun the other day.


Xbox 360 towel trick, anybody?


Anecdotally, I find customers are fine with (enjoy?) any physical / mechanical solution to a computer problem. It's in usually putting them back in their comfort zone. Wiggling the mouse is easier to explain than control panel anyway...


Despite never being a Windows developer, I ended up reading Raymond Chen's "The Old New Thing" blog from start to finish while lying in bed with a nasty flu and wanting to die a couple years ago.

After that, I don't think anything somebody reports as a Windows application bug could make me think they're crazy.


The book's worth reading too - even if you've read the blog.


The customer? didn't you mean the developer?

The dumbest customer will guess that a "solution" like that from support can only come from a design that sucks.

If it were a refrigerator instead of a spreadsheet it would be like support telling you that you have to keep petting the door of your refrigerator to keep it cooling.

Who is the crazy there?


I agree - but the simile is a bit exaggerated. It's more like having to tap the lever a bunch for a couple of seconds on a toaster to make it make toast when it should work the first time you interact. Annoying but also short-lived and not really worth replacing the device.


Or having to jiggle the toilet handle after flushing, a common problem that many live with.


That's a simple fix, though; you just shorten the chain between the flush handle and the flapper by a link or so.


Whenever I am waiting for something to happen, I tend to move my mouse pointer around in circles on the screen - this always caught peoples attention and I've been asked plenty of times over the years if there was a reason for it. I always joked that it helped make things go faster by making the computer know I was still there and waiting - I guess I wasn't lying.


On my old 486 or first Pentium, when the computer would slow down and start trashing/swapping, the screensaver would enable itself while I waited... further crashing the computer. So moving the mouse did indeed make the computer faster ;)

Later I became lazy and just pressed the ctrl or shift key. I still do it when watching movies with mplayer or in something browser-based (that does not disable the screenlock automatically).


I do exactly the same thing, because once upon a time I had a hand-crank modem: https://news.ycombinator.com/item?id=6576823


If I see a wall of text, I highlight it as I read through it. People think that's really odd too!


I do that all the time, even for small amounts of text, but only when using a mouse and not, say, a touchpad. I know other people who do it too.


Same here. I've always told myself that it helps my eye keep the position in the text. (Especially helpful I'm distracted and turn away from the screen.) But if I really belived that, I'd be doing it consciously, whereas I becoming aware that I'm doing it usually makes me stop...

P.S. I'm also slightly(?) OCDish in preferring selections that are integer fractions of the paragrah length. E.g. in a paragraph of 6.3 lines I'll tend to select 2.1 lines at a time (not _that_ precisely of course).


Whenever I encounter a spinning progress indicator, I try to spin my mouse in sync with it.


hehe, same here!


This is a 11 year old knowledge base article for a bug in a product (Excel 97) that saw the light in 1997, that is 17 years ago. Yes, the second workaround is kind of hilarious, but let's not draw too many far-fetching conclusions from it. I believe many of us have seen crazier bugs.


This is by far craziest I've seen: "Error Message: Your Password Must Be at Least 18770 Characters and Cannot Repeat Any of Your Previous 30689 Passwords" [1]

[1] https://support.microsoft.com/kb/276304

Edit: typo


"Note that the number of required characters changes from 17,145 to 18,770 with the installation of SP1."

Wow. The story behind the extra 1625 characters must be good.


My $0.02 is on a dull explanation: the max length and #remembered parameters must be configurable and/or depend on the access method, and the printf-like call that creates the message string must have passed erroneous pointers pointing into some DLL. When that DLL changed, or when another DLL changed in size, causing the DLL to move, the pointers pointed to different, but still constant data.


Whats crazy is Microsoft believing this is an acceptable solution rather than fixing/patching it.


They don't. It's listed as a workaround. The fix is Excel 2000.


The crazy part is they made you pay for the fix. Therefore, it was considered acceptable behavior for Excel 97.


Okay come on. It's not like this is a ridiculously common use case. Plus there are 2 other workarounds.

If you're playing with Oracle data that often, you're probably an enterprise customer who's going to buy the next version anyway.

It's not like this was happening every time you wanted to save or copy/paste data.


I agree it's not a very relevant problem that didn't affect too many people. Still, it shows a pretty lame bug (that is, if the Oracle folks did their job correctly). In 1997 it was already usual to offer updates through the internet.

It's still kind of embarrassing.


>I believe many of us have seen crazier bugs.

Might be, but is it a good sign for computer industry as a whole?


It's a sign we're having trouble in handling the complexity of our computer systems. Programming is mostly human-mind-bound now (and has been for many years); we need better tools and better abstractions.


I think we need to remove that complexity and find a simpler approach to things, not just "better" (more complex) tools to layer even more complexity on top of complex.


I also would recommend that approach! The problem is, that you need more professional and better programmers that are aware of the problem and do reduce the complexity. Many self indulgent programmers, that think they are smart, won't.

But the trend is more to hire more and more novice programmers (because they are cheaper) ... so we get more into trouble in the future!


Agree completely. I think that contemporary CS education is really responsible for this "more complexity is better" mentality, by encouraging overly general and abstracted solutions, and that abstraction should really be applied only when it's necessary and not as a "just because we can" thing.

Aspiring programmers could learn a lot from the demoscene, where astonishingly impressive things are done with very little code and complexity.


Let's rephrase my point: the existence of this knowledge base article should not be taken as a sign for ANYTHING. :)


>far-fetching conclusions

like bot detection on NT-based systems? Yes, lets not talk about that.


Excel does bot detection? :)


>> This problem has been reported when querying an ORACLE 7.3 data source by using the following ODBC drivers. Sqo32_73.dll is manufactured by Oracle Corporation. Microsoft makes no warranty, implied or otherwise, regarding this product's performance or reliability.

Maybe not the fault of Microsoft?


I'm always fascinated by how tightly bound various software is with UI code in Windows. It's the equivalent of random shell scripts having Xlib or Gnome dependencies just to show a progress meter.

I guess if it's an event loop thing one could also alternate 'z' and 'x' as fast as possible :-)


This was a major culture clash for me when I ended up working at a company that mainly sells Windows software. I do my best to stay objective about it and not jump to the conclusion that it's incredibly broken because of its hairiness.


Microsoft agrees that it's broken, which is why they're trying to get rid of the windows desktop and its legacy API's.


I think you are jumping to too many conclusions here, or maybe just not being precise about what "it" refers to. It may surprise you that it will still be possible to write poorly coupled code and create bugs with the WinRT APIs.


It's possible to do that for any system. My distaste for Microsoft is deep, but the ability to write crappy software does not say anything about Windows itself.


In your experience, is OSX or Linux graphical software any better? Genuinely curious. I feel like interface freezes happen on all major platforms, and always assumed it had more to do with the applications than with the platforms. I mean, writing interfaces that never block is a lot more work. But I don't have a lot of cross-platform experience to know if some platforms make it easier than others.


Layer violations are way easier in OSes that don't have memory protection. For example, in classical Mac OS, it wasn't unheard of, and actually fairly common, to have a driver write into video memory to give user feedback or even to have a driver pop up an alert, or to have an application move the mouse pointer by writing the appropriate memory locations in the kernel.

Historically, it was even worse: all desk accessories on Mac OS where running as load able drivers (http://www.folklore.org/StoryView.py?story=Desk_Ornaments.tx.... Aside: that article shows that Jobs did actually design a GUI, that for the calculator. More info at http://www.folklore.org/StoryView.py?project=Macintosh&story...)

I think windows wasn't much different. There is/was an article about the mess that Microsoft's build system for Windows was that explains how they spent lots of effort fixing layer violations in order to improve build times (they used to have full builds only novice every few months. That meant that incompatibilities between, say, an improved menu system and the latest version of the Explorer would take months to surface). Anybody remember that?


Not in my experience. I use IntelliJ IDEA (and previously used Eclipse) on OS X and Linux on a daily basis.

Anytime they start indexing or re-indexing a huge code base in a "background" thread, they end up locking the machine to varying extents. Linux is not as bad, as it only tends to lock up instances of the offending application... But that maybe because the Linux PC is a beefy workstation. OS X is on a much less powerful MacBook Pro, and freezes tend to lock up the whole system, or at least most of the other apps, especially Chrome.

I've also had problems with Chrome, and especially Flash, on OS X. Spinning beachballs on every page load. Click-to-flash was such a lifesaver, but I can now sorta see why Steve Jobs wanted Flash dead.

The spinning beachball has grown to be a frequent source of rage for me over the last 5 years. I'm surprised nobody else complains about it more. Maybe it only happens for heavy Java GUI-based apps?


I noticed that too with IntelliJ IDEA on Windows 7 :(

IntelliJ IDEA should run such background threads with lower priority! All common OS support that, and also Java supports this: http://stackoverflow.com/questions/1617963/setting-priority-...

Can someone file a bug for it?


I stopped getting spinning beachballs when I upgraded to an SSD. OS X, especially pre-Mavericks is a real RAM hog. Java GUI apps have a reputation of being even worse RAM hogs. I wouldn't be surprised to find your spinning beach balls are from swapping to disk.

The only Java app I use is BucketExplorer though, which isn't very heavy.


I recently came across similar behaviour in Linux - if you are waiting in XNextEvent on one thread and you call glXSwapBuffers on another, it never returns. If you move the mouse, XNextEvent returns with a MotionNotify event and glXSwapBuffers can grab some internal X11 lock and complete. It then gets stuck on the next frame. The result is that you have to keep moving the mouse or nothing is drawn.

X11 is supposed to be multithreaded enough to allow this and it worked fine on older versions of Xlib, before the libxcb transition.

Fixed by using select() in the event loop for the curious: https://github.com/lcrs/6ilk/commit/d5c39abde09e0467a8a4d17d...


Yes, it has more to do with being able to usually get away with methods that really belong in the trash.

My "first love" in OS's is AmigaOS, and while the OS is very dated in many ways (no memory protection, no SMP support), one of the things it really got right was to encourage extremely extensive multithreading. On Amiga's it was a necessity if you wanted to have full multitasking, as the machines were slow enough and memory constrained enough that not doing it would severely limit usability (frankly, it would have done PC's a world of good too, but the PC world took the simpler approach of not even trying for proper multi-tasking).

My favourite example is how cut and paste from the console worked.

It includes the device handlers handling keyboard and mouse input, the intuition input handler, which processes the raw input events and turns them into input events for specific windows, the console.device device handler which takes intuition events for console windows and "cooks" the events into higher level events that gets passed to the console-handler, which then will find the area you are selecting, and call a function that passes the buffer to the clipboard.device, which will then create a clipboard entry that gets written to a disk volume, which involves the filesystem handler for that filesystem, which again likely will involve a device suck as ram.device or trackdisk.device to write it to the actual device.

Every single one of these steps is handled by a separate thread/process (the distinction doesn't mean much on AmigaOS due to the lack of memory protection, and are usually referred to as "task" instead).

The reason for this is all down to responsiveness: The input devices and things like trackdisk.device musc deal with hardware, and so must have priority. But if the rest of the flow was given high priority, the system would be sluggish, so the minimum amount of work is done, put into a message, tacked onto a queue, and things are off to a start.

At the opposite end, things like clipboard.device must not lock up when clipboard entries are added, as while the clipboards are usually in RAM:, they are files on a filesystem, and a user with only 512KB RAM and possibly no harddrive might in fact be using a floppy drive for the clipboard - forcing the user to wait for a floppy write would have been intolerable (and why Amiga users loved to mock Windows 3.x users back in the day).

This permeated through many applications as well. It was a matter of pride for many developers, and the first chapters in many Amiga developer books tended to involve Exec (Amiga's "kernel", or parts of it) which provided a set of library functions for managing messages, lists of messages and message ports, as they were essential for talking to the OS, but also readily available for application developers. For many, before you'd written your first "hello world" app you had gotten a crash course in making things asynchronous by default.

Today a day go by for me without either my browser freezing, or Thunderbird freezing or some other application, both on Linux at home and OS X at work. And on the few occasions I've had to work on Windows machines, there too. Every time it happens, I dream wistfully about a world where people understood how to write software that way.

It is, in fact, not all that hard: "All it takes" is to subdivide your application into smaller components that only communicate using async message queues. Incidentally it makes the apps easier to test too, and easier to make robust (unlike under AmigaOS, on a modern OS you can separate components on process boundaries where it makes sense too, and automatically restart failed components), and it makes it easy to make them scriptable etc.


frankly, it would have done PC's a world of good too

Ah, the MS OS/2 2.0 fiasco. I wrote before about http://www.groklaw.net/pdf/iowa/www.iowaconsumercase.org/011... and how it ignores the limitations of the 32-bit Windows extenders, including the lack of preemptive multitasking.


Microsoft designed and built the OS and two of the three applications involved. And the third application was almost certainly built using Microsoft development tools according to Microsoft guidelines. How could this not be their fault?


When I acquire a mutex inside my callback, it deadlocks. Microsoft wrote the calling function and the mutex implementation! Stupid Microsoft!

Never mind that my callback violated the caller's threading model and that mutices introduce deadlocks when used incorrectly... They should fix the bug!

Edit: I guess what I was trying to say is that this part needs a huge "citation needed":

> was almost certainly built ... according to Microsoft guidelines

We don't know that at all. When you insert your code into another process, bugs in your code have the potential to destabilize that process. This discussion is extremely light on details but we cannot assume that the caller, rather than the callee, is at fault. If it's working correctly with another driver, I say, look at the suspicious driver, it's likely got a bug.


When was the last time you saw a mutext deadlock that could be worked around by moving the mouse?

You're right that this discussion is light on details, but there is one relevant detail from which reasonable conclusions can be drawn: there is a workaround that involves moving the mouse around continuously. I can't think of any plausible scenario that could produce that behavior that does not involve some blatantly horrible design decision by Microsoft (hence "their fault"). It's possible that such a scenario exists, but I can't think of one, and I have not seen any such plausible scenario proposed by anyone else. I also know that Microsoft products are chock-full of blatantly horrible design decisions. In particular, Windows tightly couples the graphical UI with the OS kernel in a way that other OSes do not. It seems to me exceedingly likely that whatever is causing this problem has something to do with that.


Elsewhere on this thread there is the suggestion that this could have to do with improper use of message pumps... Message pumps as a concept has equivalents in pretty much every UI framework I've looked at. (Some examples: run loops in Cocoa, g_main_loop in glib/Gtk+) If you violate the run loop's threading requirements it's not totally inconceivable that a mouse event could affect behavior.

I suggest it might be helpful to learn how the system works before you bash it.


I am familiar with the design of message queues, though not specifically with the design of Windows message pumps.

> If you violate the run loop's threading requirements it's not totally inconceivable that a mouse event could affect behavior.

That's true. But the possibility remains that the design of Windows message queues is so byzantine and imposes so many arcane requirements on the developer that it is easy to miss something. In which case the question of fault is still arguable even if the immediate bug is in Oracle's code. I have a very hard time imagining any circumstances under which an OS could allow a mouse event to impact a database driver -- even if the database driver is buggy -- and still be considered well (or even reasonably) designed. But I'm certainly open to the possibility that I've overlooked something.


It looks more to me that there's some problem with the background polling mechanism in the app, as implied by solution #1. Presumably something in the mouse event handler does something that prods the polling code into life, perhaps because something somewhere is trying to update the list of things on screen in order to handle hovering. Kind of hard to say without looking at the code, though...

(The Windows message queue is actually fairly straightforward, and if you read the documentation then it's simpler still. The somewhat magical WM_PAINT message is a bit ugly, but aside from that it's actually quite difficult to get things massively wrong. Which is probably why this sort of freakish bug is rather rare.)


As someone who has worked with Win32 for a long time, I think the problem is quite simple; in the message loop body someone put something like

    if(dataAvailable())
     fetchData();
and assumed regular window messages to drive execution of this process. If you don't move the mouse or do anything else with the window, no messages are sent to it (unless a timer or something else does) and the message loop just waits for the next message, so that code doesn't get run. If you move the mouse around in the window a constant stream of WM_MOUSEMOVE will drive the loop. Polling really shouldn't be done in the main message loop; one way to fix this is to move to a completely event-driven system where dataAvailable() sends a message that causes the main loop to run fetchData().


That sounds like a very plausible theory.


You overlooked the fact that programming model of windowed/GUI applications is quite complex and very sensitive to non-cooperative code - a thread that frequently blocks for 100 ms can cause very unpleasant hiccup. Introduction of OLE/COM interprocess communication mechanism made things worse as one unresponsive application could block other programs and halt their message loops. But it was quite efficient and you didn't have to spin up multiple threads to have multitasking in your application.


> You overlooked the fact that programming model of windowed/GUI applications is quite complex and very sensitive to non-cooperative code - a thread that frequently blocks for 100 ms can cause very unpleasant hiccup.

No, I didn't overlook this. But this is 2013. IMO, a multitasking architecture that is "sensitive to non-cooperative code" ought to be considered broken, and so if that's the cause of this problem, that still counts as Microsoft's fault in my book.


It's now 2014, actually, and you're talking about a problem from 1997, in an OS where backwards compatibility has been an extremely high priority since the 1980s.


> It's now 2014, actually

Is that really the kind of discussion you want this to be?

> you're talking about a problem from 1997,

The last revision of the article is 2002. But your point is well taken. I had assumed this was a current issue. (Maybe it is. I don't have time to dig into those details right now.)

> in an OS where backwards compatibility has been an extremely high priority since the 1980s

OS X was introduced in 2001, and it could run OS 9 applications, so these kinds of problems can be solved if a company decides it is worthwhile to solve them.


> Is that really the kind of discussion you want this to be?

Yes. If you're going to chastise people based on what year it is, you damn well better be right about what year it is.

> OS X was introduced in 2001, and it could run OS 9 applications

OS X booted a modified copy of OS 9 to accomplish that. It didn't fix anything about OS 9. It also wasn't present on Intel Macs, nor on Leopard, the last PowerPC-compatible version of OS X. Nor was it fully compatible -- I always had trouble with it, and have kept older System 7 and OS 9 Macs around rather than put up with it.


In that case I look forward to the lisper UI toolkit, which stays responsive when application code stalls on a UI thread. I'll bet it goes well with the lisper dynamic linker, which I hear completely isolates me from bugs of code living in the same address space.

Kidding aside, I can't help but think that you keep going back to that "clearly the mutex is broken" analogy I mentioned earlier. If I read you correctly, it doesn't matter if the Oracle code is breaking the rules set out by the MS documentation, Microsoft should account for all possible bugs past, present and future and prevent them. Seems like that's asking a lot. The other platforms I've worked with don't seem to satisfy this either.

What bothers me more, though, is how preemptively dismissive you've been about Win32 while showing something of a lack of knowledge on how it works. I don't think Win32 is perfect. (Ask me about filesystem behaviors some time.) At the same time I have gotten to know it and can appreciate areas where it works well. I would also hope that before making the kinds of comments you are making about any platform, I'd get to know the framework a bit better first.


> I look forward to the lisper UI toolkit

I might not be able to show you such a toolkit, but I can certainly show you an operating system that allows you to write a database driver whose behavior is not affected by the user moving the mouse.

> it doesn't matter if the Oracle code is breaking the rules set out by the MS documentation

Of course it matters. I just couldn't imagine what rule there could possibly be that is both 1) broken by a database driver and 2) reasonable, that could possibly result in said driver displaying the behavior in question. (I can imagine such a rule now because to3m pointed out a reasonable possibility in another branch of this thread.)

> how preemptively dismissive you've been about Win32 while showing something of a lack of knowledge on how it works

I freely confess to being (at least potentially) prejudiced against Microsoft's technology by my anger at the fact that they reached market dominance by breaking the law. As a result of that prejudice, I have maintained a carefully cultivated ignorance of all things Microsoft -- until recently. In the past weeks I have found myself in a position where I had to do some Windows development for the first time in my career. So I am far from an expert, but I do now speak from firsthand experience when I say that, in my humble opinion, Windows is every bit the horrible monstrosity on the inside as it has always appeared to me to be on the outside. And, BTW, my opinion is shared by colleagues who know a lot more about it than I do. So yes, my opinion on this is not as well informed as it might or should be, but I have not reached in a total vacuum either.


It seems clear that the window message pump was being used by the Oracle driver in a way that made it break when hosted in a GUI application. Message pumps are not atypical of event-driven GUIs on many different platforms, and problems with multithreading abound everywhere shared memory is the default. IMO it's a big stretch to try and blame MS for this.


> It seems clear that the window message pump was being used by the Oracle driver in a way that made it break when hosted in a GUI application.

It's not clear to me, but maybe you know something I don't. What is it exactly that makes this clear? And what is the "way" in which the message pump is being used that makes it break in a GUI app?


Just because they give you a C compiler and a library to link doesn't mean you blame the compiler developer for letting the user shoot himself in the foot.


If I understood it correctly, the problem isn't that Oracle is using the Microsoft libraries and compilers incorrectly, it's the other way around. The two applications that are broken when using Oracle's ODBC drivers are actually Microsoft's, so it superficially looks like the company with the user-facing problems either doesn't want to fix their bugs or doesn't want to work around the bugs in the ODBC driver.

This isn't too uncommon, actually, but what usually happens is that the company with the upper hand prevails and the other one has to fix the bugs, lest they piss off their customers who have to leave. However, when both companies keep their users in tight vendor lockdowns, they just wave their cocks around for a few months blaming each other and settle for the users working around, since it's the cheapest alternative.


That's true, but it's a straw-man argument, because Microsoft did not just produce the compiler. They produced the compiler AND the operating system AND two of the three applications in question.


An aside of historical trivia: On an Amiga 1000, you used to be able to wiggle the mouse "too fast" and cause the machine to crash with its infamous "GURU MEDITATION ERROR"[1].

1. http://simhq.com/forum/files/usergals/2013/02/full-4656-5091...


I wrote an ncurses frontend to the the ticketing system used at a job one time. One of the users complained that if they resized their terminal window rapidly, larger and smaller, that is would crash.

The fix was telling them to stop doing that. :)


I guess both of these were related to nested interrupts somehow?


This reminds me a bit of how networking works on OS X and iOS (at least if you are using the "standard APIs" the way they are supposed to be used: The networking that is going on is tied to a runloop. Some apps tie the networking stuff to the main run loop. You can see which apps do that by simply opening a context menu in the app somewhere or open a regular menu from the app's main menu. The run loops will be "halted" for as long the menu is open and thus your networking will stop. As soon you dismiss the menu the networking will continue.

At first this may seem totally bullshitty: Imagine a download manager. Do you really want the download to pause every time you open a context menu? Well it turns out it is not such a bad idea in many cases: What if the context menu allows you to cancel the download? If the download were to go on in the background you would have to explicitly take care of that. If you tie the networking callbacks to your main runloop this simply can't happen.

Of course there are also a lot of use cases where you want your networking code to not have anything to do with your main runloop...


This isn't really correct. Runloops have modes. Often networking happens in a mode that is not restricted (see NSRunLoopCommonModes here https://developer.apple.com/library/ios/documentation/cocoa/... ).

If you are running your network code in default mode on the main thread (problematic doing that and it's probably better to use async methods these days) then yes that can happen but it's usually programmer error.

On Mac, entering a modal window will usually put the runloop in a context that regular events won't fire and only those that mater to the modal window will fire. Commons mode almost always fires though.

I work on Apportable (YC2011) and we have reimplemented CFRunLoop/NSRunLoop down to the bare metal. A lot of misconceptions on how it works we find.


Thanks for the clarification.

I am aware of what you mention in your answer. In my statement I simplified quite a bit. I know that runloops have different modes. What I was saying is that if you just use the "defaults" then you will see the phenomenon that I described.

And I don't think that it is a programmer error if you use the defaults... but sure: These days you would probably just use a framework or take care of these details...

May I know why you have reimplemented CFRunLoop?


It almost always programmer error to use defaults mode for NSURLConnection, especially on Mac where they are more runloop modes used by the platform that are not inclusive of defaults mode, unless the user has a very specific reason. File IO is usually ok in defaults mode because it makes code easier and you don't have to worry about timeouts.

We make Objective-C run on Android at Apportable :-) The entire stack. Everything from clang, GCD, CoreFoundation, Foundation, UIKit, and dozens of comm frameworks. It's fairly popular with game developers.

We recently re-developed all of Foundation from scratch on top of the parts of CFLite that Apple open sources and no longer use GNUStep.

http://docs.apportable.com/release-notes.html#1100

We have even open sourced all our tests around Foundation as well and this is one of them. Check it out:

https://github.com/apportable/FoundationTests


I still don't agree that it is programmer error per se... but we don't have to agree on that. :)

Your idea is somewhat similar to "our" idea... at Objective-Cloud we try to bring Objective-C to the cloud... :) You try to bring it to Android. What a coincidence. Nice to meet you. :)


We should talk off list. :-)


:) My email address is in my profile in case you did not already find it...


> And I don't think that it is a programmer error if you use the defaults... but sure: These days you would probably just use a framework or take care of these details...

It's usually a programmer error when you write code that does something other than what you intend, regardless of whether or not what you intend to do is the default behaviour.

I think this is perfectly understandable behaviour for a framework of such a general purpose. The whole convention over configuration thing works for tools like Ruby on Rails and web2py. 90% of the applications written using them are pretty much identical and the non-technical constraints favour quick delivery over well-adapted architecture, so you can afford making it obnoxiously hard for the rest of 10%.


MacOS had non-preemptive multitasking, what meant that a progam would only have a chance to transfer data once all other programs released the CPU. The default develoment suite also had several loops were it wouldn't release the CPU, there are some stories around about MacOS servers that stopped serving content because somebody left forcus on the wrong element, or left the mouse on a menu on some application.

But both OS X and iOS are preemptive. Networking just keeps working, but your interface thread may stop for a while depending on how you write your programs.


That's not what he is saying. OS level networking keeps working. But an application that processes network events in the same context as it is processing UI events will cause its network related stuff to halt in some instances.

That's not the OS's fault; that's "just" bad programmers, and just as trivial to do wrong on Linux or any other OS.


My pet theory about Microsoft is that worse-is-better provides an overwhelming first-mover advantage, and an even more overwhelming second-mover advantage in the longer term (provided the first mover followed the WiB strategy.) I call this 'technical living-beyond-your-means' (rhymes with 'technical debt'). The flame that burns twice as high burns half as long, yada yada.

Eventually, you will be kicking yourself for not having made it right the first time.

The corollary is that eventually we will all be running Plan 9, so there are probably other factors at play as well, which I will quietly brush under the carpet, like air friction in a high school physics question. ;)


I have spent the past two years learning this lesson. It's so much more work to maintain a project that was not put together correctly in the beginning.


Many moons ago I had a dual Pentium Pro and had installed Slackware with one of the early linux kernels to include SMP. And, when clicking on a link in mozilla (or it might have been netscape back then), the new website would only be rendered after I jiggled the mouse. I just figured it was some race condition with the SMP kernel that got broken when the PS2 interrupt fired. It went away with some kernel upgrade, but I still jiggle the mouse every now and then in the hopes a page will render faster...


It may look weird or stupid, but it makes sense to publish it because it actually works.

I don't know what causes this kind of issue but I remember using this workaround at times since windows 95. It somehow prevented freezes of either application or windows from happening.

On the other hand, jiggling the mouse around can also trigger the adverse effect of crashing some other apps under others conditions such as loading screen during games.


Internet Explorer (v2 or v3 I can't remember) suffered a similar bug, basically pages would load or load faster if you "gave more cpu" to the thread, by moving your mouse over the window. Noone ever believed me but now I feel a bit less crazy for thinking that worked.


I remember that too with IE 4 on Win95.

IE has still the same UI mouse feeling of Spyglass Mosaic. IE 1 started with its source code and even after the major improvements of IE 3 it still feels very similar. (IE up to IE 6 mentioned "Spyglass source code" on the about dialog.)

Even the print preview window of IE 11 looks like the same window in NCSA Mosaic 2 (1995) (I just downloaded and run on it on Win7)


Nothing new for Microsoft: There is a reason, they had to drop Office 95 and completely re-implemented it. I guess, the day will come, that the current implementation must be abandoned too.

The trouble with today's computing is: Just to much complexity around (starting from the OS, and that is also valid for Linux, I must say!) and to few really good programmers that make things better (and not worse). I see also in the Linux world to much complexity and to few real professionals.

Real professionals don't try to handle complexity -- they try to minimize it!


The issue with simplicity is that computing is a complex problem.

It's all well and good to say "Just keep it simple!" It's another to implement that.

I built and maintain a fairly popular RubyMotion library. It's goal is pretty simple: DSL out the ViewController hierarchy management and make it manageable for the application developer.

Every bit of code I add to what is now a fairly mature and full featured system pains me. But every bit of code addresses some edge case that we didn't think of, but which is quite valid.

An OS (like Linux, Unix, Windows) has orders of magnitude more issues to deal with than I do. In order to keep the system simple for users, they often have to add complexity to their code.

Simple for users just isn't the same as simple under the hood.


>Simple for users just isn't the same as simple under the hood.

Right, and exactly that is the reason, why we are in desperate need for really professional computer scientists (or programmers, how you want to name it). But the trend is the other way around. Everybody is searching for the quick solution and the cheapest programmers, or hackers (I even see many job offers that go in the direction hacker rather than professionals).

It takes a real professional to cope with the complexity and reduce it to the minimal value. But in our current trend, we will thinks getting worse and worse, because we have far to much coding and far to less professionals (and even they can't afford to make a clean job oftentimes).


It still happens, i have seen here in Mexico in the Gov Agency that collects taxes when they send something to print/export pdf's they keep moving the mouse or their progress bar doesn't move at all. But they made this natural i think it is part of the training they receive, maybe some devs there tough it was the only way to get the rid of that bug, JUST KEEP MOVING THE MOUSE!.

Agency url: http://www.sat.gob.mx/sitio_internet/home.asp


Sounds like they do their long-running process in the event loop. It's an easy mistake to make for someone writing a Windows app for the first time.


How random workarounds are reminds of a joke that made the rounds on the internet more than a decade ago: "If Cars Were Built Like Computers" and the best line is:

> Occasionally for no reason whatsoever, you car would lock the door and refuse to let you in until you simultaneously lifted the door handle, turned the key, and grabbed hold of the radio antenna.


This made me laugh out loud in the old school sense of this phrase.


"in the old school sense of this phrase"

You mean it made you actually laugh out loud?


You bet. A loud laughing noise came out of my vocal cords. I felt like I'm in IRC in the early 90s. Remember that feeling? Laughing, out loud? Feels great. Kids today with their loling will never know....


I don't really understand why people make fun of this. Would you rather they do not provide a workaround for the bug? And yes as every programmer should know, workarounds can be sometimes odd (I did similar things in the past to work around things).

And yes of course providing a bug fix is better, but providing a workaround is always the first step. A workaround is plenty good enough for many people for whom updating is more risk and hassle.

The whole thing reminds me a bit of the classic "known knowns" episode with Rummy where he dared to bring some decision theory into an interview answer ( http://en.wikipedia.org/wiki/There_are_known_knowns ) The reaction to it really showed more about journalists than about the statement.


I wonder if this is just feel-good nonsense for the user or if it's actually because moving the mouse causes the message loop to pump and thus prevents something or other from timing out...


I clearly remember old times with Windows 98/ME, where if you keep moving mouse, long-running operations like directory coping would be faster, thanks to CPU power savings not kicking-in.


True, but back then there was no "CPU power saving" mode. The CPU was slow and it always run at full speed (power was cheap back then).

In the Windows NT series (4, 2000, XP, etc.) the sheduler gives the forground application an higher priority. It runs a longer CPU time than all other applications. This was probably also similar with Win 3 and 9x series but probably with many hacks and glue code on top of DOS.

Moving the mouse also prevented the screensaver, an application that would "save" the screen by drawing fancy graphics to prevent static phosphore burns on CRT monitor and slowed down the system


Not completely true, there was some kind of power saving starting in those days:

If I remember correctly, one of the first moves towards power-saving in the stock-OS was Win 95/98/ME (one of those) to start using the HLT instruction (stop processor up to next interrupt) in the main OS, was a busy loop until then.

Notebook manufacturers (I worked with machines from Toshiba and Compaq in the 90s...) provided DOS TSRs and windows applications and drivers to control the CPU speed and display brightness.


that's the reason why win9x in emulators like qemu and vbox will cause those emulators to eat up the entire CPU.


So back in the early-mid 90 days of terminals and 24.4K baud modems...I would FTP into a university server to download game demos...and in many cases, in order to get the progressbar moving beyond a slow crawl, I would have to move the mouse around (this was on Windows 95 or something similar)...and the faster I moved it, the faster the download seemed to go. Looking back, I think of it as one of those times when I was just dumb and didn't know how computers worked. But maybe it really was mouse powered...


When I was working in my college IT department, I had a computer come in once that was refusing to load things. I started up firefox and nothing was really moving, then I started moving the mouse around and all of a sudden it worked. I took a look at some other things and noticed even the system clock didn't move until I moved the mouse. Never did find out what the cause was, just reimaged it and called it a day.


Anybody remembers the same used to happen with youtube? Sometimes you had to keep moving the pointer in order to load the video correctly...


Coworker had this at a dayjob recently - it involved a full screen Windows Remote Desktop run from a Citrix Remote Desktop, on a bad day run from another remote desktop. If you didn't move the mouse enough: the connection did indeed fail. The connection took 4-5 minutes anyway, so quite annoying.


Oh, so this actually works? How wrong I was all those times I scoffed at people moving their mouse around while waiting for a program.

On a related note: isn't it interesting how this meme (mostly a placebo) traveled to the majority of computer users in a pre-internet age?


Back in the day, there was a windows based IDE/simulator for the PIC 8-bit microcontrollers. The simulation ran much (2-5 times?) faster if you wiggled the mouse. Someone wrote an app to send fake mouse events to the simulator window.


Haha, that's awesome.


I thought that was a joke... :-/ I don't if it's hilarious or just sad...


iOS has a similar (some may say the opposite) problem, especially when setting up notification-based APIs and and app which expects a notification to happen within a certain period of time (ie a user needs their location now to complete an action, say a social post with location). Sleeping won't work, nor will waiting on another thread work because there has to be activity on the main thread for the main runloop to execute. What works is looping until timeout by sleeping for a few ms and then running the main thread's run loop explicitly. Gross but it works. Otherwise, the location notifications never arrive. Such is the downside of cooperative multitasking.


nice way to say "it's not a bug, it's a feature".


I actually had a similar problem installing a copy of windows 2000 years ago - the only way to get the install to complete was to sit there moving the mouse.


How does this work? What makes mouse movement relevant to this... this is computers, not magic.


Other folks have answered. I'll try to summarize:

It's an impedance mismatch, you have a long-running background operation (sql query) that is running within the context of an event driven system (a windowing GUI). In this case the problem is that the sub-system that runs the query will only receive cycles when the main thread for the app fires mouse events. This is due to poor application design, of course, though not necessarily to the extent that it may seem, sometimes this sort of thing can be a harder problem than it may seem.


The exact cause would be a question for (old, probably retired) members of the Excel team at MS, but bugs like this were fairly common when working with Windows apps that mixed interprocess communication with UI updates.

Raymond Chen has a great example of a similar bug here: http://blogs.msdn.com/b/oldnewthing/archive/2005/02/17/37530...


It's probably running out of entropy. If windows is anything like Linux, it will use certain random events like mouse movements, keystrokes, boot time etc. to generate randomness.


Probably not in this case. Windows' RNG does not block when it runs low on entropy. AFAIK only Linux does that. And the task is a database query so probably does not use the RNG.

But yes, I have seen a similar case involving linux /dev/random - the user reported that outgoing emails were sometimes delayed for hours and that moving a mouse over a VNC window would sometimes speed it up. I did not believe that at first, but it was exim4 running out of entropy when generating TLS session keys. Worse, it was on a VPS with about 20 exim4 processes competing for the entropy.


Bug ridden horde of elephants...


Method4: just turn off the screensaver/standby?


Seems it's more about incorrect implementation of an event loop rather than a change in active processes. I assume the purpose of the screensaver is to save the screen, not to sleep other processes when it is activated. I further assume that hardware interrupts and event creation are the things happening while moving the mouse continuously. Based on these assumptions, I'm almost certain the screensaver isn't the problem.


[deleted]


Seems you're a new user - so a healthy tip: Keep those kinds of image macros for Reddit. You'll notice that the upvoted comments here are those that contribute thoughtfully to the discussion.


Wth?!


And if you do not have a mouse ? :D


Just a wild guess given a post describing why the problem happens, but this being windows 95, you use the alt key to bring up the window management drop down, choose move, and just move around the modal window with the keys. You cause the window refresh event to fire, and the conflicting threads continue to fire properly.


When I was a kid, Windows ME would hang on startup unless I kept moving the mouse.

My friend's dad, a Math/CS teacher, did not believe me until I showed him.


Microsoft and mice seem to have an odd relationship.

The first versions of Windows NT would crash if you moved the mouse while shutting down. Like a child throwing a tantrum because it's bed time...


Some version of OS/2 used to crash at boot if you randomly pressed a few keys...


I ran NT 3.1 and I don't remember that at all. Of course, I also don't know if I ever tried to move the mouse while shutting down.


lol! " it may take several minutes "




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: