Hacker News new | past | comments | ask | show | jobs | submit login
Sikuli: Automate Anything You See on Screen (sikuli.org)
319 points by GuiA on June 9, 2016 | hide | past | favorite | 65 comments



I used this to automate my unemployment benefits, no joke. You have to keep reporting hours (every week I think), so I just set this up and run it, the closest thing I get to having a piece of software that makes money #sosad


Congratulations. You've made Kafka proud! :)


Don't you have to report application/locations/etc too? Or did you actually do the applications and log them in some type of database (flatfile/etc), and have this script auto fill in the benis every week?


In my state you have a few weeks or a month of grace period before they start to require resumé and interview skill classes, as well as you reporting on any applications you filled out.


i dont think I had to do that but i did have to report whether im looking for a job and other yes no questions One of the reasons i automated it was because the UI, the gap between the question and the yes no option was so far that it's hard to align them (I answered it wrong one time and they made a case out of it, I did check no on the "are you looking for a job question")


Did you set it up once and set it on a schedule, or did you have to run it each time? I'm curious if the tasks can be scheduled in advance, and how.


I ran it manually, and yes it can be automated even further but I'd like to keep an eye on the process + the sense of gratification of double clicking an icon == making it rain $$$


I managed to setup a schedule with a simple batch script + Windows Task Scheduler (OS: Windows). Imagine you could do the same on other OS'


Great project, but it hasn't changed since 2010? Maybe we should add it to the title?

I was hoping that image based testing would eventually overtake Selenium/Appium for web/mobile testing. In principle there are several advantages:

+ It works the way human works. If your button id has change from #submit to #submit-application, your UI test should work just fine. However, if your button became one pixel at top-left corner, test should break. Right now most frameworks do the opposite.

+ If it's based on visual, it would be easier to maintain tests. E.g. Your button change color from light grey to dark grey, would you like to update your tests.

+ Tests would be much easier and faster to read and write. Even less technical folks like current manual QA testing can contribute to automate testing tools.


You might be interested in http://www.sikulix.com/, which was forked from this project and is more up to date.


Oh, you beat me to it. But the good news is, that SikuliX maintainer seems to be actively developing the next major release version now, and last release is not that old (end of 2015, where the technology don't have to change that rapidly)



Link is a 404.


I mean, we don't rely on pixels because we usually care more about behavior than we do the actual design. It's too brittle to rely on exact pixel locations in the face of added features and small design tweaks. Maybe it makes sense to have one or two of these types of visual checks, but they should be automated so that regenerating them is easy.


> I mean, we don't rely on pixels because we usually care more about behavior than we do the actual design.

True, although for some things I do still like a visual test. Your users may be used to clicking a button that looks a certain way in a certain place, and suddenly shifting that might be worthy of requiring a change to a test. You wouldn't want all your behavioural tests done like this though.

> Maybe it makes sense to have one or two of these types of visual checks, but they should be automated so that regenerating them is easy.

I agree. Perhaps there's a nice integration between the two? If you were able to give, say, a css selector and a template image rather than just one or the other to look for you'd be able to report these errors:

* I can't find the button I was looking for, but there is something that matches the css selector [blah] that looks like this: [image] instead of [image]. Do you want to update the image in the tests? Y/N

* I can't find the css selector [blah] but the button appears here [image] with [class] and [id], do you want to update the css selector in the tests? Y/N

If the program is actually still fine then it'd be a quick auto-update, and if it's not then these bits of information are probably the first bit of debug you'd want to do anyway to figure out what broke the tests.


> True, although for some things I do still like a visual test. Your users may be used to clicking a button that looks a certain way in a certain place, and suddenly shifting that might be worthy of requiring a change to a test. You wouldn't want all your behavioural tests done like this though.

A decent example of this would be that i've implemented frontend features before but did not notice an unintended pixel shift. I introduced a bug, albeit just visual, but a bug nonetheless - and i had no idea. Likewise, a minor visual issue like that can get through many stages of review and even deployment.

Pixel interaction may be terrible for early prototyping, but the idea has merit. Or, at the very least, perhaps it could identify by css/dom, and throw warnings for pixel alterations.

I like the idea of letting design people programmatically enforce standards (Though, no idea how to make them do that programmatically lol)


> Perhaps there's a nice integration between the two?

They both run in Java, so you can combine Sikuli with Selenium pretty easily [0], so long as you're not doing headless testing. I've played around with it a bit, using Selenium to get coordinates of a container element and then having SikuliX operate just on that region of the screen.

[0]: http://www.softwaretestinghelp.com/sikuli-tutorial-part-2/


The visual matchers are fuzzy matchers, not pixel perfect.


It can be some nice fuzzy algorithm with learning capabilities. E.g. if you change background, but button is in same place, you can just update the test.


From my experience, the fuzzy matching works well enough to handle picking out text or icons on interfaces with semi-transparent backgrounds that change on scrolling (e.g., due to a parallax effect). It's pixel-based and not scale-invariant, though, so it won't automatically handle multiple zoom levels/resolutions.


you can adjust the fuzziness


If you're only worried about your tests breaking because of #submit vs #submit-application you might be interested in http://heliumhq.com. It's a commercial wrapper around Selenium and I'm one of the authors.


What is the difference between other wrappers? E.g. Selenide. It looks like you are selling shitty wrapper, not more...


The only reason you say shitty is because you don't like that it's commercial. ¯\_(ツ)_/


Automation lead with a specialty in GUI here,

You do realize we standardize on testing to ID so that you can change everything else (location, parentage, etc.) because that's the stuff that benignly changes during development or for localization, right? The ID is supposed to be semi-permanent and not change in the normal case, so don't make trivial changes like that.

Otherwise, we're fully capable of finding buttons by coordinate, parentage, attributes, whatever. It's just crappy practice to do so. We intentionally lock down some things and not others because we know how software development works and what changes are likely to indicate greater chances of bugs and what aren't. We also code to accommodate different resolutions, adaptive interfaces, visual changes for different languages--your button location and boundaries change when your English 5 character caption becomes a German 25 character caption--etc.

As for the 1 pixel button, it's not just that it might be 1 pixel. If all you care about is static visual correctness, you can do simple bitmap comparison to get that, though that has its own issues. And we can build in basic implicit checks into our frameworks for sane boundaries, z-order occlusion, etc. It's common for control selection to also validate the control could actually be accessed by the user. The bigger problem is that the human interaction details might not work right, It might not visibly depress when you hit it, for example, or might have a hit target that doesn't correspond to its boundaries, or have a janky delay after tapping, or whatever.

Even injecting the most user-like of user events won't always catch those so there's no substitute for actually exercising the interface. And since you're in there anyway, the visual checks become trivial and mostly automatic.

On another subject, there's no way in hell I'll trust a system that takes guesses as to what it's looking at for a regression test suite, which is what nearly all GUI automation suites are. Even if I did use this tool for something less deterministic like model-based GUI fuzzing, I'd need to have another suite built that verified beyond a doubt that the button I expect to be there is there and at least superficially behaves how I expect it to behave. That requires discrete selection and determinism, not fuzzy identification and non-determinism.

The point of a regression suite is NOT to try to make the tests pass, it's to verify that absolutely none of the assumptions I coded into it changed and alert me to manually explore that functionality if they have by failing. Then I can bless the new assumptions with an automation code change or file a bug and xfail the test until the bug resolves. But the assumptions are supposed to be crystalized. You can't trust an alert system unless you know it'd make its decisions to alert the way you would. Unpredictability isn't welcome there.

In this particular case, an ID changing means you removed the control and added another one, conceptually (and possibly practically) speaking, so the automation is supposed to break to trigger a re-examination. Your own code might reference that ID too, you know, so just you having done that means you greatly increased the chances of there being new bugs.


Made a small bash script to do something similar: https://gist.github.com/eirikb/c189a7d8406b2897dad0e86086be1...


I ... was expecting something more complicated than that.

What a pleasant surprise. Thanks you for sharing!


Nice. Any demos of it in action?


Certainly, it should be pretty easy.

Here is a demo: http://i.imgur.com/1BpMKpe.gif Script is here: https://gist.github.com/eirikb/ac8196beb0b57577a8fc47eb18427... Images are here: http://imgur.com/a/Uv3NH

What it does: Opens Google translate in Chrome, inserts some Norwegian text, copies the translated text and inserts it into this comment. To make the images I used gnome-screenshot, it let me grab areas.


This is really nice, great hack.


Another great project in a similar vein is Pulover's Macro Creator, combined with Autohotkey:

http://www.macrocreator.com/

http://ahkscript.org/

Sadly Windows only. It's one area where Windows has a leg up on Macs.


Keyboard Maestro is a good option for Macs. Fake is another one but only for automated web browsing. I've been using both for years.

http://www.keyboardmaestro.com/

http://fakeapp.com/


Thanks for this!


I am primarily a Windows user, but Automator blew me away when I first saw it a few years ago(mainly due to the OSX Applescript dictionaries? which it could access meant it had a lot of additional functionality.)

A cursory glance at macrocreator doesn't really give me the impression of what it does better, do you have any additional insight?


Ease of use is a big one. I tried used Automator for simple tasks in the past, and it's got a complicated syntax and a steeper learning curve compared to AHK which meant I gave up. Look at the list of keyboard commands for an example of what I mean: https://autohotkey.com/docs/KeyList.htm#Keyboard

Another example, this is a macro to jump to the end of a file name in windows explorer and append today's date, then jump to the next item in the list (F3 is the trigger):

  F3::
  Macro1:
  Send, {Alt}{f}{m}{End}	; shortcut to rename, jump to end of filename
  SetKeyDelay, 0
  SendRaw, _09-06-2016	; this is what is sent
  Send, {Enter}{Down}		; jump to next file
  Return
This is extremely readable and easy to understand.

Perhaps it's possible to do pixel searches in Automator (click on certain buttons for example), but AHK + Pulover's Macro creator make this extremely easy - this is why I mentioned both, as these combined make a perfect drop-in replacement for Sikuli - and they're far more stable than Sikuli, which was quite unstable when I tried it at year or two ago.


FWIW I used sikuli(x) and found it very useful, but I have a chronic allergy to Java[1], so I've been re-implementing the same API in pure Python --> https://github.com/shish/sikulpy

[1] more specifically, I found sikuli scripts pretty painful to debug, since neither java debuggers nor python debuggers seem to work quite right.


This is just plain incorrect.

Yes, the original API was written in java, but Sikuli scripts are written in Jython. I did an entire automation project using Sikuli written in jython without any hiccups a couple of years ago.


> This is just plain incorrect.

Did you somehow manage to read my comment and interpret it as "Sikuli scripts are written in java"? Because that's not at all what I said (or meant) :P

Sikuli itself was (and still is) written in java - which I find a pain to work with when I want to modify the internals of the library; and then scripts are written in jython, which means I don't have access to my cpython-specific tools and extensions.



How do you tackle the debuggablity problem?


I'm just using standard python tools (pudb and pycharm depending if I'm running on a remote server or locally). Also being pure python makes it easier to stick instrumentation into the core library when I need it :)


When I started at my current job (back in 2011) they were using Tevron's CitraTest product to automate software hosted on citrix. The first task I was given was to see if we could find an alternative to CitraTest, well Sikuli was the answer. All of the existing code was written in VB.NET, but with a simple wrapper class and use of XML-RPC, we dropped CitraTest completely for a free MIT-Licensed software. It has worked great for years actually.


Sikulix is great and comes with a Python IDE where you drag and drop screen captures - so a command to click on an icon has the icon itself written into the code.

Robotic process automation (e.g. Blue Prism) is coming to the fore in commercial settings - a mixture of API and computer vision (probably a horrible mix) so there's a market out there for developing something commercially.


Sikulix is the current fork, and I have used it to surprising effect in all kinds of situations, really awesome as long as you can ignore some bugs, but the good news is the dev is super responsive to reports.


Last time I tried it on OS X the experience was pretty painful but it did a decent job when all of the pieces fell into place. Has anyone tried both Sikuli and RobotJS [1] and can compare/contrast them?

[1] https://github.com/octalmage/robotjs


Used Sikuli for some casino fairness testing https://pbs.twimg.com/media/B4wlbnhIAAEti1z.jpg:large


I'd love to read more about this. Do you have a writeup somewhere?


Used Sikuli for automating systems testing a few years ago (last I heard they stopped working on it). We had to use a VNC because it was affecting system performance. Overall really good and glad SikuliX was able to continue development of it. I used it again a few months ago and there same issues with the IDE and some other setup/runtime bugs exist that existed a few years ago. Otherwise it has huge potential for automating QA/systems testing if someone takes the time and has the resources to do so.


We use Sikuli for automatiom tests on both Windows and Mac. Other than occasional issues with its image recognition features it works great and serves us well. We have Jenkins automation jobs spin up VMs in VirtualBox, load build artifacts on the VM, and run Sikuli tests. All test output gets pulled back into the Jenkins output and parsed for successes/failures. Makes it possible for us to test across different OSes use the same Sikuli scripts.



Great tool. Used it to automate SAP in a project a couple of years ago. Was using it with a .NET framework though, so it is was a total pain.

That led to me writing Sikuli4NET about a year later. https://sourceforge.net/projects/sikuli4net/


I once worked on a project where we needed a tool like this. Sikuli definitely worked, but for us it only handled relatively simple cases, and it felt a bit like a science project (at the time, anyway... this was 2010). Good commercial alternatives at the time included eggPlant and T-Plan.


Reminded me of the SCAR tool many of us used to use back in the day for botting on Runescape. Will have to look into this for QA automation, as Selenium isn't always too fun when you've got a hairball of ASP.NET WebForms, Vanilla JS, and Angular for a code base.


Surprised to see SCAR mentioned. I have some connection to it ;) I still use it now and then to automate some flash games. Helps me to relax.


Wow, it really is a small world, I can't believe you replied to my comment. You're one of the only aliases I still recognize from those days as I was about 12 at the time. SCAR and the other scriptable clients were a huge spark for my interest in programming as a kid. Seeing something that could automate actions in a game I played all day blew my mind and I had to learn how it worked. Cheers to you for that!


Well, this is certainly more advanced than the xmacro* utilities. Once I had to use xmacro to do stuff like this, it's a good idea to dedicate a separate X server in the background though so you don't have to leave your other tasks while it's running.


When I was working at a video editing shop, I used SikuliX to automate a lot of common, repetitive tasks (like inputting chapter names to chapter markers in Adobe Premiere). It's really great for automating easy repetitive tasks


I used Sikuli to automate app downloads from app store. It opens iTunes, goes to the featured apps page, enters the password if required and downloads 10 new featured apps every day.


In my experience Sikuli/X was too slow for image search on high definition desktops.

UIs also tend to change more than keyboard shortcuts.


Also great for automating repetitive/grindy tasks in computer games! (Not a bot, promise)


Anyone knows if this could be used with Go or Rust? like a c lib.


Wonder if an integration with Screenflow or Camtasia is possible?


I love sikuli! Saved me a lot of time in the past


Fun fact: if you want to click() on something that appears more than once, Sikuli will choose it at random.

Some time ago I was demoing a script to my boss and the second time I run it, it clicked on a different place with the same text (and fortunately with the same destination).

We were shocked because the link wasn't where we expected to be (Ciscosecure ACS 3.x).


The "region" feature helps protect against this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: