Show HN: Ellipsis – Automated PR reviews and bug fixes

lolinder · 2024-05-10T02:05:22

This looks surprisingly good! I can see the quick sanity checks being very useful in cases like the mismatched env variables. That said, since you probably already know what works well, here's some constructive criticism:

First, I'm pretty unimpressed with the PR descriptions. I'd be frustrated if my company adopted this and I started seeing PRs like the one from continuous-eval that you linked to first. It's classic LLM output: lots of words, not much actual substance. "This PR updates simple.py", "updates certain values". It's the kind of information that can be gleaned in the first 5 seconds of glancing through a PR, and if that creates the illusion that no more description is needed then we'll have lost something.

Second, in the same PR: when writing a collection to a jsonl file, I would expect an empty collection to give me an empty file, not no file. Further, I haven't looked at the rest of the context, but it seems extremely unlikely to me that dataset_generator.generate would somehow produce non-serializable objects, and a human would easily see that. These two suggestions feel at best like a waste of time and at worst wrong, and it's concerning to me that the habits this tool encourages led to the suggestions being uncritically adopted and incorporated and that this seemed to you to be a good example of the tool in use.

The second PR you linked to is, I think, a better example, but I'm still not sold on it. Similar to the PR descriptions, I'm concerned that a tool like this would create an illusion of security against the "simple" problems (leaving reviewers to focus on the high level), whereas I'd hope that the human reviewer would still read every line carefully. And if they're reading every line carefully, then have we really saved them that much time by paying for an LLM reviewer to look over it first? Maybe the time it takes to type out a note about a misnamed environment variable.

robertlagrant · 2024-05-10T09:09:47

> Similar to the PR descriptions, I'm concerned that a tool like this would create an illusion of security against the "simple" problems (leaving reviewers to focus on the high level), whereas I'd hope that the human reviewer would still read every line carefully.

PR descriptions should be about intention, not what changed (what changed is in the PR itself), so I agree there.

With the other stuff though, if you treat it as a linter, that might be fine? The problem is when you justify getting it as it'll save you engineering time, but it won't actually do that significantly unless you have a lot of people using it.

diwank · 2024-05-09T18:48:45

We are using ellipsis and sweep both for our open source project and they are quite helpful in their own ways. I think selling them as an automated engineer is a little over the top at the moment but once you get the hang of it they can spot common problems in PRs or do small documentation related stuff quite accurately.

Take a look at this PR for example: https://github.com/julep-ai/julep/pull/311

Ellipsis caught a bunch of things that would have come up only in code review later. It also got a few things wrong but they are easy to ignore. I like it overall, helpful once you get the hang of it although far from a “junior dev”.

hartator · 2024-05-09T20:09:13

> Take a look at this PR for example: https://github.com/julep-ai/julep/pull/311

I am still confused if vector size should be 1024 or 728 lol.

diwank · 2024-05-10T02:58:16

Lolll. It’s 1024 but only for documents and not the tools (we changed the embedding model for RAG)

azinman2 · 2024-05-10T15:39:28

Why isn’t the AI suggesting putting into an appropriately named const? Magic numbers are poor practice.

hunterbrooks · 2024-05-10T18:31:14

Good catch. The team could add this rule to their Ellipsis config file to make sure that it's always flagged: "Never use magic numbers. Always store the number in a variable and use the variable instead."

Docs: https://docs.ellipsis.dev/config#add-custom-rules

azinman2 · 2024-05-10T20:43:02

But even that isn’t ALWAYS the case. There are times when it is appropriate to have numbers inline, as long as they’re not repeated.

This is where good judgement comes in, which is difficult to encode rules for.

runlevel1 · 2024-05-09T22:43:45

> I think selling them as an automated engineer is a little over the top at the moment

Indeed. Amazon originally advertised CodeGuru as being "like having a distinguished engineer on call, 24x7".[^1] That became a punchline at work for a good while.

I can definitely see the value of a tool that helps identify issues and suggest fixes for stuff beyond your typical linter, though. In theory, getting that stuff out of the way could make for more meaningful human reviews. (Just don't overpromise what it can reasonably do.)

[^1]: https://web.archive.org/web/20191203185853/https://aws.amazo...

hunterbrooks · 2024-05-10T00:40:15

As it stands today, Ellipsis isn't sold as an AI software engineer.

One of our largest learnings is that state of the art LLM's aren't good enough to write code autonomously, but they are good enough to be helpful during code review.

diwank · 2024-05-10T02:50:16

Right, I stand corrected, I think I confused the branding of other competing products. I remember really liking the fact that ellipsis does _not_ sell itself as a developer. I’ll edit my comment to reflect that. :)

_boffin_ · 2024-05-10T00:25:02

I’ve been following sweep and aider for awhile and really love what they’re both doing, especially sweep.

Would love to get your thoughts on sweep. Does it meet your expectations? If not, where does it fall short?

diwank · 2024-05-10T02:56:43

Not as a “junior dev” that sweeps markets itself as but it is useful in its own ways. For example, one really nifty way I found to effectively use it is to: - git diff - gh issue create “sweep: update docs for this file change” for every file changed

It’s not perfect even after that but gives me a good starting point and often just needs a minor change.

hasu_po · 2024-05-10T13:28:49

Any thoughts on aider vs. sweep so far? I am also interested in trying out both...

GrinningFool · 2024-05-09T19:08:54

A sampling of PRs looks pretty good code-wise, but the commit messages/descriptions don't. They just summarize the changes done (something that can be gleaned from the diff) but don't give context or rationale around why the changes were necessary.

hunterbrooks · 2024-05-09T20:03:11

It's most helpful when a GitHub/Linear issue is linked because the "why" is extracted, and also for larger PR's

mhluongo · 2024-05-10T01:12:52

The "why" should be in the commit message. That's what it's for

cornel_io · 2024-05-10T04:41:19

Different strokes: I prefer it in the PR, with a lot of detail, with commits that are very granular and just explain what they do. PR descriptions should have all the why, test cases, pictures, etc.

skyfallsin · 2024-05-09T18:49:24

I've been using Ellipsis for a few months now. I have zero regrets about paying for it now and likely will pay them more in the future as their new features ship.

For a solo engineer like me who's working in multiple codebases across multiple languages, it's excellent as another set of eyes to catch big and small things in a pull request workflow I'm used to (and it has caught more than a few). I'd argue even as a backstop for catching edge cases/screwups that may lead to wasting my time that it's already more than paid for itself.

bilekas · 2024-05-09T16:30:06

Interesting project for sure but I'm trying to find the reasons for shifting the AI to the PR stage, wouldn't this be more efficient at development time? I.e copilot/openAI tool chain?

hunterbrooks · 2024-05-09T17:05:33

Teams should use both Copilot (synchronous code generation) and Ellipsis (async code gen).

Sure, Copilot speeds up human dev productivity, but our take is that humans should only be spending their time on the highest value code changes and use products like Ellipsis to handle the rest.

The downside of async code gen is that Ellipsis workflows take a few minutes to run because Ellipsis is actually building the project, running the tests, fixing it's mistakes, etc. The upside is that a developer can have multiple workflows running at once, and each workflow delivers higher quality code because it's guaranteed to be working + tested.

I'm super bullish on async code gen. I think there's a whole category of tedious development tasks with unambiguous solutions that can be automated to the point where a human just needs to give it a LGTM.

mirsadm · 2024-05-09T17:28:35

Do you use your own product in that way?

hunterbrooks · 2024-05-09T17:36:16

Yeah, I use it for lots of boilerplate work like adding new API endpoints, new Celery jobs, and building react components.

danenania · 2024-05-09T17:18:17

I think we're going to see different AI tools optimized for different stages of the development workflow, with developers assembling and using a collection of them, just like they currently assemble a collection of tools for different tasks within the stack (backend language, frontend language, database, cache, infrastructure, etc.). It's unlikely that there's ever going to be one AI coding tool to rule them all.

As someone building (and frequently using) an AI coding tool that is very much focused on development time[1], I still also use GH Copilot and ChatGPT plus heavily as well. In a team setting, I could definitely see using my tool in conjunction with Ellipsis too. A feature built partly (or entirely) with AI still needs PR review. And an agent focused specifically on that job is likely going to be better at it than an agent designed for building new features from scratch.

1 - https://github.com/plandex-ai/plandex

hunterbrooks · 2024-05-09T17:32:58

Definitely agree with this.

One analogy I see today is the typical testing pipeline. Unit tests get run locally, then maybe a CI job runs unit tests + integration tests, then maybe there's a deployment job which does a blue/green release. At every stage the system is being "tested", but because the tests validate different capabilities, it's like concentric circles that grow confidence for the change.

A software dev lifecycle that uses AI dev tools is similar. Agents will review/contribute at the various stages of the SDLC, sometimes with overlap, but mostly additive and building on the output of one another.

theamk · 2024-05-09T18:11:14

That is pretty horrible, on the level of "junior engineer" who has no idea of good industry practices and needs careful code review. I would hate to see the system as presented on any of my projects.

Summary: The point of summary is to tell "why" the change was made and highlight unusual/non-trivial parts.. and examples absolutely fails there. To look at first one:

- Why was "generate" result type updated? Was it customer request or general cleanup or prep for some ongoing work?

- The other 3 points - are they logic fallout of output type update, or are those separate changes? For latter case, you really want to list the changes ("Updated examples to use more recent gpt-4 model" for example)

- What's the point of just saying "updating X in Y" if you don't say how? this is just visual noise duplicating "file changes" tab in the PR.

Suggested changes: those are even worse - like https://github.com/relari-ai/continuous-eval/pull/38#discuss...

- This is an example file and you know where the "dataset" comes from. Why would you have non-serializeable records to begin with?

- This changes semantics from "let programmer know in case of error" to "produce corrupted/truncated data file in case of error" - which generally makes debugging harder and gives people nasty surprises when their file is somehow missing records. Sure, sometimes this is needed, but in that particular file it's all downsides. This should not have been proposed at all.

- Even if you do want the check somehow, it's pretty terrible as written - the message does not include original error text nor bad object nor even line number. What is one supposed to do if they see it?

----

And I know some people say "you are supposed to review AI-generated content before submitted", but I am also sure many new users will think the kind of crap advice that AI generates is OK.

Ellipsis authors: please stop making open source worse. Buggy patches are worse than no patches, and 1 line hand-written summary is better then AI-generated useless one.

Maintainers: don't install this in your repo if you don't want crap PRs.

userbinator · 2024-05-10T03:02:56

I came here to find and agree with this comment. I've been seeing more stuff like this show up recently in OSS projects and it is extremely irritating.

And I know some people say "you are supposed to review AI-generated content before submitted", but I am also sure many new users will think the kind of crap advice that AI generates is OK.

Those who don't know better, think AI is awesome and will solve all their problems. Those who do know enough to see the flaws don't need AI either.

Ellipsis authors: please stop making open source worse. Buggy patches are worse than no patches, and 1 line hand-written summary is better then AI-generated useless one.

I'll extend that to say "please stop making software worse." We were already drowning in mediocrity before AI accelerated it.

wiseowise · 2024-05-10T11:52:36

> And I know some people say "you are supposed to review AI-generated content before submitted", but I am also sure many new users will think the kind of crap advice that AI generates is OK.

We're watching development of idiocracy in real time. It'll start with "yeah, we're reviewing every content it generates" until new generation will stop thinking for itself completely and PRs will switch from machine-sceptic to "who are you to argue against computer?" (and not because machine became so good it's better than human).

keybored · 2024-05-10T15:32:37

This looks unnecessary. Instead of a program (AI) that checks your code locally (like fixing typos) the program lives on the PR. Instead of installing something locally you have to have some user program on GitHub which knows about your repository.

What do you get? A bot that makes commits. I don’t want to keep those commits. I would rather just squash (rebase) those typo fixes into my original commit. Just noise.

What else do you get? It seems that you get to pretend that you’re in a Google Doc For Code.[1] If people want that then they should work towards real-time online code editors.

Take a step back. The point of code review is knowledge transfer, mentorship and guaranteeing code quality. You can certainly fulfill the last point with a program. And this is AI so of course you can in principle solve the two first as well. But if the underlying point is to transfer the knowledge from the senior humans (groan) to the junior humans, then how does your AI replace that? Because the point isn’t to transfer some general-purpose (AI) knowledge but the specific knowledge of those more experienced developers.

[1] Why do people want to turn regular programs (including AI) into some high-tech dance in the cloud where you tag and ping bots on GitHub? I’m at a loss.

tyrw · 2024-05-09T20:09:38

We're required to have code review as part of our SOC2 process, and I assume automated agents wouldn't count.

The other end of the spectrum is linting and tests, which catch errors before review.

Does Ellipsis have a role between these two? If so, what is the role?

hunterbrooks · 2024-05-09T20:38:37

Ellipsis increases the quality of code a developer brings to their teammates, meaning fewer PR comments, meaning the team ships faster. It's not a replacement for human review. It's most often used by the PR author after a self review but before asking for reviews from human teammates.

Ellipsis will use your existing lint/test commands when making code changes. For example, you can leave a comment on a PR that says "@ellipsis-dev fix the assertion in the failing unit test" and Ellipsis will run the tests, observe the failure, make the code change, confirm the tests pass, lint-fix the code, and push a commit or new PR

thih9 · 2024-05-09T21:39:01

> It's most often used by the PR author after a self review

Why run it as part of a PR then? I'd prefer to run a tool like this before a PR is even open, and ideally on my local machine.

hunterbrooks · 2024-05-10T00:45:30

The product works on draft PR's too, but not on local.

Sometimes reviewers rope in Ellipsis by asking questions (we also support natural language Q&A about a PR) or by having a design discussion via GH comments and then assigning the change to Ellipsis

userbinator · 2024-05-10T03:04:04

increases the quality of code a developer brings to their teammates

Only if they're already below average.

hunterbrooks · 2024-05-10T12:18:34

No way, anyone can make silly mistakes. Ellipsis is a 2nd pair of eyes to catch stuff like that

wiseowise · 2024-05-10T11:53:58

> Ellipsis increases the quality of code a developer brings to their teammates, meaning fewer PR comments, meaning the team ships faster.

Interesting. Got any numbers how it affects team velocity?

hunterbrooks · 2024-05-10T12:22:21

Unfortunately not. We’re early - all our data is qualitative.

But we know we need hard numbers, so we’re working on it. We don’t want to sell a novelty, the product needs to actually save time

luskira · 2024-05-09T21:35:27

This is a very interesting usecase

yesbut · 2024-05-09T17:19:23

Will this enable me to slowly, over time, add a back door without anyone detecting it?

hunterbrooks · 2024-05-09T17:21:41

No, but maybe if your widely adopted, poorly supported open source project uses Ellipsis for code reviews we may be able to catch that type of hack ;)

internetter · 2024-05-09T17:40:55

How could an open source project afford the $20/user/month license fee?

hunterbrooks · 2024-05-09T17:45:42

We offer Ellipsis to large open source projects for free. Email us team@ellipsis.dev

I was referencing the recent xz backdoor hack.

kyawzazaw · 2024-05-09T18:57:48

Anything similar for hobbyist or student projects?

hunterbrooks · 2024-05-09T20:05:02

Hmm... probably, send me an email.

yesbut · 2024-05-09T17:23:13

I have my doubts.

Arch-TK · 2024-05-09T17:37:24

"This PR appears to add some kind of autotools gibberish to the codebase. Since autotools needs to be regularly fed gibberish in order to continue to live, this is normal and expected. However please note that some gibberish may be malicious.

As an AI code review model, I am unable to advise on whether this autotools gibberish is malicious or not. Human review will be required."

hunterbrooks · 2024-05-09T17:43:48

Totally fair - there's a saturation right now of magic AI dev tools. We try to differentiate by not over promising/under delivering and by solving a problem that's closely matched to what today's state of the art LLM's can handle: code review.

But the only really way to figure out if it's useful for your team is to try it. That's why we added a free trial.

rathboma · 2024-05-10T00:18:38

Signed up for Beekeeper Studio, no idea how well it performs for a desktop app written with electron and vue.js, but we'll see!

First two code reviews show no feedback.

biggoodwolf · 2024-05-09T17:09:14

What is the value add vs using an LLM agent?

hunterbrooks · 2024-05-09T17:19:28

Ellipsis uses BUNCH of LLM agents internally. If you built your own code generation LLM agent you'd need to also build a way to execute the code that the agent writes, which is a bit of an engineering headache.

We handle this, the result is that if you set up a Dockerfile, we promise to return working, tested code: https://docs.ellipsis.dev/code#from-a-pr

benzible · 2024-05-09T19:07:54

Is there a list of supported languages?

hunterbrooks · 2024-05-09T20:04:39

Nearly all languages are supported, but some perform better than others.

JS/TS, Python, Java, C++, Ruby are highly supported.

hartator · 2024-05-09T20:15:14

Do you have real life examples on GitHub to see?

[Edit] You can see a bunch of them here: https://github.com/search?q=%22ellipsis.dev%22&type=issues Nothing breathtaking unfortunately.

hunterbrooks · 2024-05-09T21:18:28

Hmm, that searches issues, which isn't the best way to see Ellipsis' work.

Example of PR review: https://github.com/getzep/zep-js/pull/67#discussion_r1594781...

Example of issue-to-PR: https://github.com/getzep/zep/issues/316

Example of bug fix on a PR: https://github.com/jxnl/instructor/pull/546#discussion_r1544...

hnz101 · 2024-05-09T18:31:44

How is this different from coderabbit, codegen and codium's PR agent?

nbrad · 2024-05-09T18:56:26

Hi there! I recommend trying them out to see which one you like best :)

hnz101 · 2024-05-09T20:57:35

Could you provide more information or elaborate on how Ellipsis is better? I'd appreciate a more detailed explanation.

hunterbrooks · 2024-05-09T21:24:44

All those tools do code review, so you'll have to try them out for yourself to see which is the most helpful.

But when it comes to writing code for you, not all those tools actually run your unit tests/linter/compile/etc. Ellipsis will, and it'll use the stdout/stderr to fix it's own mistakes, meaning the commit delivered to you actually compiles/passes CI.

timetraveller26 · 2024-05-10T00:23:38

Automated review seems like a hard sell to me.

If humans participation is reduced they could approve PR's without proper reviewing.

Eventually this stochastic parrot could throw an ``rm -rv ${TEMP}/`` in there and you are roasted.

hunterbrooks · 2024-05-10T00:37:51

Automated code reviews are a step in the software development lifecycle, not a replacement for human reviewers.

Ellipsis can't commit code without your permission and approval, so this particular parrot can't feather your filesystem.

antifa · 2024-05-11T03:26:20

Hiding your price behind a login wall is shadey.

hunterbrooks · 2024-05-11T16:28:58

Pricing is on the landing page. $20/seat per month

unkownnomad110 · 2024-05-16T14:10:19

Per seat how? My Github repo has 10 members. Are each of them a seat or just the person who created an account on Ellipsis?

Ryyy2818 · 2024-05-11T03:21:58

Good job!