More

saltypal · 2023-11-06T19:00:29

Previously: https://news.ycombinator.com/item?id=3947903

(I feel incredibly old right now.)

Edited to add: Finally found the full form letter [1]. (I filed a copy back then and it still shows up languishing in Feedback Assistant even now!)

[1]: https://gist.github.com/mysteriouspants/1989061

saltypal · 2023-10-03T00:33:44

Eleos Technologies (https://eleostech.com) | Remote Front-End Web Engineer | Permanently Remote (US-based applicants only) | Full time

Eleos Technologies is a growing company building communication software for truck drivers and the back-office workers who support them. We’re looking to scale up a second product engineering team to ship more advanced driver planning tools, automate outdated workflow, and a bunch of other wins to make drivers’ lives easier and more efficient.

We're looking for someone with several years of professional web experience who's looking to learn and flourish at a company that's big enough to survive an economic downturn, but not so big you can't make immediate, direct impact that'll be seen and appreciated by the whole company. (We've also spent the last decade working really hard to make sure our tools, processes, and culture respect everyone's work/life balance. I'm the CTO and I don't text people on my team, and they don't text me, and that's true throughout.)

The stack is Erlang/Elixir with a ClojureScript+React frontend with robust automated test coverage.

If you're interested, please share your info via https://jobs.lever.co/eleostech/629c95b5-2205-4d5e-af25-4ca5... or feel free to reach out to me directly if you have questions (email is in my about/bio).

saltypal · on March 1, 2023

Eleos Technologies (https://eleostech.com) | Senior Android Engineer | Permanently Remote (US-based applicants only) | Full time

Eleos Technologies is a growing company building communication software for truck drivers and field workers, and Drive Axle is our consumer-facing document scanning app. We’re looking to scale up a team to invest further into this product and ship more advanced scanning capabilities, driver planning tools, and other features to make individual drivers’ lives easier.

We're looking for someone with several years of professional Android experience who's looking to learn and grow on a team with multiple members with 10+ years experience on mobile in a company that's big enough to survive an economic downturn, but not so big you can't make immediate, direct impact that'll be seen and appreciated by the whole company. (We've also spent the last decade working really hard to make sure our tools, processes, and culture respect everyone's work/life balance. I'm the CTO and I don't text people on my team, and they don't text me, and that's true throughout.)

If you're interested, please share your info via https://jobs.lever.co/eleostech/edff9316-f691-43be-8947-3ff0... or feel free to reach out to me directly if you have questions (email is in my about/bio).

saltypal · on Feb 17, 2023

Oh wow, thank you!

I'd been poking through them with plain old `sqlite3` and then deserializing all the plist data with something like `pbpaste | xxd -r -p > foo.plist` for examination, but had no idea datasette existed and https://datasette.io/plugins/datasette-bplist#user-content-t... seems like the ticket for browsing these.

For others: all the Caches.db files are the per-process HTTP cache that NSURLRequest/NSURLSession keeps, so if you peek at it you can see (partially) a history of network requests that process has made. Most of them seem to pull feature flag configuration from https://bag.itunes.apple.com/bag.xml, but others do more interesting things.

saltypal · on Oct 3, 2022

Eleos Technologies (https://eleostech.com) | Android Engineer | Permanently Remote (US-based applicants only) | Full time

Eleos Technologies is a growing company building communication software for truck drivers and field workers, and Drive Axle is our consumer-facing document scanning app. We’re looking to scale up a team to invest further into this product and ship more advanced scanning capabilities, driver planning tools, and other features to make individual drivers’ lives easier.

We're looking for someone with a small amount of professional Android experience (or a substantive side project) who's looking to learn and grow on a team with multiple members with 10+ years experience on mobile.

If you're interested, please share your info via our jobs page (https://jobs.lever.co/eleostech/fd8bbee8-f5c1-4dcd-8e73-fe31...) or feel free to reach out to me directly (email is in my about/bio)

We've got a similar role open for iOS, too, if that's more your speed: https://jobs.lever.co/eleostech/62b47296-9e81-4b1e-be52-3db1...

saltypal · on June 30, 2022

Email George Blood LP (https://www.georgeblood.com/), they have support to read like 100+ old magnetic formats and are wonderful pros.

passer_byer · on July 1, 2022

Another thank you! I have choices

saltypal · on March 9, 2022

Based on our telemetry, this started as NXDOMAINs for sqs.us-east-1.amazonaws.com beginning in modest volumes at 20:43 UTC and becoming a total outage at 20:48 UTC. Naturally, it was completely resolved by 20:57, 5 minutes before anything was posted in the "Personal Health Dashboard" in the AWS console.

It takes a while to find a Vice President, I guess.

mcqueenjordan · on March 9, 2022

Or perhaps triaging, root-causing, and fixing the issue is the highest-order bit?

viraptor · on March 9, 2022

Different people have different responsibilities. At Amazon scale, the comms and people doing a deep dive to fix stuff will not be the same.

jrockway · on March 9, 2022

I'd be totally fine just having alerts and metrics driving the status page. Why involve a human at all? They just get emotional.

(I have a data-driven status page for my personal website. If Oh Dear decides my website is down, the status page gets automatically updated. Obviously nobody is ever going to visit status.jrock.us if they are trying to read an article on my blog and it doesn't load, but hey at least I can say I did it.)

cj · on March 10, 2022

> Why involve a human at all?

To make a judgement call on whether the issue is severe enough to warrant the legal/financial risk of admitting your service is broken, potentially breaking customer SLAs.

viraptor · on March 10, 2022

Also prevents monitoring flukes and planned/transient-but-no-impact issues from showing up in dashboards.

jrockway · on March 10, 2022

If you're just going to lie, why have an SLA at all? It's like doing a clinical trial for a drug; a bunch of your patients die and you say "well they were going to die anyway it has nothing to do with the drug." If it's one person, maybe you can get away with that. When it's everyone in the experimental group, people start to wonder.

I have two arguments in favor of honest SLAs. One is, if customers expect that something is down, it can give them a piece of data with which to make their mitigation decisions. "A lot of our services are returning errors", check the auto-updated status page, "there may be an issue with network routes between AZs A and C". Now you know to drain out of those zones. If the status page says "there are no problems", now you spend hours debugging the issue, costing yourself far more money in your time than you spend on your cloud infrastructure in the first place. If having an SLA is the cause of that, it would be financially in your favor to not have the SLA at all. The SLA bounds your losses to what you pay for cloud resources, but your losses can actually be much higher; lost revenue, lost time troubleshooting someone else's problem, etc.

The second is, SLA violations are what justify reliability engineering efforts. If you lose $1,000,000 a year to SLA violations, and you hire an SRE for $500,000 a year to reduce SLA violations by 75%, then you just made a profit of $250,000. If you say "nah there were no outages", then you're flushing that $500,000 a year down the toilet and should fire anyone working on reliability. That is obviously not healthy; the financial aspect keeps you honest and accountable.

All of this gets very difficult when you are planning your own SLAs. If everyone is lying to you, you have no choice but to lie to your customers. You can multiply together all the 99.5% SLAs of the services you depend on and give your customers a guarantee of 95%, but if the 99.5% you're quoted is actually 89.4%, then you can't actually meet your guarantees. AWS can afford to lie to their customers (and Congress apparently) without consequences. But you, small startup, can't. Your customers are going to notice, and they were already taking a chance going with you instead of some big company. This is hard cycle to get out of. People don't want to lie, but they become liars because the rest of the industry is lying.

Finally, I just want to say I don't even care about the financial aspect, really. The 5 figures spent on cloud expenses are nothing compared to the nights and weekends your team loses to debugging someone else's problem. They could have spent the time with their families or hobbies if the cloud provider just said "yup, it's all broken, we'll fix it by tomorrow". Instead, they stay up late into the night looking for the root cause, finding it 4 hours later, and still not being able to do anything except open a support ticket answered by someone who has to lie to preserve the SLA. They'll never get those hours back! And they turned out to be a complete waste of time.

It's a disaster, and I don't care if it's a hard problem. We, as an industry, shouldn't stand for it.

viraptor · on March 11, 2022

It's not just a hard problem, it's impossible to go from metrics to automatic announcements. Give me a detected scenario and I'll give you an explanation for seeing errors which are unrelated to the service health as seen by customers. From basic "our tests have bugs", to "reflected DDoS on internal systems causes rate limit on test endpoint", to "deprecated instance type removal caused a spike in failed creations", to "bgp issues cause test endpoints failures, but not customer visible ones", to...

You can't go from a metric to diagnosis for a customer - there's just no 1:1 mapping possible, with errors going both ways. AWS sucks with their status delays, but it's better than seeing their internal monitoring.

jrockway · on March 11, 2022

I don't agree. I'm perfectly happy seeing their internal metrics.

I remember one night a long time ago while working at Google, I was trying a new approach for loading some data involving an external system. To get the performance I needed, I was going to be making a lot of requests to that service, and I was a little concerned about overloading it. I ran my job and opened up that service's monitoring console, poked around, and determined that I was in fact overloading it. I sent them an email and they added some additional replicas for me.

Now I live in the world of the cloud where I pay $100 per X requests to some service, and they won't even tell me if the errors it's returning are their fault or my fault? Not acceptable.

Corrado · on March 10, 2022

Corey Quinn has a great blog post[0] on why status pages are hard, especially for large organizations with lots of products.

[0] https://www.lastweekinaws.com/blog/status-paging-you/

saghm · on March 10, 2022

> I'd be totally fine just having alerts and metrics driving the status page. Why involve a human at all?

What happens if the monitoring service goes down or has a bug that causes it to incorrectly report the status as okay? Obviously this won't usually happen at the same time as the actual service goes down, but if it did, would people really believe that?

nostrebored · on March 9, 2022

It definitely is. For an issue like this, you will see relevant teams and delegates looped in very quickly. Getting approved wording about an outage requires some very senior people though. Often they have to be paged in as well.

Having worked at a few other large tech companies now -- Amazon's incident response process is honestly great. It's one of the things I miss about working there.

saltypal · on March 9, 2022

This. We have a 4-person team and posted our own incident about this 7 minutes before Amazon did. Surely they can aim a little higher.

ElevenLathe · on March 9, 2022

IME, this actually becomes more challenging as a company gets larger, not less (but that doesn't mean it can't be done).

saltypal · on March 9, 2022

Separate teams. We have a tiny team and even _we_ appoint a group to fix and a group or individual to do nothing but communicate.

smachiz · on March 9, 2022

sure, but if those people are updating the status pages to say something isn't right and we're looking into it, we're doomed.

mhio · on March 9, 2022

The truth assuaging usually takes 15-30 minutes.

saltypal · on Nov 2, 2021

Eleos Technologies (https://eleostech.com) | Multiple: Infrastructure Observability / Senior iOS Software Engineer | Permanently Remote (US-based applicants only) | Full time

Eleos Technologies is a growing 10-year-old company building communication software for truck drivers and field workers. We’re helping a diverse mix of customers—from mom and pop operations to household names—improve how they communicate with their employees by tackling information overload, reducing phone calls, and eliminating obsolete technologies. Our app is used by thousands of big-rig and small truck drivers, day and night, every day, and we've been on a sustainable growth curve for long enough that we're ready to grow our backend team!

Our driver-facing mobile app does some unique things, including assisting drivers with planning their trips, finding stopovers, managing their electronic duty log, and more. These features are (with a couple of specific and thoughtful exceptions) natively implemented, and we take the performance and stability of the app seriously.

Both roles are listed here with a bit more description, and you can apply there, or feel free to reach out to me directly (email is in my about/bio) if you don't trust throwing your resume into a large hiring tool! https://eleostech.com/careers

(I'm the hiring manager and the actual manager for the role, so I'll see it either way.)

saltypal · on Oct 1, 2021

Eleos Technologies (https://eleostech.com) | Senior iOS Software Engineer | Permanently Remote (US only) | Full time

Eleos Technologies is a growing 10-year-old company building communication software for truck drivers and field workers. We’re helping a diverse mix of customers—from mom and pop operations to household names—improve how they communicate with their employees by tackling information overload, reducing phone calls, and eliminating obsolete technologies. Our app is used by thousands of big-rig and small truck drivers, day and night, every day, and we've been on a sustainable growth curve for long enough that we're ready to grow our backend team!

Our driver-facing mobile app does some unique things, including assisting drivers with planning their trips, finding stopovers, managing their electronic duty log, and more. These features are (with a couple of specific and thoughtful exceptions) natively implemented, and we take the performance and stability of the app seriously.

For more details and to apply, head over to: https://jobs.lever.co/eleostech/f6ccb4be-24c9-4fa0-b695-e83d...

As a remote team (since founding!), we're super lucky to have some great folks who use the ability to work from home to spend more time with their kids, help volunteer at a school, or otherwise be more fulfilled than they would be working from an office. You could join us! If that sounds fun and rewarding to you, the full description and info about applying are in the two links above. If you have questions, feel free to comment here. If something in the post discourages you from applying, I'm also curious to hear about that! I'm happy to share info about our interview process, stack, and vertical.

saltypal · on April 1, 2021

Eleos Technologies (https://eleostech.com) | Senior Erlang/Elixir Software Engineer / Senior SRE | Permanently Remote (US only) | Full time

Eleos Technologies is a growing 9-year-old company building communication software for truck drivers and field workers. We’re helping a diverse mix of customers—from mom and pop operations to household names—improve how they communicate with their employees by tackling information overload, reducing phone calls, and eliminating obsolete technologies. Our app is used by thousands of big-rig and small truck drivers, day and night, every day, and we've been on a sustainable growth curve for long enough that we're ready to grow our backend team!

Our driver-facing mobile app does some unique things, including assisting drivers with planning their trips, finding stopovers, managing their electronic duty log, and more. This is all powered by an Erlang/Elixir OTP application, and we're looking for someone who's used this stack in a production setting before, recognizes the advantages of OTP, and would like a key role in a maintaining and extending such a system. https://eleostech.com/careers/backend-elixir

We're also looking for somebody with more interest or background in the operations side of thing, since we're finally at a point where we need somebody thinking about this full-time in addition to our existing backend team sharing the pager and ops responsibilities. https://eleostech.com/careers/sre

As a remote team (since founding!), we're super lucky to have some great folks who use the ability to work from home to spend more time with their kids, help volunteer at a school, or otherwise be more fulfilled than they would be working from an office. You could join us!

If that sounds fun and rewarding to you, the full description and info about applying are in the two links above.

If you have questions, feel free to comment here. If something in the post discourages you from applying, I'm also curious to hear about that! I'm happy to share info about our interview process, stack, and vertical.