A remotely operated vehicle livestreaming the deep sea

hedgew · on May 10, 2017

Direct link: https://www.youtube.com/watch?v=HuvmV5O-qTA

jpm_sd · on May 10, 2017

Very happy to see the Global Explorer ROV still out there, hard at work. This vehicle was Chris Nicholson's baby and it looks like the crew at Oceaneering are keeping his legacy going.

Chris was one of the early inventors / developers of ROV technology and he was famous in the ocean engineering world. He passed away suddenly in 2015.

Some background:

http://www.rovexchange.com/nc_interviews.php

http://oceanexplorer.noaa.gov/explorations/15biolum/backgrou...

http://oceanexplorer.noaa.gov/explorations/16arctic/logs/jul...

http://www.capenews.net/falmouth/obituaries/christopher-j-ni...

kstenerud · on May 10, 2017

Wow... that Motherboard web page is packed with enough cruft to bring my brand new Kaby Lake i5 to its knees.

10165 · on May 10, 2017

Assuming nothing listening on 127.0.0.1, this should fix the problem.

   cat << eof >> /etc/hosts
   127.0.0.1 vice-web-statics-cdn.vice.com
   127.0.0.1 fonts.googleapis.com
   127.0.0.1 www.google-analytics.com
   127.0.0.1 d31qbv1cthcecs.cloudfront.net
   127.0.0.1 d5nxst8fruw4z.cloudfront.net
   127.0.0.1 b.scorecardresearch.com
   127.0.0.1 advice-ads-cdn.vice.com
   127.0.0.1 vice-publishers-cdn.vice.com
   127.0.0.1 vice-sundry-assets-cdn.vice.com
   eof

Alternatively, sed script below will give you just the article html, images, and embedded youtube along with all the json data enclosed in <pre></pre> tags.

   curl https://motherboard.vice.com/en_us/article/this-robot-is-livestreaming-all-the-gnarly-stuff-its-seeing-in-the-deep-sea?utm_source=mbtwitter \
   |sed '
   /window.__PREFETCH_DATA/!d;
   s/window.__PREFETCH_DATA = //;
   s/\"[:,]\"/\
   /g;
   s/,\"/\
   /g;
   s/\":/\
   /g;
   s/.[}{]./\
   /g;
   s/[}{]./\
   /g;
   s/[][]//g;
   s/id/<pre>&/;
   s/body/&<\/pre>/;
   s/\\u003C/</g;
   s/\\u003E/>/g;
   s/\\u002F/\//g;
   s/\\"/\"/g;
   s/url/<pre>&/;
   s/embed_code/&<\/pre>/;
   s/autoplay/<pre>&/;
   s/;/<\/pre>/28;
   '

ge96 · on May 11, 2017

I'm not entirely sure I understand what is happening, I have some grasp of the aim/purpose. Also have pulled pages using CURL before. I'm just curious, does this mean you have to analyze their page (like console) and find the links like the ones you posted above for cloudfront,etc... and then assemble it so you can format it? Not my field but I use ublock/adblock plus and it's not enough all the time, some videos have an overlay which I swear has a double-click counter where you have to open 2 ads before you can actually push the play button to play the video.

It is interesting to grab data and package it yourself through your own reader but I wonder if it loses the site's design/feel... but you're probably just after the information anyway.

10165 · on May 12, 2017

"... I use adblock plus and it's not enough all the time..."

Next time this happens it could be useful to make a submission to HN about it.

We might be able to identify and/or solve the problem.

I do not use an ad blocker nor do I use python or youtube-dl yet I never see any ads and I download all videos before watching them. Indeed I access the web for the information not the inconsistent design/feel.

What I posted above illustrates examples of two alternative approaches:

1. Block sources of undesired resources: ads, tracking, etc. The "links" listed are domains used for undesired resources.

2. Only make requests to sources of desired resources: the article and its accompanying images and video. The script extracts only what we want from the html page.

I use approach #2 more than #1.

As far as I know ad blockers use blocking, #1, exclusively. They need to maintain a list of domains to block.

ge96 · on May 12, 2017

I wasn't sure if you were serious about posting to HN about adblock haha, I complain enough as it is about my pathetic life.

In the case of the videos, they're sourced from other domains, it's funny the site's argument is "Google does this too technically so why are we different?" But I think the embedded iFrame's that contain the video players have their own ads. I don't know, it's hard to read JS code when it's minfied (un-minify it).

You mentioned python?

Anyway thanks for the response.

pavel_lishin · on May 10, 2017

Something like that sed script would be nice to pipe all of my browsing through, before it ever hits Chrome.

10165 · on May 10, 2017

If you are serious and you want a script like this for any given page that you want de-crufted and reformatted, let me know. This applies to any HN reader.

I make these for myself all the time.

I do one-offs in sed and if it is something I will reuse I redo it in C using flex.

If you want something generalized to all web pages that removes google analytics, doubleclick, scorecardresearch, etc. of course that is possible too. I am happy to make this for anyone who wants it.

Another solution is to run a proxy on your computer that filters out garbage. Such proxies used to be more popular when advertising began to take over the web, but most of these projects seem to have been abandoned and forgotten (junkbusters, etc.).

Nowadays it seems like people try to accomplish this via Javascript running in the browser.

Turning off Javascript is also highly effective at disabling advertising. But note this may not stop things like img src tags pointing at tracking pixels, malicious iframes, and other elements that graphical browsers will load automatically.

Also, tcp clients do not normally store or send cookies. And even if they try to provide this functionality, e.g., curl or wget, unlike a graphical web browser, cookies are off by default. Consider the case where the user retrieves the page with a tcp client and then views it in a browser.

j_s · on May 11, 2017

λmegous | https://news.ycombinator.com/item?id=13226170

For each use case that is not a free browsing I create an electron app, that never executes any code from the web or uses any external style. It only uses XHR to fetch html pages/json data/other static stuff and then transforms that data and uses it in the custom UI designed for the use case.

pavel_lishin · on May 11, 2017

> Another solution is to run a proxy on your computer that filters out garbage.

That's what I was thinking; a proxy server that can run on a pi. Plug it into the wall, configure wifi, connect to it, and bam - instant safety. (Assuming you trust the code.)

milcron · on May 10, 2017

>Nowadays it seems like people try to accomplish this via Javascript running in the browser.

I filter out lots of cruft by doing the opposite - using NoScript to block extraneous junk.

jackyinger · on May 10, 2017

Yeah, that's why I don't read vice sites anymore...

shrimp_emoji · on May 10, 2017

What web page? https://snag.gy/skEJYT.jpg

turbohedgehog · on May 11, 2017

What picture? http://i.imgur.com/srEiiD7.png

agumonkey · on May 10, 2017

Lucky for you next generation will accelerate internet. Save your money.