More

vasi · on Aug 20, 2016

xz in multithreaded mode supports random access too, at least theoretically. But there's no reasonable way with xz to actually find the file in a tarball you want to access, it's that bit that pixz provides.

Another nice thing about pixz is it does parallel decompression, as well as compression.

(Disclaimer: I'm the original author of pixz.)

mgerdts · on Aug 23, 2016

I was thinking about that "no reasonable way" comment. When you uncompress the first block, you will find the first tar header. From that you can know the uncompressed offset of the next tar header. If the compressed stream does support random access, you should be able to uncompress a block (assuming uncompressed block size was a multitple of 512 bytes) to get to the next tar header. You can repeat this until you get to the file you are looking for.

With large files, this approach would be of huge value. If the files tend to be no larger than block_size - 512, there will be no speedup.

Of course, this would need to be implemented directly in tar, not by piping the output of a decompression command through tar.

vasi · on March 15, 2016

I like almost everything about LXD, except the attitude towards networking. The project has rejected simple solutions such as port forwarding, saying that managing the network shouldn't be LXD's job. Instead, they'd like the user to manually configure their own bridges or routes or iptables chains.

I can kinda understand their point of view, there's no simple solution that will please everyone. But most developers or IT folks aren't networking experts, and LXD won't be an intuitive tool for them without a simpler mode of operation.

ansible · on March 15, 2016

I have just finished a small LXC deployment, and the networking did take a little bit of time to figure out what direction to go.

I ended up choosing to use bridging to connect all the physical network adapters to the ones in the containers. This is nice because I set up each container with its own IP address, which travels with it when the container is moved to a new host.

dozzie · on March 15, 2016

> [...] most developers or IT folks aren't networking experts, and LXD won't be an intuitive tool for them without a simpler mode of operation.

Really, no expertise is required. Just basic understanding how the heck the network works. If somebody can't grasp what bridge interface is or how NAT operates, he apparently doesn't have the qualifications for writing software.

lotyrin · on March 15, 2016

Then I guess most of the developers I've worked with should brush up their sandwich-making skills and quit their jobs.

dozzie · on March 15, 2016

You do realize that this is not to their credit, right?

vasi · on July 22, 2014

I did it! https://bugzilla.mozilla.org/show_bug.cgi?id=548763

Unfortunately it took quite a long time, the Mozilla process is quite confusing to a newcomer, even one with a lot of open source project experience. I definitely second the recommendation of finding a mentor.

agentultra · on July 22, 2014

I don't think I would have been able to get my two patches into Mozilla if it wasn't for the mentored bugs program. It's incredibly useful and I wish more projects would implement it. ahemopenstackcough.

desipenguin · on July 23, 2014

openstack is a "commercial" open source. Several (if not most) developers who contribute are paid by "companies" - so typically new "contributors" find mentors within their "company"

vasi · on Dec 28, 2013

lzma parallelizes very well! My implementation of parallel xz ( https://github.com/vasi/pixz ) works quite well. There are others as well, including pxz and the alpha version of the standard xz. All these tools produce compressed files that conform to the xz format and can be decompressed by any xz utility.

vasi · on July 3, 2013

Metafilter chat has been using this for a few months: https://chat.metafilter.com/

mweibel · on July 3, 2013

Add yourself to the list of the users: https://github.com/candy-chat/candy/wiki/Candy-In-The-Wild

if you like to :)

vasi · on June 27, 2013

My feature, progress bars for downloads in the OS X Dock, made Firefox 22! Hope you all like it, and be sure to report any bugs.

https://bugzilla.mozilla.org/show_bug.cgi?id=548763

vasi · on Dec 10, 2012

Your first point is interesting, but I'm no so sure about the second one:

Given certain search strings it should be possible for amazon to detect that this [person] engages in piracy.

Amazon does not see the search as coming from an individual. Rather, the Ubuntu servers act as an intermediary. All Amazon can see is "some unidentifiable Ubuntu user is searching for this". That's hardly something they could report to any authorities.

EthanHeilman · on Dec 10, 2012

>All Amazon can see is "some unidentifiable Ubuntu user is searching for this". That's hardly something they could report to any authorities.

It's surprisingly easy to take anonymous search data and figure out who it is. You might remember the mess that happened when AOL released anonymized search data (hint:peoples identities were compromised). http://en.wikipedia.org/wiki/AOL_search_data_leak

Consider the simple example of files that are named after the person doing the search.

Anonymization of queries is really really hard and I see no system academic or otherwise that would protect users from being identified.

For instance if someone were to accidentally click on a link to an amazon product and they had an amazon account it would immediately link the person and the query. Someone downloads a movie, searches for it to find it and then accidentally mistakes the amazon link for the pirated movie.

vasi · on Aug 16, 2012

The #1 reason I want Gmail backup is to be able to migrate in case something goes wrong with Gmail. Does Gmvault make this possible? I don't see any documentation about exporting from Gmvault to another IMAP server, or to a common format like Maildir or mbox so that I could run my own server.

epo · on Aug 16, 2012

If the gmvault on-disc format is well defined why not produce a tool to convert that to maildir or whatever? Gmvault does one thing well (allegedly, I've not used it), "export to some other format" just seems like feature creep especially when conversion can be done offline.

zoobert · on Aug 16, 2012

Gmvault stores email content in individual files as text files (EML file) so it should be pretty easy to add some export functionalities (it might be a second tool). I will add that in the road map for v2.0.

zoobert · on Aug 16, 2012

Currently Gmvault gives you the ability to save your emails on disk and restore them on any Gmail account with all the features. For example labels will be restored as identical. Many email backup tools are very generic and you will loose quite a lot of information when restoring your emails in a blend IMAP Server. Now all emails are stored in individual EML files with a unique filename. It is quite open and it should be pretty easy to create a Maildir structure. I will add that in the roadmap for v2.0. I am not convinced by mbox because it is a unique file with all the emails concatenated in it. I will see what to do with it.

vasi · on Aug 16, 2012

Yeah, I understand the advantages of Gmvault :)

I also realize it might not make sense to use Maildir as the native DB for Gmvault, since you don't want to store multiple copies of each email like OfflineIMAP does. A 'gmvault export' option would be sufficient for my purposes.

Let me know if there's anything I can do to help!

zoobert · on Aug 16, 2012

Yes exactly, I was thinking more of an export option to export in different formats.

mw6621 · on Aug 16, 2012

I use OfflineIMAP to back up Gmail and Google Apps mail. I have it set up to run every evening from cron, backing up a Gmail account as well as a couple Apps accounts. It uses Maildir format.

[1] http://offlineimap.org/

vasi · on Aug 16, 2012

Yeah, I'm currently using OfflineIMAP until I find something better. But it's really not a perfect solution: it downloads emails multiple times (once per label), it doesn't restore to Gmail terribly well, it's pretty slow, and it crashes sometimes. If Gmvault adds some sort of export facility, I'd switch.

zoobert · on Aug 16, 2012

As said above OfflineIMAP will create a blend copy of your emails and you will loose quite a lot of very interesting features when copying them to a standard IMAP server. That said I understand the need and will think on an approach to allow users to leave Gmail for another IMAP server in v2.0 Another point is that Gmvault is meant to be easy to use. OfflineIMAP is a very good tool but it is meant for advanced users like us as Gmvault is meant for users with very little computer knowledge. Gmvault v2.0 will go further as it will have a GUI (while still having a CLI mode) to allow my Granma to backup her emails :-).

vasi · on June 17, 2012

There's a rough open source clone of Filelight for Macs[1]. Disclaimer: I'm the original author, but no longer involved.

A commercial app in the same genre, DaisyDisk[2], is much more flashy and polished. You can also get the KDE/Mac version filelight from MacPorts, as part of package 'kdeutils4'.

[1] https://github.com/jvhaarst/MacFileLight [2] http://www.daisydiskapp.com/

vasi · on May 31, 2012

Remember that compressors, including xz, support multiple compression levels. The default level for xz is 6, which is perhaps too far on the small-but-slow side. Levels 2 and lower tend to give similar compression levels to bzip2, and are considerably faster.

Also, note that decompressing bzip2 is very slow, xz usually beats it by a factor of two or more.

SnowLprd · on May 31, 2012

I agree that the default level (6) for xz probably errs too much on favoring file size over speed. My tests with compression levels 1-2 do indeed show modestly improved size and speed performance relative to single-threaded bzip2.

The fact remains, however, that I can't seem to find a simple way to install a parallelized version of xz. Perhaps I'll post an issue in the Github issue tracker for pixz and see if we can't resolve that. :)