I've written software to replace this type of thing, in my case it was classifying communications as to whether they were attorney-client privileged or not. It has to be cheaper, it was about a month of my time. And about a week working with a high priced lawyer to write the ruleset and confirm it caught everything and was excluding all the right stuff. With that software they've essentially got their best lawyer going through every piece of communication.
It has to be cheaper, I can't imaging paying someone with a law degree to go through all of that. And now their lawyers are freed up to do something more useful.
Tries for text search, that's really the only thing I learned is that if your searching through a body of text for 40,000 odd keyword you need to turn the document in to a trie. But that's something you'd learn as soon as you read any of the research on the topic.
It's basically a programming language for text search that's stored in a database. If I had to do it again I'd base if off a parser-generator and use a DSL, possibly using git for versioning.
I ended up basically storing an AST in the db and then creating an object for each node in the decision tree. It was written in perl but that was more of a constraint than a choice.
Things I didn't know (no idea how accurate they are, if anyone has more detail, chime in):
On the Enron Corpus:
"He bought a copy of the database for $10,000 and made it freely available to academic and corporate researchers. Since then, it has become the foundation of a wealth of new science — and its value has endured, since privacy constraints usually keep large collections of e-mail out of reach. “It’s made a massive difference in the research community,” Dr. McCallum said.
The Enron Corpus has led to a better understanding of how language is used and how social networks function, and it has improved efforts to uncover social groups based on e-mail communication."
On future law employment:
Quantifying the employment impact of these new technologies is difficult. Mike Lynch, the founder of Autonomy, is convinced that “legal is a sector that will likely employ fewer, not more, people in the U.S. in the future.” He estimated that the shift from manual document discovery to e-discovery would lead to a manpower reduction in which one lawyer would suffice for work that once required 500 and that the newest generation of software, which can detect duplicates and find clusters of important documents on a particular topic, could cut the head count by another 50 percent.
The computers seem to be good at their new jobs. Mr. Herr, the former chemical company lawyer, used e-discovery software to reanalyze work his company’s lawyers did in the 1980s and ’90s. His human colleagues had been only 60 percent accurate, he found.
“Think about how much money had been spent to be slightly better than a coin toss,” he said.
It's mainly accurate. My wife works for a major vendor in this sector, and I'm a pre-law student with a hacking background, knowing more about this technology than I can say here. However, discovery is just one part of the legal industry. Lynch's numbers are exaggerated, though his basic points are correct. Don't generalize too much from this.
Correct. This has little to no effect on transaction or corporate practice. Some of these vendors claim to handle M&A due diligence. Having been through one or two mergers/sales myself, I am convinced that no expert system in the near term will be able to replace humans in that area. Unlike discovery, which is focused on a readily-searchable data set like email, the job of piecing together a company's assets and contract obligations is extremely complex business, where the bottleneck is often the mish-mash of paper records and broken contract management tools, and nuanced judgment calls about critical asset dispositions must be made every few hours over the course of months and months. I'm not saying there isn't an obvious, eventual technology solution to these problems – just that it isn't anywhere near as mature as litigation support, discovery, and patent practice.
“Over the long run we find things for people to do. The harder question is, does changing technology always lead to better jobs? The answer is no.”
Overall, I have to disagree. Letting machines do all the mind-numbingly boring work leaves the rest for human beings. And we pay a lot less for the final products, leaving resources for new jobs, which will also have the boring parts automated.
Disagree? Hire a hundred people to do all your drudgery. Then pretend you're paying them hardly anything. I submit your life will suck less.
You're assuming most human beings are capable of doing knowledge-based work. I don't think this is true. The reason knowledge based workers make so much is precisely because the skills necessary to perform it are rarer than manual labor. Fully automating manual labor means a large class of people (possibly a majority) will not have any marketable skills from which to make a living.
Furthermore, there are much fewer knowledge based jobs necessary to run an economy. At the end of the day, most knowledge work is a one-off thing. Once the program is written, it can be duplicated at no cost. Once the drug is discovered, that knowledge is spread for no cost. Of course there are many exceptions to this, but this is a trend. There is necessarily orders of magnitude fewer knowledge based jobs necessary to run an economy than labor jobs.
When the automation revolution comes we're going to need a new basis for the economy because the current one just doesn't fit in that world.
You seem to be assuming that knowledge-based workers can deplete the corpus of knowledge work.
I don't think that's the case; once one program is written, another will be needed, once one drug is discovered, there will be still more diseases to cure and more side effects to eliminate.
I'd even go so far as to say that knowledge work is growing and will continue to grow - The more you know, the more you know you don't know, right? There wasn't a need for working on computers or airplanes as short as a century ago. Now how many people are employed in those fields?
My assumption is that growth in knowledge work is dependent on there being growth in disposable income for the population. In my scenario where a large portion of the population has no marketable skills and therefore can't find a job, this effectively puts a cap on the amount of knowledge work the economy can sustain. See my reply to stretchwithme for a further explanation of my thoughts on this scenario.
The growth in knowledge is dependent on how much brain power there is. Everything we do is subject to improvement by anyone that cares to analyze and innovate.
In the scenarios of those against technology 50 years ago, they also could not see how the economy would evolve or how people would adapt. But that lack of vision isn't proof that they were right.
There are more knowledge-based jobs all the time. And more and more jobs all the time involve working with knowledge.
It seems you assume that the economy cannot continue to evolve. I see no reason to conclude that.
People will solve problems as they arise.
Just because a program or drug has large IP costs and low marginal cost to produce doesn't mean its a fundamentally different thing.
An oil well takes a lot of knowledge to find but not very much to profit from once you have. McDonald's franchise system took a lot of intellectual work to perfect, but making copies of restaurants is cheaper with more likely success than starting your own.
A lot of things have a lot of IP involved and that's nothing new.
Sure there is a lot of knowledge work associated with what you mentioned, and with most products out there. But there is a many-orders-of-magnitude difference between the manual labor aspect and the IP aspect of bringing a product to market.
(I know I'm taking a leap here, but) I believe this scale difference is a necessary part of the equation. There has been a growth in knowledge based jobs precisely because of growth in population plus growth in productivity in the output of manual labor jobs. In a world where production costs are near zero because its fully automated, there is still the built-in costs of the materials themselves. This puts a cap at how cheap a product can become. Unfortunately, 60% of the population now has no money to speak of so even a fully "optimized" price is too much. There will potentially be many dirt cheap products but not nearly enough customers to buy them.
As I'm sure any discussion involving economics would :) I've given it a good think and I find it hard to come up with something concrete.
In our current economy there is still the driving force of profit that creates jobs. All jobs that don't directly support the necessities of life, are either directly selling products to consumers or supporting companies directly selling products to consumers. There still has to be a mass of people to sell crap to for the economy to function. Under the assumption that all manual labor, factory and most service level jobs are gone, there is only a small market for non-essential products, which effectively puts a cap on the number of knowledge workers.
Browsing through the new section on HN I came across this: http://www.slate.com/id/2287531/. It seems to support my contention that technology itself does not cause economic growth.
"Over the long run we find things for people to do."
The problem with such a prediction is that we have so very little data to base it on.
if computers are to replace most of man's intellectual activities, that change compares to the industrial revolution , but it's not a very good comparison , because we will have machines and robots to replace physical labor.
It's basicly a prediction by an economist , and we all know how accurate are those.
All we have is history to go on and the fact that new jobs have always replaced old ones. Whenever something is automated, the money no longer spent on salaries are spent on other things, things which people must be paid to provide.
Which is why farm automation put 80% of the people out of work and there are still lots of jobs. Except, of course, when bad policy gums up the works.
> All we have is history to go on and the fact that new jobs have always replaced old ones.
I've always been a little skeptical of this. We all know that the official unemployment rate is a bit misleading as it only counts those actively looking for work, but I suspect historical comparisons show that it's even more misleading than we think.
Let's go back to the 1800s, when technology hadn't yet, according to the Luddite hypothesis, put tons of people out of work.
Back then, retirement didn't exist. So bam, now we've got one segment of the population who are employed. Schooling went to high school at most and higher education was only a tiny fraction of the population. So there's another large segment of the population employed, all those who would be stuck in school now. Basically everyone who is now in undergraduate or graduate school would have been out working a job.
And of course, child labor was legal, so that implies an awful lot of jobs there too. And what about all those people with Downs syndrome or other forms of retardation or mental illness? They were all down on the farm helping with the baling or something. (Just counting the kids in special ed, and not all the adults, there's like 6 million of them: http://en.wikipedia.org/wiki/Special_education_in_the_United... )
Now surely some of all of that is necessary to keeping a modern economy running rather than an 1800s one. But how much of that is just killing time or keeping people out of the employment market? Thought experiment: if a fairy suddenly granted all the kids and students and retirees an education and credentials to match (all 20 or 50 million or whatever of them), how many years would it take the market to absorb them and match them up with jobs? Would it ever?
Farm automation also put draft horses out of work.
My case is brittian in the early 1800s, which had a population of about 40 million people and 4 million draft horses.
A draft horse did the manual labor of about four men, making them worth about 16 million agrarian jobs, and they were also put out of work. Horses, of course, aren't horribly adaptive and don't revolt when you keep them from reproducing, and so over the century they virtually disappeared with the jobs they lost.
Furthermore, if you consider the move in the early 1900s from a six day to a five day work week, that's a lot more human jobs that have never been replaced.
We're rapidly approaching (have almost certainly passed) the point where it would be feasible for the average worker to do far less than 40 hours a week.
Notwithstanding the cultural & structural reasons we can't get there from here, that is.
But at some point in the future, most of what we now call "work" will not be done by people.
This has the potential to be incredibly liberating, if by then we will have found ways to decouple a persons ability to meet basic economic necessities from their ability to find a job.
For those that don't know, this is the model by which electronic discovery is generally conducted today: http://edrm.net/
It's over a billion dollar market and there are plenty of enterprisey vendors that have positioned themselves into every box on that EDRM chart. Lawfirms and corporate counsel will happily pay for this stuff because the amounts at stake in litigation are so huge.
Actually, the EDRM model is only one model, and it is rapidly falling by the wayside. For example, ECA - early case assessment - is becoming a very important part of ediscovery but you can't fit ECA into the EDRM as the EDRM is essentially linear and doesn't allow for loops.
I admit that my understanding may be off, but it seems like ECA is just using the same EDRM model and iterating faster upfront. I feel that it's more of a vendor buzzword than a new way of conducting electronic discovery.
The industry is full of buzzwords, hype, and snakeoil, alas. ECA is one example, and I'd argue that the EDRM is another. The EDRM was good a few years ago but hasn't really evolved to reflect current, and developing practices.
ECA describes a part of the process that was hard to do in the early days because the tools didn't support the faster iteration upfront. As people realized the value of that iteration, the tools improved and the term gained power and marketing presence.
Second place in a market can still allow a company to be profitable and growing. Second place in a lawsuit can impact not only profits and growth but viability.
It is a real problem and not one we're trying to create. There was a lot of discussion at Legal Tech this year about the need to determine the accuracy of ediscovery processes, and in particular, review processes.
I think we can expect much more of this sort of thing. Technology always removes humans from the work force and we now have the right set of resources - reliable, cheap computers and a large enough pool of skilled programmers - to do to office workers with software what we did to farm works with tractors and auto workers with robots.
There is still a big piece missing -- the methods for translating business rules used by these people into workable software is error-prone, cumbersome, and ad-hoc.
The method for building robotic arms was full of early errors in robotic arm programs(and likely still is), but the programmers identified those errors and fixed them, over and over again. For that matter, programming in general is error prone. Bugfixing happens.
Businesses which can hire ten men to oversee a buggy program rather than twenty men to do the actual work will do so, and over time, the program will get better and need less overseers. Its an iterative process, and one that doesn't need perfect programs to bootstrap.
That said, the point stands that its actually getting easier to build machines to do the thinking work than the work requiring highly articulate hands: Cotton piking may be out, but sandwitch making is rather hard to conquer.
Really? I actually think sandwich making will be dead easy within 10 years. You just need a robot arm that can recognize and manipulate ingredients that costs less than maybe $300k.
Assuming the 300k is the total cost of the machine, that's replacing, in a shop open 16 hours a day, 365 days a year, a machine that lives for over six years.
Six years, without maintenance...
Furthermore, a small sandwitch shop (a subway franchise) is worth only 90 to 300 thousand dollars. You're suggesting adding a machine worth the cost of the entire shop, on a very high end shop.
You may be able to build an arm that can make sandwitches, but selling it to firms over cheap minimum wage labor (which is easily replaceable and already has all the software necessary for the process...
One of the difficulties in the legal analytics space is that there's not much incentive for lawyers to spend less time on a case. In fact it's the opposite, the more billable hours they can charge for their work, the better. Speeding up their process is not necessarily hitting their pain point.
It is hitting the client's pain point, though. There is a huge push in the industry to lower costs, and law firms are feeling, though resisting, this push.
Consider the firm or lawyer being compensated on a contingency fee basis -- they must cover all the expenses and overhead related to prosecuting the case, including the salaries for attorneys and paralegals working the case -- and if they lose they get nothing. Since their reward for winning the case is fixed, at usually 1/3, reducing the time and cost they expend on a case, particularly during e-discovery (which can consume quite a bit of time and manpower), is both valuable and in their best interests.
Very common for plaintiffs lawyers in many types of cases (consumer class action (sort of), personal injury, among many others). Almost impossible to have a contingency fee for a defendant. And often the defendants are the ones with the burden of producing (and first reviewing) a high level of documents.
Finally, here is a technology I can get with. Most of the web entrepreneurs seem to embrace a "Field of Dreams" breed of business plan: build it and they will come. And when they do come - advertise, upsell, follow, track and slice and dice the hell out of them! Thanks, but no thanks.
But saving millions of lawyer hours a year? Realizing substantial savings on significant business costs will always prove valuable. Win for society, win for technology, win for capitalists. While I wouldn't say that lawyers are parasites, reducing the proportion of lawyer cost strikes me as highly positive for the economy.
If you literally want to get behind it, let me know ... I can throw in "hold lawyers, SaaS providers, and vendors accountable for the quality of their offerings in a court of law."
I wonder if something like this could help solve the patent troll problem. As I see it, there are two conditions that allow patent trolls to function:
1. The ease of obtaining frivolous patents.
2. The cost of litigating patent disputes exceeds the cost of settling.
The software industry has focused primarily on the first point, but if we could reduce the costs of litigating, it may well reduce the financial benefits of being a patent troll.
I'm thinking of something like IBM's Watson being used to find prior art, auto-filing court motions, etc. Of course, it all comes down to what some jury in Texas thinks so I'm probably just dreaming.
I read somewhere that it's easier to spot a lie in written communication than from listening to a tape. People are magnificent at bluffing, if you listen to the tone of their voice, but when you look at the transcript they become a lot more transparent.
There are probably all kinds of potential jobs in the real world that are way too expensive to have a human do and that computers are better at or will be soon.
Like gold mining. I'm sure there are rivers in Canada that haven't been fully explored. Send in the drones!
this technology wont hurt legal employment because it enables so many more lawsuits. every document on earth holds potential tort value that can be exploited now. after discovery its still a manual process to litigate so the number of attorneys should rise.
Enhancing the quality of automated review tools will significantly reduce the number of lawyers employed doing sweatshop style review.
Just because a technology might enable more lawsuits (which I am not certain is the case here) doesn't mean more lawsuits will be filed. Oftentimes, finding evidence causes parties to settle rather than go to court.
Further, the existing courts can only handle so many cases per year. You can't suddenly up the throughput on the court system.
Hmm. So either you don't get the not-that-nuanced reference, or you just look for any opportunity to vote people down. I used to take downvotes more seriously -- but less and less so these days.
It has to be cheaper, I can't imaging paying someone with a law degree to go through all of that. And now their lawyers are freed up to do something more useful.