Good examples
– cut republican billion will pay percent benefit cost
– program fund educ cut provid health help million
– economi job see need percent continu import now
– job make work compani busi right american good
– iraq war us presid support vote administr congress
These don't really look that great to me (and the 'bad examples' are worse) but I'm not an NLP expert
The stuff that you're quoting is not the stuff that the `demo.py` script generates, but "latent dirichlet allocation" (whereas this project appears to generate based on a different alternate algorithm, the sentence-based one).
The sentence-based one has a major flaw: it mostly "settles into" a given speech from its training set for a few paragraphs, then transitions to a paragraph from another speech whenever the statements become sufficiently generic.
Another bug: it can sometimes lapse into an infinite loop. Here's the tail end of one of my runs:
accordingly , the committee rose ; and the speaker pro tempore ( mrs. drake )
having assumed the chair , mr. gilchrest , acting chairman of the committee of
jurisdiction would ask the counsel to explain it ; and if it is a drafting
error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
if it is not a drafting error , then that could be corrected .
[... continues for many, many more lines before ending ...]
From a technical standpoint, the paper is somewhat lacking. It's a standard Markov-chain generator with an additional prior defined on the cohesiveness of the generated text.
I'm guessing all the press is due to it being US election season, but this is really no different from King James Programming (http://kingjamesprogramming.tumblr.com), which was entertaining, but nothing revolutionary.
* What are the system prerequisites?
* What are the optional command line params?
* What is an example of the expected output?
* How does it (roughly) work?
* Why not use actual markdown to highlight the code in the README.md file.
I feel bad highlighting this as it looks really interesting, but as Python is not one of my main languages I don't feel much enthusiasm for spending an unknown amount of time just trying to make sense of the demo, which is a shame.
__START__ mr. speaker , the gentleman is 110 percent right .
the gentleman from ohio mentioned an issue about being nonpartisan and being , i would say , third-party validators , i just want to make clear the supreme court cases made it clear that you could discriminate with your personal church money , but not with federal money .
all of the cases are consistent .
in fact , if my colleagues read the cases , they point out that if you are using federal money , you can not discriminate .
__END__
# Class: DY, Lambda: 0.5 #
__START__ mr. chairman , as someone who for the past 2 years has represented over 40 , 000 soldiers at fort hood , texas , who have fought for our country in iraq , i am deeply appreciative of the expeditious manner in which the gentleman from california ( chairman pombo ) and the gentleman from delaware ( mr. castle ) and the gentlewoman from colorado ( ms. degette ) and many of the rest of us , have thought we should increase spending on , medical research all across the board in all kinds of medical research .
yes , in order to make room for the president's tax cuts that have gone overwhelmingly to the wealthiest in our country , we have simply cut medical research and not done what we should as a nation do overall in medical research .
so when i hear my friends talk on this , i do not quite get how this will expand medical research while closing out one whole avenue of medical research and , at the same time , cutting spending on what we should be doing to move our country ahead .
__END__
Here was the first one I got, cleaned up a bit into paragraphs (the ones you get from this script are one-line-per-sentence, all-lowercase, and have odd spaces before punctuation marks). It was rather longwinded and rambling; shorter ones like you see around are a little more persuasive because they don't show the decoherence of the Markov chain's context:
---
Mr. Speaker, I rise in opposition of H.R. 525 and the association health plans it creates. There are 44 million Americans who are uninsured in this country and this bill will not even affect 1 percent of them. Not 1 percent. CBO found that only 360,000 uninsured Americans would join AHPs. This bill in fact hurts those who enroll in the plans and will even cause healthcare costs to go up for many other Americans. There has to be a better way to help 44 million uninsured Americans. AHPs will not be accountable to state health regulations. This will leave consumers who enroll in these plans without protection or a right to appeal if their cancer or diabetes treatment or medicines are denied. We can not let that happen again.
Members in this body are faced with a choice: representing consumers and small businesses, or big oil companies. We should not leave the American people in the integrity of this institution. Perception, as we all know, four hurricanes ravaged my home state of Florida and some of the Gulf coast. Three of them literally destroyed parts of the district that I am privileged to represent.
It will not lower their energy prices and it puts in place weak price gouging standards. It also does little to promote additional refining capacity, while gutting important environmental safeguards and creating additional corporate tax breaks. Waiving environmental protections and offering federal tax breaks to oil companies will not entice them to build new oil refineries. While more refineries would certainly help produce more gasoline, oil companies have had the opportunity and financial capability for years to increase their refining capacity. Environmental regulations are not stopping them. Rather, the inability to build profitable refineries has led oil company executives away from constructing or resurrecting them.
An alternative to this bill is being offered by Mr. Stupak of Michigan and others. The Stupak bill would strengthen the hands of the Federal Trade Commission and its willingness and ability and resources to enforce the price gouging remedies that we give the FTC.
The attorneys general of our states are elected by our constituents, they know the conditions in their states better than we do, they have the resources and the discretion under the substitute to decide whether or not it is in the best interest of both the plan and participants to enhance the independent advice market, and we urge congress to adopt this approach.
AARP urges you to stand with us in opposition to these critical provisions in H.R. 2830 in order to provide protections for older workers that are necessary, reasonable and fair, and to ensure that the case is moved along as expeditiously as possible to ensure that workplace hazards are addressed in as timely a manner as possible, thus improving worker safety and health.
OSHA, as is almost every other federal agency, is already required by law to pay attorneys' fees and costs in any proceeding in which it does not win, regardless of why it lost and notwithstanding the fact that the position of the agency was substantially justified. In effect, unless the agency can guarantee that it will win every case it brings, H.R. 742 punishes the OSHA for trying to enforce the law. It also occurs to me that there is a question of the constitutional rights of workers here, that since OSHA is given rather exclusive jurisdiction to protect the rights of our own citizens through regulation.
That is wrong. We should not do it to people serving in Iraq and Afghanistan. They have made tremendous sacrifices on behalf of their country and have served longer deployments than expected.This bill provides important new benefits for our troops and their families need desperately. It includes additional funds for health care services, mental health for veterans, active duty servicemembers and their families, and financial assistance to help members of the national guard and reservist forces lose income when they leave their civilian jobs for active duty.
The people of this country.
I just think we have struck the wrong balance. We need to sunset this bill again for a shorter period of time, and I hope my colleagues will oppose this bill so we might do this effort in a bipartisan manner.
Mr. Chairman, I rise today to express my concern about the current state of our nation's budget woes. I've been running the family ranch for several years and I know what it means to work within a budget. You may have to count your pennies, but you spend your money where it matters most. We can do that without increasing taxes. First off -- our nation's taxpayers deserve an honest budget that gives an account of all future spending. If this administration wants to privatize social security.
Mr. Speaker, I reserve the balance of my time to read the resolution that I believe ought to be before us, Mr. Speaker. The record shows otherwise. It is economics, not regulations, that have led to the shortfall in capacity.
Yes. If you read the paper, that's entirely what they're doing: each sentence should be a verbatim quote from an actual congressional proceeding. If there's a flaw here it's that the lambda=0.5 parameter that I used above keeps it within a single speech for too long.
And especially in this case, you have to install a few extra packages to make it work, then download the data set and (if you're on Linux) adjust the path to that data set so that it doesn't use backslashes.
Basically a glorified Markov Chain (with "topic tagging"). Not particularly interesting and the results are actually not very great. The sentence-based approach (discussed in section 5) would've been more meaningful I think.
i suck. after installing nltk and sklearn, I get this:
$ python demo.py RY 0.25
[constructing dataset...]
Traceback (most recent call last):
File "demo.py", line 37, in <module>
(dataset,vocab) = construct_dataset([TRAIN_DIR,TEST_DIR,DEV_DIR])
File "/Users/asdf/workspace/github/conspeech/con_util.py", line 48, in construct_dataset
for f in sorted(os.listdir(p)):
OSError: [Errno 2] No such file or directory: 'convote_v1.1\\data_stage_three/training_set'
If you're on Linux, you have to change the slash on demo.py:32 from '\\' to '/', and also download and place the convote data from cornell in the convote_v1.1 directory.
$ python demo.py RY 0.25
[constructing dataset...]
[dataset constructed.]
# Class: RY, Lambda: 0.25 #
__START__ mr. speaker , i yield back the balance of my time , and i want to commend the gentleman from california ( mr. pombo ) and myself were the republicans , and , unfortunately , it took us literally 6 months to finally agree what time to meet and where .
the difficulty with the endangered species act .
i also would like to thank the chairman for supporting the green brook flood control project is saving homes and businesses and lives .
it is equally vital that our senators from new jersey take up the fight for this important project and finish the work that we have asked them to do .
i have heard friends across the aisle say that americans have journeyed freely in the past and that this goes against the very freedoms which this nation was founded on .
but the truth is , try getting on an airplane .
what the motion to recommit does is force the states to do something , or not do something ; and that goes directly against the notion of federalism that is contained in this bill and which was drafted by the committee on government reform .
__END__
Here's an example:
__START__ mr. speaker, for years, honest but unfortunate consumers have had the ability to plead their case to come under bankruptcy protection and have their reasonable and valid debts discharged. the way the system is supposed to work, the bankruptcy court evaluates various factors including income , assets and debt to determine what debts can be paid and how consumers can get back on their feet. stand up for growth and opportunity. pass this legislation .
__END__
Good examples – cut republican billion will pay percent benefit cost – program fund educ cut provid health help million – economi job see need percent continu import now – job make work compani busi right american good – iraq war us presid support vote administr congress
These don't really look that great to me (and the 'bad examples' are worse) but I'm not an NLP expert