Hacker News new | past | comments | ask | show | jobs | submit login

Before jumping into frameworks, if your data is lucky enough to be stored in an html table:

    import pandas as pd
    dfs = pd.read_html(url)
Where ‘dfs’ is an array of dataframes - one item for each html table on the page.

https://pandas.pydata.org/pandas-docs/stable/reference/api/p...




Sometimes it's also helpful to use beautiful soup to isolate the elements you want, feed the text of the elements into StringIO and give that to read_html.


Yes, this is a good idea for more complicated cases.


We made a chrome extension that queries any html table in any open tab with SQL:

https://chrome.google.com/webstore/detail/sqanything/naejbcf...

You can export the results to Google Sheets too. One advantage of the extension is it works with JS rendered tables.


Handy! I'm also a big fan of pd.read_clipboard() for specific selections.


Holy crap, is there anything pandas can't do?


Ingest bamboo? Sorry, couldn't resist.


Woah. I’ve used pandas a fair amount and had no idea about this. Thank you!


+1 this has saved me countless of hours


what does this do


It reads HTML and returns the tables contained in the HTML as pandas dataframes. It’s a simple way to scrape tabular data from websites.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: