Hacker News new | past | comments | ask | show | jobs | submit | shubham_saboo's comments login

Imagine a data scientist tasked with improving customer segmentation for a marketing campaign. Typically, they might ask an AI, "What is the best clustering algorithm to use for customer segmentation based on purchase history?" This question, while precise, limits the scope of the AI's response to just selecting an algorithm.

Instead, the data scientist decides to use a more open-ended approach: "In what innovative ways can we use data science to understand our customers' behavior and improve our marketing strategies?" This broader question doesn't just seek an algorithm; it opens the door to a wider range of data-driven insights and strategies.

The AI's response suggests not only using clustering algorithms like K-means for segmentation but also incorporating sentiment analysis of customer reviews and feedback to add another layer to understanding customer preferences. It also proposes predictive modeling to forecast future purchasing behaviors based on a combination of historical purchase data and external factors like market trends and seasonal impacts.


Very recently we have also opensourced BUDA, top-down software stack for running ML models on Tenstorrent Hardware: https://github.com/tenstorrent/tt-buda

Metalium being the bottom-up software stack giving open access to Tenstorrent Hardware.


Yes, Grayskull is Tenstorrent's entry-level devkit for inference only. Future generation of chips to feature training.


Wao, this is a really cool way to build full fledged search that too in a notebook!

Does it work end-to-end with PDF as a data structure or do we have to use OCR and parse the text first to be able to search it, really curious?


The version in the notebook is just for simple text-based PDFs. I wrote some posts on our company blog[1] about the sheer agonies of dealing with PDF as a data format, so wanted to stick with as simple as possible for now.

That said, I'm planning future notebooks where you can perform text-to-image or image-to-image search, integrate OCR, scale it up, serve it, deploy it, etc.

[1] https://medium.com/jina-ai


Awesome, will be on the lookout for that!


We've got quite a few other notebooks for other kinds of search on the blog. Would love to hear your thoughts!


Under the hood, it uses https://github.com/pdfminer/pdfminer.six which expects the text to be stored as text.


You mean the PDFSegmenter Executor in the notebook?


Yes


PDFSegmenter also extracts images, which can then be OCR'ed in the next step of the pipeline


"PDF as a data structure"

Don't. PDF is a terrible format for storing machine readable data. You lose a ton of Information while you create the PDF which you then painstakingly have to get back later (if that's even possible)


I may have misworded it (if I wrote those words - PDF rots the brain and my memory likewise).

Agreed on the rest. PDFs don't store machine-readable data. Often just pixelated scanned hot garbage dumpster fire text.

I hate PDFs but have to work with the satanforesaken things. Hence the notebook. It's my little way of trying to give my little PDF-bespoked-hellscape a tiny little glow-up.


I probably didn’t read your comment closely enough. When I hear about PDF parsing or PDF as data I immediately get flashbacks from a project years ago where I had to parse PDF files. I think I am still traumatized by this experience so whenever I hear somebody wants to do this I just want to scream “Nooo. Don’t do this”


I think you and I should start a support group!


Incidentally Jina Hub [0] has a few OCR Executors [1][2] you could integrate into my notebook (though you'd have to do some rewiring to take images into account since it's a text-based notebook)

[0] https://hub.jina.ai/

[1] https://hub.jina.ai/executor/w4p7905v

[2] https://hub.jina.ai/executor/78yp7etm


Raises a big question here - Is AI there to assist the teachers or replace them?


So much time is taken up on student questions when they could just RTFM. Having an AI take care of that menial bullshit doesn't hurt anyone. There are higher value tasks a teacher can perform then schooling someone who didn't read the curriculum (which this chatbot is based off, after all)


I'm so glad you think an AI is better equipped to answer student questions than an actual human being! I'm sure your years of experience and expertise in the field of education have really helped you develop this groundbreaking opinion.


When prompted GPT-3 with this and asked to come up with a funny response, here is what I got - "I don't know, I can't calculate that."


The book is a pragmatic take on OpenAI's GPT-3 illustrating the capability of this extraordinary model in tackling a wide array of tasks, like having a human-like conversation, text completion, text summarization, and even coding with stunningly good performance.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: