shubham_saboo's comments

shubham_saboo · 2024-03-10T19:56:42 1710100602

Imagine a data scientist tasked with improving customer segmentation for a marketing campaign. Typically, they might ask an AI, "What is the best clustering algorithm to use for customer segmentation based on purchase history?" This question, while precise, limits the scope of the AI's response to just selecting an algorithm.

Instead, the data scientist decides to use a more open-ended approach: "In what innovative ways can we use data science to understand our customers' behavior and improve our marketing strategies?" This broader question doesn't just seek an algorithm; it opens the door to a wider range of data-driven insights and strategies.

The AI's response suggests not only using clustering algorithms like K-means for segmentation but also incorporating sentiment analysis of customer reviews and feedback to add another layer to understanding customer preferences. It also proposes predictive modeling to forecast future purchasing behaviors based on a combination of historical purchase data and external factors like market trends and seasonal impacts.

shubham_saboo · 2024-02-05T17:56:46 1707155806

Very recently we have also opensourced BUDA, top-down software stack for running ML models on Tenstorrent Hardware: https://github.com/tenstorrent/tt-buda

Metalium being the bottom-up software stack giving open access to Tenstorrent Hardware.

shubham_saboo · 2024-02-05T17:53:32 1707155612

Yes, Grayskull is Tenstorrent's entry-level devkit for inference only. Future generation of chips to feature training.

shubham_saboo · on July 25, 2022

Wao, this is a really cool way to build full fledged search that too in a notebook!

Does it work end-to-end with PDF as a data structure or do we have to use OCR and parse the text first to be able to search it, really curious?

alexcg1 · on July 25, 2022

The version in the notebook is just for simple text-based PDFs. I wrote some posts on our company blog[1] about the sheer agonies of dealing with PDF as a data format, so wanted to stick with as simple as possible for now.

That said, I'm planning future notebooks where you can perform text-to-image or image-to-image search, integrate OCR, scale it up, serve it, deploy it, etc.

[1] https://medium.com/jina-ai

shubham_saboo · on July 25, 2022

Awesome, will be on the lookout for that!

alexcg1 · on July 25, 2022

We've got quite a few other notebooks for other kinds of search on the blog. Would love to hear your thoughts!

rahimnathwani · on July 25, 2022

Under the hood, it uses https://github.com/pdfminer/pdfminer.six which expects the text to be stored as text.

alexcg1 · on July 25, 2022

You mean the PDFSegmenter Executor in the notebook?

rahimnathwani · on July 25, 2022

alexcg1 · on July 25, 2022

PDFSegmenter also extracts images, which can then be OCR'ed in the next step of the pipeline

spaetzleesser · on July 25, 2022

"PDF as a data structure"

Don't. PDF is a terrible format for storing machine readable data. You lose a ton of Information while you create the PDF which you then painstakingly have to get back later (if that's even possible)

alexcg1 · on July 25, 2022

I may have misworded it (if I wrote those words - PDF rots the brain and my memory likewise).

Agreed on the rest. PDFs don't store machine-readable data. Often just pixelated scanned hot garbage dumpster fire text.

I hate PDFs but have to work with the satanforesaken things. Hence the notebook. It's my little way of trying to give my little PDF-bespoked-hellscape a tiny little glow-up.

spaetzleesser · on July 26, 2022

I probably didn’t read your comment closely enough. When I hear about PDF parsing or PDF as data I immediately get flashbacks from a project years ago where I had to parse PDF files. I think I am still traumatized by this experience so whenever I hear somebody wants to do this I just want to scream “Nooo. Don’t do this”

alexcg1 · on July 26, 2022

I think you and I should start a support group!

alexcg1 · on July 25, 2022

Incidentally Jina Hub [0] has a few OCR Executors [1][2] you could integrate into my notebook (though you'd have to do some rewiring to take images into account since it's a text-based notebook)

[0] https://hub.jina.ai/

[1] https://hub.jina.ai/executor/w4p7905v

[2] https://hub.jina.ai/executor/78yp7etm

shubham_saboo · on July 18, 2022

Raises a big question here - Is AI there to assist the teachers or replace them?

alexcg1 · on July 18, 2022

So much time is taken up on student questions when they could just RTFM. Having an AI take care of that menial bullshit doesn't hurt anyone. There are higher value tasks a teacher can perform then schooling someone who didn't read the curriculum (which this chatbot is based off, after all)

shubham_saboo · on July 18, 2022

I'm so glad you think an AI is better equipped to answer student questions than an actual human being! I'm sure your years of experience and expertise in the field of education have really helped you develop this groundbreaking opinion.

shubham_saboo · on July 5, 2022

When prompted GPT-3 with this and asked to come up with a funny response, here is what I got - "I don't know, I can't calculate that."

shubham_saboo · on Dec 17, 2021

The book is a pragmatic take on OpenAI's GPT-3 illustrating the capability of this extraordinary model in tackling a wide array of tasks, like having a human-like conversation, text completion, text summarization, and even coding with stunningly good performance.