More

hcentelles · on March 26, 2023

Could you please elaborate on how you utilize both of them together, and for which specific use case? I'm attempting to gain a better understanding of the hybrid approach.

ChocoluvH · on March 27, 2023

Certainly!

The thing is to make ElasticSearch scores "comparable" to Milvus scores. Lots of ways to do this, but there's no single good solution. For example you could calculate BM25 score offline, or use TF-IDF score to do some kind of filtering. Again there's no single perfect answer. You'd have to do a lot of experiment according to your own use case and your own data to get the best results.

Also a lot of tuning needs to be done during all phases: 1) query pre-processing 2) query tokenizing 3) retrieval 4) ranking and reranking

I personally would not trust any universal "hybird-search" solutions. All toy demos.

It usually takes 5-10 good engineers to build a decent search engine/system for any real use case. It also requires a lot of turning, tricks, hand-written rules to make things work.

hcentelles · on March 25, 2023

To keep the "memory", do you pass the embeddings along with the new text prompt in an API call? How do you combine embeddings and text prompts? I don't know much about this, sorry if the question sounds silly.

TrapLord_Rhodo · on March 25, 2023

use llama index: https://gpt-index.readthedocs.io/en/latest/index.html

The below code takes a list of questions from an excel, and answers each one based on the directory I passed in. I use this for answering Statement of Works for proposals i write as a first path. Usually, I will have a number of different directorys that i pass in to 'Talk' to different intellegences and get a couple different answers for each prompt. One trains on the entire corpus of my past performance. One has a simple document discussing tone and other information, and one in training on only the SOW itself.

   def excelGPT(dir, excel_file, sheet):
    #my GPT Key
    os.environ['OPENAI_API_KEY'] = 'sk-~Your open AI Key Here'
    #Working Directory for training
    root_folder = ''
    documents = SimpleDirectoryReader(root_folder).load_data()
    index = GPTSimpleVectorIndex(documents)
    file_name = dir + excel_file
    

    df = pd.read_excel(file_name, sheet_name=sheet)

    answer_array = []
    df_series = df.iloc[:,0]
    
    for i,x in enumerate(df_series):
        print("This is the index ", i)
        print(x)
        response = index.query(x)
        answer_array.append(str(response))

    zip_to_doc(df_series, answer_array, dir)

ibrahimsow1 · on April 6, 2023

Hey, is it alright if you explain this in a bit more detail. I've playing around with llama-index myself. Do you have multiple indices? Or do you run each question through and get multiple responses. Isn't that quite expensive?

How do you also deal with the formatting of the various excel files. Would love to see the source code for this if you are willing to share?

nunodonato · on March 25, 2023

I explain it here: https://www.youtube.com/watch?v=JHkdK_9ZnFw&list=PLhZ4HC__ic... (its a bit old, I've made quite a few changes, but the essence is the same)

hcentelles · on June 17, 2021

I’ve being waiting for a product like this for some time now, I think there is a huge (not yet served) market for this. I’ve tried to implement something using Cloudflare workers, but failed, also tried to use Apollo Cloud trough a Apollo Federation server in front of my (non Apollo Server) API, failed too.

Some questions:

How it compares with Apollo Cloud on feature set terms?

My graphql server load is like 20 request/s average. At first the pricing looks a little bit intimidating for me, but running the numbers it looks like $500/m, is that right? Hopefully it will offset some of my origin servers costs.

What count as a request? Just request coming from the “outside” or also calls to purge for example?

I’ll be trying GraphCDN soon, maybe even today.

Good luck

timsuchanek · on June 17, 2021

Thanks @hcentelles, that's great to hear and gives us validation that there is a need!

Compared to Apollo Cloud: We're mostly focused on the caching part right now and have a different architecture where we are in your stack. Apollo runs a sidecar next to your application. We are a proxy in front of your API.

When it comes to the analytics part - which Apollo rather calls metrics, I think Apollo gives you field-level information, while we for now just have query-level information. However, we are fully server agnostic - you don't need to use Apollo Server. Any GraphQL API works. You just need to switch the URL in your clients. We even have customers just using the analytics part for now and disabling the caching in the beginning.

For the pricing: That is correct - you'd have about 50mio requests a month, so $500. However, the pricing there is not set in stone and we're happy to give you an early discount. Just contact us at support@graphcdn.io.

Right now only outside requests count as a request, no matter if cached or not. Purging calls might also count in the future.

hcentelles · on June 17, 2021

Thanks for the quick (good news) answer. I’ve more questions

We use Apollo client, I’m worried about 2 specific features of it:

Batch queries [1], we’re currently using it.

Persisted queries [2], we’re planning to using it.

Are those compatible with GraphCDN?

1. https://www.apollographql.com/docs/react/api/link/apollo-lin...

2. https://www.apollographql.com/docs/react/api/link/persisted-...

mxstbr · on June 17, 2021

We "support" batched and persisted queries in the sense that we don't break them, we pass them through to your origin, but we don't currently analyse them. Caching / analytics support for both of them is on our roadmap[0] in the near-term for sure!

[0]: https://graphcdn.io/feedback

hcentelles · on June 17, 2021

Oh, it seems like I’ll have to wait for that, at least for batch queries support.

I’ll try to keep myself updated on your changelog.

0xy · on June 17, 2021

Apollo Cloud pricing is absurd at the enterprise level. I worked at two large companies who inquired and both balked.

It was cheaper to build our own solution with plugins than it was to use their solution.

timsuchanek · on June 17, 2021

Interesting to know. For enterprises we even go down with the price per million requests, as the volume is much higher and therefore the enterprise pays enough already.

hcentelles · on June 15, 2015

It seems like google is in Cuba right now, talking about this: http://www.politico.com/story/2015/06/google-testing-the-wat...

hcentelles · on June 15, 2015

Not really, unfortunately: http://cubanotes.com/is-the-cuban-government-censoring-el-pa...

hcentelles · on June 15, 2015

It's been a while now since "El Paquete" became the main distribution channel of online content in Cuba, a lot has been written about this before.

A less known aspect of this topic is the net neutrality issues that this kind of distribution imply. At the end of the day, all the content come from this mighty anonymous source that download and distribute the content for a profit, presumably a huge profit. This source is god, he or she has the last word of what get in and what is left out.

So, since the beginning of "El Paquete" my website revolico.com went in. Revolico content (classifieds ads) is like a basic need in a market with almost 100% goverment control over the retail space (price fixing, availability, etc.), but about a month ago our content was left out, with a note that said that it would no longer be available because it has been used to for the purposes of “personal and political defamation against the country and its citizens.”, I was like WTF, is this the goverment infiltrating "El Paquete"? is a nasty move of our competitors? Who knows, the problem is that one guy has the power to decide what is ditribute it and what is not. This is not good by any mean.

Two weeks after revolico came back to "El Paquete", everything points that the customers were asking for it, so the producers were forced to include it again.

"El Paquete" is one of the best things that is happen in Cuba digital space right now, but a not centralized version is mandatory to make it less vulnerable to goverment control or other kind of arbitrariness.

More on this:

http://cubanotes.com/is-the-cuban-government-censoring-el-pa...

http://www.theguardian.com/world/2014/dec/23/cuba-offline-in...

drzaiusapelord · on June 15, 2015

Where is revolico hosted? What are your analytics like in a country with so little internet penetration? Does being in El Paquete mean that people are seeing ads that are now weeks or even months old?

hcentelles · on June 15, 2015

The app is hosted in a typical cloud computing environment. The traffic from Cuba is 4M page views monthly. El Paquete gets updated every week so the people are seeing ads active the week before. We sell premium listing, our clients ask us for the right timing so its ads gets into El Paquete in the firsts positions.

hcentelles · on June 9, 2015

Accurate snapshot of the internet state in Cuba, well written from a american point of view.

As the cofounder of one of the most popular cuban websites, revolico.com, I'm suffering this since 2007. We launch revolico on December '07, on march '08 the government blocked our IPs, then when we circumvent this censorship, they made a DNS spoofing nationwide.

Nevertheless revolico is still the #1 classifieds ads site of the island, way ahead of the government offering, our users are doing a lot of crazy and creative stuff to get acces to the site.

So Cuba, besides having an internet penetration of less than 5%, strongly censor the link, which is even sadder. I predict that access will increase in the near/medium term, but unfortunately proportionally with the censorship.

rdudek · on June 9, 2015

Have you ever writted a full story in regards how you operate the site and how users from the island are able to access it? I would love to read it.

bayesianhorse · on June 9, 2015

Cuban authorities would also be very interested, I imagine.

hcentelles · on Dec 17, 2014

I'm from Cuba, living in Spain right now.

The Cuban government is reluctant to open internet access to the people, despite of they already have the needed bandwith through a submarine cable from Venezuela. Is really fascinating how the Cubans have developed a higly optimized offline distribution channel to share dowloaded content like websites, software, video games, tv shows, movies, with almost the same comsuption patterns of the connected world.

This is a loable move from Obama admnistration and can have a pontentially impact on the near future of cuban internet. The White House fact sheet (http://www.whitehouse.gov/the-press-office/2014/12/17/fact-s...) said:

"Telecommunications providers will be allowed to establish the necessary mechanisms, including infrastructure, in Cuba to provide commercial telecommunications and internet services, which will improve telecommunications between the United States and Cuba."

If Cuban government allow this kind of companies to do business on or with Cuba, that could be huge. But if happens, this could be very slow, sadly.

Disclosure: I'm the cofounder of some Cuba related startups, a classifieds ads site censored by the Cuba government https://www.youtube.com/watch?v=GUmPkb44n_w, they block us by ip and dns, despite of the censorship, revolico is one of the most visited sites in the country, taking into account that cuba has a 5% internet penetration. Also a atypical remittances platform https://www.fonoma.com and crowfunding site for cuban artists shutted down by the USA goverment because of the kind of restriction that they are softening today http://www.yagruma.org

caente · on Dec 18, 2014

You started revolico!? That's awesome man, I think that project "opened" the mind of a lot of cuban entrepreneurs. I know a few cool projects over there, I also know a lot of plastic artists trying to start projects that connect the "exile" with the people of the island. As you might know, even among those oppose to the regime, there is a lot of bias against cubans from America(unless they are family/friends). I'm also cuban, living in NY, I'd like to help out with whatever I can. You'll find my email on my profile.

hcentelles · on Dec 18, 2014

I can't get the email from your profile, but you can ping me on twitter @hcentelles, it will be nice to be in touch

ars · on Dec 18, 2014

> loable

The word you are looking for is laudable BTW.

smtddr · on Dec 18, 2014

I had to look it up, but apparently in spanish "loable" is pefectly valid. http://www.spanishdict.com/translate/loable

pmelendez · on Dec 18, 2014

Yes, loable would be the equivalent in Spanish.

abreu · on Dec 18, 2014

Hi. I'm from Cuba, living in Norway. I run a startup in Norway. Would love to meet you if you are in Madrid 19-22 December.

Replace the X with my username. david.gutierrez.X@gmail.com

Un abrazo.

hcentelles · on March 31, 2014

Where this quote came from?

maxmcd · on March 31, 2014

http://www.foundersatwork.com/

wpietri · on March 31, 2014

A book that any founder should read. It's a great set of interviews with founders telling relatively unsanitized versions of their startup stories. It serves as a great antidote to the business press's "all winners are perfect geniuses" school of reporting.

hcentelles · on March 18, 2014

I really want to hear more on that from the angular team. Generators and modules are my favorite ES6 features right now.