"What do you think should happen if there are multiple foreign keys connecting t...

kbenson · on June 27, 2022

Unambiguous things can become ambiguous at later points. As soon as you add a second relation between the tables, what once was unambiguous now is, and because of something which may be entirely unrelated to the specifics of the original query.

This is where many conveniences that use implicit data run into problems. A small convenience now for the possibility of accidentally breaking because of mostly unrelated changes later is a poor trade off for anyone that wants to have stable and consistent software.

This is likely one of those cases where you're better off with tooling to help make writing the correct unambiguous code easier (or automated away) than introducing a feature which leads to less stable systems in some cases.

Edit: Along the lines of what you note at the end, I would rather see joins able to use named relations as defined in the schema. Of there's a relation from table movie to table actor specifically names roles in the schema, I would rather be able to join movie on roles and have actors joined correctly using that relation, and aliases to roles which I could then use. Then you're using features that are designed and stable and not implicit and subject to changing how or whether they function based on semi-unrelated changes.

That might look like: "from movie relate roles" which is equivalent to "from movie join actor roles on movie.id = roles.movie_id", but because actor.movie_id has a constraint in the schema named roles which restricts it to a movie.id already.

asqueella · on June 28, 2022

Agreed! But rather than making the query compiler infer join paths from the schema, wouldn't it make more sense to support defining common join paths (like your 'movie relate roles') in the language, and build a tool, that generates such definitions from the schema, as a separate step?

I don't have a specific syntax in mind yet; for illustrative purposes:

    defjoin r,m,a = %prejoin_roles() -> {     # define a common join path between three relations r,m,a:
      from r=ROLES                            # can hard-code table names or use parameters (which may refer to other parameters)
      join m=MOVIES [r.movie_id = m.movie_id]
      join a=ACTORS [r.actor_id = a.actor_id]
    }

    from r,m,a = %prejoin_roles()
    select m.title, a.character_name

This `defjoin` thing is a limited version of PRQL `table`, which -- unlike a CTE -- remembers which relation each attribute comes from. Perhaps one can instead figure out how to extend `table` to support this.

strbean · on June 27, 2022

That sounds like the perfect solution!

ximeng · on June 27, 2022

One way to avoid constraint name collisions is to include the base table and foreign table names and keys in the constraint name separated by underscores, at which point you don’t save much by using the constraint in a join.