Hacker News new | past | comments | ask | show | jobs | submit login

Nice visualization. Never heard/used cross join. What would be use case for something like that in practice?



I mostly use CROSS JOIN when I need "one of every combination".

Suppose we have a log of events, and each event has a user, a date, and an event action associated with it. Now I want to transform this log into a tabulation, where I have one record for each combination of user, date, and action type with an associated count. I can do something like:

    WITH tabulation AS (
        SELECT
            user,
            action,
            date,
            COUNT(*) AS num_events
        FROM events
        GROUP BY
            user,
            action,
            date
    ),
    combinations AS (
        SELECT
            users.user,
            actions.action,
            dates.date
        FROM (SELECT DISTINCT user FROM tabulation) users
        CROSS JOIN (SELECT DISTINCT action FROM tabulation) actions
        CROSS JOIN UNNEST(
            SEQUENCE(
                (SELECT MIN(date) FROM tabulation),
                (SELECT MAX(date) FROM tabulation)
            )
        ) AS dates(date)
    )
    SELECT
        user,
        action,
        date,
        COALESCE(num_events, 0) AS num_events
    FROM combinations
    LEFT JOIN tabulation USING (user, action, date)


+1 for being the only comment so far to mention USING!


They're fairly rare, but they do come up. It allows you to very quickly create data, and since it's multiplication it can very quickly create a lot of data. It's mostly a way of thinking about things rather than something you actually do.

Usually they can be written as an INNER JOIN because what you really want is a non-standard join or a partial CROSS JOIN. You want a Cartesian product. For example, maybe you want to join an employee table to a calendar table to produce a list of dates for each employee. Usually you want a subset of the calendar though, so you end up with something like:

  SELECT *
  FROM Employee e 
  INNER JOIN Calendar c 
      ON c.Cal_Date BETWEEN '2021-01-01' AND '2021-12-31'
They also come up when you define an inline table with a VALUES statement, which is like creating a virtual table in a query. So if you want one row for M-F, you could do something like this:

  SELECT *
  FROM Employee e
  CROSS JOIN (VALUES ('M'),('T'),('W'),('R'),('F')) d (Day_of_week)
Note that we have to give the system an alias for the "table" we're making and a name for the column we created, too.

Another common use is with a tally table, which is a utility table that stereotypically has a single column with integers from 1 to some arbitrary value (10,000 is common). It's got many uses. Say you want to look for non-English letters in names. You have a UNICODE() function that returns the character code of a single character, and a SUBSTRING() function that returns a substring of a character field.

  SELECT *
  FROM Employee e
  INNER JOIN Tally t
      ON t.N BETWEEN 1 AND LEN(e.Name)
  WHERE UNICODE(SUBSTRING(e.Name,t.N,1)) NOT BETWEEN 32 AND 126
Note again that it's not actually written as a CROSS JOIN here, but it's the same line of logic that leads to CROSS JOINs. We're creating a Cartesian product. You could write it as a CROSS JOIN, but this method makes it a bit more readable to understand the logic.

(The above is written with SQL Server or T-SQL. Other RDBMSs have different syntax, but can generally do the same things.)


Cross Join is also called Cartesian Join.

A (vague, handwavy) use case would be if you had a table of Towns, and a table of Pizza_Chains; by iterating over the joined table, you would see every permutation of Towns and Pizza_Chains, and (by inputting address info) you could produce a new table giving the address for each Pizza_Chain in each Town.

The cross-join would be used once, to create the new table. You want to use cross joins as little as possible, so a single-use cross join is reasonable. I admit it's a pretty stretched example; I suspect a better example might be found in a more abstract field, like maths, but I'm not feeling abstract enough right now to work one out.

You make a cross join like this:

SELECT Towns.id, Towns.name, Pizza_Chains.id, Pizza_Chains.name FROM Towns, Pizza_Chains

I.e. you simply don't have a WHERE clause.

I don't think I've ever had occasion to use a cross join. It's not a very useful query, because the resultset doesn't give you any information - or rather, it tells you every possible result you could get, if you tried the query with every possible WHERE clause.


When you want to view (or display) all possible combinations.

It doesn't make much sense for example with relations that describe things that happened or are (facts). For example you wouldn't cross join a users and addresses table.

But when relations are composable, say a product with selectable features, then it can make sense to cross join.

An example:

You ask a user a few questions and want to suggest a solution or product based on the combinations of the answers. The suggestions are provided by domain experts in advance.

In this case you might want to use a cross join to provide a matrix of all combinations to your domain experts. They can fill in valid suggestions for some and invalid ones for others.

Based on their input you can now automatically generate a questionnaire.


cross joins create data (facts) that dont exist in real life

you will probably never use them, except in rare cases for example

return employee sales by products, show 0 if no sales in a normal db, you will only have records for products with sales above 0

you will need a cross join to join every employee with all products even those with no sales

and full outer join will not work, because it will only return products with some sales for at least one employee

only a cross join will return all products

i will do this in two steps 1 cross join emp and product left join the result from 1 to product sales replace nulls with 0

rare case, stupid case too


I've only used them while using certain frameworks to dynamically generate queries referencing multiple tables. They end up getting turned into an implicit inner join though. The following queries are equivalent.

  SELECT a.*, b.*
  FROM a CROSS JOIN b
  WHERE a.id = b.a_id
  
  SELECT a.*, b.*
  FROM a, b
  WHERE a.id = b.a_id

  SELECT a.*, b.*
  FROM a INNER JOIN b ON a.id = b.a_id
IIRC .NET's Entity Framework will (or at least used to) generate queries similar to this as well.


I've only used it a few times, but it was usually to combine data from two tables that didn't have a column I could join them on.


Enumerating a cafe menu:

  select *
  from
    (values ('egg'), ('sausage'), ('spam')) main,
    (values ('baked beans'), ('chips'), ('spam')) side;
Of course, to get the complete menu, you would need to use a recursive common table expression.


In Oracle in some cases cross join works as cross apply, so you can cross join table A with table-returning structure and have it compute column based on table A.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: