0
db.inventory.aggregate( [
   {
     $lookup:
       {
         from: "order",
         localField: "_id",
         foreignField: "item_id",
         as: "inventory_docs"
       }
  }
] )

The $lookup is joining based on the item_id field, which is indexed. If 100,000 documents pass through this $lookup, it increased the query time to 4X than without this $lookup.

Given that the $lookup is indexed, it is unexpected that the query will be slower by 4X. I was expecting a marginal increase in query time.

Is this also the case for SQL databases? Will an indexed join increase query time by 4X?

EDIT:

explain doc:

    {
      "$lookup": {
        "from": "order",
        "as": "inventory_docs",
        "localField": "_id",
        "foreignField": "item_id",
        "let": {},
        "pipeline": [
          {
            "$project": {
              "_id": 1
            }
          }
        ]
      },
      "totalDocsExamined": 0,
      "totalKeysExamined": 100008,
      "collectionScans": 0,
      "indexesUsed": [
        "_id_"
      ],
      "nReturned": 100008,
      "executionTimeMillisEstimate": 18801
    }

So it took 18 seconds to lookup 100,000 documents with an indexed field, _id. This seems really slow.

2
  • Which SQL dbms do you have in mind?
    – jarlh
    Commented Aug 15, 2023 at 19:39
  • SQL dbms in general. Commented Aug 15, 2023 at 19:40

2 Answers 2

3

i suggest you run an explanation of that aggregation. Examine if the index is being used and make sure if the lookup is what taking all that extra time. you can share the explain logs here to provide more context you can find the docs for explaining your aggregation pipelines :

https://www.mongodb.com/docs/manual/reference/method/db.collection.explain/

9
  • 1
    I have edited my question and posted the explain output of the $lookup. It is using an index, and it still took 18 seconds. The whole query takes about 20 seconds. Commented Aug 15, 2023 at 21:01
  • 1
    i have some recommendation that might help , Before performing such a large join, evaluate the necessity of including all these documents in the result. If possible, reduce the size of the join by filtering the data or limiting the result set. Commented Aug 15, 2023 at 21:24
  • Though it seems minimal, the $project stage inside the $lookup pipeline could have some impact. It might be worth evaluating if the projection is necessary or if it can be optimized further. Commented Aug 15, 2023 at 21:25
  • 1
    If the join operation is a constant need and you have control over the data model, you might consider denormalizing some of the data into the main collection. This could greatly enhance performance by eliminating the need for the join. Commented Aug 15, 2023 at 21:25
  • and Always good to check hardware and server configuration, ensuring that there is sufficient memory, CPU, and that the server is configured optimally for your workload. Commented Aug 15, 2023 at 21:26
0

There is no general answer to your question.

Let's have a look at Oracle history. In earlier times (i.e. prior Oracle 8i which was released 1997) the execution path was defined in rule-based-optimizer. The query was analyzed and existence of indexes was checked. Based on this information the execution path was selected.

Today the Oracle optimizer also considers the data itself, using statistics. MongoDB $lookup would be a OUTER JOIN, Oracle knows 5 different OUTER Join Types. The Oracle Optimizer fills entire books.

Also MongoDB is constantly improving their data access methods, see for example Slot-Based Query Execution Engine

Your question is way too broad!

Not the answer you're looking for? Browse other questions tagged or ask your own question.