See More
Popular Forum

MBA (4887) B.Tech (1769) Engineering (1486) Class 12 (1030) Study Abroad (1004) Computer Science and Engineering (988) Business Management Studies (865) BBA (846) Diploma (746) CAT (651) B.Com (648) B.Sc (643) JEE Mains (618) Mechanical Engineering (574) Exam (525) India (462) Career (452) All Time Q&A (439) Mass Communication (427) BCA (417) Science (384) Computers & IT (Non-Engg) (383) Medicine & Health Sciences (381) Hotel Management (373) Civil Engineering (353) MCA (349) Tuteehub Top Questions (348) Distance (340) Colleges in India (334)
See More
( 8 months ago )

How to design schema denormalization to handle data changes?

General Tech Bugs & Fixes
Max. 2000 characters

Alex Wilson


( 8 months ago )


I'm designing a schema for MongoDB, and keep running into scenarios where future updates might invalidate my cached copies of data. One example is for Users, Orders and Addresses.

const UserSchema = mongoose.Schema({
    addresses: [{ street: String, city: String, state: String, zip: String }]

const OrderSchema = mongoose.Schema({
    address: { street: String, city: String, state: String, zip: String }

This seems to be a standard approach, since MongoDB isn't meant to be a relational database, to denormalize the data where possible. However, the following scenario confuses me:

  1. User adds an Address to their User document.
  2. User places an Order and selects an Address from their list of addresses.
  3. The address data is copied into the Order document when the order is persisted.
  4. Before the order is shipped, the user discovers that they mistyped the address.
  5. User changes that incorrect Address in their User document.
  6. The Address in the Order object needs to change also, otherwise it will be shipped to an invalid address.

This seems to point towards the need for a reference using mongoose.Schema.Types.ObjectId to simulate a relational structure between the collections. (In that case, there would also be an Addresses collection, of course.) However, there are other considerations such as the history aspect of the denormalization. I want to store the actual address to which the order was shipped, even if that address is later deleted or changed. With denormalization this would seem to be easier than the relational paradigm.

One approach I considered to create an Addresses collection, then mark its records as invalidated when they are deleted, in case they are already referenced in an Order. And when they are modified, I would need to check the Orders collection to see if that Address is referenced. If it's already been referenced in a shipped order, I would have to leave it alone (for historical purposes), and create an additional Address document with the new changes. All of which sounds a bit complicated, compared to the denormalization approach.

The next part of the issue regards querying and reporting. If I want to pull up a list of all Users who have ever had an address in Illinois, I would need to traverse both the Addresses collection and the Orders in order to find out. Because they may have had an Illinois address, used it in a shipped order, then deleted it from the Addresses collection.

How do the smartest MongoDB data architects handle situations like this? I'm an experienced relational database architect, but am somewhat baffled by the conceptual framework of NoSQL. Thanks!

what's your interest