We are doing a huge migration of two stores on one platform into one store on a different platform.
We're starting relatively "fresh" as far as importing anything old. Only the catalog and customers were copied, no order history and so forth.
However, over the years I've been doing weekly backups downloading the CSV data that the previous stores made available. They had separate files for products, categories, customers, and orders. They also provide XML exports which actually include additional data not found in the CSVs.
I need to take these hundreds of CSV files which are in 4 categories, across two stores, and convert them all into a type of single archive application for holding about 10 years worth of data. Sadly, they never wanted individual sales to be recording in accounting software (Quickbooks) so this is also acting like an accounting archive as well.
My skills are in PHP and MySQL as far as building DB apps, but there are some nuances to this project where I might want to explore alternate databases.
One thing is that as I did weekly backups, various bits of data might change, for example the price going up or down, or text in the description changing, or the URL being changed, etc. These are reflected in the history of my CSV files, but would be lost if I simply recorded the latest version of the product. Even an order may have changed from one week to the next if, for example, they added or removed an item from an order. Or the order was live and good but later cancelled. I don't want to lose the history of the order as recorded through my backups.
As I think about this from the RDBS perspective I don't know that it's the best choice. It would be difficult to have "one row" for a product, but then ALSO record every time-stamped variation of any given field.
I could save every backup as individual rows, but then every query would have to group things properly and filter out by date and so on. I think these queries would get very expensive always having to process dozens of rows of the same product and then parse out sorted alternative fields that might exist.
In my brain this seems better suited to an object datastore where an "array" called "Description" could contain every dated alternate text. Then each product or order could still be a single master record and all the alternate fields would be within it.
Normally storing customers, orders, and products would be pretty "relational" with a table for each one. But since I'd like to store the dated history of various fields within those, according to my backups, it complicates things a bit.
I'm either trying to figure out a good RDBMS strategy for this without creating spaghetti queries, or considering more of an object store or graph-type DB. Maybe Mongo, etc.
I also have to consider a method whereby all my CSVs can be processed and stored automatically since I don't want to enter anything manually, or even one file at a time.
Then I'll have to build a basic GUI for finding and viewing the records.