lundi 28 décembre 2015

Importing large CSV files in MySQL using Laravel

I have a csv file that can range from 50k to over 100k rows of data.

I'm currently using Laravel w/ Laravel Forge, MySQL, and Maatwebsite Laravel Excel package.

This is to be used by an end-user and not myself so I have created a simple form on my blade view as such:

{!! Form::open(
    array(
        'route' => 'import.store', 
        'class' => 'form',
        'id' => 'upload',
        'novalidate' => 'novalidate', 
        'files' => true)) !!}

    <div class="form-group">
        <h3>CSV Product Import</h3>
        {!! Form::file('upload_file', null, array('class' => 'file')) !!}
    </div>

    <div class="form-group">
        {!! Form::submit('Upload Products', array('class' => 'btn btn-success')) !!}
    </div>
{!! Form::close() !!}

This then stores the file on the server successfully and I'm now able to iterate through the results using something such as a foreach loop.

Now here are the issues I'm facing in chronological order and fixes/attempts: (10k rows test csv file)

  1. [issue] PHP times out.
  2. [remedy] Changed it to run asynchronously via a job command.
  3. [result] Imports up to 1500 rows.
  4. [issue] Server runs out of memory.
  5. [remedy] Added a swap drive of 1gb.
  6. [result] Imports up to 3000 rows.
  7. [issue] Server runs out of memory.
  8. [remedy] Turned on chunking results of 250 rows each chunk.
  9. [result] Imports up to 5000 rows.
  10. [issue] Server runs out of memory.
  11. [remedy] Removed some tranposing/joined tables logic.
  12. [result] Imports up to 7000 rows.

As you can see the results are marginal and nowhere near 50k, I can barely even make it near 10k.

I've read up and looked into possible suggestions such as:

  • Use a raw query to run Load Data Local Infile.
  • Split files before importing.
  • Store on server then have server split into files and have a cron process them.
  • Upgrade my 512mb DO droplet to 1gb as a last resort.

Going with load data local infile may not work because my header columns could change per file that's why I have logic to process/iterate through them.

Splitting files before importing is fine under 10k but for 50k or more? That would be highly impractical.

Store on server and then have the server split it and run them individually without troubling the end-user? Possibly but not even sure how to achieve this in PHP yet just only briefly read about that.

Also to note, my queue worker is set to timeout in 10000 seconds which is also very impractical and bad-practice but seems that was the only way it will keep running before memory takes a hit.

Now I can give-in and just upgrade the memory to 1gb but I feel at best it may jump me to 20k rows before it fails again. Something needs to process all these rows quickly and efficiently.

Lastly here is a glimpse of my table structure:

Inventory
+----+------------+-------------+-------+---------+
| id | profile_id | category_id |  sku  |  title  |
+----+------------+-------------+-------+---------+
|  1 |         50 |       51234 | mysku | mytitle |
+----+------------+-------------+-------+---------+

Profile
+----+---------------+
| id |     name      |
+----+---------------+
| 50 | myprofilename |
+----+---------------+

Category
+----+------------+--------+
| id | categoryId |  name  |
+----+------------+--------+
|  1 |      51234 | brakes |
+----+------------+--------+

Specifics
+----+---------------------+------------+-------+
| id | specificsCategoryId | categoryId | name  |
+----+---------------------+------------+-------+
|  1 |                  20 |      57357 | make  |
|  2 |                  20 |      57357 | model |
|  3 |                  20 |      57357 | year  |
+----+---------------------+------------+-------+

SpecificsValues
+----+-------------+-------+--------+
| id | inventoryId | name  | value  |
+----+-------------+-------+--------+
|  1 |           1 | make  | honda  |
|  2 |           1 | model | accord |
|  3 |           1 | year  | 1998   |
+----+-------------+-------+--------+

Full CSV Sample
+----+------------+-------------+-------+---------+-------+--------+------+
| id | profile_id | category_id |  sku  |  title  | make  | model  | year |
+----+------------+-------------+-------+---------+-------+--------+------+
|  1 |         50 |       51234 | mysku | mytitle | honda | accord | 1998 |
+----+------------+-------------+-------+---------+-------+--------+------+

So a quick run-through of my logic workflow as simple as possible would be:

  1. Load file into Maatwebsite/Laravel-Excel and iterate through a chunked loop
  2. Check if category_id and sku are empty else ignore and log error to an array.
  3. Lookup category_id and pull all relevant column fields from all related tables it uses and then if no null insert into the database.
  4. Generate a custom title using more logic using the fields available on the file.
  5. Rinse and repeat.
  6. Lastly export the errors array into a file and log it into a database for download to view errors at the end.

I hope someone can share with me some insight on some possible ideas on how I should tackle this while keeping in mind of using Laravel and also that it's not a simple upload I need to process and put into different related tables per line else I'd load data infile it all at once.

Thanks!



via Chebli Mohamed

Aucun commentaire:

Enregistrer un commentaire