0

I'm trying to download a very large gzipped csv file, hosted on a S3 server. I want to insert each csv row to my database. To achieve that I get the encoded stream, unzip it, parse it and save it to my mysql database.

The first +/- 2000 rows are inserted to database correctly, but after some time the process stops with a ECONNRESET onStreamRead error. I spent the last 2 days figuring out why. I'm very sure my internet connection is stable and fast and I'm using the latest version of Node + external packages.

Goal

  1. Process a remote gzipped csv on the fly (unzipped 1 GB big)

  2. Don't download/store the whole file first before processing, but process the streams directly

  3. Insert each row to database

  4. When all rows are inserted to database, resolve the Promise

Error

Error: read ECONNRESET at TLSWrap.onStreamRead (node:internal/stream_base_commons:218:20) { errno: -4077, code: 'ECONNRESET', syscall: 'read' }

Code

import https from "node:https";
import zlib from "node:zlib";
import mysql from "mysql";
import csv from 'fast-csv';

async processData(url) {
    return new Promise(async (resolve, reject) => {
        let gunzip = zlib.createGunzip();
        let csvStream = csv.parse({
            delimiter: '\t',
            headers: true,
            discardUnmappedColumns: true,
            ignoreEmpty: true,
            trim: true
        }).transform(async (row, next) => {
            await processRow(row);

            next();
        }).on('error', (e) => {
            reject(e);
        });

        https.get(url, async (encodedStream) => {
            encodedStream.pipe(gunzip).pipe(csvStream);
        }).on('error', (e) => {
            reject(e)
        }).on('finish', () => {
            resolve('done');
        });
    });
}

processRow(row) {
    let query = 'INSERT INTO `data` (external_id, external_value) VALUES(`${row.id}`, `${row.value}`)';
    this.con.query(query, [true], async (error, result, fields) => {
        if (error) {
            console.log(error, result, fields);
        }
    });
}

processData("https://some-gzipped-csv.gz");
1
  • probably there is a timeout or something being triggered. most probably on the server you're downloading from (but could also be on your local network stack, but that seems unlikely.) Does it always happen after the exact same x number of seconds or y number of bytes?
    – mb21
    Commented Jul 7 at 7:12

0

Browse other questions tagged or ask your own question.