One of the most dangerous engineering bugs is the one that looks correct in output, passes simple tests, and still loses user data in the real world.
That is exactly what happened in this CSV round-trip case.
Here is the trap in one sentence:
a valid exported row can still be destroyed by a bad import assumption.
At a surface level, everything looked fine:
- export produced a CSV file
- quoted fields were present
- rows looked normal on happy-path records
But that surface success hid a deeper problem: the pipeline only really worked on toy rows.
The first failure mode showed up in notes.
Users could write multi-line notes. Export handled that correctly by quoting the field. So if you only inspected the CSV output, you might think the system was sound.
It was not.
On import, the old implementation split the full file on raw newlines before doing quote-aware CSV parsing. That meant a single valid record with a newline inside a quoted notes field got cut into two half-rows, and both halves became unparseable.
That is a classic integrity trap:
the export is technically valid, but a later stage invalidates the guarantee.
The second failure mode showed up in timestamps.
The export format used minute precision for createdAt. That looked harmless until two distinct events occurred within the same minute. Once exported and re-imported, they could collide on the dedup key and be treated as one event instead of two.
Again, the problem was not obvious by casually reading the file. The file looked normal. The silent damage emerged only when import semantics met realistic usage.
That is what makes this case so useful.
It was not "a CSV parser bug."
It was a broader round-trip integrity bug class caused by assumptions that seemed reasonable in isolation:
- rows end at newlines
- minute precision is enough
- dedup keys only need obvious fields
- if export looks right, import is probably fine
Those assumptions fail under user-shaped data.
The hardening pass fixed the bug class in a way that is worth copying:
- row splitting became quote-aware
- timestamps moved to second precision
- dedup keys became stronger
- regression tests were added for multiline notes
- regression tests were added for same-minute records
- regression tests were added for pathological note content and newline normalization
The biggest lesson is simple:
If you claim round-trip support, test round-trip integrity, not just output syntax.
That means using records with:
- embedded newlines
- commas
- quotes
- emoji
- locale-shaped numbers
- near-colliding timestamps
Because that is what real users generate.
Simple CSV fixtures are fine for getting started. They are not enough to prove you are preserving data.
If this kind of deep bug breakdown is useful, I can keep building a library of engineering cases where "the output looked fine" was exactly the trap.