low performance: creating & reading files with many lines of data rather slow

added Ideas/Discussion Refactor labels

changed the description

removed Ideas/Discussion label

added Bug label

changed the description

I agree.

I would have gone with using np.loadtxt / np.genfromtxt from the beginning, but did not due to the increased complexity when reading FFI2110 files (data structure: ivar + nvar + auxvars line, then nvar * "ibvar + depvar" lines, then again ivar + auxvars, and so forth, with the number of ibvar + depvar lines possibly different for each portion ...).

For FFI1001 using np.loadtxt / np.genfromtxt is straightforward, and we could implement that first, then continue with FFI2110 bulk reading.

The alternative you suggested (using native lists first) seems appealing as it could more easily cover the FFI2110 format and this would potentially mean more reusable code.

No strong feelings either way at the moment, will need to look at it in detail.

changed milestone to %publish as v2.0.0 on pypi

Now that test data is called correctly, this also becomes obvious when running the tests (bulkIO)

assigned to @christophknote

Have a look at the speedup_io branch in my fork, @FObersteiner . I have implemented a Numpy genfromtxt - based input solution, which is way quicker. Works for both FFI1001 and FFI2110. Users are required to prepare NumPy arrays as input, and make sure they put the data in the right order. But that is probably ok to ask from them.

See the tests for usage, have not done documentation yet. That should do the trick, no?

@christophknote cool, I take a look later tonight. In the meantime, I just pushed revisions for handling normal comments to my dev branch, https://mbees.med.uni-augsburg.de/gitlab/FObersteiner/icartt-dev/-/tree/dev;

nLines are now also calculated correctly if the user supplies a multi-line comment as a single string
keywords/ingest now handles multi-line comments that include a : but no new keyword
bulk IO tests now enforce UTF-8

@christophknote I can't seem to find your fork, is it still private maybe?

Oops - was set to internal, which external users cannot see. Is public now, @FObersteiner !

@christophknote ok working now The revision looks fine to me! Took me a while to figure out that one has to set usemask=True to make the replacement with vmiss working. numpy docs are not very verbose there ^^

Do you want to merge this into MBEES/master first? As commented above, I found more bugs / "unexpected features" within the normal comments handling - at least half of those were probably introduced by myself ;-) I'll go ahead and make a PR as soon as you integrated the IO speed-up.

Good. Will merge into main now, work on docs and tests afterwards, so you can go ahead and make a PR for the normal comments...

... merged into master, @FObersteiner. Leaving issue open until tests and docs are up to date.

created branch 17-low-performance-creating-reading-files-with-many-lines-of-data-rather-slow to address this issue

Done with merging !18 (merged) .

closed

low performance: creating & reading files with many lines of data rather slow

creation

reading

ideas

Child items 0

Activity