Reading and writing text to / from a file

bbalmerTotalFluency · Post by **bbalmerTotalFluency** » Fri Mar 08, 2024 9:15 am

Hi:

I have a file with 50,000 rows of about 25 chars per row. I only need to read roughly the most recent 200 rows until something is true and it saves time not to load the entire file using put ULR "file:filename" into tWhatever

I can append data to the file easily (a line at a time) but then I'd have to read the file backwards - which I can't see how to do.
OR
I can prepend data to the file then read in the normal order, but I can seem to do that without it destroys the previous data.

It's odd I can't find a solution as I'm sure this is a problem that has been faced by so many people so many times, that there is an elegant solution that I am missing. In short, I DO NOT want to load the entire file into memory using the URL command. I just want to read the 200 most recently contributed lines for the purpose of time efficiency - in this case, speed matters.

Bruce

richmond62 · Post by **richmond62** » Fri Mar 08, 2024 10:10 am

This is extremely easy, and I have answered your question on an alternative forum which LiveCode do not want me to mention over here.

bbalmerTotalFluency · Post by **bbalmerTotalFluency** » Fri Mar 08, 2024 10:42 am

Thanks. I found your answer after a bit of confusion.

Klaus · Post by **Klaus** » Fri Mar 08, 2024 10:46 am

Hi Bruce,

look up these terms in the dictionary:
open file
read from file
close file

Best

Klaus

stam · Post by **stam** » Fri Mar 08, 2024 11:16 am

I'm not convinced that open file for read, then using read from file and then figuring out where lines -200 to -1 are to read those in will be particular faster than using the put URL method. I can't see a way to automatically figure out what the last 200 lines will be and possibly counting will be slower.

I say this because the Dictionary on Read From File states:

Summary: Takes data from a file that has been opened with the open file command, and places the data in the it variable.

Which is basically what put URL (file://...) does.
And to be honest, both functions are highly optimised and I don't see a problem with either...

FourthWorld · Post by **FourthWorld** » Fri Mar 08, 2024 4:24 pm

bbalmerTotalFluency wrote: ↑
Fri Mar 08, 2024 9:15 am
I have a file with 50,000 rows of about 25 chars per row. I only need to read roughly the most recent 200 rows until something is true and it saves time not to load the entire file using put ULR "file:filename" into tWhatever

What is the evaluation test you're performing? It may also be useful to see a representative sample of data that both satisfies and doesn't satisfy the test.

And to double check assumptions, it's only the single most recent occurrence of the element that satisfies the test that you're looking for, is that correct?

Klaus · Post by **Klaus** » Fri Mar 08, 2024 4:44 pm

stam wrote: ↑
Fri Mar 08, 2024 11:16 am
I'm not convinced that open file for read, then using read from file and then figuring out where lines -200 to -1 are to read those in will be particular faster than using the put URL method. I can't see a way to automatically figure out what the last 200 lines will be and possibly counting will be slower.
I say this because the Dictionary on Read From File states:
Summary: Takes data from a file that has been opened with the open file command, and places the data in the it variable.
Which is basically what put URL (file://...) does.
And to be honest, both functions are highly optimised and I don't see a problem with either...

"open file..." and "read from file...FOR..." gives us a way to define how much we want to read from the file,
where "put url("file:"...) will read the complete file into memory!

stam · Post by **stam** » Fri Mar 08, 2024 7:20 pm

Klaus wrote: ↑
Fri Mar 08, 2024 4:44 pm
"open file..." and "read from file...FOR..." gives us a way to define how much we want to read from the file,
where "put url("file:"...) will read the complete file into memory!

So what if it's all in memory? We're not running on 256 Kb any more, and haven't done for decades. It's speed that matters.

The question the OP asked (unless I'm mistaken) is how to access the last 200 lines of a text file with ~50,000 lines.
Maybe I'm not seeing it, but how would you do that with read from file FOR?
Yes you can start reading from a certain character number onwards until EOF, but how will you know which character that is without iterating the it variable. Will that not defeat the purpose of this?

And remember read from file puts the text in the it variable, so if you are iterating you still need to load the entire file into memory

Dictionary on Read From File wrote:Summary: Takes data from a file that has been opened with the open file command, and places the data in the it variable

I accept I may be missing the obvious coding magic that allows read from file to read the last 200 lines and if I am please do post the answer, and even better post some speed comparisons.

But based on what I see in the dictionary I don't see how without iterating, in which case you'd do better reading the whole thing into a variable and getting lines -200 to -1.

bbalmerTotalFluency · Post by **bbalmerTotalFluency** » Sat Mar 09, 2024 9:35 am

Hi OP here:

The heart of the matter is being discussed.

It isn't a memory constraint (fortunately). But let's be extreme for clarity. Let's say I have temperature records like this

2024-03-08,Calgary,30C
2024-03-08,London,12C

etc. And I have such records for each of 1,000 cities. So I have 1,000 records per day. Each day, I APPEND the records to the text file.

Now, I want to interrogate the file ONLY for today's records (yes, I could do this with a db but it is being done with text files at present). Let's say I have 10 years of records so 365 days * 10 years * 1000 cities = almost 4M rows.

I don't want the computer to have to read the first nearly 4M records just to ignore them because they aren't from today. What I want - and maybe this isn't possible because of the way computer systems are architected (not my area of expertise) - is to start at EOF (can I get there instantly or is the computer having to traverse all the records either way) and go back line by line until the line does not start with today's date. That would potentially be as quick as just reading the first 1,000 records.

I can solve the problem, of course in multiple ways. I can use put URL and then get the last 200 lines, I can create a file just for today and tomorrow append that file to the main records file. The nature of my question is - is there an elegant way using the built in cleverness of the OS and livecode or must this be resolved inelegantly? What's the difference? In this case, just being able to read 1,000 lines would be a great deal faster than reading in 4M lines. Again - could it be done better with a DB? Yes. Is it being done with a DB? No.

If my question is confusing it may be that to more qualified people my question seems insane. I can only SEEM to add lines efficiently to a text file by APPENDING them - meaning the most recent records are at the end of the file. I can only SEEM to read a file efficiently from the start. That those two are in opposition is the problem I would like to solve.

P.S. I appreciate that I could sort the file in reverse order once I had appended the records. Unfortunately for me, they don't come in all at the same time in a single batch - which would mean I could just sort all 4M records once, it would happen over and over and again it would be slow. But as I say, all sorts of CLEVER solutions could be used, I'm looking for simple, elegant and standard, clever I can do.

What also has me puzzled is there must be millions of people around the world with a similar problem and usually those ones have and simple, elegant and standard solution (maybe that solution is a database, but that isn't a choice right now).

SparkOut · Post by **SparkOut** » Sat Mar 09, 2024 11:22 am

Clever is not to have millions of records in the same record file. Work on a separate file for today's data. Give it a separate filename based on the date. If you need to aggregate data you can append to a master archive file. Or just combine specific files in a given date range when needed.
Or in a database.

stam · Post by **stam** » Sat Mar 09, 2024 3:21 pm

I second that.
If you're appending records to the end of a text file, sooner or later you'll run into trouble.
Why not use an SQLite database? Much more efficient...

bn · Post by bn » Sat Mar 09, 2024 9:53 pm

Bruce,

Since you insist you could try this:

Code: Select all

on mouseUp
   
   ## 25 chars per row
   ## lets use 30 chars to be sure
   put 200 * 30 into tCharsNeeded
   
   ## the number of lines you want from the end of the file
   put 200 into tNumLinesNeeded
   
   answer file "choose file to read"
   if it is empty then exit mouseUp
   
   put it into tfilePath
   open file tFilePath for text read
   read from file tfilePath at - tCharsNeeded until eof
   
   ## tResult should contain "EOF"
   put the result into tResult
   
   put textDecode(it, "utf8") into tData -- depends on your needs.
   
   ## for debugging check if you have enough lines to cut them down to the number your want
   ## otherwise increase the number of chars per line
   put the number of lines of tData into tNumLines
   
   ## cut down the number of lines to the last 200 of the file
   put line - tNumLinesNeeded to -1 of tData into tData
   
   ## for debugging check
   put the number of lines of tData into tNumAfter
   
   close file tFilePath
   
   ## now do processing your data
   -- code 
   
end mouseUp

Kind regards
Bernd

stam · Post by **stam** » Sat Mar 09, 2024 11:15 pm

Would be interesting how that compares speed wise to the 'dumb' route:

Code: Select all

on mouseUp
    local tFile, tData
    answer file "choose file to read"
    if it is empty then exit mouseUp
    put it into tFile
    put line -200 to -1 of tFile into tData
    // ... process tData
end mouseUp

I'm guessing there won't be much difference at all with smaller text files, but would be very curious to see what happened with files with over 50,000 lines that the OP reports.

Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
Stam

bn · Post by bn » Sun Mar 10, 2024 12:11 am

stam wrote: ↑
Sat Mar 09, 2024 11:15 pm
Would be interesting how that compares speed wise to the 'dumb' route:
Code: Select all
on mouseUp
    local tFile, tData
    answer file "choose file to read"
    if it is empty then exit mouseUp
    put it into tFile
    put line -200 to -1 of tFile into tData
    // ... process tData
end mouseUp
I'm guessing there won't be much difference at all with smaller text files, but would be very curious to see what happened with files with over 50,000 lines that the OP reports.

Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
Stam

I did a comparison using "put url" and "open file" with a text file of 56 GB with about 220,000 lines, a line has about 200 chars.

"put url" clocks in at roughly 190 milliseconds
"open file" clocks in at 1 milliseconds. (using the code above)

This is on a MacBookPro M1 processor and SSD.

190 milliseconds is not that bad considering the amount of data shuffling for "put url".
"open file" uses a lot less memory because it only takes in the specified amount of bytes.

Kind regards
Bernd

bn · Post by bn » Sun Mar 10, 2024 12:29 am

I did another test on a 1 billion numbers version of pi. The file size is 1 GB.
I set the lineDelimiter to 5 and then repeated the tests.

"put url" took just short of 7 seconds, whereas "open file" stayed at 1 millisecond.

(I saved everything before the test because I was not sure how LC would handle the 1 GB file. All went well)

Kind regards
Bernd

LiveCode Forums

Reading and writing text to / from a file

Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file

Re: Reading and writing text to / from a file