Reading and writing text to / from a file
Moderators: FourthWorld, heatherlaine, Klaus, kevinmiller, robinmiller
-
- Posts: 52
- Joined: Mon Apr 06, 2020 1:19 pm
- Location: Thailand
- Contact:
Reading and writing text to / from a file
Hi:
I have a file with 50,000 rows of about 25 chars per row. I only need to read roughly the most recent 200 rows until something is true and it saves time not to load the entire file using put ULR "file:filename" into tWhatever
I can append data to the file easily (a line at a time) but then I'd have to read the file backwards - which I can't see how to do.
OR
I can prepend data to the file then read in the normal order, but I can seem to do that without it destroys the previous data.
It's odd I can't find a solution as I'm sure this is a problem that has been faced by so many people so many times, that there is an elegant solution that I am missing. In short, I DO NOT want to load the entire file into memory using the URL command. I just want to read the 200 most recently contributed lines for the purpose of time efficiency - in this case, speed matters.
Bruce
I have a file with 50,000 rows of about 25 chars per row. I only need to read roughly the most recent 200 rows until something is true and it saves time not to load the entire file using put ULR "file:filename" into tWhatever
I can append data to the file easily (a line at a time) but then I'd have to read the file backwards - which I can't see how to do.
OR
I can prepend data to the file then read in the normal order, but I can seem to do that without it destroys the previous data.
It's odd I can't find a solution as I'm sure this is a problem that has been faced by so many people so many times, that there is an elegant solution that I am missing. In short, I DO NOT want to load the entire file into memory using the URL command. I just want to read the 200 most recently contributed lines for the purpose of time efficiency - in this case, speed matters.
Bruce
-
- Livecode Opensource Backer
- Posts: 9455
- Joined: Fri Feb 19, 2010 10:17 am
- Location: Bulgaria
Re: Reading and writing text to / from a file
This is extremely easy, and I have answered your question on an alternative forum which LiveCode do not want me to mention over here.
-
- Posts: 52
- Joined: Mon Apr 06, 2020 1:19 pm
- Location: Thailand
- Contact:
Re: Reading and writing text to / from a file
Thanks. I found your answer after a bit of confusion.
Last edited by bbalmerTotalFluency on Sat Mar 09, 2024 10:14 am, edited 1 time in total.
Re: Reading and writing text to / from a file
Hi Bruce,
look up these terms in the dictionary:
open file
read from file
close file
Best
Klaus
look up these terms in the dictionary:
open file
read from file
close file
Best
Klaus
Re: Reading and writing text to / from a file
I'm not convinced that open file for read, then using read from file and then figuring out where lines -200 to -1 are to read those in will be particular faster than using the put URL method. I can't see a way to automatically figure out what the last 200 lines will be and possibly counting will be slower.
I say this because the Dictionary on Read From File states:
And to be honest, both functions are highly optimised and I don't see a problem with either...
I say this because the Dictionary on Read From File states:
Which is basically what put URL (file://...) does.Summary: Takes data from a file that has been opened with the open file command, and places the data in the it variable.
And to be honest, both functions are highly optimised and I don't see a problem with either...
-
- VIP Livecode Opensource Backer
- Posts: 9857
- Joined: Sat Apr 08, 2006 7:05 am
- Location: Los Angeles
- Contact:
Re: Reading and writing text to / from a file
What is the evaluation test you're performing? It may also be useful to see a representative sample of data that both satisfies and doesn't satisfy the test.bbalmerTotalFluency wrote: ↑Fri Mar 08, 2024 9:15 amI have a file with 50,000 rows of about 25 chars per row. I only need to read roughly the most recent 200 rows until something is true and it saves time not to load the entire file using put ULR "file:filename" into tWhatever
And to double check assumptions, it's only the single most recent occurrence of the element that satisfies the test that you're looking for, is that correct?
Richard Gaskin
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
LiveCode development, training, and consulting services: Fourth World Systems
LiveCode Group on Facebook
LiveCode Group on LinkedIn
Re: Reading and writing text to / from a file
"open file..." and "read from file...FOR..." gives us a way to define how much we want to read from the file,stam wrote: ↑Fri Mar 08, 2024 11:16 amI'm not convinced that open file for read, then using read from file and then figuring out where lines -200 to -1 are to read those in will be particular faster than using the put URL method. I can't see a way to automatically figure out what the last 200 lines will be and possibly counting will be slower.
I say this because the Dictionary on Read From File states:Which is basically what put URL (file://...) does.Summary: Takes data from a file that has been opened with the open file command, and places the data in the it variable.
And to be honest, both functions are highly optimised and I don't see a problem with either...
where "put url("file:"...) will read the complete file into memory!
Re: Reading and writing text to / from a file
So what if it's all in memory? We're not running on 256 Kb any more, and haven't done for decades. It's speed that matters.
The question the OP asked (unless I'm mistaken) is how to access the last 200 lines of a text file with ~50,000 lines.
Maybe I'm not seeing it, but how would you do that with read from file FOR?
Yes you can start reading from a certain character number onwards until EOF, but how will you know which character that is without iterating the it variable. Will that not defeat the purpose of this?
And remember read from file puts the text in the it variable, so if you are iterating you still need to load the entire file into memory
I accept I may be missing the obvious coding magic that allows read from file to read the last 200 lines and if I am please do post the answer, and even better post some speed comparisons.Dictionary on Read From File wrote:Summary: Takes data from a file that has been opened with the open file command, and places the data in the it variable
But based on what I see in the dictionary I don't see how without iterating, in which case you'd do better reading the whole thing into a variable and getting lines -200 to -1.
-
- Posts: 52
- Joined: Mon Apr 06, 2020 1:19 pm
- Location: Thailand
- Contact:
Re: Reading and writing text to / from a file
Hi OP here:
The heart of the matter is being discussed.
It isn't a memory constraint (fortunately). But let's be extreme for clarity. Let's say I have temperature records like this
2024-03-08,Calgary,30C
2024-03-08,London,12C
etc. And I have such records for each of 1,000 cities. So I have 1,000 records per day. Each day, I APPEND the records to the text file.
Now, I want to interrogate the file ONLY for today's records (yes, I could do this with a db but it is being done with text files at present). Let's say I have 10 years of records so 365 days * 10 years * 1000 cities = almost 4M rows.
I don't want the computer to have to read the first nearly 4M records just to ignore them because they aren't from today. What I want - and maybe this isn't possible because of the way computer systems are architected (not my area of expertise) - is to start at EOF (can I get there instantly or is the computer having to traverse all the records either way) and go back line by line until the line does not start with today's date. That would potentially be as quick as just reading the first 1,000 records.
I can solve the problem, of course in multiple ways. I can use put URL and then get the last 200 lines, I can create a file just for today and tomorrow append that file to the main records file. The nature of my question is - is there an elegant way using the built in cleverness of the OS and livecode or must this be resolved inelegantly? What's the difference? In this case, just being able to read 1,000 lines would be a great deal faster than reading in 4M lines. Again - could it be done better with a DB? Yes. Is it being done with a DB? No.
If my question is confusing it may be that to more qualified people my question seems insane. I can only SEEM to add lines efficiently to a text file by APPENDING them - meaning the most recent records are at the end of the file. I can only SEEM to read a file efficiently from the start. That those two are in opposition is the problem I would like to solve.
P.S. I appreciate that I could sort the file in reverse order once I had appended the records. Unfortunately for me, they don't come in all at the same time in a single batch - which would mean I could just sort all 4M records once, it would happen over and over and again it would be slow. But as I say, all sorts of CLEVER solutions could be used, I'm looking for simple, elegant and standard, clever I can do.
What also has me puzzled is there must be millions of people around the world with a similar problem and usually those ones have and simple, elegant and standard solution (maybe that solution is a database, but that isn't a choice right now).
The heart of the matter is being discussed.
It isn't a memory constraint (fortunately). But let's be extreme for clarity. Let's say I have temperature records like this
2024-03-08,Calgary,30C
2024-03-08,London,12C
etc. And I have such records for each of 1,000 cities. So I have 1,000 records per day. Each day, I APPEND the records to the text file.
Now, I want to interrogate the file ONLY for today's records (yes, I could do this with a db but it is being done with text files at present). Let's say I have 10 years of records so 365 days * 10 years * 1000 cities = almost 4M rows.
I don't want the computer to have to read the first nearly 4M records just to ignore them because they aren't from today. What I want - and maybe this isn't possible because of the way computer systems are architected (not my area of expertise) - is to start at EOF (can I get there instantly or is the computer having to traverse all the records either way) and go back line by line until the line does not start with today's date. That would potentially be as quick as just reading the first 1,000 records.
I can solve the problem, of course in multiple ways. I can use put URL and then get the last 200 lines, I can create a file just for today and tomorrow append that file to the main records file. The nature of my question is - is there an elegant way using the built in cleverness of the OS and livecode or must this be resolved inelegantly? What's the difference? In this case, just being able to read 1,000 lines would be a great deal faster than reading in 4M lines. Again - could it be done better with a DB? Yes. Is it being done with a DB? No.
If my question is confusing it may be that to more qualified people my question seems insane. I can only SEEM to add lines efficiently to a text file by APPENDING them - meaning the most recent records are at the end of the file. I can only SEEM to read a file efficiently from the start. That those two are in opposition is the problem I would like to solve.
P.S. I appreciate that I could sort the file in reverse order once I had appended the records. Unfortunately for me, they don't come in all at the same time in a single batch - which would mean I could just sort all 4M records once, it would happen over and over and again it would be slow. But as I say, all sorts of CLEVER solutions could be used, I'm looking for simple, elegant and standard, clever I can do.
What also has me puzzled is there must be millions of people around the world with a similar problem and usually those ones have and simple, elegant and standard solution (maybe that solution is a database, but that isn't a choice right now).
Re: Reading and writing text to / from a file
Clever is not to have millions of records in the same record file. Work on a separate file for today's data. Give it a separate filename based on the date. If you need to aggregate data you can append to a master archive file. Or just combine specific files in a given date range when needed.
Or in a database.
Or in a database.
Re: Reading and writing text to / from a file
I second that.
If you're appending records to the end of a text file, sooner or later you'll run into trouble.
Why not use an SQLite database? Much more efficient...
If you're appending records to the end of a text file, sooner or later you'll run into trouble.
Why not use an SQLite database? Much more efficient...
-
- VIP Livecode Opensource Backer
- Posts: 4028
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Reading and writing text to / from a file
Bruce,
Since you insist you could try this:
Kind regards
Bernd
Since you insist you could try this:
Code: Select all
on mouseUp
## 25 chars per row
## lets use 30 chars to be sure
put 200 * 30 into tCharsNeeded
## the number of lines you want from the end of the file
put 200 into tNumLinesNeeded
answer file "choose file to read"
if it is empty then exit mouseUp
put it into tfilePath
open file tFilePath for text read
read from file tfilePath at - tCharsNeeded until eof
## tResult should contain "EOF"
put the result into tResult
put textDecode(it, "utf8") into tData -- depends on your needs.
## for debugging check if you have enough lines to cut them down to the number your want
## otherwise increase the number of chars per line
put the number of lines of tData into tNumLines
## cut down the number of lines to the last 200 of the file
put line - tNumLinesNeeded to -1 of tData into tData
## for debugging check
put the number of lines of tData into tNumAfter
close file tFilePath
## now do processing your data
-- code
end mouseUp
Bernd
Re: Reading and writing text to / from a file
Would be interesting how that compares speed wise to the 'dumb' route:
I'm guessing there won't be much difference at all with smaller text files, but would be very curious to see what happened with files with over 50,000 lines that the OP reports.
Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
Stam
Code: Select all
on mouseUp
local tFile, tData
answer file "choose file to read"
if it is empty then exit mouseUp
put it into tFile
put line -200 to -1 of tFile into tData
// ... process tData
end mouseUp
Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
Stam
-
- VIP Livecode Opensource Backer
- Posts: 4028
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Reading and writing text to / from a file
I did a comparison using "put url" and "open file" with a text file of 56 GB with about 220,000 lines, a line has about 200 chars.stam wrote: ↑Sat Mar 09, 2024 11:15 pmWould be interesting how that compares speed wise to the 'dumb' route:I'm guessing there won't be much difference at all with smaller text files, but would be very curious to see what happened with files with over 50,000 lines that the OP reports.Code: Select all
on mouseUp local tFile, tData answer file "choose file to read" if it is empty then exit mouseUp put it into tFile put line -200 to -1 of tFile into tData // ... process tData end mouseUp
Ultimately however I still think an sqlite database would be a) safer, b) quicker with the number of records discussed here.
Stam
"put url" clocks in at roughly 190 milliseconds
"open file" clocks in at 1 milliseconds. (using the code above)
This is on a MacBookPro M1 processor and SSD.
190 milliseconds is not that bad considering the amount of data shuffling for "put url".
"open file" uses a lot less memory because it only takes in the specified amount of bytes.
Kind regards
Bernd
-
- VIP Livecode Opensource Backer
- Posts: 4028
- Joined: Sun Jan 07, 2007 9:12 pm
- Location: Bochum, Germany
Re: Reading and writing text to / from a file
I did another test on a 1 billion numbers version of pi. The file size is 1 GB.
I set the lineDelimiter to 5 and then repeated the tests.
"put url" took just short of 7 seconds, whereas "open file" stayed at 1 millisecond.
(I saved everything before the test because I was not sure how LC would handle the 1 GB file. All went well)
Kind regards
Bernd
I set the lineDelimiter to 5 and then repeated the tests.
"put url" took just short of 7 seconds, whereas "open file" stayed at 1 millisecond.
(I saved everything before the test because I was not sure how LC would handle the 1 GB file. All went well)
Kind regards
Bernd