performance - Matlab: Speed-up reading of ascii file -
i wrote piece of code works fine, way slow purposes:
%%% load nodal data %%% path = sprintf('%sfile.dat',directory); fid = fopen(path); num_nodes = textscan(fid,'%s %s %s %s %d',1,'delimiter', ' '); num_nodes = num_nodes{5}; header = textscan(fid,'%s',7,'delimiter', '\t'); k = 0; while ~feof(fid) line = fgetl(fid); [head,rem] = strtok(line,[' ',char(9)]); if head == '#' k = k+1; j = 1; time_steps(k) = sscanf(rem, [' output @ t = %d']); end if ~isempty(head) if head ~= '#' data(j,:,k) = str2num([head rem]); j = j+1; end end end fclose(fid); nodal_data = struct('header',header,'num_nodes',num_nodes,'time_steps',time_steps,'data',data);
the ascii reading matlab looks this:
# number of nodes: 120453 #x y z depth vel_x vel_y wse # output @ t = 0 76456.003 184726 3815.75 0 0 0 3815.75 76636.003 184726 3728.25 0 0 0 3728.25 76816.003 184726 3627 0 0 0 3627 76996.003 184726 3527.75 0 0 0 3527.75 77176.003 184726 3371.5 0 0 0 3371.5 # output @ t = 36000.788 76456.003 184726 3815.75 0 0 0 3815.75 76636.003 184726 3728.25 0 0 0 3728.25 76816.003 184726 3627 0 0 0 3627 76996.003 184726 3527.75 0 0 0 3527.75 77176.003 184726 3371.5 0 0 0 3371.5
while code wrote works files small, blows on me larger ascii files. had abort loading ~25mb ascii (approximately 240k lines), test file. later versions of file ~500mb. there way of speeding process of loading file not happy 3 if-statements, did not know how seperate '#' numbers switch on head, because not able distinguish 'head' class, i.e. trying check either ischar or isnumeric, variable 'head' read string, case of ischar
, never never isnumeric
= true
. not happy using tokenizer @ being able use if-cases , putting line here: str2num([head rem]);
, consumes lot of time. however, did not know how else it. if have useful suggestions of how adapt code, highly appreciate them!
have sunday , thank in advance!
the code below reads approx 70000 timesteps 5 nodes per step in around 7 seconds. of code , should easy enough add features of code. there other ways of doing faster should adequate.
filename = 'd:\temp\input.txt'; filetext = fileread(filename); headerlines = 2; valuesperline = 7; expr = '[^\n]*[^\n]*'; lines = regexp(filetext, expr, 'match'); istimestep = cellfun(@(x) strncmp(x,'#',1), lines ); numtimesteps = sum(istimestep)-headerlines; nodesperstep = ((length(lines)-headerlines) / numtimesteps ) - 1; data = zeros(nodesperstep, valuesperline, numtimesteps); timestep = 1:numtimesteps lineindex = headerlines + (timestep-1) * (nodesperstep + 1) + 2; node = 1:nodesperstep data(node, :, timestep ) = sscanf(lines{lineindex},'%f'); lineindex = lineindex + 1; end end
just tried on 2 million line file (340000 time steps 5 nodes per step) , took approx 36 seconds run.
if want solution doesn't have coded loops, replace code from
data = zeros(....
with
values = cellfun(@(x) sscanf(x,'%f'),lines(~istimestep),'uniformoutput',false); data = reshape(cell2mat(values), nodesperstep, valuesperline, numtimesteps);
but takes 50% longer run.
Comments
Post a Comment