c# - String caching. Memory optimization and re-use -
i working on large legacy application handles large amount of string data gathered various sources (ie, names, identifiers, common codes relating business etc). data alone can take 200 meg of ram in application process.
a colleague of mine mentioned 1 possible strategy reduce memory footprint (as lot of individual strings duplicate across data sets), "cache" recurring strings in dictionary , re-use them when required. example…
public class stringcacher() { public readonly dictionary<string, string> _stringcache; public stringcacher() { _stringcache = new dictionary<string, string>(); } public string addorreuse(string stringtocache) { if (_stringcache.containskey(stringtocache) _stringcache[stringtocache] = stringtocache; return _stringcache[stringtocache]; } }
then use caching...
public ienumerable<string> incomingdata() { var stringcache = new stringcacher(); var datalist = new list<string>(); // add data, fair amount of strings same. datalist.add(stringcache.addorreuse("aaaa")); datalist.add(stringcache.addorreuse("bbbb")); datalist.add(stringcache.addorreuse("aaaa")); datalist.add(stringcache.addorreuse("cccc")); datalist.add(stringcache.addorreuse("aaaa")); return datalist; }
as strings immutable , lot of internal work done framework make them work in similar way value types i'm half thinking create copy of each string dictionary , double amount of memory used rather pass reference string stored in dictionary (which colleague assuming).
so taking account run on massive set of string data...
is going save memory, assuming 30% of string values used twice or more?
is assumption work correct?
this string interning is, except don't have worry how works. in example still creating string, comparing it, leaving copy disposed of. .net in runtime.
see string.intern
, optimizing c# string performance (c calvert)
if new string created code (
string goober1 = "foo"; string goober2 = "foo";
) shown in lines 18 , 19, intern table checked. if string in there, both variables point @ same block of memory maintained intern table.
so, don't have roll own - won't provide advantage. edit unless: strings don't live long appdomain - interned strings live lifetime of appdomain, not great gc. if want short lived strings, want pool. string.intern
:
if trying reduce total amount of memory application allocates, keep in mind interning string has 2 unwanted side effects. first, memory allocated interned string objects not released until common language runtime (clr) terminates. reason clr's reference interned string object can persist after application, or application domain, terminates. ...
edit 2 see jon skeets answer here
Comments
Post a Comment