csv - Extract value from link in Powershell -
i have function in powershell gets content of file , breaks fields put csv file. i'm wondering if there's way value link , add columns sent csv file while keeping link column intact.
function convert2csv { (get-content $input_path) -match "href" | % { $data = ($_ -replace '(?:.*)href="(.*?)">date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)</a>(?:.*)' , '$1;$2;$3;$4').split(";") new-object psobject -property @{ "link" = $data[0] "date" = $data[1] "time" = $data[2] "item" = $data[3] } } #| export-csv $output_file -notypeinformation }
the value i'm looking either
feeddefault_.*?(&) or _feed.*?(&)
am correct in thinking can add sort of if statement "link" = $data[0] part?
sample output requested.
value in link | link | date | time | item | --------------------------------------------------------------------------------------------------------------------------------------------| bluepebbles | http://www.domain.com/page.html?feeddefault_bluepebbles&something | 2013-05-19 | 13:30 | blue pebbles | --------------------------------------------------------------------------------------------------------------------------------------------| redpebbles | http://www.domain.com/page.html?feed_redpebbles&something | 2013-05-19 | 13:31 | red pebbles | --------------------------------------------------------------------------------------------------------------------------------------------|
csv formatted
value in link,link,date,time,item "bluepebbles","http://www.domain.com/page.html?feeddefault_bluepebbles&something","2013-05-19","13:30","blue pebbles" "redpebbles","http://www.domain.com/page.html?feed_redpebbles&something","2013-05-19","13:31","red pebbles"
so entering in
$input_path = 'f:\mockup\area51\files\link.html' $output_file = 'f:\mockup\area51\files\db_csv.csv' $tstampculture = [globalization.cultureinfo]::getcultureinfo("en-gb") $ie = new-object -com "internetexplorer.application" $ie.visible = $false $ie.navigate("file:///$input_path") $ie.document.getelementsbytagname("a") | % { $_.innertext -match 'date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)' $obj = new-object psobject -property @{ "link" = $_.href "date" = $matches[1] "time" = $matches[2] "item" = $matches[3] } if ( $obj.link -match '\?feed(?:default)?_(.*?)&' ) { $obj | add-member –type "noteproperty" –name "linkvalue" –value $matches[1] } $obj } #| export-csv $output_file -notypeinformation
returns error:
you cannot call method on null-valued expression. @ line:12 char:38 + $ie.document.getelementsbytagname <<<< ("a") | % { + categoryinfo : invalidoperation: (getelementsbytagname:string) [], runtimeexception + fullyqualifiederrorid : invokemethodonnull
so i'm pretty sure messed up. :)
first i'd suggest use -match
instead of -replace
. resulting $matches
array contains submatches you're interested in, there's no need manually create array.
get-content $input_path | ? { $_.contains("href") } | % { $_ -match 'href="(.*?)">date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)</a>' $obj = new-object psobject -property @{ "link" = $matches[1] "date" = $matches[2] "time" = $matches[3] "item" = $matches[4] } $obj } #| export-csv $output_file -notypeinformation
the additional information can extracted $obj.link
second -match
, added custom object via add-member
:
if ( $obj.link -match '\?feed(?:default)?_(.*?)&' ) { $obj | add-member –type "noteproperty" –name "linkvalue" –value $matches[1] }
also, since input files html files should consider using internetexplorer
com object, give far better control on extracted tags processing files line-by-line.
$ie = new-object -com "internetexplorer.application" $ie.visible = $false $ie.navigate("file:///$input_path") while ( $ie.busy ) { start-sleep -milliseconds 100 } $ie.document.getelementsbytagname("a") | % { $_.innertext -match 'date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)' $obj = new-object psobject -property @{ "link" = $_.href "date" = $matches[1] "time" = $matches[2] "item" = $matches[3] } if ( $obj.link -match '\?feed(?:default)?_(.*?)&' ) { $obj | add-member –type "noteproperty" –name "linkvalue" –value $matches[1] } $obj }
Comments
Post a Comment