csv - Extract value from link in Powershell -


i have function in powershell gets content of file , breaks fields put csv file. i'm wondering if there's way value link , add columns sent csv file while keeping link column intact.

function convert2csv { (get-content $input_path) -match "href" | % { $data = ($_ -replace '(?:.*)href="(.*?)">date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)</a>(?:.*)' , '$1;$2;$3;$4').split(";") new-object psobject -property @{     "link" = $data[0]     "date" = $data[1]     "time" = $data[2]     "item" = $data[3]     } } #| export-csv $output_file -notypeinformation } 

the value i'm looking either

feeddefault_.*?(&) or _feed.*?(&) 

am correct in thinking can add sort of if statement "link" = $data[0] part?

sample output requested.

value in link   |   link                                                                    |   date        |   time    |   item            | --------------------------------------------------------------------------------------------------------------------------------------------| bluepebbles     |   http://www.domain.com/page.html?feeddefault_bluepebbles&something       |   2013-05-19  |   13:30   | blue pebbles      | --------------------------------------------------------------------------------------------------------------------------------------------| redpebbles      |   http://www.domain.com/page.html?feed_redpebbles&something               |   2013-05-19  |   13:31   | red pebbles       | --------------------------------------------------------------------------------------------------------------------------------------------| 

csv formatted

value in link,link,date,time,item "bluepebbles","http://www.domain.com/page.html?feeddefault_bluepebbles&something","2013-05-19","13:30","blue pebbles" "redpebbles","http://www.domain.com/page.html?feed_redpebbles&something","2013-05-19","13:31","red pebbles" 

so entering in

$input_path = 'f:\mockup\area51\files\link.html' $output_file = 'f:\mockup\area51\files\db_csv.csv'  $tstampculture = [globalization.cultureinfo]::getcultureinfo("en-gb")  $ie = new-object -com "internetexplorer.application" $ie.visible = $false  $ie.navigate("file:///$input_path")  $ie.document.getelementsbytagname("a") | % {   $_.innertext -match 'date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)'   $obj = new-object psobject -property @{     "link" = $_.href     "date" = $matches[1]     "time" = $matches[2]     "item" = $matches[3]   }   if ( $obj.link -match '\?feed(?:default)?_(.*?)&' ) {     $obj | add-member –type "noteproperty" –name "linkvalue" –value $matches[1]   }   $obj } #| export-csv $output_file -notypeinformation 

returns error:

you cannot call method on null-valued expression. @ line:12 char:38 +     $ie.document.getelementsbytagname <<<< ("a") | % { + categoryinfo          : invalidoperation: (getelementsbytagname:string) [], runtimeexception + fullyqualifiederrorid : invokemethodonnull 

so i'm pretty sure messed up. :)

first i'd suggest use -match instead of -replace. resulting $matches array contains submatches you're interested in, there's no need manually create array.

get-content $input_path | ? { $_.contains("href") } | % {   $_ -match 'href="(.*?)">date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)</a>'   $obj = new-object psobject -property @{     "link" = $matches[1]     "date" = $matches[2]     "time" = $matches[3]     "item" = $matches[4]   }   $obj } #| export-csv $output_file -notypeinformation 

the additional information can extracted $obj.link second -match , added custom object via add-member:

if ( $obj.link -match '\?feed(?:default)?_(.*?)&' ) {   $obj | add-member –type "noteproperty" –name "linkvalue" –value $matches[1] } 

also, since input files html files should consider using internetexplorer com object, give far better control on extracted tags processing files line-by-line.

$ie = new-object -com "internetexplorer.application" $ie.visible = $false  $ie.navigate("file:///$input_path") while ( $ie.busy ) { start-sleep -milliseconds 100 }  $ie.document.getelementsbytagname("a") | % {   $_.innertext -match 'date:\s*([\w\.]+)\s*([\w\:]+)\s*item:\s*(.*)'   $obj = new-object psobject -property @{     "link" = $_.href     "date" = $matches[1]     "time" = $matches[2]     "item" = $matches[3]   }   if ( $obj.link -match '\?feed(?:default)?_(.*?)&' ) {     $obj | add-member –type "noteproperty" –name "linkvalue" –value $matches[1]   }   $obj } 

Comments

Popular posts from this blog

SPSS keyboard combination alters encoding -

Add new record to the table by click on the button in Microsoft Access -

CSS3 Transition to highlight new elements created in JQuery -