Work flow automation with bash script?

You have a problem with Salix? Post here and we'll do what we can to help.
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

Indeed, gapan is right, the script almost certainly needs debugging on a realistic sample.

No need for the lynx stage if you use the Python script. It extracts well-formed email addresses irrespective of how they are formatted. However, it does assume certain things which may cause it to fail with some formats, which is why I suggested trying it out with all the formats google docs downloads to. If not, tweaking it will just be a matter of telling it to replace some more characters with whitespace. Or it might become quite robust and general if I rewrote it to check the pieces it throws away for further addresses. At the moment it will fail on something like this:

foo.bar@salix.com#$%&bar.foo@ubuntu.com

but to sum up, easily fixed, especially if you canpost a sample :)
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

gapan wrote:It would help a lot if you posted part of that dump. You can edit the contact details before posting, so that the real ones don't get published here.
I thought about that as well. I tried opening it in Kate, but Kate chokes on it. Geany just gives me the info I posted. The same goes for Leafpad and nano. When I open it in VIm, I only get a bunch of html style code. The only way I have found to view the contents is by opening the file in Firefox. Very weird. In Lynx, it looks like this:

Code: Select all

A2 ny 7 Apple Pear C.Th. Zzzz St. 4, 3.th 2300 Kbh. S         
35362026 abc@humanrights.dk 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 193      
                                                                                 .                                                                          
A2 0 7 Apple Orange 26 Diddely 4690 Haslev 5639 9050        
5578 8888 abc@nhs.dk
They both appear to be recognized by Lynx as e-mail addresses. When I search in Lynx for an e-mail address that I know isn't formatted properly, I get "unknown or ambiguous command as a response.
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

gapan wrote:It would help a lot if you posted part of that dump. You can edit the contact details before posting, so that the real ones don't get published here.
There isn't anything in the .xls file, just normal cells in a spreadsheet, in this case with some e-mail addresses formatted as "mailto:" hyperlinks and others not formatted. The picture is more mixed with the .csv file. Here is what an address looks like that isn't formatted as a "mailto:" hyperlink:

Code: Select all

,abc@politik.dk,,
I believe that the two following addresses are both formatted as "mailto:" hyperlinks:

Code: Select all

,,,,,,,,,abc@mail.dk

,,,,,,abc.def@jura.dk,professor,,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,242
I am not sure why they look differently. Note: the abc@mail.dk address doesn't have any tailing comas.
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
gapan
Salix Wizard
Posts: 6368
Joined: 6. Jun 2009, 17:40

Re: Work flow automation with bash script?

Post by gapan »

Well, here's a quick and dirty sed sequence that cleans up addresses on that sample csv.

Code: Select all

cat file.csv |grep "@" | \
sed "s/.*[,^]\(.*\)@\(.*\)/\1@\2/" |sed "s/,/__FOO__/"| \
sed "s/\(.*\)__FOO__.*/\1/"
I'm assuming that the "__FOO__" string is nowhere in your file.

edit: I've added a grep in there, so it only keeps lines with addresses.
Image
Image
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

sed is more succinct than Python :lol:

... but it is harder to read ;)
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

gapan wrote:I've added a grep in there, so it only keeps lines with addresses.
Cheers, that seems to work. I then went into Vim and ran the following to format the e-mails so that I can just copy - paste into a BCC: line.

Code: Select all

%s/\n/, /g
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

mimosa wrote:sed is more succinct than Python :lol:

... but it is harder to read ;)
I have to agree with both of you - and as I have yet to learn either one... :o I am pretty much at your mercy :)
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

I have added a line to turn commas into whitespace:

http://pastebin.com/tDG6XgAA

All you should now need to do is download the data from Google to somefile.csv, execute

Code: Select all

$bcc.py somefile.csv
and the file bcc.txt should contain a list of addresses separated by commas, ready to paste into your email bcc field.

The script assumes:

1)email addresses contain only alphanumeric characters and full stops with a "@" somewhere in the middle (technically, you can put all sorts of strange stuff in an email address, but I've never seen one that did)
2)they are separated in the input file by spaces, newlines, or commas

It might be an idea to add tabs to the items in 2). But let me know if it works like this!
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

Code: Select all

vanilla[bin]$ cat raw.txt
,,,,,,,,,abc@mail.dk

,,,,,,abc.def@jura.dk,professor,,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,242MAILTO; 

vanilla[bin]$ bcc.py raw.txt
vanilla[bin]$ cat bcc.txt
abc@mail.dk, abc.def@jura.dk
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

Code: Select all

$ python bcc.py some_file.csv > 1some_file.csv
Traceback (most recent call last):
  File "bcc.py", line 88, in <module>
    main()
  File "bcc.py", line 23, in main
    address = wellFormed(address)      #strip it of forbidden elements
  File "bcc.py", line 44, in wellFormed
    user, domain = possAddress.split("@")  #divide into user and domain
ValueError: too many values to unpack
Military justice is to justice what military music is to music. - Groucho Marx
Post Reply