Work flow automation with bash script?

You have a problem with Salix? Post here and we'll do what we can to help.
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

I don't think it's causing the problem, but you shouldn't call the script like that. The output file is always bcc.txt. I'm surprised it didn't complain about too many arguments.

What's happening is a situation like the one I mentioned where you have foo@bar.com[some forbidden characters but no spaces, newlines or commas]bar@foo.com. So you have two "@"s and the script can't cope. It's an easy matter to strip out more of the forbidden characters - but if you can provide a bigger sample, it might offer a pointer as to which are needed.

I'd also like to rewrite it more robustly so it can cope with pretty much any input, but that may take a couple of hours, whereas the stripping out is thirty seconds.
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

Try this:
http://pastebin.com/JzqXekcX

As well as adding a few more forbidden characters to strip out, I added underscore as a permitted character (oops!) and '-' for good measure, though I've never seen this.
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

Code: Select all

$ python bcc.py 1rpf_medlemsliste.csv
Traceback (most recent call last):
  File "bcc.py", line 88, in <module>
    main()
  File "bcc.py", line 23, in main
    address = wellFormed(address)      #strip it of forbidden elements
  File "bcc.py", line 44, in wellFormed
    user, domain = possAddress.split("@")  #divide into user and domain
ValueError: too many values to unpack
Here are some more examples:

Code: Select all

A2,,ny 7,Apple Fruit Brown,,"C.Th. Fruity St., 3.th",,2300 Kbh. S,35362026,,,,,,,,abc@humanrights.dk,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,193
A2,0,7,Apple Grapes,,Brown Fruit St. 26,Derailed,4690 Hassel,5639 9050,,,,,5578 8888,,,abc@mn.dk
A2,0,8-150,Apple Pear Strawberry,,,Hassel,8210 Worse Off V,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,194
A2,,9-140,Apple Dragonfruit,,"Alpine Way 3, 1",,2300 Kbh. S,32847209,26346074,,,,,,,abc@drip.com,Jurastud KU,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,191
A2,0,8-400,Apple Espanf,,"Black Hole 23, 1.th",,2300 København P
A2,0,9-400,Apple Strudle Rocky St. 14 B,Solvangstrup,8668 2423,,,,,8942 1359,,,abc@jura.dk,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,195
Edit: I wonder if you could be hitting a snag with the Danish æøåÆØÅ?
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

I wonder if you could be hitting a snag with the Danish æøåÆØÅ
Of *course*! Well, that's the advantage of live testing! :) Nonetheless I wouldn't expect it to fail like that - just mess up those addresses.

That raises the even knottier problem of addresses in different scripts, such as Greek, Cyrillic, even Chinese ... or does everyone just use the Roman alphabet?

You've got a working solution from Gapan so I won't bother you any more with this little project for now; but I'll let you know if I come up with something more robust and general. It's stimulating to have a real-world problem to get stuck into!

EDIT

It works with that sample too:

Code: Select all

vanilla[bin]$ bcc.py raw.txt
vanilla[bin]$ cat bcc.txt
abc@mail.dk, abc.def@jura.dk, abc@humanrights.dk, abc@mn.dk, abc@drip.com, abc@jura.dk
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

gapan wrote:Well, here's a quick and dirty sed sequence that cleans up addresses on that sample csv.

Code: Select all

cat file.csv |grep "@" | \
sed "s/.*[,^]\(.*\)@\(.*\)/\1@\2/" |sed "s/,/__FOO__/"| \
sed "s/\(.*\)__FOO__.*/\1/"
I'm assuming that the "__FOO__" string is nowhere in your file.
Many thanks gapan.
Military justice is to justice what military music is to music. - Groucho Marx
Post Reply