Page 4 of 4

Re: Work flow automation with bash script?

Posted: 4. Mar 2012, 12:39
by mimosa
I don't think it's causing the problem, but you shouldn't call the script like that. The output file is always bcc.txt. I'm surprised it didn't complain about too many arguments.

What's happening is a situation like the one I mentioned where you have foo@bar.com[some forbidden characters but no spaces, newlines or commas]bar@foo.com. So you have two "@"s and the script can't cope. It's an easy matter to strip out more of the forbidden characters - but if you can provide a bigger sample, it might offer a pointer as to which are needed.

I'd also like to rewrite it more robustly so it can cope with pretty much any input, but that may take a couple of hours, whereas the stripping out is thirty seconds.

Re: Work flow automation with bash script?

Posted: 4. Mar 2012, 12:56
by mimosa
Try this:
http://pastebin.com/JzqXekcX

As well as adding a few more forbidden characters to strip out, I added underscore as a permitted character (oops!) and '-' for good measure, though I've never seen this.

Re: Work flow automation with bash script?

Posted: 4. Mar 2012, 15:25
by globetrotterdk

Code: Select all

$ python bcc.py 1rpf_medlemsliste.csv
Traceback (most recent call last):
  File "bcc.py", line 88, in <module>
    main()
  File "bcc.py", line 23, in main
    address = wellFormed(address)      #strip it of forbidden elements
  File "bcc.py", line 44, in wellFormed
    user, domain = possAddress.split("@")  #divide into user and domain
ValueError: too many values to unpack
Here are some more examples:

Code: Select all

A2,,ny 7,Apple Fruit Brown,,"C.Th. Fruity St., 3.th",,2300 Kbh. S,35362026,,,,,,,,abc@humanrights.dk,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,193
A2,0,7,Apple Grapes,,Brown Fruit St. 26,Derailed,4690 Hassel,5639 9050,,,,,5578 8888,,,abc@mn.dk
A2,0,8-150,Apple Pear Strawberry,,,Hassel,8210 Worse Off V,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,194
A2,,9-140,Apple Dragonfruit,,"Alpine Way 3, 1",,2300 Kbh. S,32847209,26346074,,,,,,,abc@drip.com,Jurastud KU,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,191
A2,0,8-400,Apple Espanf,,"Black Hole 23, 1.th",,2300 København P
A2,0,9-400,Apple Strudle Rocky St. 14 B,Solvangstrup,8668 2423,,,,,8942 1359,,,abc@jura.dk,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,195
Edit: I wonder if you could be hitting a snag with the Danish æøåÆØÅ?

Re: Work flow automation with bash script?

Posted: 4. Mar 2012, 15:39
by mimosa
I wonder if you could be hitting a snag with the Danish æøåÆØÅ
Of *course*! Well, that's the advantage of live testing! :) Nonetheless I wouldn't expect it to fail like that - just mess up those addresses.

That raises the even knottier problem of addresses in different scripts, such as Greek, Cyrillic, even Chinese ... or does everyone just use the Roman alphabet?

You've got a working solution from Gapan so I won't bother you any more with this little project for now; but I'll let you know if I come up with something more robust and general. It's stimulating to have a real-world problem to get stuck into!

EDIT

It works with that sample too:

Code: Select all

vanilla[bin]$ bcc.py raw.txt
vanilla[bin]$ cat bcc.txt
abc@mail.dk, abc.def@jura.dk, abc@humanrights.dk, abc@mn.dk, abc@drip.com, abc@jura.dk

Re: Work flow automation with bash script?

Posted: 4. Mar 2012, 17:41
by globetrotterdk
gapan wrote:Well, here's a quick and dirty sed sequence that cleans up addresses on that sample csv.

Code: Select all

cat file.csv |grep "@" | \
sed "s/.*[,^]\(.*\)@\(.*\)/\1@\2/" |sed "s/,/__FOO__/"| \
sed "s/\(.*\)__FOO__.*/\1/"
I'm assuming that the "__FOO__" string is nowhere in your file.
Many thanks gapan.