I don't think it's causing the problem, but you shouldn't call the script like that. The output file is always bcc.txt. I'm surprised it didn't complain about too many arguments.
What's happening is a situation like the one I mentioned where you have foo@bar.com[some forbidden characters but no spaces, newlines or commas]bar@foo.com. So you have two "@"s and the script can't cope. It's an easy matter to strip out more of the forbidden characters - but if you can provide a bigger sample, it might offer a pointer as to which are needed.
I'd also like to rewrite it more robustly so it can cope with pretty much any input, but that may take a couple of hours, whereas the stripping out is thirty seconds.
Work flow automation with bash script?
Re: Work flow automation with bash script?
Try this:
http://pastebin.com/JzqXekcX
As well as adding a few more forbidden characters to strip out, I added underscore as a permitted character (oops!) and '-' for good measure, though I've never seen this.
http://pastebin.com/JzqXekcX
As well as adding a few more forbidden characters to strip out, I added underscore as a permitted character (oops!) and '-' for good measure, though I've never seen this.
- globetrotterdk
- Posts: 435
- Joined: 26. Oct 2010, 13:57
- Location: Denmark
Re: Work flow automation with bash script?
Code: Select all
$ python bcc.py 1rpf_medlemsliste.csv
Traceback (most recent call last):
File "bcc.py", line 88, in <module>
main()
File "bcc.py", line 23, in main
address = wellFormed(address) #strip it of forbidden elements
File "bcc.py", line 44, in wellFormed
user, domain = possAddress.split("@") #divide into user and domain
ValueError: too many values to unpack
Code: Select all
A2,,ny 7,Apple Fruit Brown,,"C.Th. Fruity St., 3.th",,2300 Kbh. S,35362026,,,,,,,,abc@humanrights.dk,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,193
A2,0,7,Apple Grapes,,Brown Fruit St. 26,Derailed,4690 Hassel,5639 9050,,,,,5578 8888,,,abc@mn.dk
A2,0,8-150,Apple Pear Strawberry,,,Hassel,8210 Worse Off V,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1,194
A2,,9-140,Apple Dragonfruit,,"Alpine Way 3, 1",,2300 Kbh. S,32847209,26346074,,,,,,,abc@drip.com,Jurastud KU,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,191
A2,0,8-400,Apple Espanf,,"Black Hole 23, 1.th",,2300 København P
A2,0,9-400,Apple Strudle Rocky St. 14 B,Solvangstrup,8668 2423,,,,,8942 1359,,,abc@jura.dk,,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,195
Military justice is to justice what military music is to music. - Groucho Marx
Re: Work flow automation with bash script?
Of *course*! Well, that's the advantage of live testing!I wonder if you could be hitting a snag with the Danish æøåÆØÅ

That raises the even knottier problem of addresses in different scripts, such as Greek, Cyrillic, even Chinese ... or does everyone just use the Roman alphabet?
You've got a working solution from Gapan so I won't bother you any more with this little project for now; but I'll let you know if I come up with something more robust and general. It's stimulating to have a real-world problem to get stuck into!
EDIT
It works with that sample too:
Code: Select all
vanilla[bin]$ bcc.py raw.txt
vanilla[bin]$ cat bcc.txt
abc@mail.dk, abc.def@jura.dk, abc@humanrights.dk, abc@mn.dk, abc@drip.com, abc@jura.dk
- globetrotterdk
- Posts: 435
- Joined: 26. Oct 2010, 13:57
- Location: Denmark
Re: Work flow automation with bash script?
Many thanks gapan.gapan wrote:Well, here's a quick and dirty sed sequence that cleans up addresses on that sample csv.I'm assuming that the "__FOO__" string is nowhere in your file.Code: Select all
cat file.csv |grep "@" | \ sed "s/.*[,^]\(.*\)@\(.*\)/\1@\2/" |sed "s/,/__FOO__/"| \ sed "s/\(.*\)__FOO__.*/\1/"
Military justice is to justice what military music is to music. - Groucho Marx