Work flow automation with bash script?

You have a problem with Salix? Post here and we'll do what we can to help.
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Work flow automation with bash script?

Post by globetrotterdk »

I could use some help to automate the following work flow – maybe with a bash script. I am doing some volunteer work for an organization. Part of the work is sending out e-mails to the members of the organization, to inform them of a major event. I use the e-mail addresses from the latest update of the membership list, which is on a spreadsheet. The membership changes regularly, so each time I send an e-mail, I need to do the following:

1) Download a copy of the spreadsheet (the organization is testing Google Docs at this time).
2) Open the spreadsheet in LibreOffice and save it as an HTML file.
3) Use Lynx to find the e-mail addresses and save them to a file.
4) Remove the spaces before each e-mail in the resulting column.
5) Replace the “new line” character after each e-mail address with a comma and a space, so all of the addresses are on one line.
6) Copy the email addresses
7) Open Alpine (I use re-Alpine), and paste the e-mail addresses into the BCC line.

At this time, I use Lynx and Vim to prepare the data, ready to copy into Alpine:

Code: Select all

lynx -dump -force-html some_file.html | sed -n 's/^ *[0-9]*\. //p' | fgrep "mailto:"

Code: Select all

:%s/\n/, /g
Everything else is done manually.
Any suggestions how I could automate the work flow?
Military justice is to justice what military music is to music. - Groucho Marx
Shador
Posts: 1295
Joined: 11. Jun 2009, 14:04
Location: Bavaria

Re: Work flow automation with bash script?

Post by Shador »

globetrotterdk wrote:1) Download a copy of the spreadsheet (the organization is testing Google Docs at this time).
wget/curl possibly with some cookie/authentication/referrer magic
globetrotterdk wrote:2) Open the spreadsheet in LibreOffice and save it as an HTML file.
There seem to be some commandline tools around to convert xls to csv (comma-separated table). csv is very straightforward to parse.
globetrotterdk wrote:3) Use Lynx to find the e-mail addresses and save them to a file.
4) Remove the spaces before each e-mail in the resulting column.
5) Replace the “new line” character after each e-mail address with a comma and a space, so all of the addresses are on one line.
If using csv those steps would change. Probably only some sed/cut magic anymore. Possibly a bash/python/... script to parse the data properly.
globetrotterdk wrote:6) Copy the email addresses
7) Open Alpine (I use re-Alpine), and paste the e-mail addresses into the BCC line.
Setup mail (mailx package I think) and use it to send out the mails.
Image
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

Your task can be analysed as follows:

1) Get the data

2) Reformat it so it is usable as a plain list of email addresses

3) Email some text to those addresses

You've already solved 2), which is the hard part. However at the moment you method involves an intermediate stage via HTML - on the principle, first make the problem simpler, *then* solve it. Shador's suggestion is to use csv as the intermediate format instead.

1) and 3) are easy (though 3) is not that demanding to do manually). Doing email "the hard way" can be very complex, but not if all you want to do is send out mail. By doing all tasks with command line tools, you make it scriptable. Although it may be possible to use a script to interact with apps such as Alpine or even Libre, it's going to be fiddly and maybe a bit fragile.
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

Shador wrote:
globetrotterdk wrote:1) Download a copy of the spreadsheet (the organization is testing Google Docs at this time).
wget/curl possibly with some cookie/authentication/referrer magic
GoogleCL is in Sourcery. Maybe that would be the way to go?
Shador wrote:
globetrotterdk wrote:2) Open the spreadsheet in LibreOffice and save it as an HTML file.
There seem to be some commandline tools around to convert xls to csv (comma-separated table). csv is very straightforward to parse.
The problem here is that there is a lot of other info like names, addresses, etc. that is also in this file. I only need the "mailto:"
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

The problem here is that there is a lot of other info like names, addresses, etc. that is also in this file. I only need the "mailto:"
Right, so you will need to process the text some more, using sed or a script. But this is perfectly feasible.

One thing worth thinking about is robustness. You don't want a setup that works if the data is exactly in the form you expect, but falls apart if, say, there's one malformed email address in the list. Your script could check the data after extraction, too.
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

Cool. GoogleCL can access Google Docs conversion mechanism (API?) so that it is possible to do the following:

Code: Select all

$ google docs get --title "some_document*" ~/downloads/some_document.html
This also works with .csv
Personally, it seems to me that an .html file has some advantages over a .csv file, if malformed e-mail addresses are a concern. In most cases, "mailto:" will be simple to extract with

Code: Select all

lynx -dump -force-html some_file.html | sed -n 's/^ *[0-9]*\. //p' | fgrep "mailto:"
whereas not all e-mails are delimited with comma just before and a comma just after, with "@" in the middle. Some e-mails in the converted .csv file appear to have a "new line" or a "space" character after the address.
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

Looks like you're nearly there: Google CL does what I called 1), you can keep your existing 2), that leaves 3). Unless the volume of emails to send out is high, there might even be a case for leaving that as a manual step - that way you get to do a sanity check before sending.

EDIT

I imagine you've reached this point, but I can confirm it works here:

Code: Select all

vanilla[~]$ google docs get --title renda_tim_2010 ~/
Downloading renda_tim_2010 to /home/vanilla/renda_tim_2010.xls
vanilla[~]$ google docs get --title renda_tim_2010 ~/ --format csv
Downloading renda_tim_2010 to /home/vanilla/renda_tim_2010.csv
vanilla[~]$ google docs get --title renda_tim_2010 ~/ --format html
Downloading renda_tim_2010 to /home/vanilla/renda_tim_2010.html
vanilla[~]$ ls renda*
renda_tim_2010.csv  renda_tim_2010.html  renda_tim_2010.xls
8-) :D
User avatar
globetrotterdk
Posts: 435
Joined: 26. Oct 2010, 13:57
Location: Denmark

Re: Work flow automation with bash script?

Post by globetrotterdk »

mimosa wrote:Looks like you're nearly there: Google CL does what I called 1), you can keep your existing 2), that leaves 3). Unless the volume of emails to send out is high, there might even be a case for leaving that as a manual step - that way you get to do a sanity check before sending.
Yes, I agree. Step 3 can be left out.
OK, so here are the three actions that would make up a script. I am unsure about the last one as I don't have much experience with Vim. I can confirm that the other actions work:

Code: Select all

google docs get --title "some_document*" ~/downloads/some_document.html
From what I have read, and what I can see works, the "--format" switch isn't necessary as long as the output file name includes the desired / supported format extension.

Code: Select all

lynx -dump -force-html some_document.html | sed -n 's/^ *[0-9]*\. //p' | fgrep "mailto:" > some_document.txt

Code: Select all

vim some_document.txt :%s/\n/, /g
Military justice is to justice what military music is to music. - Groucho Marx
User avatar
mimosa
Salix Warrior
Posts: 3311
Joined: 25. May 2010, 17:02
Contact:

Re: Work flow automation with bash script?

Post by mimosa »

Although I dare say it is possible to get a script to use vim to strip out the unwanted newlines, it's probably easier to do so by scripting it - perhaps something short and to the point with sed?

In Python an expression something like this might replace the newlines with commas:

Code: Select all

bcc = ", ".join(string.split(addressees, "/n")
but sed is probably more succinct.
Shador
Posts: 1295
Joined: 11. Jun 2009, 14:04
Location: Bavaria

Re: Work flow automation with bash script?

Post by Shador »

globetrotterdk wrote:vim some_document.txt :%s/\n/, /g
That should be equivalent to:

Code: Select all

sed -e 's/\n/, /g'
Actually you can merge the two steps like this:

Code: Select all

lynx -dump -force-html some_document.html | sed -n 's/^ *[0-9]*\. //p' | fgrep "mailto:" | sed -e 's/\n/, /g' > some_document.txt
Image
Post Reply