No need for the lynx stage if you use the Python script. It extracts well-formed email addresses irrespective of how they are formatted. However, it does assume certain things which may cause it to fail with some formats, which is why I suggested trying it out with all the formats google docs downloads to. If not, tweaking it will just be a matter of telling it to replace some more characters with whitespace. Or it might become quite robust and general if I rewrote it to check the pieces it throws away for further addresses. At the moment it will fail on something like this:
foo.bar@salix.com#$%&bar.foo@ubuntu.com
but to sum up, easily fixed, especially if you canpost a sample
