Free linguistic software and data for Finnish
Free linguistic software and data for Finnish
I noticed that Salix or slackware's package sources found an important for us Finns, i.e. voikko proofreading. Would it be possible to get one in package sources? Here's a link to that site where you can find all the information etc.. https://voikko.puimula.org
Re: Free linguistic software and data for Finnish
That looks like a useful project. If no one else wants to do it, I could have a go at packaging it.
Re: Free linguistic software and data for Finnish
Unfortunately, it's proved too hard a nut for me to crack - just out of practice, I guess. The first sticking point for me was that it needs this:
https://en.wikipedia.org/wiki/Foma_%28software%29
... and I can't get it to build, nor make sense of the precompiled binary (not an unreasonable option, if it worked). This may be because that project has been abandoned for eight years or so, though there is a package for it on Arch.
Perhaps someone else with sharper packaging skills will take it on!
This is what I am seeing:
https://en.wikipedia.org/wiki/Foma_%28software%29
... and I can't get it to build, nor make sense of the precompiled binary (not an unreasonable option, if it worked). This may be because that project has been abandoned for eight years or so, though there is a package for it on Arch.
Perhaps someone else with sharper packaging skills will take it on!
This is what I am seeing:
Code: Select all
mimosa[foma-0.9.18]$ make
ar cru libfoma.a int_stack.o define.o determinize.o apply.o rewrite.o lexcread.o topsort.o flags.o minimize.o reverse.o extract.o sigma.o io.o structures.o constructions.o coaccessible.o utf8.o spelling.o dynarray.o mem.o stringhash.o trie.o lex.lexc.o lex.yy.o lex.cmatrix.o regex.o
ranlib libfoma.a
gcc -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden -fPIC -shared -Wl,-soname,libfoma.so.0 -o libfoma.so.0.9.18 int_stack.o define.o determinize.o apply.o rewrite.o lexcread.o topsort.o flags.o minimize.o reverse.o extract.o sigma.o io.o structures.o constructions.o coaccessible.o utf8.o spelling.o dynarray.o mem.o stringhash.o trie.o lex.lexc.o lex.yy.o lex.cmatrix.o regex.o -lreadline -lz -ltermcap
/usr/bin/ld: define.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: define.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: determinize.o:(.bss+0x18): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: determinize.o:(.bss+0x20): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: apply.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: apply.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: rewrite.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: rewrite.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lexcread.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lexcread.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: topsort.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: topsort.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: flags.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: flags.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: minimize.o:(.bss+0x0): multiple definition of `trans_array'; determinize.o:(.bss+0x0): first defined here
/usr/bin/ld: minimize.o:(.bss+0x8): multiple definition of `trans_list'; determinize.o:(.bss+0x8): first defined here
/usr/bin/ld: minimize.o:(.bss+0x10): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: minimize.o:(.bss+0x18): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: reverse.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: reverse.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: extract.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: extract.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: sigma.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: sigma.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: io.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: io.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: structures.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: structures.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: constructions.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: constructions.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: coaccessible.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: coaccessible.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: utf8.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: utf8.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: spelling.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: spelling.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: dynarray.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: dynarray.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: mem.o:(.bss+0x28): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: mem.o:(.bss+0x30): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.lexc.o:(.bss+0x10): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.lexc.o:(.bss+0x8): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.yy.o:(.bss+0x1600): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.yy.o:(.bss+0x1608): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.cmatrix.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.cmatrix.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: regex.o:(.bss+0x30): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: regex.o:(.bss+0x38): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
collect2: error: ld returned 1 exit status
make: *** [Makefile:70: libfoma.so.0.9.18] Error 1
mimosa[foma-0.9.18]$
[ darkstar ][
Re: Free linguistic software and data for Finnish
Very interesting. I have reading of these things on 2020.
You have to install graphviz of course for visualization. (sudo slapt-get -i graphviz epdfview) and epdfview.
Foma SLKBUILD and binary
The package build and the tests were done on Salixlive-64 xfce 15.0 real installation on external usb stick, which is another proof of having a portable system in pocket running fast as in internal ssd.
Test 1
Download the english.lexc
type foma
Test 2
Download english.foma
type foma
Another ukkonen graph is here. Which is generating by python code.
Download text_algorithms.py and testing.py
type
python3 testing.py
You have to install graphviz of course for visualization. (sudo slapt-get -i graphviz epdfview) and epdfview.
Foma SLKBUILD and binary
The package build and the tests were done on Salixlive-64 xfce 15.0 real installation on external usb stick, which is another proof of having a portable system in pocket running fast as in internal ssd.
Test 1
Download the english.lexc
type foma
"view net" shows the graphicdjemos[automata]$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"
Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.
foma[0]: read lexc english.lexc
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
foma[1]: define Lexicon;
defined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
foma[0]: regex Lexicon;
1.7 kB. 32 states, 46 arcs, 42 paths.
foma[1]: view net
Test 2
Download english.foma
type foma
djemos[foma]$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"
Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.
foma[0]: source english.foma
Opening file 'english.foma'.
defined V: 413 bytes. 2 states, 5 arcs, 5 paths.
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
defined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
defined ConsonantDoubling: 1.2 kB. 11 states, 47 arcs, Cyclic.
defined EDeletion: 1.2 kB. 11 states, 52 arcs, Cyclic.
defined EInsertion: 1.1 kB. 7 states, 43 arcs, Cyclic.
defined YReplacement: 1006 bytes. 9 states, 36 arcs, Cyclic.
defined KInsertion: 1.9 kB. 12 states, 89 arcs, Cyclic.
defined Cleanup: 332 bytes. 1 state, 2 arcs, Cyclic.
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
redefined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
defined Grammar: 2.2 kB. 47 states, 70 arcs, 42 paths.
2.2 kB. 47 states, 70 arcs, 42 paths.
foma[1]: view net
Another ukkonen graph is here. Which is generating by python code.
Download text_algorithms.py and testing.py
type
python3 testing.py
Re: Free linguistic software and data for Finnish
Thanks very much djemos, I'll have a look at these when I have a moment, and see if I can get any further!