Page 1 of 1

Free linguistic software and data for Finnish

Posted: 9. Oct 2022, 14:17
by novrexi
I noticed that Salix or slackware's package sources found an important for us Finns, i.e. voikko proofreading. Would it be possible to get one in package sources? Here's a link to that site where you can find all the information etc.. https://voikko.puimula.org

Re: Free linguistic software and data for Finnish

Posted: 13. Oct 2022, 11:05
by mimosa
That looks like a useful project. If no one else wants to do it, I could have a go at packaging it.

Re: Free linguistic software and data for Finnish

Posted: 13. Oct 2022, 14:45
by novrexi
mimosa wrote: 13. Oct 2022, 11:05 That looks like a useful project. If no one else wants to do it, I could have a go at packaging it.
I would be more than grateful if you could do it. 8-)

Re: Free linguistic software and data for Finnish

Posted: 14. Oct 2022, 18:27
by mimosa
Unfortunately, it's proved too hard a nut for me to crack - just out of practice, I guess. The first sticking point for me was that it needs this:
https://en.wikipedia.org/wiki/Foma_%28software%29

... and I can't get it to build, nor make sense of the precompiled binary (not an unreasonable option, if it worked). This may be because that project has been abandoned for eight years or so, though there is a package for it on Arch.

Perhaps someone else with sharper packaging skills will take it on!

This is what I am seeing:

Code: Select all

mimosa[foma-0.9.18]$ make
ar cru libfoma.a int_stack.o define.o determinize.o apply.o rewrite.o lexcread.o topsort.o flags.o minimize.o reverse.o extract.o sigma.o io.o structures.o constructions.o coaccessible.o utf8.o spelling.o dynarray.o mem.o stringhash.o trie.o lex.lexc.o lex.yy.o lex.cmatrix.o regex.o
ranlib libfoma.a
gcc -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden -fPIC -shared -Wl,-soname,libfoma.so.0 -o libfoma.so.0.9.18 int_stack.o define.o determinize.o apply.o rewrite.o lexcread.o topsort.o flags.o minimize.o reverse.o extract.o sigma.o io.o structures.o constructions.o coaccessible.o utf8.o spelling.o dynarray.o mem.o stringhash.o trie.o lex.lexc.o lex.yy.o lex.cmatrix.o regex.o -lreadline -lz -ltermcap
/usr/bin/ld: define.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: define.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: determinize.o:(.bss+0x18): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: determinize.o:(.bss+0x20): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: apply.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: apply.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: rewrite.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: rewrite.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lexcread.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lexcread.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: topsort.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: topsort.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: flags.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: flags.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: minimize.o:(.bss+0x0): multiple definition of `trans_array'; determinize.o:(.bss+0x0): first defined here
/usr/bin/ld: minimize.o:(.bss+0x8): multiple definition of `trans_list'; determinize.o:(.bss+0x8): first defined here
/usr/bin/ld: minimize.o:(.bss+0x10): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: minimize.o:(.bss+0x18): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: reverse.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: reverse.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: extract.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: extract.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: sigma.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: sigma.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: io.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: io.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: structures.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: structures.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: constructions.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: constructions.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: coaccessible.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: coaccessible.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: utf8.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: utf8.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: spelling.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: spelling.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: dynarray.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: dynarray.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: mem.o:(.bss+0x28): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: mem.o:(.bss+0x30): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.lexc.o:(.bss+0x10): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.lexc.o:(.bss+0x8): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.yy.o:(.bss+0x1600): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.yy.o:(.bss+0x1608): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.cmatrix.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.cmatrix.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: regex.o:(.bss+0x30): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: regex.o:(.bss+0x38): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
collect2: error: ld returned 1 exit status
make: *** [Makefile:70: libfoma.so.0.9.18] Error 1
mimosa[foma-0.9.18]$ 
[ darkstar ][          

Re: Free linguistic software and data for Finnish

Posted: 29. Oct 2022, 01:35
by ChuangTzu

Re: Free linguistic software and data for Finnish

Posted: 29. Oct 2022, 08:23
by djemos
Very interesting. I have reading of these things on 2020.
You have to install graphviz of course for visualization. (sudo slapt-get -i graphviz epdfview) and epdfview.
Foma SLKBUILD and binary

The package build and the tests were done on Salixlive-64 xfce 15.0 real installation on external usb stick, which is another proof of having a portable system in pocket running fast as in internal ssd.

Test 1
Download the english.lexc
type foma
djemos[automata]$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"

Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.

foma[0]: read lexc english.lexc
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.

foma[1]: define Lexicon;
defined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.

foma[0]: regex Lexicon;
1.7 kB. 32 states, 46 arcs, 42 paths.

foma[1]: view net
"view net" shows the graphic
Image


Test 2
Download english.foma
type foma
djemos[foma]$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"

Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.

foma[0]: source english.foma
Opening file 'english.foma'.
defined V: 413 bytes. 2 states, 5 arcs, 5 paths.
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
defined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
defined ConsonantDoubling: 1.2 kB. 11 states, 47 arcs, Cyclic.
defined EDeletion: 1.2 kB. 11 states, 52 arcs, Cyclic.
defined EInsertion: 1.1 kB. 7 states, 43 arcs, Cyclic.
defined YReplacement: 1006 bytes. 9 states, 36 arcs, Cyclic.
defined KInsertion: 1.9 kB. 12 states, 89 arcs, Cyclic.
defined Cleanup: 332 bytes. 1 state, 2 arcs, Cyclic.
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
redefined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
defined Grammar: 2.2 kB. 47 states, 70 arcs, 42 paths.
2.2 kB. 47 states, 70 arcs, 42 paths.
foma[1]: view net
Image

Another ukkonen graph is here. Which is generating by python code.
Download text_algorithms.py and testing.py
type
python3 testing.py

Image

Re: Free linguistic software and data for Finnish

Posted: 29. Oct 2022, 18:35
by mimosa
Thanks very much djemos, I'll have a look at these when I have a moment, and see if I can get any further!