Free linguistic software and data for Finnish

If there's software you need and you can't find, make a request for it.
Post Reply
novrexi
Posts: 7
Joined: 7. Oct 2022, 04:53

Free linguistic software and data for Finnish

Post by novrexi » 9. Oct 2022, 14:17

I noticed that Salix or slackware's package sources found an important for us Finns, i.e. voikko proofreading. Would it be possible to get one in package sources? Here's a link to that site where you can find all the information etc.. https://voikko.puimula.org

User avatar
mimosa
Salix Warrior
Posts: 3286
Joined: 25. May 2010, 17:02
Contact:

Re: Free linguistic software and data for Finnish

Post by mimosa » 13. Oct 2022, 11:05

That looks like a useful project. If no one else wants to do it, I could have a go at packaging it.

novrexi
Posts: 7
Joined: 7. Oct 2022, 04:53

Re: Free linguistic software and data for Finnish

Post by novrexi » 13. Oct 2022, 14:45

mimosa wrote:
13. Oct 2022, 11:05
That looks like a useful project. If no one else wants to do it, I could have a go at packaging it.
I would be more than grateful if you could do it. 8-)

User avatar
mimosa
Salix Warrior
Posts: 3286
Joined: 25. May 2010, 17:02
Contact:

Re: Free linguistic software and data for Finnish

Post by mimosa » 14. Oct 2022, 18:27

Unfortunately, it's proved too hard a nut for me to crack - just out of practice, I guess. The first sticking point for me was that it needs this:
https://en.wikipedia.org/wiki/Foma_%28software%29

... and I can't get it to build, nor make sense of the precompiled binary (not an unreasonable option, if it worked). This may be because that project has been abandoned for eight years or so, though there is a package for it on Arch.

Perhaps someone else with sharper packaging skills will take it on!

This is what I am seeing:

Code: Select all

mimosa[foma-0.9.18]$ make
ar cru libfoma.a int_stack.o define.o determinize.o apply.o rewrite.o lexcread.o topsort.o flags.o minimize.o reverse.o extract.o sigma.o io.o structures.o constructions.o coaccessible.o utf8.o spelling.o dynarray.o mem.o stringhash.o trie.o lex.lexc.o lex.yy.o lex.cmatrix.o regex.o
ranlib libfoma.a
gcc -O3 -Wall -D_GNU_SOURCE -std=c99 -fvisibility=hidden -fPIC -shared -Wl,-soname,libfoma.so.0 -o libfoma.so.0.9.18 int_stack.o define.o determinize.o apply.o rewrite.o lexcread.o topsort.o flags.o minimize.o reverse.o extract.o sigma.o io.o structures.o constructions.o coaccessible.o utf8.o spelling.o dynarray.o mem.o stringhash.o trie.o lex.lexc.o lex.yy.o lex.cmatrix.o regex.o -lreadline -lz -ltermcap
/usr/bin/ld: define.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: define.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: determinize.o:(.bss+0x18): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: determinize.o:(.bss+0x20): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: apply.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: apply.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: rewrite.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: rewrite.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lexcread.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lexcread.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: topsort.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: topsort.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: flags.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: flags.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: minimize.o:(.bss+0x0): multiple definition of `trans_array'; determinize.o:(.bss+0x0): first defined here
/usr/bin/ld: minimize.o:(.bss+0x8): multiple definition of `trans_list'; determinize.o:(.bss+0x8): first defined here
/usr/bin/ld: minimize.o:(.bss+0x10): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: minimize.o:(.bss+0x18): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: reverse.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: reverse.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: extract.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: extract.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: sigma.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: sigma.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: io.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: io.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: structures.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: structures.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: constructions.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: constructions.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: coaccessible.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: coaccessible.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: utf8.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: utf8.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: spelling.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: spelling.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: dynarray.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: dynarray.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: mem.o:(.bss+0x28): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: mem.o:(.bss+0x30): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.lexc.o:(.bss+0x10): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.lexc.o:(.bss+0x8): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.yy.o:(.bss+0x1600): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.yy.o:(.bss+0x1608): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: lex.cmatrix.o:(.bss+0x0): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: lex.cmatrix.o:(.bss+0x8): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
/usr/bin/ld: regex.o:(.bss+0x30): multiple definition of `g_defines_f'; int_stack.o:(.bss+0x0): first defined here
/usr/bin/ld: regex.o:(.bss+0x38): multiple definition of `g_defines'; int_stack.o:(.bss+0x8): first defined here
collect2: error: ld returned 1 exit status
make: *** [Makefile:70: libfoma.so.0.9.18] Error 1
mimosa[foma-0.9.18]$ 
[ darkstar ][          


djemos
Salix Warrior
Posts: 1311
Joined: 29. Dec 2009, 13:45
Location: Greece

Re: Free linguistic software and data for Finnish

Post by djemos » 29. Oct 2022, 08:23

Very interesting. I have reading of these things on 2020.
You have to install graphviz of course for visualization. (sudo slapt-get -i graphviz epdfview) and epdfview.
Foma SLKBUILD and binary

The package build and the tests were done on Salixlive-64 xfce 15.0 real installation on external usb stick, which is another proof of having a portable system in pocket running fast as in internal ssd.

Test 1
Download the english.lexc
type foma
djemos[automata]$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"

Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.

foma[0]: read lexc english.lexc
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.

foma[1]: define Lexicon;
defined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.

foma[0]: regex Lexicon;
1.7 kB. 32 states, 46 arcs, 42 paths.

foma[1]: view net
"view net" shows the graphic
Image


Test 2
Download english.foma
type foma
djemos[foma]$ foma
Foma, version 0.9.18alpha (svn r0)
Copyright © 2008-2015 Mans Hulden
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; for details, type "help license"

Type "help" to list all commands available.
Type "help <topic>" or help "<operator>" for further help.

foma[0]: source english.foma
Opening file 'english.foma'.
defined V: 413 bytes. 2 states, 5 arcs, 5 paths.
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
defined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
defined ConsonantDoubling: 1.2 kB. 11 states, 47 arcs, Cyclic.
defined EDeletion: 1.2 kB. 11 states, 52 arcs, Cyclic.
defined EInsertion: 1.1 kB. 7 states, 43 arcs, Cyclic.
defined YReplacement: 1006 bytes. 9 states, 36 arcs, Cyclic.
defined KInsertion: 1.9 kB. 12 states, 89 arcs, Cyclic.
defined Cleanup: 332 bytes. 1 state, 2 arcs, Cyclic.
Root...2, Noun...6, Verb...6, Ninf...2, Vinf...5
Building lexicon...
Determinizing...
Minimizing...
Done!
1.7 kB. 32 states, 46 arcs, 42 paths.
redefined Lexicon: 1.7 kB. 32 states, 46 arcs, 42 paths.
defined Grammar: 2.2 kB. 47 states, 70 arcs, 42 paths.
2.2 kB. 47 states, 70 arcs, 42 paths.
foma[1]: view net
Image

Another ukkonen graph is here. Which is generating by python code.
Download text_algorithms.py and testing.py
type
python3 testing.py

Image

User avatar
mimosa
Salix Warrior
Posts: 3286
Joined: 25. May 2010, 17:02
Contact:

Re: Free linguistic software and data for Finnish

Post by mimosa » 29. Oct 2022, 18:35

Thanks very much djemos, I'll have a look at these when I have a moment, and see if I can get any further!

Post Reply