<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    Szia Balázs (és mindenki)!<br>
    <br>
    A nem /-rel kezdődő POS címkéket én szúrtam el. Köszönöm a jelzést,
    elnézést kérek. <br>
    Vera írta, hogy ilyeneket javítottak. Lehet, hogy nem minden esetben
    sikerült.<br>
    <blockquote type="cite">@Attila: néha a képzett szavaknál a szófaj
      elől lemarad a / jel (pl. Dél-dunántúli   
      Dél-dunántúli[Adj][Nom]), ezt mi elvben javítottuk a konverzió
      során, de az elemzőben is hasznos lenne átírni.<br>
      <br>
      Üdv:<br>
      Vera<br>
    </blockquote>
    <br>
    Ha a csatolt scriptet lefuttatjátok az elemzett forráson, akkor
    kijavítja ezeket a hibákat, és a modell(eke)t így újra tudjátok
    tanítani.<br>
    <br>
    futtatás:<br>
    <br>
    perl postagfix.pl corpus-hibas-postagekkel.txt
    >corpus-javitott-postagekkel.txt<br>
    <br>
    Azért nézzétek meg, hogy tényleg azt csinálja, amit kell... :)<br>
    <br>
    Attila<br>
    <br>
    <br>
    <br>
    <br>
    <div class="moz-cite-prefix">On 2016.07.29. 10:13, Indig Balázs
      wrote:<br>
    </div>
    <blockquote
cite="mid:CAFSpsSACshO0Ziv6p7onfQt9zyoQQ2BJ=Y7XjfQtHa5MZ5856g@mail.gmail.com"
      type="cite">
      <div dir="ltr">
        <div>Sziasztok!</div>
        <div><br>
        </div>
        <div>1412 egyedi címke. Jobb a helyzet...</div>
        <div><br>
        </div>
        A PurePOS model frissült.
        <div><br>
        </div>
        <div>Már tisztul a kép, de még nem értek bizonyos dolgokat (most
          PurePOS jelölsében #-el elválasztva szóalak szótő címke):</div>
        <div><br>
        </div>
        <div>1) "leg"-et#"#[Punct]  Ez miért Punct ha a mondatvégi pont
          vessző stb. OTHER?</div>
        <div>2) Most akkor a címkének nem "[/" -el kelellene kezdődnie?
          Mert ezek a címkék furák:</div>
        <div><br>
        </div>
        <div>
          <div>
            <div> 252623 OTHER</div>
            <div>   1305 [Adj][Nom]</div>
            <div>     26 [N|Acron][Acc]</div>
            <div>     17 [N|Acron][Pl][Nom]</div>
            <div>     14 [N|Acron][Transl]</div>
            <div>     12 [Num][Nom]</div>
            <div>      6 [Adj][Pl][Nom]</div>
            <div>      5 [N|Acron][Ins]</div>
            <div>      5 [N|Abbr][Dat]</div>
            <div>      4 [N][Nom]</div>
            <div>      4 [Adj|nat][Nom]</div>
            <div>      3 [N][Poss.3Sg][Nom]</div>
            <div>      3 [N|Acron][Pl][Subl]</div>
            <div>      3 [Adj][All]</div>
            <div>      2 [V][Inf]</div>
            <div>      2 [_PerfPtcp_Subj=tA/Adj][Pl][AnP][All]</div>
            <div>      2 [N][Poss.3Sg][Acc]</div>
            <div>      2 [N|Acron][Pl][All]</div>
            <div>      2 [N|Acron][Pl][Acc]</div>
            <div>      2 [N|Acron][Nom]</div>
            <div>      2 [N|Abbr][Subl]</div>
            <div>      2 [N|Abbr][All]</div>
            <div>      2 [N|Abbr][Acc]</div>
            <div>      1 [V][Pst.Def.3Sg]</div>
            <div>      1 [V][Pst.Def.1Sg]</div>
            <div>      1 [V][_Mod][Prs.NDef.3Pl]</div>
            <div>      1 [V][_Mod][Prs.Def.3Sg][Punct]</div>
            <div>      1 [Punct]</div>
            <div>      1 POS</div>
            <div>      1 [_PerfPtcp_Subj=tA/Adj][Pl][Dat]</div>
            <div>      1 [_PerfPtcp_Subj=tA/Adj][Nom]</div>
            <div>      1 [N][Poss.3Pl][Nom]</div>
            <div>      1 [N][All]</div>
            <div>      1 [N|Acron][Subl]</div>
            <div>      1 [N|Acron][Poss.1Sg][Subl]</div>
            <div>      1 [N|Acron][Poss.1Pl][All]</div>
            <div>      1 [N|Acron][Pl][Ter]</div>
            <div>      1 [N|Acron][Pl][Ins]</div>
            <div>      1 [N|Acron][Pl][Ine]</div>
            <div>      1 [N|Acron][Ade]</div>
            <div>      1 [N|Acron][Acc][Punct]</div>
            <div>      1 [N][Acc]</div>
            <div>      1 [N|Abbr][Ela]</div>
            <div>      1 [Adj][Pl][Ade]</div>
            <div>      1 [Adj][EssFor%:ként]</div>
            <div>      1 [Adj]</div>
          </div>
          <div><br>
          </div>
          <div><br>
          </div>
          <div>Balázs</div>
          <div><br>
          </div>
        </div>
      </div>
      <div class="gmail_extra"><br>
        <div class="gmail_quote">2016. július 29. 9:46 Veronika Vincze
          írta, <span dir="ltr"><<a moz-do-not-send="true"
              href="mailto:vinczev@inf.u-szeged.hu" target="_blank">vinczev@inf.u-szeged.hu</a>></span>:<br>
          <blockquote class="gmail_quote" style="margin:0 0 0
            .8ex;border-left:1px #ccc solid;padding-left:1ex">
            <div bgcolor="#FFFFFF" text="#000000">
              <p>Sziasztok!</p>
              <p>A második hibát javítottuk a konverterben, frissültek a
                fájlok.</p>
              <p>Az első hiba nagyrészt tulajdonneveket érint, ha jól
                gondolom. A Szeged Korpuszban ezek egységesen főnévi
                címkét kaptak, még akkor is, ha jelen esetben egy
                melléknév képezi a tulajdonnév részét. A konverter úgy
                működik, hogy az új harmonizált kódok közül választjuk
                ki az MSD-kód alapján a neki leginkább megfelelőt,
                vagyis itt most a főnévi kódnak megfelelően főnévi kódot
                választ, ami persze nem helyes, de ezt automatikusan nem
                tudjuk eldönteni. Sajnos arra most nincs se időnk, se
                erőforrásunk (Szegeden legalábbis), hogy ezeket az
                eseteket kézzel egyértelműsítsük :(</p>
              <p>Üdv:<br>
                Vera<br>
              </p>
              <div>
                <div class="h5"> <br>
                  <div>On 2016.07.29. 8:22, Indig Balázs wrote:<br>
                  </div>
                  <blockquote type="cite">
                    <div dir="ltr">Szasztok!
                      <div><br>
                      </div>
                      <div>@Vera: </div>
                      <div><br>
                      </div>
                      <div>Megye -> Megy<br>
                      </div>
                      <div><br>
                      </div>
                      <div>
                        <div>Jász-Nagykun-Szolnok<span
                            style="white-space:pre-wrap"> </span>Jász-Nagykun-Szolnok<span
                            style="white-space:pre-wrap"> </span>N<span
                            style="white-space:pre-wrap"> </span>SubPOS=p|Num=s|Cas=n|NumP=none|PerP=none|NumPd=none<span
                            style="white-space:pre-wrap"> </span>Jász-Nagykun-Szolnok[/N][Nom]</div>
                        <div>Megyei<span style="white-space:pre-wrap"> </span>Megyei<span
                            style="white-space:pre-wrap"> </span>N<span
                            style="white-space:pre-wrap"> </span>SubPOS=p|Num=s|Cas=n|NumP=none|PerP=none|NumPd=none<span
                            style="white-space:pre-wrap"> </span>Megy[/N][Pl.Poss.3Sg][Nom]</div>
                      </div>
                      <div><br>
                      </div>
                      <div>És ebből van egy csomó... </div>
                      <div><br>
                      </div>
                      <div>Illetve a másik:</div>
                      <div><br>
                      </div>
                      <div>
                        <div>A<span style="white-space:pre-wrap"> </span>a<span
                            style="white-space:pre-wrap"> </span>T<span
                            style="white-space:pre-wrap"> </span>SubPOS=f<span
                            style="white-space:pre-wrap"> </span>a[/Det|art.Def]</div>
                        <div>két<span style="white-space:pre-wrap"> </span>két<span
                            style="white-space:pre-wrap"> </span>M<span
                            style="white-space:pre-wrap"> </span>SubPOS=c|Num=s|Cas=n|Form=l|NumP=none|PerP=none|NumPd=none<span
                            style="white-space:pre-wrap"> </span>két[/Num|Attr][Nom]</div>
                        <div>óra<span style="white-space:pre-wrap"> </span>óra<span
                            style="white-space:pre-wrap"> </span>N<span
                            style="white-space:pre-wrap"> </span>SubPOS=c|Num=s|Cas=n|NumP=none|PerP=none|NumPd=none<span
                            style="white-space:pre-wrap"> </span>óra[/N][Nom]</div>
                        <div>közti<span style="white-space:pre-wrap"> </span>közti<span
                            style="white-space:pre-wrap"> </span>A<span
                            style="white-space:pre-wrap"> </span>SubPOS=f|Deg=p|Num=s|Cas=n|NumP=none|PerP=none|NumPd=none<span
                            style="white-space:pre-wrap"> </span>közti[/Adj][Nom]</div>
                        <div>szüntet<span style="white-space:pre-wrap">
                          </span>szüntet<span
                            style="white-space:pre-wrap"> </span>Z<span
                            style="white-space:pre-wrap"> </span>_<span
                            style="white-space:pre-wrap"> </span>[szüntet[/V][Prs.NDef.3Sg]]</div>
                      </div>
                      <div><br>
                      </div>
                      <div>Értem én, de akkor valaki elmondhatná, hogy
                        mi a formátum: Az első [/ -től van a címke vagy
                        az első [ -től? És ha az előbbi akkor mit
                        kezdjen a PurePOS a fenti "lemmakezdő [" -el?</div>
                      <div><br>
                      </div>
                      <div>Most az egész  [szüntet[/V][Prs.NDef.3Sg]]
                        címkének van véve és így halál lassú a
                        tanítás(az eddigi 1026 uniq cimke helyett van
                        2408 a hülyeségekkel együtt), meg nem is biztos,
                        hogy ez adja az elvárt eredményt...</div>
                      <div><br>
                      </div>
                      <div><br>
                      </div>
                      <div>Balázs</div>
                      <div><br>
                      </div>
                    </div>
                    <div class="gmail_extra"><br>
                      <div class="gmail_quote">2016. július 28. 18:37
                        Indig Balázs írta, <span dir="ltr"><<a
                            moz-do-not-send="true"
                            href="mailto:indig.balazs@itk.ppke.hu"
                            target="_blank">indig.balazs@itk.ppke.hu</a>></span>:<br>
                        <blockquote class="gmail_quote" style="margin:0
                          0 0 .8ex;border-left:1px #ccc
                          solid;padding-left:1ex">
                          <div dir="ltr">Sziasztok!<br>
                            <br>
                            <div>Az új javított szeged korpuszhoz is
                              elérhető a PurePOS model itt:</div>
                            <div><br>
                            </div>
                            <div><a moz-do-not-send="true"
                                href="http://pi.itk.ppke.hu/%7Edlazesz/infra/"
                                target="_blank">http://pi.itk.ppke.hu/~dlazesz/infra/</a><br>
                            </div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div><br>
                            </div>
                            <div>Üdv,</div>
                            <div><br>
                            </div>
                            <div>Balázs</div>
                          </div>
                          <div>
                            <div>
                              <div class="gmail_extra"><br>
                                <div class="gmail_quote">2016. július
                                  28. 15:54 Veronika Vincze írta, <span
                                    dir="ltr"><<a
                                      moz-do-not-send="true"
                                      href="mailto:vinczev@inf.u-szeged.hu"
                                      target="_blank">vinczev@inf.u-szeged.hu</a>></span>:<br>
                                  <blockquote class="gmail_quote"
                                    style="margin:0 0 0
                                    .8ex;border-left:1px #ccc
                                    solid;padding-left:1ex">
                                    <div bgcolor="#FFFFFF"
                                      text="#000000">
                                      <p>Sziasztok,</p>
                                      <p>Javítottunk pár bugot a
                                        konvertálásban, most már elvben
                                        jó minden sor formátuma. A
                                        Szeged Korpusz teljes anyaga
                                        elérhető a<br>
                                      </p>
                                      <a moz-do-not-send="true"
                                        href="http://www.inf.u-szeged.hu/%7Evinczev/infra/konvertalt_morf/"
                                        target="_blank">http://www.inf.u-szeged.hu/~vinczev/infra/konvertalt_morf/</a><br>
                                      <br>
                                      címen. Ha bármilyen problémát
                                      találtok, jelezzétek, kérlek.<br>
                                      <br>
                                      @Attila: néha a képzett szavaknál
                                      a szófaj elől lemarad a / jel (pl.
                                      Dél-dunántúli   
                                      Dél-dunántúli[Adj][Nom]), ezt mi
                                      elvben javítottuk a konverzió
                                      során, de az elemzőben is hasznos
                                      lenne átírni.<br>
                                      <br>
                                      Üdv:<br>
                                      Vera
                                      <div>
                                        <div><br>
                                          <br>
                                          <div>On 2016.07.28. 13:46,
                                            Indig Balázs wrote:<br>
                                          </div>
                                          <blockquote type="cite">
                                            <div dir="ltr">Kedves
                                              Mindenki!
                                              <div><br>
                                              </div>
                                              <div>PurePOS modellek és a
                                                szeged korpusz
                                                formátumából purepos
                                                input formátumba
                                                konvertáló script:</div>
                                              <div><br>
                                              </div>
                                              <div><a
                                                  moz-do-not-send="true"
href="http://pi.itk.ppke.hu/%7Edlazesz/infra/" target="_blank">http://pi.itk.ppke.hu/~dlazesz/infra/</a><br>
                                              </div>
                                              <div><br>
                                              </div>
                                              <div>A szeged korpusz
                                                formátuma kicsit fura.
                                                Van ahol nem csak 5 mező
                                                van És csomó helyen
                                                viszonylag nehéz
                                                kinyerni, hogy mit is
                                                akarhatott a szerző,
                                                hogy mi kerüljön a
                                                PurePOS-ba. A konvertáló
                                                script tele van ezirányú
                                                kommentekkel.</div>
                                              <div><br>
                                              </div>
                                              <div>Ami a "PurePOS-beli
                                                morfológiát" illeti:</div>
                                              <div><br>
                                              </div>
                                              <div>Így néz ki egy
                                                morfológiával
                                                szelektíven
                                                annotált{{annotál[\V]||annotáció[\N]}}
                                                input  sor .</div>
                                              <div><br>
                                              </div>
                                              <div>Majd még fog alakulni
                                                a dolog. Most a héten ez
                                                megy.</div>
                                              <div><br>
                                              </div>
                                              <div><br>
                                              </div>
                                              <div>Üdv,</div>
                                              <div><br>
                                              </div>
                                              <div>Balázs</div>
                                              <div><br>
                                              </div>
                                            </div>
                                            <div class="gmail_extra"><br>
                                              <div class="gmail_quote">2016.
                                                július 25. 13:52
                                                Veronika Vincze írta, <span
                                                  dir="ltr"><<a
                                                    moz-do-not-send="true"
href="mailto:vinczev@inf.u-szeged.hu" target="_blank">vinczev@inf.u-szeged.hu</a>></span>:<br>
                                                <blockquote
                                                  class="gmail_quote"
                                                  style="margin:0 0 0
                                                  .8ex;border-left:1px
                                                  #ccc
                                                  solid;padding-left:1ex">Kedves

                                                  Mindenki,<span><br>
                                                    <br>
                                                    <br>
                                                    On 2016.07.25.
                                                    12:12, Sass Bálint
                                                    wrote:<br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      <br>
                                                      2.<br>
                                                      Vera, esetleg írj
                                                      pár szót a fájl
                                                      oszlopairól,<br>
                                                      hogy igaziból
                                                      melyik kell nekünk
                                                      most.<br>
                                                      <br>
                                                      Megnéztem, sztem
                                                      ez van:<br>
                                                      1. oszlop =
                                                      szóalak<br>
                                                      5. oszlop =
                                                      újkódos szótő +
                                                      szófaj + elemzés<br>
                                                      <br>
                                                      Ugye ez a kettő
                                                      kell most a
                                                      tanításhoz,<br>
                                                      a 2-3-4. oszlop
                                                      régi, így
                                                      figyelmen kívül
                                                      hagyandó?<br>
                                                      <br>
                                                      Szóval az 5.
                                                      oszlopban lévő
                                                      szótövet, szófajt
                                                      és elemzést<br>
                                                      kellene rendre
                                                      lemma, pos és
                                                      feature -ként<br>
                                                      használni a
                                                      tanításokhoz,
                                                      ugye? :)<br>
                                                    </blockquote>
                                                  </span> Így igaz, az
                                                  1. és az 5. oszlop a
                                                  fontos, a többit csak
                                                  benne hagytuk a
                                                  kiíratáskor.<span><br>
                                                    <blockquote
                                                      class="gmail_quote"
                                                      style="margin:0 0
                                                      0
                                                      .8ex;border-left:1px
                                                      #ccc
                                                      solid;padding-left:1ex">
                                                      3.<br>
                                                      Tekintetbe véve,
                                                      hogy hét végére
                                                      mindennek mennie
                                                      kellene<br>
                                                      valamilyen
                                                      formában a honlap
                                                      mögött, azt kérem,
                                                      hogy<br>
                                                      mindenki (az alább
                                                      említettek)<br>
                                                      tanítson egy
                                                      modellt ezen a
                                                      részkorpuszon,
                                                      hogy legyen
                                                      valami,<br>
                                                      aztán majd később
                                                      legyenek meg a
                                                      teljes korpuszon
                                                      tanított modellek.<br>
                                                    </blockquote>
                                                  </span> Nekiállunk mi
                                                  is szintaxist tanítani
                                                  (plusz csináljuk a
                                                  többi alkorpusz
                                                  infrásmorfológiára
                                                  való átalakítását), ez
                                                  valószínűleg pár napot
                                                  igénybe vesz nálunk.<br>
                                                  <br>
                                                  Üdv:<br>
                                                  Vera
                                                  <div>
                                                    <div><br>
                                                      <br>
                                                      <br>
_______________________________________________<br>
                                                      nlp-infra-devel
                                                      mailing list<br>
                                                      <a
                                                        moz-do-not-send="true"
href="mailto:nlp-infra-devel@nytud.mta.hu" target="_blank">nlp-infra-devel@nytud.mta.hu</a><br>
                                                      <a
                                                        moz-do-not-send="true"
href="http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel"
                                                        rel="noreferrer"
                                                        target="_blank">http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel</a><br>
                                                    </div>
                                                  </div>
                                                </blockquote>
                                              </div>
                                              <br>
                                            </div>
                                            <br>
                                            <fieldset></fieldset>
                                            <br>
                                            <pre>_______________________________________________
nlp-infra-devel mailing list
<a moz-do-not-send="true" href="mailto:nlp-infra-devel@nytud.mta.hu" target="_blank">nlp-infra-devel@nytud.mta.hu</a>
<a moz-do-not-send="true" href="http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel" target="_blank">http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel</a>
</pre>
                                          </blockquote>
                                          <br>
                                        </div>
                                      </div>
                                    </div>
                                    <br>
_______________________________________________<br>
                                    nlp-infra-devel mailing list<br>
                                    <a moz-do-not-send="true"
                                      href="mailto:nlp-infra-devel@nytud.mta.hu"
                                      target="_blank">nlp-infra-devel@nytud.mta.hu</a><br>
                                    <a moz-do-not-send="true"
                                      href="http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel"
                                      rel="noreferrer" target="_blank">http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel</a><br>
                                    <br>
                                  </blockquote>
                                </div>
                                <br>
                              </div>
                            </div>
                          </div>
                        </blockquote>
                      </div>
                      <br>
                    </div>
                    <br>
                    <fieldset></fieldset>
                    <br>
                    <pre>_______________________________________________
nlp-infra-devel mailing list
<a moz-do-not-send="true" href="mailto:nlp-infra-devel@nytud.mta.hu" target="_blank">nlp-infra-devel@nytud.mta.hu</a>
<a moz-do-not-send="true" href="http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel" target="_blank">http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel</a>
</pre>
                  </blockquote>
                  <br>
                </div>
              </div>
            </div>
            <br>
            _______________________________________________<br>
            nlp-infra-devel mailing list<br>
            <a moz-do-not-send="true"
              href="mailto:nlp-infra-devel@nytud.mta.hu">nlp-infra-devel@nytud.mta.hu</a><br>
            <a moz-do-not-send="true"
              href="http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel"
              rel="noreferrer" target="_blank">http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel</a><br>
            <br>
          </blockquote>
        </div>
        <br>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
nlp-infra-devel mailing list
<a class="moz-txt-link-abbreviated" href="mailto:nlp-infra-devel@nytud.mta.hu">nlp-infra-devel@nytud.mta.hu</a>
<a class="moz-txt-link-freetext" href="http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel">http://corpus.nytud.hu/dltlist/listinfo/nlp-infra-devel</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>