Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for the explanation of how your cache works. Will you be at ACL this year?

The memoisation I refer to is called here:

https://github.com/syllog1sm/redshift/blob/segmentation/reds...

What happens is, I extract the set of token indices for S0, N0, S0h, S0h2, etc, into a struct SlotTokens. SlotTokens is sufficient to extract the features, so I can use its hash to memoise an array of class scores. Cache utilisation is about 30-40% even at k=8.

While I'm here...

https://github.com/syllog1sm/redshift/blob/segmentation/reds...

The big enum names all of the atomic feature values that I extract, and places their values into an array, context. So context[S0w] contains the word of the token on top of the stack.

I then list the actual features as tuples, referring to those values. So I can write a group of features with something like new_features = ((S0w, S0p), (S0w,), (S0p,)). That would add three feature templates: one with the word plus the POS tag, one with just the word, one with just the POS tag.

A bit of machinery in features.pyx then takes those Python feature definitions, and compiles them into a form that can be used more efficiently.




I won't be at ACL this year, maybe next. My advisor will be there though (Reut Tsarfaty).




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: