[Photon] Next steps

Tue Jun 17 16:05:06 UTC 2014

great news! I am currently working on a better test coverage in particular for germany. I am also using a new query based on cross_field which looks very promising. In fact this behaves very much like the collector approach but with some important improvements:

# the index shrinks by 40% or so because the default values get not copied to the language specific collectors (avoiding redundancy). worldwide data for english and german occupy less than 32 gb, this approach could open the door for full multilingual support needed for osm.org

I can also import global data in only 2 hours.

# you can calibrate the weight nicely by tuning the fields specifically: https://github.com/christophlingg/photon/blob/master/website/photon/app.py#L29

# solves lot of proplems like one digit housenumbers for example. raw and edgengram tokens are queried simultaneously while they still have different boosts, that’s great!

downside is apparently that fuzzy is not working, haven’t tried it yet. Are you sure about this yohan?

I have also extended the test framework a little bit. But it’s still in progress.

I will dig on it further tomorrow but I did not expect that you will have something ready already by the end of this week. Do you have some time tomorrow for a chat? I think it makes sense to exchange our findings and care about coordination.

Cheers,
Christoph

Am 17.06.2014 um 17:22 schrieb Yohan Boniface <ybon at enix.org>:

> Next sprint in Paris this Thursday evening :)
> We will focus on finishing the "positive scoring" branch. I'm confident we can reach a stable state.
> I think I will dedicate some time on Thursday to prepare everything, and on Friday to finish.
> Christoph, if you have some time Thursday or Friday, we can do a pair session so I explain you everything we have done yet on the search logic.
> 
> Yohan
> 
> On 06/09/2014 05:33 PM, Christoph Lingg [mobil] wrote:
>> I'll have a look on the plugin. For testing purposes I disabled tf/idf by changing the scoring method to replace (instead of multiply). Results were good but maybe I is not that, anyway I need a closer look on it.
>> 
>> I will work on a fuzzyless version next week then and add German test cases step by step. Would be helpful if I can get your advice when I need some help. Maybe that's possible?
>> 
>> If it is tricky and complex to summarise  fuzzy and fuzzless results in one es query, maybe it is worth to try an alternative approach that came up in my mind: we can first query without fuzzy then with fuzzy and merge both results to a final result list. Nonfuzzy results in first, third, fifth place whist fuzzy results in second, fourth ... I'll give it a try. Same could be done with location bias.
>> 
>> I would like to join you that weekend and it would definitely make sense, but exactly this weekend is difficult for me ;-(
>> 
>> Yohan Boniface <ybon at enix.org> schrieb am 09.06.2014:
>>> On 06/09/2014 03:15 PM, Christoph Lingg [mobil] wrote:
>>>> #2 search logic
>>>> This is a little bit a black box for me as I am not aware of the
>>> current state of yohan's work. Can you give an update and an outlook?
>>> How can one support you? I thought we might make a small sprint (1-2
>>> days) so we can work on it together? Some komoot background: in komoot
>>> we still use a very old version of photon (still solr) and have some
>>> severe bugs (even some big cities cannot be found) I am keen to work on
>>> a search logic that resolves those bugs (a basic version is already
>>> satisfying in the first step maybe without fuzzy). So I can easily
>>> justify towards komoot to work on this some days.
>>> 
>>> I'm in two weeks of Mozilla sprint, so no time at all until next
>>> week-end for me.
>>> Code side, the summary is that
>>> https://github.com/komoot/photon/tree/positivescoring is the best
>>> search
>>> logic we had.
>>> But custom similarity is still not working.
>>> And I think we will want it in any case.
>>> So one think that you can do to help is to make the plugin work
>>> (https://github.com/yohanboniface/elasticsearch-photon-similarity),
>>> because we will need to take control of TF/IDF (basically, we don't
>>> want
>>> it I think).
>>> The other think is to add many search tests for Germany.
>>> If there is hurry in the air, you can still remove the fuzzy part from
>>> any of the branches, and the good results will magically grow ;) But
>>> fuzzy is the current challenge ;)
>>> I'm not sure I will work on photon next week-end, but for sure next
>>> week.
>>> I'm planning a new sprint on photon in Paris on June 28/29th, on
>>> Mozilla
>>> office, could you join maybe?
>>