for some reason the query below is allowing duplicate names. Why is that?
SELECT id, name_without_variants, SUM(relevance) as total_relevance FROM (
SELECT
card_definitions.id,
card_definitions.name_without_variants,
(MATCH(card_definitions.name_without_variants) AGAINST ('lost soul site discard')) * 0.40 AS relevance
FROM card_definitions
GROUP BY name_without_variants, id
UNION
SELECT
card_definitions.id,
card_definitions.name_without_variants,
(MATCH(card_def_identities.special_ability_text) AGAINST ('lost soul site discard')) * 0.05 AS relevance
FROM card_def_identities
INNER JOIN card_definitions ON card_def_identities.card_def_sid = card_definitions.id
GROUP BY name_without_variants, id
UNION
SELECT
card_definitions.id,
card_definitions.name_without_variants,
(MATCH(brigades.brigade_color) AGAINST ('lost soul site discard')) * 0.30 AS relevance
FROM brigades
INNER JOIN card_def_brigades ON brigades.id = card_def_brigades.brigade_sid
INNER JOIN card_definitions ON card_def_brigades.card_def_sid = card_definitions.id
GROUP BY name_without_variants, id
UNION
SELECT
card_definitions.id,
card_definitions.name_without_variants,
(MATCH(identifiers.identifier) AGAINST ('lost soul site discard')) * 0.20 AS relevance
FROM identifiers
INNER JOIN card_def_identifiers ON identifiers.id = card_def_identifiers.identifier_sid
INNER JOIN card_definitions on card_def_identifiers.card_def_sid = card_definitions.id
GROUP BY name_without_variants, id
UNION
SELECT
card_definitions.id,
card_definitions.name_without_variants,
(MATCH(card_effects.effect) AGAINST ('lost soul site discard')) * 0.05 AS relevance
FROM card_effects
INNER JOIN card_def_effects ON card_effects.id = card_def_effects.effect_sid
INNER JOIN card_definitions on card_def_effects.card_def_sid = card_definitions.id
GROUP BY name_without_variants, id
) AS combined_search
GROUP BY name_without_variants, id
HAVING total_relevance > 0
ORDER BY total_relevance DESC
LIMIT 10;
Here's the result i'm getting. Notice the two Lost Soul [Site Doubler]
2623 Lost Soul [Deck Discard] 6.35151714086533
1410 Lost Soul [Hand Discard] 6.29273346662521
1495 Lost Soul [Discard Card] 5.93360201716423
1442 Lost Soul [Demon Discard] 5.91308708190918
1497 Lost Soul [Site Doubler] 5.05888686180115
1498 Lost Soul [Site Doubler] 5.05888686180115
2572 Lost Soul [Site Guard] 4.82421946525574
2774 Lost Soul [Far Country] 3.39325473308563
2891 Fortify Site [RoA2] 2.77084048986435
1418 Lost Soul [Hopper] 2.63041100502014
Because the IDs are different and you are grouping by ID, you get multiple rows for each, that's what GROUP BY
does. If you change your top level SELECT
to
SELECT name_without_variants, SUM(relevance) as total_relevance
and the outer GROUP BY to:
GROUP BY name_without_variants
you should see distinct names, but will no longer have the id.
GROUP BY name_without_variants, id
You're grouping by name_without_variants, id. The id's differ on the two records:
1497 Lost Soul [Site Doubler] 5.05888686180115
1498 Lost Soul [Site Doubler] 5.05888686180115
You need to decide how you will manage the id.
Remove the id from group by and add an aggregate function to the id columnn in your select. Or just remove the column all together.
Here's an example which has been reduced to single query. Please understand I don't have the full perspective on your schema or data, nor has this been tested. I'm also making some assumptions here. However, if the schema is relational, this should bring back what you're looking for:
SELECT cd.id, cd.name_without_variants, (((MATCH(cd.name_without_variants) AGAINST ('lost soul site discard')) * 0.40)+
((MATCH(cdi.special_ability_text) AGAINST ('lost soul site discard')) * 0.05)+
((MATCH(b.brigade_color) AGAINST ('lost soul site discard')) * 0.30)+
((MATCH(i.identifier) AGAINST ('lost soul site discard')) * 0.20)+
((MATCH(ce.effect) AGAINST ('lost soul site discard')) * 0.05)
) as total_relevance
FROM card_definitions cd
LEFT OUTER JOIN card_def_identities cdi ON cd.id=cdi.card_def_sid
LEFT OUTER JOIN brigades b ON cd.id=b.card_def_sid
LEFT OUTER JOIN identifiers i ON i.id=cdi.identifier_sid
LEFT OUTER JOIN card_def_effects cde ON cde.card_def_sid=cd.id
LEFT OUTER JOIN card_effects ce ON ce.id=cde.effect_sid
GROUP BY cd.id, cd.name_without_variants
HAVING total_relevance > 0
ORDER BY total_relevance DESC
LIMIT 10;