A quick post because this seems to be a new development that I hadn't seen before.
I was just checking Google Scholar for any recent behavioural science research to put in the EA Behavioural Science Newsletter and noticed that it now returns posts from the EA Forum. See below:
I don't know why it doesn't retrieve the right titles. This may be something for the EA forum team to look into. I don't know whether all posts are being indexed or there are some criteria being used to filter between them. I don't understand why this post is the top result when searching for "effective altruism".
Not sure what the implications are, but wanted to make people aware. From my perspective, Google Scholar listing posts from the EA Forum seems weakly positive. It probably means that more scientists and researchers will read and cite posts. If we can identify the criteria for being indexed by Google Scholar then maybe we should try to ensure that good posts meet those criteria?
I have much experience in Search Engine Optimization and I can venture an explanation for what is most likely going on here:
Google's algorithm can understand when an article is scholarly in nature and then it might add it to Google Scholar results where there are no better matches for the term you are searching for.
However, if the titles and other details are not manually added to Google Scholar, they are automatically extracted (or inferred) from the sourcecode of the page. And if they are not fully optimized for Academic search, they may not display the correct data like correct titles etc
For the correct titles to show, one of two things needs to be done:
1. Article is added manually to Google Scholar (During manual submission to Google Scholar, you will be asked to specify the title, author and other metadata which are then used when displaying or listing it in search results.)
2. Academic related tags are added to EA Forum articles. This can only be implemented by the site devs at the code level so that each article created automatically contains the following meta tags in its head section:
citation_title: To specify the article title to use for Google scholar and similar websites. This can be set to use the article's correct title from the main <title> tag of the html head section. Same for the others below
citation_author: To specify the author (s)
citation_publication_date: To specify the Publication date
There are a number of other tags and even other standards and formats.
You can learn more about all this stuff from the below links (it's sometimes called Academic Search Engine Optimization or ASEO):
On why certain articles come up high in the Google Scholar results even when they don't seem very relevant to the search:
A huge factor for ranking articles on Google scholar is the number of citations the article has. Google counts links to an article as citations so my guess would be that the top article in your search probably has more links to it from other websites. There is much more to this (such as quality of linking sites, search term density in the article and lots more) but I'm just trying to greatly simplify my explanation.
Tip: To see all the EA Forum results currently indexed on Google scholar, search with this term on Google Scholar (without the quotes).
Right now it shows that there are only about 48 links from EA Forum indexed on Google scholar.
Aside: I wanted to upload a screenshot to this post but I can't for the life of me figure out how to upload images to a post here. I feel its painfully obvious but I just cant figure out how. The embed settings do not include an image embed icon. How does one add an image in an article on EA Forum. How did you add the screenshot in your original post? Pray please tell...
Thanks for the plausible explanation!
Re: adding images to your post, I literally just copy and paste. But you could also read a longer post on how to enable advanced editing features such as tables and images.
Hey thanks a lot for the image tip and the link. It solved my problem.
Just in case anyone else is wondering how to include images in their posts, you can just drag and drop the image from your device into the post editor directly.
Thanks a bunch for this info! I'm thinking about adding this to the Forum. I'm considering the problem of the fact that most of our content isn't academic, and I don't want Google's algorithm to think that I'm trying to spam them. Since you seem knowledgeable, do you think think I should enable this for all posts and let google decide if they're legit, or have some annoying manual process that will probably miss many actual academic posts?
(An additional note: many of our users, present company included, do not use their real names on the Forum.)
It would be a great idea to enable it for all posts.
This forum has enough high quality scholarly content to qualify as an academic resource of note.
There will not be any issue with the Google algorithm at all. In fact Google actively encourages using these metadata for websites with content like EA forum has. The algorithm will be able to decide what is relevant to it and will simply ignore the rest.
With respect to the names, that also would not be a problem. The only consequence is that Google will store and display search results using the names they used. So if anyone likes to have their article to be associated with their real name they will need to use the real name in their post.
There will not be any other impact.
Also: Just in case someone prefers not to have their forum post indexed by Google scholar, there are ways to remove indexed entries . But I have a feeling hardly anyone would opt for that.
Thanks a bunch! This is really helpful.
Do you have a citation or epistemic status for this? I'm happy to deploy the PR as-is on the basis of your recommendation, but I'd be even happier with more knowledge of how confident to be.
My statement was really based on my experience and observations over the years as a practitioner in the field. SEO (which this falls under) is a core part of my regular day job and I consider it to be one of my strongest skills. I have 10+ years experience in the field and I have worked (and continue to work) on stuff like this almost on a daily basis. Because of this I'm very confident that my opinion is very likely correct (>90% confident).
Unfortunately I cannot point you to a specific citation or actually tell you with 100% confidence what exactly the Google algorithm will do or how exactly it works (only Google actually knows). Google is very secretive about how their algorithm works and most of the time SEOs can only offer (educated) guesses on how the algorithm probably works (guesses being based on past observations and measurements).
The Google algorithm is actually now very sophisticated and being able to filter signals while ignoring "noise" appears to be one of its strengths and modus operandus (this opinion is based on my (anecdotal and measured) observations over the years plus learnings from others in the field).
One more thing: It might take maybe up to between 1 and 3 months after the changes are made for more forum posts to start appearing in the scholar results. There is also some probability that they could appear sooner though.
I think the explanation for this happening is pretty simple. People writing academic articles (me included) have cited EA Forum articles... thus, google is finding them.
For better or worse, I am pretty sure there is no(t yet a) systematic attempt to integrate the EA Forum in the scholarly debate...
This actually may affect whether Wikipedia policies will count certain EA Forum articles as "reliable sources" (in the Wikipedia sense). I'll be taking this back to WikiProject: Effective Altruism to see whether this may allow us to cite specific EA Forum posts for claims where no secondary source is yet available.
I looked at my feed and didn't see any EAforum website posted, but that doesn't mean it's not a serious concern. Search engines are very much influenced by observation.
This is evidence that we should worry about covert attacks on EA from unknown origins. It's absurd and insane and clearly not worth the risk to ignore those threats.
On covert attack threats: This should be a very serious concern and is one of the main reasons why I suggested this project some weeks back (Note: See my #2 Item on this comment detailing reasons why such project is needed) but not many respondents seemed to think it important enough to support or comment on.
Thanks for pointing this out, Peter. As I understand it, you found this by searching for "effective altruism" and then sorting by date, not relevance.
I did not see any results for "less wrong"
But I did see similar results to your observation for "alignment forum"
"Less wrong" is a broad term that could be read in various contexts outside of EA so there are likely other webpages which Google sees as more relevant for this term (more relevant than lesswrong.com )
It seems you are trying to get results from lesswrong.com that are indexed in Google scholar. If that is correct, I would suggest that you use the following search > site:lesswrong.com
This is surprising, like surreal—we are in the simulation sort of vibe.
I've never seen a website indexed like this in Google scholar. Has anyone else?
Posts from the Alignment Forum have been visible there for some time.