Theater writer: My first steps with the browser automation tool
In my previous company, I have developed a combined job that tracks social media, such as Twitter, LinkedIn, Mastodon, Bluesky, Reddit, etc. Then I realized that I could repeat it for “my personality”. The problem is that some media does not provide the HTTP applications interface for the standards that I want. Below are the standards that I want on LinkedIn:
I have searched for a long time, but I did not find access to the API programming interface (API) for the above metrics. I made the standards manually every morning for a long time and finally decided to automate this arduous task. Here I have learned.
Context
The function in Python, so I want to stay in the same technical group. After a quick search, I found Playwright, a browser automation tool with two linguistic application programming facades, including Python. The basic use of Playwright is a comprehensive test, but it can also manage the browser outside the context of the test.
I use hair to manage the dependencies. Playwright installation is easy as:
poetry add playwright
At this stage, the theatrical writer is ready for use. It provides my façades programming outstanding applications, one synchronous And one Incompatible. Because of my own use, the first flavor is more than enough.
Get my wet feet
I would like to deal with development gradually.
Below is an excerpt from the application programming interface:
It is translated into the following code:
from playwright.sync_api import Browser, Locator, Page, sync_playwright
with (sync_playwright() as pw): #1
browser: Browser = pw.chromium.launch() #2
page: Page = browser.new_page() #3
page.goto('https://www.linkedin.com/login') #4
page.locator('#username').press_sequentially(getenv('LINKEDIN_USERNAME')) #5
page.locator('#password').press_sequentially(getenv('LINKEDIN_PASSWORD')) #5
page.locator('button[type=submit]').press('Enter') #6
page.goto('https://www.linkedin.com/dashboard/') #4
metrics_container: Locator = page.locator('.pcd-analytic-view-items-container')
metrics: List[Locator] = metrics_container.locator('p.text-body-large-bold').all() #7
impressions = atoi(metrics[0].inner_text()) #8
# Get other metrics
browser.close() #9
-
Get a
playwright
goal. -
Run the browser counter. Multiple types of browsers are available; I chose Chromium just for a whip. Note that you had to install the pre -set browser, any,,
playwright install --with-deps chromium
.By default, the browser is opened Broken; Do not appear. I recommend that it clearly run it in the beginning to facilitate the correction of errors:
headless = True
. -
Open a new browser window.
-
Go to a new site.
-
Select the specified entry fields site and fill it with my credentials.
-
Select the location of the specific button and press it.
-
Location everyone Specific elements.
-
Get the internal text of the first element.
-
Close the browser for cleaning.
Storing cookies
The foregoing works as expected. The only downside is that I have received a LinkedIn email every time I run the text program:
Hi Nicholas,
I have activated me successfully on a new device Without chrome head,
in . Learn more about how to make me remember me on the device.,, ,,
I also met Fabian Fushil at the Javacro Conference. He specializes in web discharge and told me that most people in this field benefit from browser profiles. In fact, if you log in to LinkedIn, you will get the authentication icon stored as cookies, and you will not need to ratify it again before its expiration. Fortunately, the theatrical writer offers such a feature with launch_persistent_context
road.
We can replace the above launch
With the following:
with sync_playwright() as pw:
playwright_profile_dir = f'{Path.home()}/.social-metrics/playwright-profile'
context: BrowserContext = pw.chromium.launch_persistent_context(playwright_profile_dir) #1
try: #2
page: Page = context.new_page() #3
page.goto('https://www.linkedin.com/dashboard/') #4
if 'session_redirect' in page.url: #4
page.locator('#username').press_sequentially(getenv('LINKEDIN_USERNAME'))
page.locator('#password').press_sequentially(getenv('LINKEDIN_PASSWORD'))
page.locator('button[type=submit]').press('Enter')
page.goto('https://www.linkedin.com/dashboard/')
metrics_container: Locator = page.locator('.pcd-analytic-view-items-container')
# Same as in the previous snippet
except Exception as e: #2
logger.error(f'Could not fetch metrics: {e}')
finally: #5
context.close()
-
The theatrical writer will store the identification file in the specified folder and reuse it through operations.
-
Improving exceptions.
-
the
BrowserContext
He can also open pages. -
We are trying to move to the dashboard. LinkedIn will see us to the login page if we are not authenticated; We can then ratify.
-
Close the context, whatever the result.
At this stage, we just need to ratify using both credentials the first time. On the operation of the suffix, that depends.
Adaptation to reality
I was surprised by seeing the above code, which does not work reliably. He succeeded in the first round and sometimes in subsequent rounds. Since I store the browser profile through operations, when I need to ratify, LinkedIn only requests the password, not log in! Since the symbol tries to enter the login, it fails in this case. Reform is very clear:
username_field = page.locator('#username')
if username_field.is_visible():
username_field.press_sequentially(getenv('LINKEDIN_USERNAME'))
page.locator('#password').press_sequentially(getenv('LINKEDIN_PASSWORD'))
conclusion
Although I am not an expert in Python, I managed to achieve what I wanted with Playwright. I preferred to use the synchronous applications interface because it facilitates a little thinking about the code, and I have no performance requirements. I only used the basic features of Playwright. Playwright allows video clips to be recorded in the context of tests, which is very useful when the test fails during the implementation of the CI pipeline.
To go forward:
It was originally published on A Java Geek on January 19, 2024